[2024-06-15 11:31:04,762][1648981] Saving configuration to train_dir/atari_2B_atari_airraid_1111/config.json... [2024-06-15 11:31:04,777][1648981] Rollout worker 0 uses device cpu [2024-06-15 11:31:04,778][1648981] Rollout worker 1 uses device cpu [2024-06-15 11:31:04,778][1648981] Rollout worker 2 uses device cpu [2024-06-15 11:31:04,778][1648981] Rollout worker 3 uses device cpu [2024-06-15 11:31:08,653][1648981] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-15 11:31:08,654][1648981] InferenceWorker_p0-w0: min num requests: 1 [2024-06-15 11:31:08,701][1648981] Starting all processes... [2024-06-15 11:31:08,715][1648981] Starting process learner_proc0 [2024-06-15 11:31:11,557][1648981] Starting all processes... [2024-06-15 11:31:11,560][1651274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-15 11:31:11,560][1651274] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2024-06-15 11:31:11,561][1648981] Starting process inference_proc0-0 [2024-06-15 11:31:11,562][1648981] Starting process rollout_proc0 [2024-06-15 11:31:11,562][1648981] Starting process rollout_proc1 [2024-06-15 11:31:11,562][1648981] Starting process rollout_proc2 [2024-06-15 11:31:11,563][1648981] Starting process rollout_proc3 [2024-06-15 11:31:11,608][1651274] Num visible devices: 1 [2024-06-15 11:31:11,658][1651274] Setting fixed seed 1111 [2024-06-15 11:31:11,661][1651274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-15 11:31:11,661][1651274] Initializing actor-critic model on device cuda:0 [2024-06-15 11:31:11,662][1651274] RunningMeanStd input shape: (4, 84, 84) [2024-06-15 11:31:11,664][1651274] RunningMeanStd input shape: (1,) [2024-06-15 11:31:11,701][1651274] ConvEncoder: input_channels=4 [2024-06-15 11:31:11,806][1651274] Conv encoder output size: 512 [2024-06-15 11:31:11,807][1651274] Created Actor Critic model with architecture: [2024-06-15 11:31:11,808][1651274] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=6, bias=True) ) ) [2024-06-15 11:31:12,416][1651274] Using optimizer [2024-06-15 11:31:13,327][1651274] No checkpoints found [2024-06-15 11:31:13,327][1651274] Did not load from checkpoint, starting from scratch! [2024-06-15 11:31:13,327][1651274] Initialized policy 0 weights for model version 0 [2024-06-15 11:31:13,329][1651274] LearnerWorker_p0 finished initialization! [2024-06-15 11:31:13,330][1651274] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-15 11:31:13,870][1651669] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2024-06-15 11:31:13,870][1651669] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2024-06-15 11:31:13,915][1651669] Num visible devices: 1 [2024-06-15 11:31:13,930][1651672] Worker 3 uses CPU cores [72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95] [2024-06-15 11:31:13,938][1651671] Worker 2 uses CPU cores [48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71] [2024-06-15 11:31:13,938][1651668] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23] [2024-06-15 11:31:14,138][1651669] RunningMeanStd input shape: (4, 84, 84) [2024-06-15 11:31:14,139][1651669] RunningMeanStd input shape: (1,) [2024-06-15 11:31:14,160][1651669] ConvEncoder: input_channels=4 [2024-06-15 11:31:14,263][1648981] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:14,268][1651670] Worker 1 uses CPU cores [24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47] [2024-06-15 11:31:14,323][1651669] Conv encoder output size: 512 [2024-06-15 11:31:14,344][1648981] Inference worker 0-0 is ready! [2024-06-15 11:31:14,345][1648981] All inference workers are ready! Signal rollout workers to start! [2024-06-15 11:31:14,345][1651670] EnvRunner 1-0 uses policy 0 [2024-06-15 11:31:14,345][1651672] EnvRunner 3-0 uses policy 0 [2024-06-15 11:31:14,345][1651668] EnvRunner 0-0 uses policy 0 [2024-06-15 11:31:14,345][1651671] EnvRunner 2-0 uses policy 0 [2024-06-15 11:31:15,766][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:20,793][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:25,772][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:28,637][1648981] Heartbeat connected on Batcher_0 [2024-06-15 11:31:28,656][1648981] Heartbeat connected on LearnerWorker_p0 [2024-06-15 11:31:28,705][1648981] Heartbeat connected on InferenceWorker_p0-w0 [2024-06-15 11:31:30,789][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:35,766][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:39,886][1648981] Heartbeat connected on RolloutWorker_w2 [2024-06-15 11:31:40,766][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:43,677][1648981] Heartbeat connected on RolloutWorker_w1 [2024-06-15 11:31:44,265][1648981] Heartbeat connected on RolloutWorker_w3 [2024-06-15 11:31:45,766][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 438.8. Samples: 13824. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:46,831][1648981] Heartbeat connected on RolloutWorker_w0 [2024-06-15 11:31:50,811][1648981] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2241.4. Samples: 81920. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2024-06-15 11:31:51,890][1651671] Worker 2, sleep for 0.500 sec to decorrelate experience collection [2024-06-15 11:31:52,183][1651274] Signal inference workers to stop experience collection... [2024-06-15 11:31:52,213][1651669] InferenceWorker_p0-w0: stopping experience collection [2024-06-15 11:31:52,394][1651671] Worker 2 awakens! [2024-06-15 11:31:55,527][1651274] Signal inference workers to resume experience collection... [2024-06-15 11:31:55,529][1651669] InferenceWorker_p0-w0: resuming experience collection [2024-06-15 11:31:55,766][1648981] Fps is (10 sec: 3276.8, 60 sec: 789.5, 300 sec: 789.5). Total num frames: 32768. Throughput: 0: 2491.9. Samples: 103424. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2024-06-15 11:31:56,822][1651669] Updated weights for policy 0, policy_version 61 (0.0012) [2024-06-15 11:31:57,244][1651672] Worker 3, sleep for 0.750 sec to decorrelate experience collection [2024-06-15 11:31:57,250][1651670] Worker 1, sleep for 0.250 sec to decorrelate experience collection [2024-06-15 11:31:57,501][1651670] Worker 1 awakens! [2024-06-15 11:31:57,998][1651672] Worker 3 awakens! [2024-06-15 11:31:58,416][1651669] Updated weights for policy 0, policy_version 112 (0.0012) [2024-06-15 11:31:59,851][1651669] Updated weights for policy 0, policy_version 163 (0.0098) [2024-06-15 11:32:00,769][1648981] Fps is (10 sec: 39487.7, 60 sec: 8455.1, 300 sec: 8455.1). Total num frames: 393216. Throughput: 0: 3504.1. Samples: 157696. Policy #0 lag: (min: 105.0, avg: 158.2, max: 169.0) [2024-06-15 11:32:01,687][1651669] Updated weights for policy 0, policy_version 240 (0.0012) [2024-06-15 11:32:05,706][1651669] Updated weights for policy 0, policy_version 288 (0.0011) [2024-06-15 11:32:05,766][1648981] Fps is (10 sec: 55705.6, 60 sec: 11452.2, 300 sec: 11452.2). Total num frames: 589824. Throughput: 0: 5248.2. Samples: 236032. Policy #0 lag: (min: 14.0, avg: 108.2, max: 254.0) [2024-06-15 11:32:08,591][1651669] Updated weights for policy 0, policy_version 352 (0.0012) [2024-06-15 11:32:10,108][1651669] Updated weights for policy 0, policy_version 401 (0.0023) [2024-06-15 11:32:10,780][1648981] Fps is (10 sec: 49099.1, 60 sec: 15654.3, 300 sec: 15654.3). Total num frames: 884736. Throughput: 0: 6040.5. Samples: 271872. Policy #0 lag: (min: 111.0, avg: 192.0, max: 335.0) [2024-06-15 11:32:10,781][1648981] Avg episode reward: [(0, '1.000')] [2024-06-15 11:32:11,153][1651274] Saving new best policy, reward=1.000! [2024-06-15 11:32:11,791][1651669] Updated weights for policy 0, policy_version 465 (0.0011) [2024-06-15 11:32:12,775][1651669] Updated weights for policy 0, policy_version 509 (0.0029) [2024-06-15 11:32:15,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 17476.2, 300 sec: 17049.0). Total num frames: 1048576. Throughput: 0: 7649.7. Samples: 344064. Policy #0 lag: (min: 111.0, avg: 192.0, max: 335.0) [2024-06-15 11:32:15,767][1648981] Avg episode reward: [(0, '1.739')] [2024-06-15 11:32:16,326][1651274] Saving new best policy, reward=1.739! [2024-06-15 11:32:16,866][1651669] Updated weights for policy 0, policy_version 569 (0.0011) [2024-06-15 11:32:19,972][1651669] Updated weights for policy 0, policy_version 624 (0.0013) [2024-06-15 11:32:20,766][1648981] Fps is (10 sec: 42657.8, 60 sec: 21854.9, 300 sec: 19709.1). Total num frames: 1310720. Throughput: 0: 9113.6. Samples: 410112. Policy #0 lag: (min: 5.0, avg: 98.9, max: 261.0) [2024-06-15 11:32:20,767][1648981] Avg episode reward: [(0, '2.044')] [2024-06-15 11:32:21,195][1651274] Saving new best policy, reward=2.044! [2024-06-15 11:32:21,855][1651669] Updated weights for policy 0, policy_version 695 (0.0012) [2024-06-15 11:32:23,787][1651669] Updated weights for policy 0, policy_version 755 (0.0012) [2024-06-15 11:32:25,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 26217.0, 300 sec: 21997.1). Total num frames: 1572864. Throughput: 0: 9705.2. Samples: 436736. Policy #0 lag: (min: 114.0, avg: 201.3, max: 359.0) [2024-06-15 11:32:25,767][1648981] Avg episode reward: [(0, '2.630')] [2024-06-15 11:32:25,770][1651274] Saving new best policy, reward=2.630! [2024-06-15 11:32:27,311][1651669] Updated weights for policy 0, policy_version 784 (0.0011) [2024-06-15 11:32:29,326][1651669] Updated weights for policy 0, policy_version 837 (0.0014) [2024-06-15 11:32:30,688][1651669] Updated weights for policy 0, policy_version 897 (0.0012) [2024-06-15 11:32:30,810][1648981] Fps is (10 sec: 52198.9, 60 sec: 30572.8, 300 sec: 23972.2). Total num frames: 1835008. Throughput: 0: 11400.8. Samples: 527360. Policy #0 lag: (min: 40.0, avg: 135.6, max: 296.0) [2024-06-15 11:32:30,811][1648981] Avg episode reward: [(0, '3.480')] [2024-06-15 11:32:31,142][1651274] Signal inference workers to stop experience collection... (50 times) [2024-06-15 11:32:31,169][1651669] InferenceWorker_p0-w0: stopping experience collection (50 times) [2024-06-15 11:32:31,446][1651274] Signal inference workers to resume experience collection... (50 times) [2024-06-15 11:32:31,446][1651274] Saving new best policy, reward=3.480! [2024-06-15 11:32:31,447][1651669] InferenceWorker_p0-w0: resuming experience collection (50 times) [2024-06-15 11:32:32,244][1651669] Updated weights for policy 0, policy_version 957 (0.0011) [2024-06-15 11:32:34,128][1651669] Updated weights for policy 0, policy_version 1000 (0.0012) [2024-06-15 11:32:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 34952.5, 300 sec: 25730.9). Total num frames: 2097152. Throughput: 0: 11218.3. Samples: 586240. Policy #0 lag: (min: 10.0, avg: 166.2, max: 274.0) [2024-06-15 11:32:35,767][1648981] Avg episode reward: [(0, '4.310')] [2024-06-15 11:32:35,769][1651274] Saving new best policy, reward=4.310! [2024-06-15 11:32:38,828][1651669] Updated weights for policy 0, policy_version 1042 (0.0013) [2024-06-15 11:32:39,607][1651669] Updated weights for policy 0, policy_version 1083 (0.0016) [2024-06-15 11:32:40,729][1651669] Updated weights for policy 0, policy_version 1120 (0.0015) [2024-06-15 11:32:40,767][1648981] Fps is (10 sec: 46076.4, 60 sec: 38229.1, 300 sec: 26516.4). Total num frames: 2293760. Throughput: 0: 11673.5. Samples: 628736. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 11:32:40,768][1648981] Avg episode reward: [(0, '4.620')] [2024-06-15 11:32:41,123][1651274] Saving new best policy, reward=4.620! [2024-06-15 11:32:42,415][1651669] Updated weights for policy 0, policy_version 1186 (0.0012) [2024-06-15 11:32:45,326][1651669] Updated weights for policy 0, policy_version 1248 (0.0011) [2024-06-15 11:32:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 43144.5, 300 sec: 28290.5). Total num frames: 2588672. Throughput: 0: 11856.4. Samples: 691200. Policy #0 lag: (min: 15.0, avg: 158.8, max: 271.0) [2024-06-15 11:32:45,767][1648981] Avg episode reward: [(0, '5.400')] [2024-06-15 11:32:45,979][1651274] Saving new best policy, reward=5.400! [2024-06-15 11:32:50,766][1648981] Fps is (10 sec: 36046.0, 60 sec: 44270.1, 300 sec: 27503.8). Total num frames: 2654208. Throughput: 0: 11719.1. Samples: 763392. Policy #0 lag: (min: 15.0, avg: 95.5, max: 271.0) [2024-06-15 11:32:50,767][1648981] Avg episode reward: [(0, '6.350')] [2024-06-15 11:32:51,064][1651274] Saving new best policy, reward=6.350! [2024-06-15 11:32:51,834][1651669] Updated weights for policy 0, policy_version 1344 (0.0104) [2024-06-15 11:32:53,244][1651669] Updated weights for policy 0, policy_version 1402 (0.0013) [2024-06-15 11:32:54,633][1651669] Updated weights for policy 0, policy_version 1456 (0.0012) [2024-06-15 11:32:55,773][1648981] Fps is (10 sec: 42569.3, 60 sec: 49692.5, 300 sec: 29698.1). Total num frames: 3014656. Throughput: 0: 11493.3. Samples: 788992. Policy #0 lag: (min: 31.0, avg: 158.1, max: 287.0) [2024-06-15 11:32:55,774][1648981] Avg episode reward: [(0, '6.600')] [2024-06-15 11:32:55,783][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000001472_3014656.pth... [2024-06-15 11:32:55,892][1651274] Saving new best policy, reward=6.600! [2024-06-15 11:32:58,185][1651669] Updated weights for policy 0, policy_version 1520 (0.0011) [2024-06-15 11:33:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45877.5, 300 sec: 29536.5). Total num frames: 3145728. Throughput: 0: 11252.7. Samples: 850432. Policy #0 lag: (min: 15.0, avg: 148.2, max: 271.0) [2024-06-15 11:33:00,767][1648981] Avg episode reward: [(0, '7.450')] [2024-06-15 11:33:00,768][1651274] Saving new best policy, reward=7.450! [2024-06-15 11:33:03,545][1651669] Updated weights for policy 0, policy_version 1554 (0.0011) [2024-06-15 11:33:04,927][1651669] Updated weights for policy 0, policy_version 1618 (0.0018) [2024-06-15 11:33:05,767][1648981] Fps is (10 sec: 36069.1, 60 sec: 46421.3, 300 sec: 30269.1). Total num frames: 3375104. Throughput: 0: 11286.7. Samples: 918016. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 11:33:05,767][1648981] Avg episode reward: [(0, '7.230')] [2024-06-15 11:33:06,252][1651669] Updated weights for policy 0, policy_version 1680 (0.0013) [2024-06-15 11:33:09,313][1651669] Updated weights for policy 0, policy_version 1744 (0.0145) [2024-06-15 11:33:10,778][1648981] Fps is (10 sec: 52365.8, 60 sec: 46422.7, 300 sec: 31498.2). Total num frames: 3670016. Throughput: 0: 11420.2. Samples: 950784. Policy #0 lag: (min: 47.0, avg: 188.0, max: 303.0) [2024-06-15 11:33:10,779][1648981] Avg episode reward: [(0, '8.060')] [2024-06-15 11:33:10,806][1651274] Saving new best policy, reward=8.060! [2024-06-15 11:33:15,013][1651669] Updated weights for policy 0, policy_version 1796 (0.0055) [2024-06-15 11:33:15,766][1648981] Fps is (10 sec: 36045.3, 60 sec: 44783.2, 300 sec: 30744.5). Total num frames: 3735552. Throughput: 0: 10978.9. Samples: 1020928. Policy #0 lag: (min: 15.0, avg: 82.0, max: 271.0) [2024-06-15 11:33:15,767][1648981] Avg episode reward: [(0, '8.730')] [2024-06-15 11:33:16,284][1651274] Saving new best policy, reward=8.730! [2024-06-15 11:33:16,416][1651274] Signal inference workers to stop experience collection... (100 times) [2024-06-15 11:33:16,464][1651669] InferenceWorker_p0-w0: stopping experience collection (100 times) [2024-06-15 11:33:16,720][1651274] Signal inference workers to resume experience collection... (100 times) [2024-06-15 11:33:16,721][1651669] InferenceWorker_p0-w0: resuming experience collection (100 times) [2024-06-15 11:33:16,723][1651669] Updated weights for policy 0, policy_version 1872 (0.0013) [2024-06-15 11:33:18,739][1651669] Updated weights for policy 0, policy_version 1952 (0.0013) [2024-06-15 11:33:20,767][1648981] Fps is (10 sec: 39368.2, 60 sec: 45875.0, 300 sec: 32119.5). Total num frames: 4063232. Throughput: 0: 11025.0. Samples: 1082368. Policy #0 lag: (min: 20.0, avg: 160.9, max: 283.0) [2024-06-15 11:33:20,768][1648981] Avg episode reward: [(0, '8.460')] [2024-06-15 11:33:21,011][1651669] Updated weights for policy 0, policy_version 2002 (0.0014) [2024-06-15 11:33:22,106][1651669] Updated weights for policy 0, policy_version 2048 (0.0013) [2024-06-15 11:33:25,767][1648981] Fps is (10 sec: 45874.2, 60 sec: 43690.5, 300 sec: 31895.0). Total num frames: 4194304. Throughput: 0: 10797.5. Samples: 1114624. Policy #0 lag: (min: 63.0, avg: 207.8, max: 319.0) [2024-06-15 11:33:25,768][1648981] Avg episode reward: [(0, '8.840')] [2024-06-15 11:33:25,773][1651274] Saving new best policy, reward=8.840! [2024-06-15 11:33:28,458][1651669] Updated weights for policy 0, policy_version 2101 (0.0012) [2024-06-15 11:33:29,735][1651669] Updated weights for policy 0, policy_version 2173 (0.0013) [2024-06-15 11:33:30,766][1648981] Fps is (10 sec: 39322.7, 60 sec: 43722.8, 300 sec: 32647.2). Total num frames: 4456448. Throughput: 0: 10763.4. Samples: 1175552. Policy #0 lag: (min: 15.0, avg: 87.4, max: 271.0) [2024-06-15 11:33:30,767][1648981] Avg episode reward: [(0, '8.530')] [2024-06-15 11:33:32,554][1651669] Updated weights for policy 0, policy_version 2232 (0.0025) [2024-06-15 11:33:34,054][1651669] Updated weights for policy 0, policy_version 2288 (0.0012) [2024-06-15 11:33:35,767][1648981] Fps is (10 sec: 52428.8, 60 sec: 43690.5, 300 sec: 33346.1). Total num frames: 4718592. Throughput: 0: 10422.0. Samples: 1232384. Policy #0 lag: (min: 15.0, avg: 140.1, max: 271.0) [2024-06-15 11:33:35,768][1648981] Avg episode reward: [(0, '8.340')] [2024-06-15 11:33:40,775][1648981] Fps is (10 sec: 29464.4, 60 sec: 40954.0, 300 sec: 32429.8). Total num frames: 4751360. Throughput: 0: 10637.7. Samples: 1267712. Policy #0 lag: (min: 12.0, avg: 78.0, max: 268.0) [2024-06-15 11:33:40,776][1648981] Avg episode reward: [(0, '8.650')] [2024-06-15 11:33:41,667][1651669] Updated weights for policy 0, policy_version 2352 (0.0014) [2024-06-15 11:33:43,395][1651669] Updated weights for policy 0, policy_version 2422 (0.0014) [2024-06-15 11:33:44,760][1651669] Updated weights for policy 0, policy_version 2451 (0.0010) [2024-06-15 11:33:45,766][1648981] Fps is (10 sec: 36045.3, 60 sec: 41506.1, 300 sec: 33524.3). Total num frames: 5079040. Throughput: 0: 10638.2. Samples: 1329152. Policy #0 lag: (min: 15.0, avg: 142.7, max: 271.0) [2024-06-15 11:33:45,767][1648981] Avg episode reward: [(0, '9.140')] [2024-06-15 11:33:45,783][1651274] Saving new best policy, reward=9.140! [2024-06-15 11:33:46,422][1651669] Updated weights for policy 0, policy_version 2499 (0.0012) [2024-06-15 11:33:50,772][1648981] Fps is (10 sec: 49170.4, 60 sec: 43140.7, 300 sec: 33499.0). Total num frames: 5242880. Throughput: 0: 10489.1. Samples: 1390080. Policy #0 lag: (min: 15.0, avg: 142.7, max: 271.0) [2024-06-15 11:33:50,772][1648981] Avg episode reward: [(0, '8.340')] [2024-06-15 11:33:52,798][1651669] Updated weights for policy 0, policy_version 2561 (0.0015) [2024-06-15 11:33:54,312][1651669] Updated weights for policy 0, policy_version 2630 (0.0013) [2024-06-15 11:33:55,679][1651669] Updated weights for policy 0, policy_version 2688 (0.0012) [2024-06-15 11:33:55,779][1648981] Fps is (10 sec: 42546.7, 60 sec: 41502.4, 300 sec: 34083.6). Total num frames: 5505024. Throughput: 0: 10592.7. Samples: 1427456. Policy #0 lag: (min: 15.0, avg: 87.6, max: 271.0) [2024-06-15 11:33:55,780][1648981] Avg episode reward: [(0, '8.240')] [2024-06-15 11:33:58,723][1651669] Updated weights for policy 0, policy_version 2755 (0.0013) [2024-06-15 11:34:00,016][1651669] Updated weights for policy 0, policy_version 2816 (0.0012) [2024-06-15 11:34:00,767][1648981] Fps is (10 sec: 52455.0, 60 sec: 43690.5, 300 sec: 34636.9). Total num frames: 5767168. Throughput: 0: 10296.8. Samples: 1484288. Policy #0 lag: (min: 12.0, avg: 136.2, max: 268.0) [2024-06-15 11:34:00,767][1648981] Avg episode reward: [(0, '8.280')] [2024-06-15 11:34:04,544][1651274] Signal inference workers to stop experience collection... (150 times) [2024-06-15 11:34:04,593][1651669] InferenceWorker_p0-w0: stopping experience collection (150 times) [2024-06-15 11:34:04,876][1651274] Signal inference workers to resume experience collection... (150 times) [2024-06-15 11:34:04,877][1651669] InferenceWorker_p0-w0: resuming experience collection (150 times) [2024-06-15 11:34:05,767][1648981] Fps is (10 sec: 36087.5, 60 sec: 41506.0, 300 sec: 34200.3). Total num frames: 5865472. Throughput: 0: 10387.9. Samples: 1549824. Policy #0 lag: (min: 15.0, avg: 96.7, max: 271.0) [2024-06-15 11:34:05,768][1648981] Avg episode reward: [(0, '8.180')] [2024-06-15 11:34:06,952][1651669] Updated weights for policy 0, policy_version 2912 (0.0182) [2024-06-15 11:34:07,805][1651669] Updated weights for policy 0, policy_version 2944 (0.0013) [2024-06-15 11:34:10,780][1648981] Fps is (10 sec: 29452.5, 60 sec: 39866.8, 300 sec: 34342.8). Total num frames: 6062080. Throughput: 0: 10248.4. Samples: 1575936. Policy #0 lag: (min: 12.0, avg: 116.4, max: 268.0) [2024-06-15 11:34:10,780][1648981] Avg episode reward: [(0, '8.560')] [2024-06-15 11:34:11,768][1651669] Updated weights for policy 0, policy_version 3012 (0.0130) [2024-06-15 11:34:12,775][1651669] Updated weights for policy 0, policy_version 3067 (0.0020) [2024-06-15 11:34:15,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 42598.4, 300 sec: 34663.0). Total num frames: 6291456. Throughput: 0: 10376.5. Samples: 1642496. Policy #0 lag: (min: 12.0, avg: 116.4, max: 268.0) [2024-06-15 11:34:15,767][1648981] Avg episode reward: [(0, '9.840')] [2024-06-15 11:34:15,768][1651274] Saving new best policy, reward=9.840! [2024-06-15 11:34:18,470][1651669] Updated weights for policy 0, policy_version 3126 (0.0013) [2024-06-15 11:34:20,456][1651669] Updated weights for policy 0, policy_version 3196 (0.0013) [2024-06-15 11:34:20,766][1648981] Fps is (10 sec: 49217.6, 60 sec: 41506.2, 300 sec: 35139.3). Total num frames: 6553600. Throughput: 0: 10331.1. Samples: 1697280. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 11:34:20,767][1648981] Avg episode reward: [(0, '8.620')] [2024-06-15 11:34:24,555][1651669] Updated weights for policy 0, policy_version 3281 (0.0014) [2024-06-15 11:34:25,520][1651669] Updated weights for policy 0, policy_version 3327 (0.0013) [2024-06-15 11:34:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 43690.8, 300 sec: 35590.7). Total num frames: 6815744. Throughput: 0: 10424.1. Samples: 1736704. Policy #0 lag: (min: 15.0, avg: 112.4, max: 271.0) [2024-06-15 11:34:25,767][1648981] Avg episode reward: [(0, '8.530')] [2024-06-15 11:34:30,768][1648981] Fps is (10 sec: 32764.1, 60 sec: 40412.9, 300 sec: 35018.4). Total num frames: 6881280. Throughput: 0: 10592.4. Samples: 1805824. Policy #0 lag: (min: 15.0, avg: 90.4, max: 262.0) [2024-06-15 11:34:30,769][1648981] Avg episode reward: [(0, '11.120')] [2024-06-15 11:34:31,213][1651274] Saving new best policy, reward=11.120! [2024-06-15 11:34:32,117][1651669] Updated weights for policy 0, policy_version 3425 (0.0013) [2024-06-15 11:34:35,222][1651669] Updated weights for policy 0, policy_version 3457 (0.0013) [2024-06-15 11:34:35,767][1648981] Fps is (10 sec: 32767.9, 60 sec: 40413.9, 300 sec: 35450.6). Total num frames: 7143424. Throughput: 0: 10628.1. Samples: 1868288. Policy #0 lag: (min: 15.0, avg: 108.9, max: 255.0) [2024-06-15 11:34:35,767][1648981] Avg episode reward: [(0, '9.530')] [2024-06-15 11:34:36,332][1651669] Updated weights for policy 0, policy_version 3520 (0.0012) [2024-06-15 11:34:37,855][1651669] Updated weights for policy 0, policy_version 3582 (0.0014) [2024-06-15 11:34:40,766][1648981] Fps is (10 sec: 45880.5, 60 sec: 43150.9, 300 sec: 35544.4). Total num frames: 7340032. Throughput: 0: 10436.2. Samples: 1896960. Policy #0 lag: (min: 15.0, avg: 108.9, max: 255.0) [2024-06-15 11:34:40,767][1648981] Avg episode reward: [(0, '10.780')] [2024-06-15 11:34:42,748][1651669] Updated weights for policy 0, policy_version 3618 (0.0014) [2024-06-15 11:34:44,176][1651669] Updated weights for policy 0, policy_version 3683 (0.0013) [2024-06-15 11:34:45,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 42052.1, 300 sec: 35943.5). Total num frames: 7602176. Throughput: 0: 10570.0. Samples: 1959936. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 11:34:45,767][1648981] Avg episode reward: [(0, '11.310')] [2024-06-15 11:34:45,768][1651274] Saving new best policy, reward=11.310! [2024-06-15 11:34:47,832][1651274] Signal inference workers to stop experience collection... (200 times) [2024-06-15 11:34:47,868][1651669] InferenceWorker_p0-w0: stopping experience collection (200 times) [2024-06-15 11:34:48,169][1651274] Signal inference workers to resume experience collection... (200 times) [2024-06-15 11:34:48,180][1651669] InferenceWorker_p0-w0: resuming experience collection (200 times) [2024-06-15 11:34:48,182][1651669] Updated weights for policy 0, policy_version 3744 (0.0031) [2024-06-15 11:34:50,313][1651669] Updated weights for policy 0, policy_version 3828 (0.0014) [2024-06-15 11:34:50,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 43694.4, 300 sec: 36324.2). Total num frames: 7864320. Throughput: 0: 10388.0. Samples: 2017280. Policy #0 lag: (min: 13.0, avg: 114.5, max: 269.0) [2024-06-15 11:34:50,767][1648981] Avg episode reward: [(0, '10.630')] [2024-06-15 11:34:55,767][1648981] Fps is (10 sec: 29491.2, 60 sec: 39875.7, 300 sec: 35652.2). Total num frames: 7897088. Throughput: 0: 10607.2. Samples: 2053120. Policy #0 lag: (min: 10.0, avg: 82.2, max: 232.0) [2024-06-15 11:34:55,768][1648981] Avg episode reward: [(0, '10.980')] [2024-06-15 11:34:55,914][1651669] Updated weights for policy 0, policy_version 3872 (0.0014) [2024-06-15 11:34:56,327][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000003888_7962624.pth... [2024-06-15 11:34:57,728][1651669] Updated weights for policy 0, policy_version 3952 (0.0016) [2024-06-15 11:35:00,111][1651669] Updated weights for policy 0, policy_version 3984 (0.0015) [2024-06-15 11:35:00,766][1648981] Fps is (10 sec: 32768.5, 60 sec: 40414.1, 300 sec: 36167.2). Total num frames: 8192000. Throughput: 0: 10581.3. Samples: 2118656. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 11:35:00,767][1648981] Avg episode reward: [(0, '10.370')] [2024-06-15 11:35:02,110][1651669] Updated weights for policy 0, policy_version 4064 (0.0024) [2024-06-15 11:35:05,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 42052.6, 300 sec: 36235.4). Total num frames: 8388608. Throughput: 0: 10717.9. Samples: 2179584. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 11:35:05,767][1648981] Avg episode reward: [(0, '10.550')] [2024-06-15 11:35:08,186][1651669] Updated weights for policy 0, policy_version 4157 (0.0015) [2024-06-15 11:35:09,627][1651669] Updated weights for policy 0, policy_version 4213 (0.0012) [2024-06-15 11:35:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 43154.1, 300 sec: 36577.7). Total num frames: 8650752. Throughput: 0: 10490.3. Samples: 2208768. Policy #0 lag: (min: 15.0, avg: 104.0, max: 271.0) [2024-06-15 11:35:10,767][1648981] Avg episode reward: [(0, '10.770')] [2024-06-15 11:35:12,530][1651669] Updated weights for policy 0, policy_version 4256 (0.0123) [2024-06-15 11:35:14,792][1651669] Updated weights for policy 0, policy_version 4351 (0.0030) [2024-06-15 11:35:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 36905.9). Total num frames: 8912896. Throughput: 0: 10445.1. Samples: 2275840. Policy #0 lag: (min: 5.0, avg: 103.1, max: 261.0) [2024-06-15 11:35:15,767][1648981] Avg episode reward: [(0, '10.920')] [2024-06-15 11:35:19,445][1651669] Updated weights for policy 0, policy_version 4403 (0.0013) [2024-06-15 11:35:20,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 42052.0, 300 sec: 36821.9). Total num frames: 9076736. Throughput: 0: 10626.8. Samples: 2346496. Policy #0 lag: (min: 3.0, avg: 88.2, max: 259.0) [2024-06-15 11:35:20,768][1648981] Avg episode reward: [(0, '10.850')] [2024-06-15 11:35:21,669][1651669] Updated weights for policy 0, policy_version 4467 (0.0015) [2024-06-15 11:35:24,051][1651669] Updated weights for policy 0, policy_version 4496 (0.0013) [2024-06-15 11:35:25,416][1651669] Updated weights for policy 0, policy_version 4560 (0.0012) [2024-06-15 11:35:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 42598.4, 300 sec: 37262.5). Total num frames: 9371648. Throughput: 0: 10831.7. Samples: 2384384. Policy #0 lag: (min: 0.0, avg: 99.6, max: 256.0) [2024-06-15 11:35:25,767][1648981] Avg episode reward: [(0, '10.480')] [2024-06-15 11:35:26,522][1651669] Updated weights for policy 0, policy_version 4604 (0.0012) [2024-06-15 11:35:30,722][1651669] Updated weights for policy 0, policy_version 4665 (0.0013) [2024-06-15 11:35:30,767][1648981] Fps is (10 sec: 45876.5, 60 sec: 44237.6, 300 sec: 37174.9). Total num frames: 9535488. Throughput: 0: 10968.2. Samples: 2453504. Policy #0 lag: (min: 0.0, avg: 101.4, max: 256.0) [2024-06-15 11:35:30,768][1648981] Avg episode reward: [(0, '9.630')] [2024-06-15 11:35:31,940][1651274] Signal inference workers to stop experience collection... (250 times) [2024-06-15 11:35:31,970][1651669] InferenceWorker_p0-w0: stopping experience collection (250 times) [2024-06-15 11:35:32,219][1651274] Signal inference workers to resume experience collection... (250 times) [2024-06-15 11:35:32,220][1651669] InferenceWorker_p0-w0: resuming experience collection (250 times) [2024-06-15 11:35:32,695][1651669] Updated weights for policy 0, policy_version 4720 (0.0012) [2024-06-15 11:35:33,161][1651669] Updated weights for policy 0, policy_version 4736 (0.0012) [2024-06-15 11:35:35,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 44236.7, 300 sec: 37466.5). Total num frames: 9797632. Throughput: 0: 11309.5. Samples: 2526208. Policy #0 lag: (min: 2.0, avg: 98.4, max: 258.0) [2024-06-15 11:35:35,768][1648981] Avg episode reward: [(0, '10.910')] [2024-06-15 11:35:35,789][1651669] Updated weights for policy 0, policy_version 4800 (0.0029) [2024-06-15 11:35:37,038][1651669] Updated weights for policy 0, policy_version 4856 (0.0014) [2024-06-15 11:35:40,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 45328.9, 300 sec: 37747.3). Total num frames: 10059776. Throughput: 0: 11377.8. Samples: 2565120. Policy #0 lag: (min: 9.0, avg: 109.3, max: 265.0) [2024-06-15 11:35:40,767][1648981] Avg episode reward: [(0, '10.800')] [2024-06-15 11:35:40,953][1651669] Updated weights for policy 0, policy_version 4924 (0.0014) [2024-06-15 11:35:43,944][1651669] Updated weights for policy 0, policy_version 4985 (0.0013) [2024-06-15 11:35:45,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 43690.6, 300 sec: 37655.5). Total num frames: 10223616. Throughput: 0: 11275.3. Samples: 2626048. Policy #0 lag: (min: 9.0, avg: 109.3, max: 265.0) [2024-06-15 11:35:45,768][1648981] Avg episode reward: [(0, '12.770')] [2024-06-15 11:35:45,770][1651274] Saving new best policy, reward=12.770! [2024-06-15 11:35:46,800][1651669] Updated weights for policy 0, policy_version 5024 (0.0013) [2024-06-15 11:35:47,400][1651669] Updated weights for policy 0, policy_version 5056 (0.0011) [2024-06-15 11:35:48,836][1651669] Updated weights for policy 0, policy_version 5113 (0.0142) [2024-06-15 11:35:50,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 43690.8, 300 sec: 37922.7). Total num frames: 10485760. Throughput: 0: 11525.7. Samples: 2698240. Policy #0 lag: (min: 3.0, avg: 101.4, max: 259.0) [2024-06-15 11:35:50,767][1648981] Avg episode reward: [(0, '11.160')] [2024-06-15 11:35:52,473][1651669] Updated weights for policy 0, policy_version 5168 (0.0013) [2024-06-15 11:35:55,622][1651669] Updated weights for policy 0, policy_version 5232 (0.0015) [2024-06-15 11:35:55,767][1648981] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 38063.9). Total num frames: 10715136. Throughput: 0: 11639.4. Samples: 2732544. Policy #0 lag: (min: 24.0, avg: 130.0, max: 280.0) [2024-06-15 11:35:55,767][1648981] Avg episode reward: [(0, '10.560')] [2024-06-15 11:35:57,696][1651669] Updated weights for policy 0, policy_version 5280 (0.0015) [2024-06-15 11:35:59,610][1651669] Updated weights for policy 0, policy_version 5317 (0.0043) [2024-06-15 11:36:00,775][1648981] Fps is (10 sec: 49113.4, 60 sec: 46415.3, 300 sec: 38313.6). Total num frames: 10977280. Throughput: 0: 11569.2. Samples: 2796544. Policy #0 lag: (min: 15.0, avg: 126.3, max: 271.0) [2024-06-15 11:36:00,776][1648981] Avg episode reward: [(0, '11.660')] [2024-06-15 11:36:03,051][1651669] Updated weights for policy 0, policy_version 5380 (0.0013) [2024-06-15 11:36:04,391][1651669] Updated weights for policy 0, policy_version 5439 (0.0014) [2024-06-15 11:36:05,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 45874.7, 300 sec: 38219.5). Total num frames: 11141120. Throughput: 0: 11582.5. Samples: 2867712. Policy #0 lag: (min: 6.0, avg: 111.2, max: 262.0) [2024-06-15 11:36:05,768][1648981] Avg episode reward: [(0, '12.310')] [2024-06-15 11:36:07,169][1651669] Updated weights for policy 0, policy_version 5502 (0.0015) [2024-06-15 11:36:09,949][1651669] Updated weights for policy 0, policy_version 5552 (0.0012) [2024-06-15 11:36:10,767][1648981] Fps is (10 sec: 42631.3, 60 sec: 45875.1, 300 sec: 38655.1). Total num frames: 11403264. Throughput: 0: 11400.5. Samples: 2897408. Policy #0 lag: (min: 15.0, avg: 134.2, max: 271.0) [2024-06-15 11:36:10,767][1648981] Avg episode reward: [(0, '12.710')] [2024-06-15 11:36:12,801][1651669] Updated weights for policy 0, policy_version 5621 (0.0013) [2024-06-15 11:36:14,773][1651669] Updated weights for policy 0, policy_version 5664 (0.0011) [2024-06-15 11:36:15,766][1648981] Fps is (10 sec: 52432.2, 60 sec: 45875.3, 300 sec: 39547.3). Total num frames: 11665408. Throughput: 0: 11423.3. Samples: 2967552. Policy #0 lag: (min: 31.0, avg: 150.1, max: 287.0) [2024-06-15 11:36:15,767][1648981] Avg episode reward: [(0, '11.310')] [2024-06-15 11:36:17,552][1651669] Updated weights for policy 0, policy_version 5712 (0.0014) [2024-06-15 11:36:17,707][1651274] Signal inference workers to stop experience collection... (300 times) [2024-06-15 11:36:17,736][1651669] InferenceWorker_p0-w0: stopping experience collection (300 times) [2024-06-15 11:36:17,867][1651274] Signal inference workers to resume experience collection... (300 times) [2024-06-15 11:36:17,868][1651669] InferenceWorker_p0-w0: resuming experience collection (300 times) [2024-06-15 11:36:20,126][1651669] Updated weights for policy 0, policy_version 5764 (0.0013) [2024-06-15 11:36:20,768][1648981] Fps is (10 sec: 45868.5, 60 sec: 46420.4, 300 sec: 40210.8). Total num frames: 11862016. Throughput: 0: 11263.6. Samples: 3033088. Policy #0 lag: (min: 19.0, avg: 138.2, max: 275.0) [2024-06-15 11:36:20,768][1648981] Avg episode reward: [(0, '13.460')] [2024-06-15 11:36:21,180][1651274] Saving new best policy, reward=13.460! [2024-06-15 11:36:21,526][1651669] Updated weights for policy 0, policy_version 5824 (0.0034) [2024-06-15 11:36:25,058][1651669] Updated weights for policy 0, policy_version 5887 (0.0011) [2024-06-15 11:36:25,766][1648981] Fps is (10 sec: 39321.2, 60 sec: 44782.9, 300 sec: 40879.9). Total num frames: 12058624. Throughput: 0: 11082.0. Samples: 3063808. Policy #0 lag: (min: 19.0, avg: 138.2, max: 275.0) [2024-06-15 11:36:25,767][1648981] Avg episode reward: [(0, '11.970')] [2024-06-15 11:36:26,871][1651669] Updated weights for policy 0, policy_version 5940 (0.0012) [2024-06-15 11:36:29,890][1651669] Updated weights for policy 0, policy_version 5970 (0.0014) [2024-06-15 11:36:30,769][1648981] Fps is (10 sec: 42592.4, 60 sec: 45873.0, 300 sec: 41653.8). Total num frames: 12288000. Throughput: 0: 11422.6. Samples: 3140096. Policy #0 lag: (min: 29.0, avg: 149.9, max: 285.0) [2024-06-15 11:36:30,770][1648981] Avg episode reward: [(0, '12.110')] [2024-06-15 11:36:32,817][1651669] Updated weights for policy 0, policy_version 6051 (0.0123) [2024-06-15 11:36:35,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 42209.6). Total num frames: 12451840. Throughput: 0: 11127.5. Samples: 3198976. Policy #0 lag: (min: 15.0, avg: 138.4, max: 271.0) [2024-06-15 11:36:35,767][1648981] Avg episode reward: [(0, '12.570')] [2024-06-15 11:36:37,463][1651669] Updated weights for policy 0, policy_version 6144 (0.0045) [2024-06-15 11:36:39,011][1651669] Updated weights for policy 0, policy_version 6202 (0.0013) [2024-06-15 11:36:40,767][1648981] Fps is (10 sec: 42609.3, 60 sec: 44236.7, 300 sec: 43098.2). Total num frames: 12713984. Throughput: 0: 10979.5. Samples: 3226624. Policy #0 lag: (min: 79.0, avg: 167.5, max: 287.0) [2024-06-15 11:36:40,767][1648981] Avg episode reward: [(0, '12.550')] [2024-06-15 11:36:42,999][1651669] Updated weights for policy 0, policy_version 6267 (0.0012) [2024-06-15 11:36:45,433][1651669] Updated weights for policy 0, policy_version 6329 (0.0013) [2024-06-15 11:36:45,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 45875.3, 300 sec: 43993.6). Total num frames: 12976128. Throughput: 0: 10992.8. Samples: 3291136. Policy #0 lag: (min: 15.0, avg: 139.7, max: 271.0) [2024-06-15 11:36:45,768][1648981] Avg episode reward: [(0, '13.520')] [2024-06-15 11:36:45,769][1651274] Saving new best policy, reward=13.520! [2024-06-15 11:36:49,499][1651669] Updated weights for policy 0, policy_version 6369 (0.0013) [2024-06-15 11:36:50,766][1648981] Fps is (10 sec: 42600.4, 60 sec: 44236.8, 300 sec: 44431.2). Total num frames: 13139968. Throughput: 0: 10900.1. Samples: 3358208. Policy #0 lag: (min: 47.0, avg: 150.5, max: 303.0) [2024-06-15 11:36:50,767][1648981] Avg episode reward: [(0, '13.110')] [2024-06-15 11:36:50,797][1651669] Updated weights for policy 0, policy_version 6432 (0.0011) [2024-06-15 11:36:54,926][1651669] Updated weights for policy 0, policy_version 6503 (0.0013) [2024-06-15 11:36:55,767][1648981] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 43987.3). Total num frames: 13369344. Throughput: 0: 11047.8. Samples: 3394560. Policy #0 lag: (min: 47.0, avg: 150.5, max: 303.0) [2024-06-15 11:36:55,767][1648981] Avg episode reward: [(0, '12.380')] [2024-06-15 11:36:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000006528_13369344.pth... [2024-06-15 11:36:55,941][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000001472_3014656.pth [2024-06-15 11:36:57,135][1651669] Updated weights for policy 0, policy_version 6582 (0.0015) [2024-06-15 11:37:00,674][1651669] Updated weights for policy 0, policy_version 6612 (0.0011) [2024-06-15 11:37:00,766][1648981] Fps is (10 sec: 39321.2, 60 sec: 42603.9, 300 sec: 43875.8). Total num frames: 13533184. Throughput: 0: 10922.6. Samples: 3459072. Policy #0 lag: (min: 31.0, avg: 162.4, max: 287.0) [2024-06-15 11:37:00,767][1648981] Avg episode reward: [(0, '14.150')] [2024-06-15 11:37:01,198][1651274] Saving new best policy, reward=14.150! [2024-06-15 11:37:02,542][1651669] Updated weights for policy 0, policy_version 6692 (0.0013) [2024-06-15 11:37:05,767][1648981] Fps is (10 sec: 39321.8, 60 sec: 43690.9, 300 sec: 43655.7). Total num frames: 13762560. Throughput: 0: 11048.2. Samples: 3530240. Policy #0 lag: (min: 99.0, avg: 194.0, max: 342.0) [2024-06-15 11:37:05,768][1648981] Avg episode reward: [(0, '15.570')] [2024-06-15 11:37:05,769][1651274] Saving new best policy, reward=15.570! [2024-06-15 11:37:06,538][1651274] Signal inference workers to stop experience collection... (350 times) [2024-06-15 11:37:06,577][1651669] InferenceWorker_p0-w0: stopping experience collection (350 times) [2024-06-15 11:37:06,858][1651274] Signal inference workers to resume experience collection... (350 times) [2024-06-15 11:37:06,859][1651669] InferenceWorker_p0-w0: resuming experience collection (350 times) [2024-06-15 11:37:07,308][1651669] Updated weights for policy 0, policy_version 6752 (0.0014) [2024-06-15 11:37:09,158][1651669] Updated weights for policy 0, policy_version 6818 (0.0014) [2024-06-15 11:37:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 43986.9). Total num frames: 14024704. Throughput: 0: 10888.5. Samples: 3553792. Policy #0 lag: (min: 111.0, avg: 182.7, max: 303.0) [2024-06-15 11:37:10,767][1648981] Avg episode reward: [(0, '15.480')] [2024-06-15 11:37:13,651][1651669] Updated weights for policy 0, policy_version 6882 (0.0012) [2024-06-15 11:37:15,222][1651669] Updated weights for policy 0, policy_version 6944 (0.0014) [2024-06-15 11:37:15,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 43144.4, 300 sec: 43875.8). Total num frames: 14254080. Throughput: 0: 10582.0. Samples: 3616256. Policy #0 lag: (min: 111.0, avg: 182.7, max: 303.0) [2024-06-15 11:37:15,767][1648981] Avg episode reward: [(0, '14.380')] [2024-06-15 11:37:15,900][1651669] Updated weights for policy 0, policy_version 6976 (0.0021) [2024-06-15 11:37:20,320][1651669] Updated weights for policy 0, policy_version 7040 (0.0013) [2024-06-15 11:37:20,767][1648981] Fps is (10 sec: 39320.4, 60 sec: 42599.3, 300 sec: 43542.5). Total num frames: 14417920. Throughput: 0: 10786.1. Samples: 3684352. Policy #0 lag: (min: 98.0, avg: 199.9, max: 339.0) [2024-06-15 11:37:20,767][1648981] Avg episode reward: [(0, '14.120')] [2024-06-15 11:37:21,769][1651669] Updated weights for policy 0, policy_version 7099 (0.0136) [2024-06-15 11:37:25,766][1648981] Fps is (10 sec: 32768.0, 60 sec: 42052.2, 300 sec: 43215.8). Total num frames: 14581760. Throughput: 0: 10865.9. Samples: 3715584. Policy #0 lag: (min: 15.0, avg: 101.4, max: 239.0) [2024-06-15 11:37:25,767][1648981] Avg episode reward: [(0, '13.830')] [2024-06-15 11:37:26,373][1651669] Updated weights for policy 0, policy_version 7152 (0.0011) [2024-06-15 11:37:27,852][1651669] Updated weights for policy 0, policy_version 7224 (0.0013) [2024-06-15 11:37:30,766][1648981] Fps is (10 sec: 39323.0, 60 sec: 42054.4, 300 sec: 43098.3). Total num frames: 14811136. Throughput: 0: 10911.4. Samples: 3782144. Policy #0 lag: (min: 15.0, avg: 101.4, max: 239.0) [2024-06-15 11:37:30,767][1648981] Avg episode reward: [(0, '14.460')] [2024-06-15 11:37:31,682][1651669] Updated weights for policy 0, policy_version 7280 (0.0023) [2024-06-15 11:37:35,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 43690.7, 300 sec: 43320.5). Total num frames: 15073280. Throughput: 0: 10820.3. Samples: 3845120. Policy #0 lag: (min: 58.0, avg: 151.8, max: 298.0) [2024-06-15 11:37:35,767][1648981] Avg episode reward: [(0, '15.520')] [2024-06-15 11:37:37,589][1651669] Updated weights for policy 0, policy_version 7363 (0.0014) [2024-06-15 11:37:39,292][1651669] Updated weights for policy 0, policy_version 7440 (0.0019) [2024-06-15 11:37:40,213][1651669] Updated weights for policy 0, policy_version 7487 (0.0019) [2024-06-15 11:37:40,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 43690.9, 300 sec: 43209.3). Total num frames: 15335424. Throughput: 0: 10797.6. Samples: 3880448. Policy #0 lag: (min: 47.0, avg: 129.4, max: 303.0) [2024-06-15 11:37:40,767][1648981] Avg episode reward: [(0, '13.720')] [2024-06-15 11:37:43,829][1651669] Updated weights for policy 0, policy_version 7553 (0.0016) [2024-06-15 11:37:45,103][1651669] Updated weights for policy 0, policy_version 7616 (0.0012) [2024-06-15 11:37:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 43690.8, 300 sec: 43875.8). Total num frames: 15597568. Throughput: 0: 10752.0. Samples: 3942912. Policy #0 lag: (min: 47.0, avg: 129.4, max: 303.0) [2024-06-15 11:37:45,767][1648981] Avg episode reward: [(0, '15.050')] [2024-06-15 11:37:49,419][1651274] Signal inference workers to stop experience collection... (400 times) [2024-06-15 11:37:49,461][1651669] InferenceWorker_p0-w0: stopping experience collection (400 times) [2024-06-15 11:37:49,597][1651274] Signal inference workers to resume experience collection... (400 times) [2024-06-15 11:37:49,598][1651669] InferenceWorker_p0-w0: resuming experience collection (400 times) [2024-06-15 11:37:49,975][1651669] Updated weights for policy 0, policy_version 7680 (0.0014) [2024-06-15 11:37:50,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 44236.7, 300 sec: 43321.4). Total num frames: 15794176. Throughput: 0: 10740.6. Samples: 4013568. Policy #0 lag: (min: 15.0, avg: 102.7, max: 271.0) [2024-06-15 11:37:50,767][1648981] Avg episode reward: [(0, '15.490')] [2024-06-15 11:37:53,447][1651669] Updated weights for policy 0, policy_version 7746 (0.0014) [2024-06-15 11:37:54,783][1651669] Updated weights for policy 0, policy_version 7801 (0.0036) [2024-06-15 11:37:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 44783.1, 300 sec: 43764.7). Total num frames: 16056320. Throughput: 0: 11161.6. Samples: 4056064. Policy #0 lag: (min: 15.0, avg: 102.7, max: 271.0) [2024-06-15 11:37:55,767][1648981] Avg episode reward: [(0, '15.820')] [2024-06-15 11:37:56,280][1651669] Updated weights for policy 0, policy_version 7867 (0.0012) [2024-06-15 11:37:56,340][1651274] Saving new best policy, reward=15.820! [2024-06-15 11:38:00,769][1648981] Fps is (10 sec: 42589.0, 60 sec: 44781.2, 300 sec: 43542.2). Total num frames: 16220160. Throughput: 0: 11308.9. Samples: 4125184. Policy #0 lag: (min: 63.0, avg: 183.6, max: 303.0) [2024-06-15 11:38:00,770][1648981] Avg episode reward: [(0, '16.820')] [2024-06-15 11:38:00,822][1651669] Updated weights for policy 0, policy_version 7936 (0.0012) [2024-06-15 11:38:01,274][1651274] Saving new best policy, reward=16.820! [2024-06-15 11:38:02,363][1651669] Updated weights for policy 0, policy_version 7995 (0.0020) [2024-06-15 11:38:05,648][1651669] Updated weights for policy 0, policy_version 8034 (0.0012) [2024-06-15 11:38:05,774][1648981] Fps is (10 sec: 39291.1, 60 sec: 44777.3, 300 sec: 43321.0). Total num frames: 16449536. Throughput: 0: 11353.1. Samples: 4195328. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 11:38:05,775][1648981] Avg episode reward: [(0, '14.640')] [2024-06-15 11:38:07,291][1651669] Updated weights for policy 0, policy_version 8117 (0.0012) [2024-06-15 11:38:10,766][1648981] Fps is (10 sec: 42608.1, 60 sec: 43690.6, 300 sec: 43764.7). Total num frames: 16646144. Throughput: 0: 11320.9. Samples: 4225024. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 11:38:10,767][1648981] Avg episode reward: [(0, '16.090')] [2024-06-15 11:38:11,498][1651669] Updated weights for policy 0, policy_version 8163 (0.0013) [2024-06-15 11:38:12,289][1651669] Updated weights for policy 0, policy_version 8193 (0.0011) [2024-06-15 11:38:13,514][1651669] Updated weights for policy 0, policy_version 8253 (0.0013) [2024-06-15 11:38:15,767][1648981] Fps is (10 sec: 45910.0, 60 sec: 44236.7, 300 sec: 43542.6). Total num frames: 16908288. Throughput: 0: 11377.7. Samples: 4294144. Policy #0 lag: (min: 18.0, avg: 144.6, max: 287.0) [2024-06-15 11:38:15,767][1648981] Avg episode reward: [(0, '16.550')] [2024-06-15 11:38:17,014][1651669] Updated weights for policy 0, policy_version 8308 (0.0090) [2024-06-15 11:38:18,358][1651669] Updated weights for policy 0, policy_version 8354 (0.0013) [2024-06-15 11:38:20,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 45875.5, 300 sec: 43986.9). Total num frames: 17170432. Throughput: 0: 11673.6. Samples: 4370432. Policy #0 lag: (min: 18.0, avg: 144.6, max: 287.0) [2024-06-15 11:38:20,767][1648981] Avg episode reward: [(0, '17.450')] [2024-06-15 11:38:21,104][1651274] Saving new best policy, reward=17.450! [2024-06-15 11:38:21,502][1651669] Updated weights for policy 0, policy_version 8416 (0.0012) [2024-06-15 11:38:25,168][1651669] Updated weights for policy 0, policy_version 8471 (0.0014) [2024-06-15 11:38:25,767][1648981] Fps is (10 sec: 49152.0, 60 sec: 46967.3, 300 sec: 43875.7). Total num frames: 17399808. Throughput: 0: 11537.0. Samples: 4399616. Policy #0 lag: (min: 15.0, avg: 118.7, max: 271.0) [2024-06-15 11:38:25,768][1648981] Avg episode reward: [(0, '17.000')] [2024-06-15 11:38:25,959][1651669] Updated weights for policy 0, policy_version 8511 (0.0016) [2024-06-15 11:38:28,331][1651669] Updated weights for policy 0, policy_version 8574 (0.0015) [2024-06-15 11:38:30,112][1651274] Signal inference workers to stop experience collection... (450 times) [2024-06-15 11:38:30,257][1651669] InferenceWorker_p0-w0: stopping experience collection (450 times) [2024-06-15 11:38:30,430][1651274] Signal inference workers to resume experience collection... (450 times) [2024-06-15 11:38:30,430][1651669] InferenceWorker_p0-w0: resuming experience collection (450 times) [2024-06-15 11:38:30,647][1651669] Updated weights for policy 0, policy_version 8632 (0.0123) [2024-06-15 11:38:30,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 47513.4, 300 sec: 43875.8). Total num frames: 17661952. Throughput: 0: 11719.1. Samples: 4470272. Policy #0 lag: (min: 55.0, avg: 156.8, max: 311.0) [2024-06-15 11:38:30,767][1648981] Avg episode reward: [(0, '16.430')] [2024-06-15 11:38:33,463][1651669] Updated weights for policy 0, policy_version 8688 (0.0012) [2024-06-15 11:38:35,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 45875.1, 300 sec: 44321.5). Total num frames: 17825792. Throughput: 0: 11559.8. Samples: 4533760. Policy #0 lag: (min: 55.0, avg: 156.8, max: 311.0) [2024-06-15 11:38:35,767][1648981] Avg episode reward: [(0, '17.400')] [2024-06-15 11:38:37,916][1651669] Updated weights for policy 0, policy_version 8761 (0.0011) [2024-06-15 11:38:39,953][1651669] Updated weights for policy 0, policy_version 8801 (0.0011) [2024-06-15 11:38:40,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 45875.3, 300 sec: 44098.0). Total num frames: 18087936. Throughput: 0: 11411.9. Samples: 4569600. Policy #0 lag: (min: 15.0, avg: 112.6, max: 271.0) [2024-06-15 11:38:40,767][1648981] Avg episode reward: [(0, '17.820')] [2024-06-15 11:38:40,771][1651274] Saving new best policy, reward=17.820! [2024-06-15 11:38:41,226][1651669] Updated weights for policy 0, policy_version 8836 (0.0018) [2024-06-15 11:38:42,382][1651669] Updated weights for policy 0, policy_version 8892 (0.0013) [2024-06-15 11:38:44,529][1651669] Updated weights for policy 0, policy_version 8929 (0.0019) [2024-06-15 11:38:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 44432.0). Total num frames: 18350080. Throughput: 0: 11355.6. Samples: 4636160. Policy #0 lag: (min: 47.0, avg: 176.4, max: 303.0) [2024-06-15 11:38:45,767][1648981] Avg episode reward: [(0, '18.410')] [2024-06-15 11:38:45,768][1651274] Saving new best policy, reward=18.410! [2024-06-15 11:38:49,120][1651669] Updated weights for policy 0, policy_version 8992 (0.0013) [2024-06-15 11:38:49,939][1651669] Updated weights for policy 0, policy_version 9024 (0.0009) [2024-06-15 11:38:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45329.2, 300 sec: 44099.8). Total num frames: 18513920. Throughput: 0: 11402.5. Samples: 4708352. Policy #0 lag: (min: 47.0, avg: 176.4, max: 303.0) [2024-06-15 11:38:50,767][1648981] Avg episode reward: [(0, '17.720')] [2024-06-15 11:38:51,684][1651669] Updated weights for policy 0, policy_version 9076 (0.0013) [2024-06-15 11:38:52,928][1651669] Updated weights for policy 0, policy_version 9109 (0.0016) [2024-06-15 11:38:54,966][1651669] Updated weights for policy 0, policy_version 9156 (0.0014) [2024-06-15 11:38:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 44209.1). Total num frames: 18808832. Throughput: 0: 11446.0. Samples: 4740096. Policy #0 lag: (min: 15.0, avg: 144.3, max: 271.0) [2024-06-15 11:38:55,767][1648981] Avg episode reward: [(0, '16.200')] [2024-06-15 11:38:56,053][1651669] Updated weights for policy 0, policy_version 9203 (0.0144) [2024-06-15 11:38:56,207][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000009216_18874368.pth... [2024-06-15 11:38:56,276][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000003888_7962624.pth [2024-06-15 11:39:00,784][1648981] Fps is (10 sec: 42521.0, 60 sec: 45317.1, 300 sec: 44317.4). Total num frames: 18939904. Throughput: 0: 11509.7. Samples: 4812288. Policy #0 lag: (min: 27.0, avg: 118.2, max: 283.0) [2024-06-15 11:39:00,785][1648981] Avg episode reward: [(0, '17.900')] [2024-06-15 11:39:00,971][1651669] Updated weights for policy 0, policy_version 9264 (0.0012) [2024-06-15 11:39:02,853][1651669] Updated weights for policy 0, policy_version 9315 (0.0093) [2024-06-15 11:39:04,525][1651669] Updated weights for policy 0, policy_version 9348 (0.0016) [2024-06-15 11:39:05,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46427.4, 300 sec: 44655.4). Total num frames: 19234816. Throughput: 0: 11252.6. Samples: 4876800. Policy #0 lag: (min: 27.0, avg: 118.2, max: 283.0) [2024-06-15 11:39:05,767][1648981] Avg episode reward: [(0, '19.100')] [2024-06-15 11:39:05,788][1651669] Updated weights for policy 0, policy_version 9405 (0.0021) [2024-06-15 11:39:05,851][1651274] Saving new best policy, reward=19.100! [2024-06-15 11:39:07,702][1651669] Updated weights for policy 0, policy_version 9465 (0.0014) [2024-06-15 11:39:10,767][1648981] Fps is (10 sec: 45957.5, 60 sec: 45875.1, 300 sec: 44431.2). Total num frames: 19398656. Throughput: 0: 11264.0. Samples: 4906496. Policy #0 lag: (min: 27.0, avg: 118.2, max: 283.0) [2024-06-15 11:39:10,768][1648981] Avg episode reward: [(0, '18.240')] [2024-06-15 11:39:13,274][1651669] Updated weights for policy 0, policy_version 9523 (0.0020) [2024-06-15 11:39:14,113][1651669] Updated weights for policy 0, policy_version 9568 (0.0012) [2024-06-15 11:39:15,774][1648981] Fps is (10 sec: 42564.9, 60 sec: 45869.4, 300 sec: 44430.0). Total num frames: 19660800. Throughput: 0: 11205.2. Samples: 4974592. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 11:39:15,775][1648981] Avg episode reward: [(0, '18.990')] [2024-06-15 11:39:17,548][1651669] Updated weights for policy 0, policy_version 9648 (0.0014) [2024-06-15 11:39:18,315][1651274] Signal inference workers to stop experience collection... (500 times) [2024-06-15 11:39:18,362][1651669] InferenceWorker_p0-w0: stopping experience collection (500 times) [2024-06-15 11:39:18,547][1651274] Signal inference workers to resume experience collection... (500 times) [2024-06-15 11:39:18,548][1651669] InferenceWorker_p0-w0: resuming experience collection (500 times) [2024-06-15 11:39:19,381][1651669] Updated weights for policy 0, policy_version 9714 (0.0109) [2024-06-15 11:39:20,773][1648981] Fps is (10 sec: 52394.2, 60 sec: 45869.9, 300 sec: 44430.2). Total num frames: 19922944. Throughput: 0: 11262.3. Samples: 5040640. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 11:39:20,775][1648981] Avg episode reward: [(0, '20.300')] [2024-06-15 11:39:20,776][1651274] Saving new best policy, reward=20.300! [2024-06-15 11:39:24,985][1651669] Updated weights for policy 0, policy_version 9761 (0.0013) [2024-06-15 11:39:25,766][1648981] Fps is (10 sec: 42631.5, 60 sec: 44783.1, 300 sec: 44764.6). Total num frames: 20086784. Throughput: 0: 11309.5. Samples: 5078528. Policy #0 lag: (min: 2.0, avg: 80.5, max: 258.0) [2024-06-15 11:39:25,767][1648981] Avg episode reward: [(0, '20.620')] [2024-06-15 11:39:25,946][1651274] Saving new best policy, reward=20.620! [2024-06-15 11:39:26,619][1651669] Updated weights for policy 0, policy_version 9856 (0.0013) [2024-06-15 11:39:30,384][1651669] Updated weights for policy 0, policy_version 9936 (0.0112) [2024-06-15 11:39:30,775][1648981] Fps is (10 sec: 45865.2, 60 sec: 45322.4, 300 sec: 44874.1). Total num frames: 20381696. Throughput: 0: 11307.3. Samples: 5145088. Policy #0 lag: (min: 2.0, avg: 80.5, max: 258.0) [2024-06-15 11:39:30,776][1648981] Avg episode reward: [(0, '19.280')] [2024-06-15 11:39:31,244][1651669] Updated weights for policy 0, policy_version 9982 (0.0012) [2024-06-15 11:39:35,767][1648981] Fps is (10 sec: 36044.1, 60 sec: 43690.5, 300 sec: 44431.2). Total num frames: 20447232. Throughput: 0: 11195.7. Samples: 5212160. Policy #0 lag: (min: 2.0, avg: 80.5, max: 258.0) [2024-06-15 11:39:35,768][1648981] Avg episode reward: [(0, '19.150')] [2024-06-15 11:39:36,969][1651669] Updated weights for policy 0, policy_version 10036 (0.0014) [2024-06-15 11:39:38,508][1651669] Updated weights for policy 0, policy_version 10101 (0.0013) [2024-06-15 11:39:40,778][1648981] Fps is (10 sec: 32759.0, 60 sec: 43682.1, 300 sec: 44429.5). Total num frames: 20709376. Throughput: 0: 11090.5. Samples: 5239296. Policy #0 lag: (min: 15.0, avg: 97.6, max: 271.0) [2024-06-15 11:39:40,779][1648981] Avg episode reward: [(0, '22.980')] [2024-06-15 11:39:41,409][1651274] Saving new best policy, reward=22.980! [2024-06-15 11:39:42,077][1651669] Updated weights for policy 0, policy_version 10170 (0.0014) [2024-06-15 11:39:43,713][1651669] Updated weights for policy 0, policy_version 10240 (0.0013) [2024-06-15 11:39:45,771][1648981] Fps is (10 sec: 52404.7, 60 sec: 43687.2, 300 sec: 44430.5). Total num frames: 20971520. Throughput: 0: 10891.8. Samples: 5302272. Policy #0 lag: (min: 15.0, avg: 97.6, max: 271.0) [2024-06-15 11:39:45,772][1648981] Avg episode reward: [(0, '23.750')] [2024-06-15 11:39:45,773][1651274] Saving new best policy, reward=23.750! [2024-06-15 11:39:49,397][1651669] Updated weights for policy 0, policy_version 10309 (0.0014) [2024-06-15 11:39:50,669][1651669] Updated weights for policy 0, policy_version 10364 (0.0011) [2024-06-15 11:39:50,766][1648981] Fps is (10 sec: 52490.8, 60 sec: 45329.0, 300 sec: 45208.8). Total num frames: 21233664. Throughput: 0: 10854.4. Samples: 5365248. Policy #0 lag: (min: 2.0, avg: 88.8, max: 258.0) [2024-06-15 11:39:50,767][1648981] Avg episode reward: [(0, '20.910')] [2024-06-15 11:39:54,698][1651669] Updated weights for policy 0, policy_version 10436 (0.0014) [2024-06-15 11:39:55,767][1648981] Fps is (10 sec: 52453.8, 60 sec: 44782.9, 300 sec: 45097.6). Total num frames: 21495808. Throughput: 0: 11047.9. Samples: 5403648. Policy #0 lag: (min: 2.0, avg: 88.8, max: 258.0) [2024-06-15 11:39:55,767][1648981] Avg episode reward: [(0, '21.460')] [2024-06-15 11:40:00,595][1651669] Updated weights for policy 0, policy_version 10512 (0.0023) [2024-06-15 11:40:00,766][1648981] Fps is (10 sec: 29491.5, 60 sec: 43157.7, 300 sec: 44542.3). Total num frames: 21528576. Throughput: 0: 10947.4. Samples: 5467136. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 11:40:00,767][1648981] Avg episode reward: [(0, '22.890')] [2024-06-15 11:40:02,503][1651274] Signal inference workers to stop experience collection... (550 times) [2024-06-15 11:40:02,555][1651669] InferenceWorker_p0-w0: stopping experience collection (550 times) [2024-06-15 11:40:02,580][1651669] Updated weights for policy 0, policy_version 10577 (0.0015) [2024-06-15 11:40:02,858][1651274] Signal inference workers to resume experience collection... (550 times) [2024-06-15 11:40:02,859][1651669] InferenceWorker_p0-w0: resuming experience collection (550 times) [2024-06-15 11:40:05,654][1651669] Updated weights for policy 0, policy_version 10640 (0.0021) [2024-06-15 11:40:05,766][1648981] Fps is (10 sec: 29491.6, 60 sec: 42598.4, 300 sec: 44542.3). Total num frames: 21790720. Throughput: 0: 10890.2. Samples: 5530624. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 11:40:05,767][1648981] Avg episode reward: [(0, '24.300')] [2024-06-15 11:40:06,347][1651274] Saving new best policy, reward=24.300! [2024-06-15 11:40:07,226][1651669] Updated weights for policy 0, policy_version 10706 (0.0017) [2024-06-15 11:40:10,771][1648981] Fps is (10 sec: 49131.3, 60 sec: 43687.9, 300 sec: 44430.6). Total num frames: 22020096. Throughput: 0: 10682.8. Samples: 5559296. Policy #0 lag: (min: 15.0, avg: 94.0, max: 271.0) [2024-06-15 11:40:10,773][1648981] Avg episode reward: [(0, '24.560')] [2024-06-15 11:40:10,789][1651274] Saving new best policy, reward=24.560! [2024-06-15 11:40:12,460][1651669] Updated weights for policy 0, policy_version 10755 (0.0013) [2024-06-15 11:40:13,912][1651669] Updated weights for policy 0, policy_version 10820 (0.0012) [2024-06-15 11:40:15,190][1651669] Updated weights for policy 0, policy_version 10874 (0.0020) [2024-06-15 11:40:15,786][1648981] Fps is (10 sec: 49054.1, 60 sec: 43681.9, 300 sec: 44761.5). Total num frames: 22282240. Throughput: 0: 10760.8. Samples: 5629440. Policy #0 lag: (min: 15.0, avg: 88.3, max: 255.0) [2024-06-15 11:40:15,787][1648981] Avg episode reward: [(0, '23.050')] [2024-06-15 11:40:17,326][1651669] Updated weights for policy 0, policy_version 10944 (0.0013) [2024-06-15 11:40:19,230][1651669] Updated weights for policy 0, policy_version 11002 (0.0013) [2024-06-15 11:40:20,806][1648981] Fps is (10 sec: 52241.5, 60 sec: 43666.6, 300 sec: 44647.3). Total num frames: 22544384. Throughput: 0: 10719.8. Samples: 5694976. Policy #0 lag: (min: 15.0, avg: 88.3, max: 255.0) [2024-06-15 11:40:20,807][1648981] Avg episode reward: [(0, '21.860')] [2024-06-15 11:40:25,681][1651669] Updated weights for policy 0, policy_version 11072 (0.0014) [2024-06-15 11:40:25,766][1648981] Fps is (10 sec: 39399.8, 60 sec: 43144.5, 300 sec: 44542.3). Total num frames: 22675456. Throughput: 0: 10959.7. Samples: 5732352. Policy #0 lag: (min: 12.0, avg: 74.9, max: 268.0) [2024-06-15 11:40:25,767][1648981] Avg episode reward: [(0, '22.060')] [2024-06-15 11:40:27,097][1651669] Updated weights for policy 0, policy_version 11132 (0.0014) [2024-06-15 11:40:28,895][1651669] Updated weights for policy 0, policy_version 11184 (0.0041) [2024-06-15 11:40:30,481][1651669] Updated weights for policy 0, policy_version 11232 (0.0013) [2024-06-15 11:40:30,766][1648981] Fps is (10 sec: 46059.4, 60 sec: 43697.3, 300 sec: 44764.5). Total num frames: 23003136. Throughput: 0: 10935.2. Samples: 5794304. Policy #0 lag: (min: 12.0, avg: 74.9, max: 268.0) [2024-06-15 11:40:30,767][1648981] Avg episode reward: [(0, '22.560')] [2024-06-15 11:40:35,790][1648981] Fps is (10 sec: 39228.2, 60 sec: 43673.5, 300 sec: 44094.4). Total num frames: 23068672. Throughput: 0: 11167.1. Samples: 5868032. Policy #0 lag: (min: 12.0, avg: 74.9, max: 268.0) [2024-06-15 11:40:35,791][1648981] Avg episode reward: [(0, '23.530')] [2024-06-15 11:40:36,842][1651669] Updated weights for policy 0, policy_version 11296 (0.0014) [2024-06-15 11:40:38,495][1651669] Updated weights for policy 0, policy_version 11380 (0.0014) [2024-06-15 11:40:40,602][1651669] Updated weights for policy 0, policy_version 11451 (0.0012) [2024-06-15 11:40:40,770][1648981] Fps is (10 sec: 45859.5, 60 sec: 45881.6, 300 sec: 44875.0). Total num frames: 23461888. Throughput: 0: 10921.9. Samples: 5895168. Policy #0 lag: (min: 10.0, avg: 77.8, max: 266.0) [2024-06-15 11:40:40,772][1648981] Avg episode reward: [(0, '23.020')] [2024-06-15 11:40:42,071][1651669] Updated weights for policy 0, policy_version 11492 (0.0012) [2024-06-15 11:40:45,766][1648981] Fps is (10 sec: 52553.9, 60 sec: 43694.2, 300 sec: 44431.2). Total num frames: 23592960. Throughput: 0: 11081.9. Samples: 5965824. Policy #0 lag: (min: 10.0, avg: 77.8, max: 266.0) [2024-06-15 11:40:45,767][1648981] Avg episode reward: [(0, '24.920')] [2024-06-15 11:40:45,768][1651274] Saving new best policy, reward=24.920! [2024-06-15 11:40:47,991][1651669] Updated weights for policy 0, policy_version 11541 (0.0013) [2024-06-15 11:40:48,283][1651274] Signal inference workers to stop experience collection... (600 times) [2024-06-15 11:40:48,346][1651669] InferenceWorker_p0-w0: stopping experience collection (600 times) [2024-06-15 11:40:48,494][1651274] Signal inference workers to resume experience collection... (600 times) [2024-06-15 11:40:48,495][1651669] InferenceWorker_p0-w0: resuming experience collection (600 times) [2024-06-15 11:40:49,940][1651669] Updated weights for policy 0, policy_version 11636 (0.0014) [2024-06-15 11:40:50,766][1648981] Fps is (10 sec: 39334.9, 60 sec: 43690.7, 300 sec: 44542.3). Total num frames: 23855104. Throughput: 0: 11252.6. Samples: 6036992. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 11:40:50,767][1648981] Avg episode reward: [(0, '25.430')] [2024-06-15 11:40:51,020][1651274] Saving new best policy, reward=25.430! [2024-06-15 11:40:52,000][1651669] Updated weights for policy 0, policy_version 11705 (0.0078) [2024-06-15 11:40:53,219][1651669] Updated weights for policy 0, policy_version 11735 (0.0019) [2024-06-15 11:40:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 44543.4). Total num frames: 24117248. Throughput: 0: 11208.1. Samples: 6063616. Policy #0 lag: (min: 15.0, avg: 76.3, max: 271.0) [2024-06-15 11:40:55,767][1648981] Avg episode reward: [(0, '24.900')] [2024-06-15 11:40:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000011776_24117248.pth... [2024-06-15 11:40:55,869][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000006528_13369344.pth [2024-06-15 11:40:59,666][1651669] Updated weights for policy 0, policy_version 11796 (0.0016) [2024-06-15 11:41:00,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45329.0, 300 sec: 44431.3). Total num frames: 24248320. Throughput: 0: 11451.1. Samples: 6144512. Policy #0 lag: (min: 7.0, avg: 70.8, max: 263.0) [2024-06-15 11:41:00,767][1648981] Avg episode reward: [(0, '25.980')] [2024-06-15 11:41:00,948][1651669] Updated weights for policy 0, policy_version 11856 (0.0013) [2024-06-15 11:41:01,369][1651274] Saving new best policy, reward=25.980! [2024-06-15 11:41:01,997][1651669] Updated weights for policy 0, policy_version 11904 (0.0013) [2024-06-15 11:41:04,563][1651669] Updated weights for policy 0, policy_version 11985 (0.0011) [2024-06-15 11:41:05,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 47513.6, 300 sec: 44875.5). Total num frames: 24641536. Throughput: 0: 11160.1. Samples: 6196736. Policy #0 lag: (min: 7.0, avg: 70.8, max: 263.0) [2024-06-15 11:41:05,767][1648981] Avg episode reward: [(0, '26.170')] [2024-06-15 11:41:05,768][1651274] Saving new best policy, reward=26.170! [2024-06-15 11:41:10,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 43693.6, 300 sec: 43986.9). Total num frames: 24641536. Throughput: 0: 11184.4. Samples: 6235648. Policy #0 lag: (min: 7.0, avg: 70.8, max: 263.0) [2024-06-15 11:41:10,767][1648981] Avg episode reward: [(0, '26.880')] [2024-06-15 11:41:10,785][1651274] Saving new best policy, reward=26.880! [2024-06-15 11:41:11,382][1651669] Updated weights for policy 0, policy_version 12039 (0.0012) [2024-06-15 11:41:12,556][1651669] Updated weights for policy 0, policy_version 12091 (0.0010) [2024-06-15 11:41:13,985][1651669] Updated weights for policy 0, policy_version 12133 (0.0011) [2024-06-15 11:41:15,477][1651669] Updated weights for policy 0, policy_version 12192 (0.0012) [2024-06-15 11:41:15,766][1648981] Fps is (10 sec: 32767.5, 60 sec: 44797.7, 300 sec: 44431.4). Total num frames: 24969216. Throughput: 0: 11366.4. Samples: 6305792. Policy #0 lag: (min: 14.0, avg: 77.2, max: 270.0) [2024-06-15 11:41:15,767][1648981] Avg episode reward: [(0, '27.220')] [2024-06-15 11:41:16,341][1651274] Saving new best policy, reward=27.220! [2024-06-15 11:41:17,549][1651669] Updated weights for policy 0, policy_version 12257 (0.0012) [2024-06-15 11:41:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 43719.8, 300 sec: 44431.2). Total num frames: 25165824. Throughput: 0: 10985.4. Samples: 6362112. Policy #0 lag: (min: 14.0, avg: 77.2, max: 270.0) [2024-06-15 11:41:20,767][1648981] Avg episode reward: [(0, '27.380')] [2024-06-15 11:41:20,768][1651274] Saving new best policy, reward=27.380! [2024-06-15 11:41:23,594][1651669] Updated weights for policy 0, policy_version 12306 (0.0012) [2024-06-15 11:41:25,766][1648981] Fps is (10 sec: 32768.3, 60 sec: 43690.7, 300 sec: 44098.4). Total num frames: 25296896. Throughput: 0: 11185.2. Samples: 6398464. Policy #0 lag: (min: 15.0, avg: 88.8, max: 271.0) [2024-06-15 11:41:25,767][1648981] Avg episode reward: [(0, '27.820')] [2024-06-15 11:41:25,794][1651669] Updated weights for policy 0, policy_version 12356 (0.0012) [2024-06-15 11:41:26,404][1651274] Saving new best policy, reward=27.820! [2024-06-15 11:41:27,902][1651669] Updated weights for policy 0, policy_version 12434 (0.0043) [2024-06-15 11:41:29,117][1651274] Signal inference workers to stop experience collection... (650 times) [2024-06-15 11:41:29,184][1651669] InferenceWorker_p0-w0: stopping experience collection (650 times) [2024-06-15 11:41:29,434][1651274] Signal inference workers to resume experience collection... (650 times) [2024-06-15 11:41:29,435][1651669] InferenceWorker_p0-w0: resuming experience collection (650 times) [2024-06-15 11:41:29,437][1651669] Updated weights for policy 0, policy_version 12496 (0.0011) [2024-06-15 11:41:30,717][1651669] Updated weights for policy 0, policy_version 12543 (0.0015) [2024-06-15 11:41:30,802][1648981] Fps is (10 sec: 52245.0, 60 sec: 44756.7, 300 sec: 44870.1). Total num frames: 25690112. Throughput: 0: 10936.9. Samples: 6458368. Policy #0 lag: (min: 15.0, avg: 88.8, max: 271.0) [2024-06-15 11:41:30,802][1648981] Avg episode reward: [(0, '28.790')] [2024-06-15 11:41:30,803][1651274] Saving new best policy, reward=28.790! [2024-06-15 11:41:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 44254.4, 300 sec: 44098.0). Total num frames: 25722880. Throughput: 0: 10945.4. Samples: 6529536. Policy #0 lag: (min: 7.0, avg: 89.0, max: 263.0) [2024-06-15 11:41:35,767][1648981] Avg episode reward: [(0, '31.390')] [2024-06-15 11:41:36,301][1651274] Saving new best policy, reward=31.390! [2024-06-15 11:41:36,455][1651669] Updated weights for policy 0, policy_version 12595 (0.0012) [2024-06-15 11:41:38,511][1651669] Updated weights for policy 0, policy_version 12656 (0.0013) [2024-06-15 11:41:40,271][1651669] Updated weights for policy 0, policy_version 12710 (0.0013) [2024-06-15 11:41:40,774][1648981] Fps is (10 sec: 36143.1, 60 sec: 43141.2, 300 sec: 44318.9). Total num frames: 26050560. Throughput: 0: 11125.5. Samples: 6564352. Policy #0 lag: (min: 7.0, avg: 89.0, max: 263.0) [2024-06-15 11:41:40,775][1648981] Avg episode reward: [(0, '30.680')] [2024-06-15 11:41:41,928][1651669] Updated weights for policy 0, policy_version 12768 (0.0030) [2024-06-15 11:41:45,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 43690.7, 300 sec: 44320.1). Total num frames: 26214400. Throughput: 0: 10581.3. Samples: 6620672. Policy #0 lag: (min: 7.0, avg: 89.0, max: 263.0) [2024-06-15 11:41:45,767][1648981] Avg episode reward: [(0, '30.560')] [2024-06-15 11:41:47,402][1651669] Updated weights for policy 0, policy_version 12801 (0.0016) [2024-06-15 11:41:48,695][1651669] Updated weights for policy 0, policy_version 12856 (0.0077) [2024-06-15 11:41:50,767][1648981] Fps is (10 sec: 42632.2, 60 sec: 43690.6, 300 sec: 44431.2). Total num frames: 26476544. Throughput: 0: 11047.8. Samples: 6693888. Policy #0 lag: (min: 15.0, avg: 96.1, max: 271.0) [2024-06-15 11:41:50,767][1648981] Avg episode reward: [(0, '29.320')] [2024-06-15 11:41:51,681][1651669] Updated weights for policy 0, policy_version 12963 (0.0014) [2024-06-15 11:41:52,912][1651669] Updated weights for policy 0, policy_version 13024 (0.0014) [2024-06-15 11:41:55,767][1648981] Fps is (10 sec: 52425.8, 60 sec: 43690.4, 300 sec: 44764.3). Total num frames: 26738688. Throughput: 0: 10797.4. Samples: 6721536. Policy #0 lag: (min: 15.0, avg: 96.1, max: 271.0) [2024-06-15 11:41:55,768][1648981] Avg episode reward: [(0, '30.680')] [2024-06-15 11:41:59,209][1651669] Updated weights for policy 0, policy_version 13076 (0.0012) [2024-06-15 11:42:00,777][1648981] Fps is (10 sec: 39281.1, 60 sec: 43683.1, 300 sec: 44429.7). Total num frames: 26869760. Throughput: 0: 11045.3. Samples: 6802944. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 11:42:00,777][1648981] Avg episode reward: [(0, '30.270')] [2024-06-15 11:42:01,391][1651669] Updated weights for policy 0, policy_version 13152 (0.0011) [2024-06-15 11:42:03,212][1651669] Updated weights for policy 0, policy_version 13232 (0.0119) [2024-06-15 11:42:04,427][1651669] Updated weights for policy 0, policy_version 13286 (0.0013) [2024-06-15 11:42:05,767][1648981] Fps is (10 sec: 52430.7, 60 sec: 43690.5, 300 sec: 44875.5). Total num frames: 27262976. Throughput: 0: 11070.5. Samples: 6860288. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 11:42:05,767][1648981] Avg episode reward: [(0, '31.090')] [2024-06-15 11:42:10,766][1648981] Fps is (10 sec: 42642.7, 60 sec: 44236.8, 300 sec: 44209.0). Total num frames: 27295744. Throughput: 0: 11161.6. Samples: 6900736. Policy #0 lag: (min: 1.0, avg: 72.7, max: 257.0) [2024-06-15 11:42:10,767][1648981] Avg episode reward: [(0, '30.140')] [2024-06-15 11:42:11,047][1651669] Updated weights for policy 0, policy_version 13344 (0.0014) [2024-06-15 11:42:12,423][1651669] Updated weights for policy 0, policy_version 13379 (0.0036) [2024-06-15 11:42:13,696][1651274] Signal inference workers to stop experience collection... (700 times) [2024-06-15 11:42:13,734][1651669] InferenceWorker_p0-w0: stopping experience collection (700 times) [2024-06-15 11:42:13,902][1651274] Signal inference workers to resume experience collection... (700 times) [2024-06-15 11:42:13,903][1651669] InferenceWorker_p0-w0: resuming experience collection (700 times) [2024-06-15 11:42:14,120][1651669] Updated weights for policy 0, policy_version 13444 (0.0011) [2024-06-15 11:42:15,507][1651669] Updated weights for policy 0, policy_version 13507 (0.0012) [2024-06-15 11:42:15,785][1648981] Fps is (10 sec: 42521.3, 60 sec: 45315.3, 300 sec: 44983.8). Total num frames: 27688960. Throughput: 0: 11427.6. Samples: 6972416. Policy #0 lag: (min: 1.0, avg: 72.7, max: 257.0) [2024-06-15 11:42:15,785][1648981] Avg episode reward: [(0, '32.020')] [2024-06-15 11:42:16,304][1651274] Saving new best policy, reward=32.020! [2024-06-15 11:42:16,465][1651669] Updated weights for policy 0, policy_version 13556 (0.0042) [2024-06-15 11:42:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 44764.4). Total num frames: 27787264. Throughput: 0: 11434.7. Samples: 7044096. Policy #0 lag: (min: 1.0, avg: 72.7, max: 257.0) [2024-06-15 11:42:20,767][1648981] Avg episode reward: [(0, '33.530')] [2024-06-15 11:42:20,798][1651274] Saving new best policy, reward=33.530! [2024-06-15 11:42:22,430][1651669] Updated weights for policy 0, policy_version 13586 (0.0012) [2024-06-15 11:42:23,973][1651669] Updated weights for policy 0, policy_version 13648 (0.0012) [2024-06-15 11:42:25,625][1651669] Updated weights for policy 0, policy_version 13712 (0.0011) [2024-06-15 11:42:25,767][1648981] Fps is (10 sec: 39393.1, 60 sec: 46421.2, 300 sec: 44986.5). Total num frames: 28082176. Throughput: 0: 11448.0. Samples: 7079424. Policy #0 lag: (min: 11.0, avg: 86.7, max: 267.0) [2024-06-15 11:42:25,767][1648981] Avg episode reward: [(0, '34.800')] [2024-06-15 11:42:26,440][1651274] Saving new best policy, reward=34.800! [2024-06-15 11:42:26,942][1651669] Updated weights for policy 0, policy_version 13766 (0.0013) [2024-06-15 11:42:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 43716.3, 300 sec: 44875.5). Total num frames: 28311552. Throughput: 0: 11525.7. Samples: 7139328. Policy #0 lag: (min: 11.0, avg: 86.7, max: 267.0) [2024-06-15 11:42:30,767][1648981] Avg episode reward: [(0, '35.250')] [2024-06-15 11:42:30,768][1651274] Saving new best policy, reward=35.250! [2024-06-15 11:42:34,111][1651669] Updated weights for policy 0, policy_version 13825 (0.0014) [2024-06-15 11:42:35,269][1651669] Updated weights for policy 0, policy_version 13874 (0.0011) [2024-06-15 11:42:35,766][1648981] Fps is (10 sec: 36045.3, 60 sec: 45329.1, 300 sec: 44431.2). Total num frames: 28442624. Throughput: 0: 11559.8. Samples: 7214080. Policy #0 lag: (min: 15.0, avg: 76.2, max: 271.0) [2024-06-15 11:42:35,767][1648981] Avg episode reward: [(0, '35.060')] [2024-06-15 11:42:37,236][1651669] Updated weights for policy 0, policy_version 13952 (0.0080) [2024-06-15 11:42:39,086][1651669] Updated weights for policy 0, policy_version 14032 (0.0065) [2024-06-15 11:42:40,449][1651669] Updated weights for policy 0, policy_version 14080 (0.0019) [2024-06-15 11:42:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46427.5, 300 sec: 44875.5). Total num frames: 28835840. Throughput: 0: 11446.2. Samples: 7236608. Policy #0 lag: (min: 15.0, avg: 76.2, max: 271.0) [2024-06-15 11:42:40,767][1648981] Avg episode reward: [(0, '34.470')] [2024-06-15 11:42:45,790][1648981] Fps is (10 sec: 39228.5, 60 sec: 43673.3, 300 sec: 44205.5). Total num frames: 28835840. Throughput: 0: 11067.3. Samples: 7301120. Policy #0 lag: (min: 15.0, avg: 76.2, max: 271.0) [2024-06-15 11:42:45,791][1648981] Avg episode reward: [(0, '35.180')] [2024-06-15 11:42:47,321][1651669] Updated weights for policy 0, policy_version 14138 (0.0011) [2024-06-15 11:42:48,193][1651669] Updated weights for policy 0, policy_version 14164 (0.0010) [2024-06-15 11:42:49,705][1651669] Updated weights for policy 0, policy_version 14227 (0.0011) [2024-06-15 11:42:50,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 45875.3, 300 sec: 44653.4). Total num frames: 29229056. Throughput: 0: 11298.2. Samples: 7368704. Policy #0 lag: (min: 2.0, avg: 64.0, max: 258.0) [2024-06-15 11:42:50,767][1648981] Avg episode reward: [(0, '37.330')] [2024-06-15 11:42:50,822][1651274] Signal inference workers to stop experience collection... (750 times) [2024-06-15 11:42:50,888][1651669] InferenceWorker_p0-w0: stopping experience collection (750 times) [2024-06-15 11:42:51,010][1651274] Saving new best policy, reward=37.330! [2024-06-15 11:42:51,011][1651274] Signal inference workers to resume experience collection... (750 times) [2024-06-15 11:42:51,017][1651669] InferenceWorker_p0-w0: resuming experience collection (750 times) [2024-06-15 11:42:51,521][1651669] Updated weights for policy 0, policy_version 14304 (0.0113) [2024-06-15 11:42:55,770][1648981] Fps is (10 sec: 52534.0, 60 sec: 43688.3, 300 sec: 44542.1). Total num frames: 29360128. Throughput: 0: 11126.5. Samples: 7401472. Policy #0 lag: (min: 2.0, avg: 64.0, max: 258.0) [2024-06-15 11:42:55,771][1648981] Avg episode reward: [(0, '36.390')] [2024-06-15 11:42:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000014336_29360128.pth... [2024-06-15 11:42:55,828][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000009216_18874368.pth [2024-06-15 11:42:57,955][1651669] Updated weights for policy 0, policy_version 14368 (0.0091) [2024-06-15 11:42:59,331][1651669] Updated weights for policy 0, policy_version 14416 (0.0014) [2024-06-15 11:43:00,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45883.1, 300 sec: 44654.5). Total num frames: 29622272. Throughput: 0: 11325.5. Samples: 7481856. Policy #0 lag: (min: 10.0, avg: 73.6, max: 266.0) [2024-06-15 11:43:00,778][1648981] Avg episode reward: [(0, '36.200')] [2024-06-15 11:43:01,089][1651669] Updated weights for policy 0, policy_version 14480 (0.0011) [2024-06-15 11:43:03,037][1651669] Updated weights for policy 0, policy_version 14560 (0.0014) [2024-06-15 11:43:05,766][1648981] Fps is (10 sec: 52448.4, 60 sec: 43690.8, 300 sec: 44875.5). Total num frames: 29884416. Throughput: 0: 10854.4. Samples: 7532544. Policy #0 lag: (min: 10.0, avg: 73.6, max: 266.0) [2024-06-15 11:43:05,767][1648981] Avg episode reward: [(0, '38.880')] [2024-06-15 11:43:05,768][1651274] Saving new best policy, reward=38.880! [2024-06-15 11:43:09,348][1651669] Updated weights for policy 0, policy_version 14608 (0.0014) [2024-06-15 11:43:10,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 45328.9, 300 sec: 44431.2). Total num frames: 30015488. Throughput: 0: 11047.8. Samples: 7576576. Policy #0 lag: (min: 15.0, avg: 78.1, max: 271.0) [2024-06-15 11:43:10,767][1648981] Avg episode reward: [(0, '40.710')] [2024-06-15 11:43:11,091][1651274] Saving new best policy, reward=40.710! [2024-06-15 11:43:11,095][1651669] Updated weights for policy 0, policy_version 14672 (0.0013) [2024-06-15 11:43:12,676][1651669] Updated weights for policy 0, policy_version 14742 (0.0013) [2024-06-15 11:43:14,631][1651669] Updated weights for policy 0, policy_version 14843 (0.0102) [2024-06-15 11:43:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45342.9, 300 sec: 44875.5). Total num frames: 30408704. Throughput: 0: 11127.5. Samples: 7640064. Policy #0 lag: (min: 15.0, avg: 78.1, max: 271.0) [2024-06-15 11:43:15,767][1648981] Avg episode reward: [(0, '41.430')] [2024-06-15 11:43:15,768][1651274] Saving new best policy, reward=41.430! [2024-06-15 11:43:20,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 44782.9, 300 sec: 44320.1). Total num frames: 30474240. Throughput: 0: 11309.5. Samples: 7723008. Policy #0 lag: (min: 31.0, avg: 104.6, max: 287.0) [2024-06-15 11:43:20,767][1648981] Avg episode reward: [(0, '39.940')] [2024-06-15 11:43:21,028][1651669] Updated weights for policy 0, policy_version 14905 (0.0014) [2024-06-15 11:43:22,807][1651669] Updated weights for policy 0, policy_version 14964 (0.0013) [2024-06-15 11:43:24,050][1651669] Updated weights for policy 0, policy_version 15024 (0.0012) [2024-06-15 11:43:25,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 46967.6, 300 sec: 44875.5). Total num frames: 30900224. Throughput: 0: 11457.4. Samples: 7752192. Policy #0 lag: (min: 31.0, avg: 104.6, max: 287.0) [2024-06-15 11:43:25,767][1648981] Avg episode reward: [(0, '41.420')] [2024-06-15 11:43:25,849][1651669] Updated weights for policy 0, policy_version 15093 (0.0013) [2024-06-15 11:43:30,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 43690.8, 300 sec: 44431.2). Total num frames: 30932992. Throughput: 0: 11543.2. Samples: 7820288. Policy #0 lag: (min: 31.0, avg: 104.6, max: 287.0) [2024-06-15 11:43:30,767][1648981] Avg episode reward: [(0, '42.840')] [2024-06-15 11:43:30,767][1651274] Saving new best policy, reward=42.840! [2024-06-15 11:43:32,093][1651669] Updated weights for policy 0, policy_version 15159 (0.0013) [2024-06-15 11:43:33,647][1651274] Signal inference workers to stop experience collection... (800 times) [2024-06-15 11:43:33,738][1651669] InferenceWorker_p0-w0: stopping experience collection (800 times) [2024-06-15 11:43:33,767][1651669] Updated weights for policy 0, policy_version 15206 (0.0013) [2024-06-15 11:43:33,885][1651274] Signal inference workers to resume experience collection... (800 times) [2024-06-15 11:43:33,894][1651669] InferenceWorker_p0-w0: resuming experience collection (800 times) [2024-06-15 11:43:34,812][1651669] Updated weights for policy 0, policy_version 15251 (0.0011) [2024-06-15 11:43:35,767][1648981] Fps is (10 sec: 39320.8, 60 sec: 47513.5, 300 sec: 44764.4). Total num frames: 31293440. Throughput: 0: 11582.5. Samples: 7889920. Policy #0 lag: (min: 31.0, avg: 94.6, max: 287.0) [2024-06-15 11:43:35,767][1648981] Avg episode reward: [(0, '41.160')] [2024-06-15 11:43:36,434][1651669] Updated weights for policy 0, policy_version 15317 (0.0013) [2024-06-15 11:43:37,378][1651669] Updated weights for policy 0, policy_version 15357 (0.0013) [2024-06-15 11:43:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 44431.2). Total num frames: 31457280. Throughput: 0: 11594.9. Samples: 7923200. Policy #0 lag: (min: 31.0, avg: 94.6, max: 287.0) [2024-06-15 11:43:40,767][1648981] Avg episode reward: [(0, '42.060')] [2024-06-15 11:43:43,515][1651669] Updated weights for policy 0, policy_version 15408 (0.0012) [2024-06-15 11:43:45,368][1651669] Updated weights for policy 0, policy_version 15472 (0.0041) [2024-06-15 11:43:45,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 47532.4, 300 sec: 44653.3). Total num frames: 31686656. Throughput: 0: 11457.4. Samples: 7997440. Policy #0 lag: (min: 26.0, avg: 96.7, max: 282.0) [2024-06-15 11:43:45,767][1648981] Avg episode reward: [(0, '42.680')] [2024-06-15 11:43:46,205][1651669] Updated weights for policy 0, policy_version 15504 (0.0026) [2024-06-15 11:43:47,755][1651669] Updated weights for policy 0, policy_version 15568 (0.0013) [2024-06-15 11:43:48,863][1651669] Updated weights for policy 0, policy_version 15613 (0.0012) [2024-06-15 11:43:50,782][1648981] Fps is (10 sec: 52346.2, 60 sec: 45863.1, 300 sec: 44651.0). Total num frames: 31981568. Throughput: 0: 11760.5. Samples: 8061952. Policy #0 lag: (min: 26.0, avg: 96.7, max: 282.0) [2024-06-15 11:43:50,783][1648981] Avg episode reward: [(0, '43.070')] [2024-06-15 11:43:50,784][1651274] Saving new best policy, reward=43.070! [2024-06-15 11:43:54,525][1651669] Updated weights for policy 0, policy_version 15680 (0.0013) [2024-06-15 11:43:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46424.2, 300 sec: 44767.2). Total num frames: 32145408. Throughput: 0: 11753.3. Samples: 8105472. Policy #0 lag: (min: 44.0, avg: 122.9, max: 300.0) [2024-06-15 11:43:55,767][1648981] Avg episode reward: [(0, '44.490')] [2024-06-15 11:43:56,172][1651274] Saving new best policy, reward=44.490! [2024-06-15 11:43:57,511][1651669] Updated weights for policy 0, policy_version 15761 (0.0014) [2024-06-15 11:43:59,067][1651669] Updated weights for policy 0, policy_version 15827 (0.0027) [2024-06-15 11:44:00,026][1651669] Updated weights for policy 0, policy_version 15870 (0.0015) [2024-06-15 11:44:00,767][1648981] Fps is (10 sec: 52510.2, 60 sec: 48059.5, 300 sec: 44986.5). Total num frames: 32505856. Throughput: 0: 11571.1. Samples: 8160768. Policy #0 lag: (min: 44.0, avg: 122.9, max: 300.0) [2024-06-15 11:44:00,767][1648981] Avg episode reward: [(0, '42.930')] [2024-06-15 11:44:05,695][1651669] Updated weights for policy 0, policy_version 15936 (0.0078) [2024-06-15 11:44:05,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 44875.5). Total num frames: 32636928. Throughput: 0: 11537.1. Samples: 8242176. Policy #0 lag: (min: 63.0, avg: 130.0, max: 319.0) [2024-06-15 11:44:05,767][1648981] Avg episode reward: [(0, '45.740')] [2024-06-15 11:44:05,768][1651274] Saving new best policy, reward=45.740! [2024-06-15 11:44:07,424][1651669] Updated weights for policy 0, policy_version 15993 (0.0013) [2024-06-15 11:44:09,359][1651669] Updated weights for policy 0, policy_version 16064 (0.0013) [2024-06-15 11:44:10,689][1651669] Updated weights for policy 0, policy_version 16123 (0.0014) [2024-06-15 11:44:10,766][1648981] Fps is (10 sec: 49152.9, 60 sec: 49698.2, 300 sec: 45209.9). Total num frames: 32997376. Throughput: 0: 11616.7. Samples: 8274944. Policy #0 lag: (min: 63.0, avg: 130.0, max: 319.0) [2024-06-15 11:44:10,767][1648981] Avg episode reward: [(0, '46.900')] [2024-06-15 11:44:10,822][1651274] Saving new best policy, reward=46.900! [2024-06-15 11:44:15,296][1651274] Signal inference workers to stop experience collection... (850 times) [2024-06-15 11:44:15,334][1651669] InferenceWorker_p0-w0: stopping experience collection (850 times) [2024-06-15 11:44:15,528][1651274] Signal inference workers to resume experience collection... (850 times) [2024-06-15 11:44:15,529][1651669] InferenceWorker_p0-w0: resuming experience collection (850 times) [2024-06-15 11:44:15,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 44236.6, 300 sec: 44543.3). Total num frames: 33062912. Throughput: 0: 11821.4. Samples: 8352256. Policy #0 lag: (min: 63.0, avg: 130.0, max: 319.0) [2024-06-15 11:44:15,768][1648981] Avg episode reward: [(0, '47.150')] [2024-06-15 11:44:16,155][1651669] Updated weights for policy 0, policy_version 16164 (0.0013) [2024-06-15 11:44:16,299][1651274] Saving new best policy, reward=47.150! [2024-06-15 11:44:17,504][1651669] Updated weights for policy 0, policy_version 16212 (0.0024) [2024-06-15 11:44:18,260][1651669] Updated weights for policy 0, policy_version 16254 (0.0012) [2024-06-15 11:44:20,418][1651669] Updated weights for policy 0, policy_version 16320 (0.0013) [2024-06-15 11:44:20,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49152.0, 300 sec: 45208.7). Total num frames: 33423360. Throughput: 0: 11764.7. Samples: 8419328. Policy #0 lag: (min: 31.0, avg: 128.4, max: 287.0) [2024-06-15 11:44:20,767][1648981] Avg episode reward: [(0, '49.660')] [2024-06-15 11:44:21,169][1651274] Saving new best policy, reward=49.660! [2024-06-15 11:44:21,803][1651669] Updated weights for policy 0, policy_version 16379 (0.0011) [2024-06-15 11:44:25,766][1648981] Fps is (10 sec: 49153.2, 60 sec: 44236.8, 300 sec: 44654.7). Total num frames: 33554432. Throughput: 0: 11810.1. Samples: 8454656. Policy #0 lag: (min: 31.0, avg: 128.4, max: 287.0) [2024-06-15 11:44:25,767][1648981] Avg episode reward: [(0, '49.060')] [2024-06-15 11:44:27,306][1651669] Updated weights for policy 0, policy_version 16443 (0.0092) [2024-06-15 11:44:29,335][1651669] Updated weights for policy 0, policy_version 16503 (0.0023) [2024-06-15 11:44:30,709][1651669] Updated weights for policy 0, policy_version 16569 (0.0012) [2024-06-15 11:44:30,778][1648981] Fps is (10 sec: 49093.9, 60 sec: 49688.3, 300 sec: 45651.2). Total num frames: 33914880. Throughput: 0: 11807.0. Samples: 8528896. Policy #0 lag: (min: 63.0, avg: 165.6, max: 319.0) [2024-06-15 11:44:30,779][1648981] Avg episode reward: [(0, '50.360')] [2024-06-15 11:44:30,842][1651274] Saving new best policy, reward=50.360! [2024-06-15 11:44:32,873][1651669] Updated weights for policy 0, policy_version 16628 (0.0018) [2024-06-15 11:44:35,767][1648981] Fps is (10 sec: 52426.1, 60 sec: 46421.1, 300 sec: 45321.5). Total num frames: 34078720. Throughput: 0: 11905.2. Samples: 8597504. Policy #0 lag: (min: 63.0, avg: 165.6, max: 319.0) [2024-06-15 11:44:35,768][1648981] Avg episode reward: [(0, '50.030')] [2024-06-15 11:44:37,523][1651669] Updated weights for policy 0, policy_version 16658 (0.0012) [2024-06-15 11:44:39,289][1651669] Updated weights for policy 0, policy_version 16720 (0.0012) [2024-06-15 11:44:40,145][1651669] Updated weights for policy 0, policy_version 16757 (0.0012) [2024-06-15 11:44:40,766][1648981] Fps is (10 sec: 45929.2, 60 sec: 48605.8, 300 sec: 45431.6). Total num frames: 34373632. Throughput: 0: 11787.4. Samples: 8635904. Policy #0 lag: (min: 63.0, avg: 165.6, max: 319.0) [2024-06-15 11:44:40,767][1648981] Avg episode reward: [(0, '52.030')] [2024-06-15 11:44:40,861][1651669] Updated weights for policy 0, policy_version 16787 (0.0012) [2024-06-15 11:44:40,984][1651274] Saving new best policy, reward=52.030! [2024-06-15 11:44:43,124][1651669] Updated weights for policy 0, policy_version 16864 (0.0012) [2024-06-15 11:44:45,766][1648981] Fps is (10 sec: 52431.9, 60 sec: 48605.9, 300 sec: 45319.8). Total num frames: 34603008. Throughput: 0: 11992.3. Samples: 8700416. Policy #0 lag: (min: 74.0, avg: 190.8, max: 335.0) [2024-06-15 11:44:45,767][1648981] Avg episode reward: [(0, '52.690')] [2024-06-15 11:44:45,768][1651274] Saving new best policy, reward=52.690! [2024-06-15 11:44:48,172][1651669] Updated weights for policy 0, policy_version 16912 (0.0016) [2024-06-15 11:44:50,079][1651669] Updated weights for policy 0, policy_version 16961 (0.0012) [2024-06-15 11:44:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46979.7, 300 sec: 45097.7). Total num frames: 34799616. Throughput: 0: 11980.8. Samples: 8781312. Policy #0 lag: (min: 74.0, avg: 190.8, max: 335.0) [2024-06-15 11:44:50,767][1648981] Avg episode reward: [(0, '52.780')] [2024-06-15 11:44:51,109][1651274] Saving new best policy, reward=52.780! [2024-06-15 11:44:51,484][1651669] Updated weights for policy 0, policy_version 17024 (0.0012) [2024-06-15 11:44:53,965][1651669] Updated weights for policy 0, policy_version 17089 (0.0013) [2024-06-15 11:44:54,830][1651274] Signal inference workers to stop experience collection... (900 times) [2024-06-15 11:44:54,892][1651669] InferenceWorker_p0-w0: stopping experience collection (900 times) [2024-06-15 11:44:55,072][1651274] Signal inference workers to resume experience collection... (900 times) [2024-06-15 11:44:55,074][1651669] InferenceWorker_p0-w0: resuming experience collection (900 times) [2024-06-15 11:44:55,246][1651669] Updated weights for policy 0, policy_version 17144 (0.0013) [2024-06-15 11:44:55,770][1648981] Fps is (10 sec: 52408.3, 60 sec: 49694.9, 300 sec: 46096.7). Total num frames: 35127296. Throughput: 0: 11866.0. Samples: 8808960. Policy #0 lag: (min: 12.0, avg: 168.9, max: 268.0) [2024-06-15 11:44:55,771][1648981] Avg episode reward: [(0, '49.700')] [2024-06-15 11:44:55,806][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000017152_35127296.pth... [2024-06-15 11:44:55,850][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000011776_24117248.pth [2024-06-15 11:45:00,256][1651669] Updated weights for policy 0, policy_version 17186 (0.0017) [2024-06-15 11:45:00,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 45329.3, 300 sec: 45542.0). Total num frames: 35225600. Throughput: 0: 11844.4. Samples: 8885248. Policy #0 lag: (min: 12.0, avg: 168.9, max: 268.0) [2024-06-15 11:45:00,767][1648981] Avg episode reward: [(0, '49.930')] [2024-06-15 11:45:00,962][1651669] Updated weights for policy 0, policy_version 17216 (0.0012) [2024-06-15 11:45:03,373][1651669] Updated weights for policy 0, policy_version 17281 (0.0013) [2024-06-15 11:45:05,067][1651669] Updated weights for policy 0, policy_version 17348 (0.0015) [2024-06-15 11:45:05,766][1648981] Fps is (10 sec: 45893.3, 60 sec: 49152.0, 300 sec: 45986.9). Total num frames: 35586048. Throughput: 0: 11764.6. Samples: 8948736. Policy #0 lag: (min: 12.0, avg: 168.9, max: 268.0) [2024-06-15 11:45:05,767][1648981] Avg episode reward: [(0, '51.590')] [2024-06-15 11:45:06,447][1651669] Updated weights for policy 0, policy_version 17408 (0.0013) [2024-06-15 11:45:10,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44236.9, 300 sec: 45322.9). Total num frames: 35651584. Throughput: 0: 11730.5. Samples: 8982528. Policy #0 lag: (min: 12.0, avg: 168.9, max: 268.0) [2024-06-15 11:45:10,767][1648981] Avg episode reward: [(0, '58.050')] [2024-06-15 11:45:11,247][1651274] Saving new best policy, reward=58.050! [2024-06-15 11:45:11,782][1651669] Updated weights for policy 0, policy_version 17466 (0.0012) [2024-06-15 11:45:13,655][1651669] Updated weights for policy 0, policy_version 17525 (0.0014) [2024-06-15 11:45:15,525][1651669] Updated weights for policy 0, policy_version 17570 (0.0013) [2024-06-15 11:45:15,767][1648981] Fps is (10 sec: 42596.5, 60 sec: 49151.9, 300 sec: 45659.2). Total num frames: 36012032. Throughput: 0: 11813.1. Samples: 9060352. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 11:45:15,768][1648981] Avg episode reward: [(0, '58.490')] [2024-06-15 11:45:16,052][1651274] Saving new best policy, reward=58.490! [2024-06-15 11:45:17,082][1651669] Updated weights for policy 0, policy_version 17637 (0.0013) [2024-06-15 11:45:20,766][1648981] Fps is (10 sec: 52427.9, 60 sec: 45875.1, 300 sec: 45764.1). Total num frames: 36175872. Throughput: 0: 11935.4. Samples: 9134592. Policy #0 lag: (min: 15.0, avg: 100.6, max: 271.0) [2024-06-15 11:45:20,767][1648981] Avg episode reward: [(0, '58.780')] [2024-06-15 11:45:20,768][1651274] Saving new best policy, reward=58.780! [2024-06-15 11:45:21,620][1651669] Updated weights for policy 0, policy_version 17680 (0.0103) [2024-06-15 11:45:24,051][1651669] Updated weights for policy 0, policy_version 17744 (0.0014) [2024-06-15 11:45:24,928][1651669] Updated weights for policy 0, policy_version 17792 (0.0013) [2024-06-15 11:45:25,766][1648981] Fps is (10 sec: 45876.8, 60 sec: 48605.9, 300 sec: 45653.0). Total num frames: 36470784. Throughput: 0: 11867.0. Samples: 9169920. Policy #0 lag: (min: 15.0, avg: 115.5, max: 271.0) [2024-06-15 11:45:25,767][1648981] Avg episode reward: [(0, '55.140')] [2024-06-15 11:45:26,595][1651669] Updated weights for policy 0, policy_version 17856 (0.0013) [2024-06-15 11:45:27,936][1651669] Updated weights for policy 0, policy_version 17916 (0.0012) [2024-06-15 11:45:30,767][1648981] Fps is (10 sec: 52428.8, 60 sec: 46430.4, 300 sec: 46212.2). Total num frames: 36700160. Throughput: 0: 11935.2. Samples: 9237504. Policy #0 lag: (min: 15.0, avg: 115.5, max: 271.0) [2024-06-15 11:45:30,767][1648981] Avg episode reward: [(0, '55.870')] [2024-06-15 11:45:33,526][1651669] Updated weights for policy 0, policy_version 17984 (0.0015) [2024-06-15 11:45:35,774][1648981] Fps is (10 sec: 42565.5, 60 sec: 46961.8, 300 sec: 45541.3). Total num frames: 36896768. Throughput: 0: 11751.2. Samples: 9310208. Policy #0 lag: (min: 15.0, avg: 115.5, max: 271.0) [2024-06-15 11:45:35,775][1648981] Avg episode reward: [(0, '58.870')] [2024-06-15 11:45:36,356][1651669] Updated weights for policy 0, policy_version 18045 (0.0014) [2024-06-15 11:45:36,390][1651274] Saving new best policy, reward=58.870! [2024-06-15 11:45:38,107][1651669] Updated weights for policy 0, policy_version 18112 (0.0013) [2024-06-15 11:45:38,191][1651274] Signal inference workers to stop experience collection... (950 times) [2024-06-15 11:45:38,220][1651669] InferenceWorker_p0-w0: stopping experience collection (950 times) [2024-06-15 11:45:38,432][1651274] Signal inference workers to resume experience collection... (950 times) [2024-06-15 11:45:38,438][1651669] InferenceWorker_p0-w0: resuming experience collection (950 times) [2024-06-15 11:45:39,583][1651669] Updated weights for policy 0, policy_version 18176 (0.0089) [2024-06-15 11:45:40,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 47513.6, 300 sec: 46208.4). Total num frames: 37224448. Throughput: 0: 11731.5. Samples: 9336832. Policy #0 lag: (min: 9.0, avg: 129.0, max: 265.0) [2024-06-15 11:45:40,767][1648981] Avg episode reward: [(0, '58.620')] [2024-06-15 11:45:44,812][1651669] Updated weights for policy 0, policy_version 18232 (0.0012) [2024-06-15 11:45:45,766][1648981] Fps is (10 sec: 45910.9, 60 sec: 45875.2, 300 sec: 45764.1). Total num frames: 37355520. Throughput: 0: 11662.2. Samples: 9410048. Policy #0 lag: (min: 9.0, avg: 129.0, max: 265.0) [2024-06-15 11:45:45,767][1648981] Avg episode reward: [(0, '61.870')] [2024-06-15 11:45:45,768][1651274] Saving new best policy, reward=61.870! [2024-06-15 11:45:47,698][1651669] Updated weights for policy 0, policy_version 18288 (0.0011) [2024-06-15 11:45:49,360][1651669] Updated weights for policy 0, policy_version 18352 (0.0012) [2024-06-15 11:45:50,822][1648981] Fps is (10 sec: 45620.3, 60 sec: 48015.1, 300 sec: 45977.6). Total num frames: 37683200. Throughput: 0: 11704.6. Samples: 9476096. Policy #0 lag: (min: 64.0, avg: 180.1, max: 320.0) [2024-06-15 11:45:50,823][1648981] Avg episode reward: [(0, '60.740')] [2024-06-15 11:45:51,109][1651669] Updated weights for policy 0, policy_version 18421 (0.0013) [2024-06-15 11:45:55,662][1651669] Updated weights for policy 0, policy_version 18464 (0.0118) [2024-06-15 11:45:55,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 44785.8, 300 sec: 45986.3). Total num frames: 37814272. Throughput: 0: 11764.6. Samples: 9511936. Policy #0 lag: (min: 64.0, avg: 180.1, max: 320.0) [2024-06-15 11:45:55,767][1648981] Avg episode reward: [(0, '62.580')] [2024-06-15 11:45:56,022][1651274] Saving new best policy, reward=62.580! [2024-06-15 11:45:57,453][1651669] Updated weights for policy 0, policy_version 18497 (0.0013) [2024-06-15 11:45:58,896][1651669] Updated weights for policy 0, policy_version 18560 (0.0020) [2024-06-15 11:46:00,766][1648981] Fps is (10 sec: 39542.9, 60 sec: 47513.6, 300 sec: 45542.0). Total num frames: 38076416. Throughput: 0: 11628.2. Samples: 9583616. Policy #0 lag: (min: 64.0, avg: 180.1, max: 320.0) [2024-06-15 11:46:00,767][1648981] Avg episode reward: [(0, '64.780')] [2024-06-15 11:46:01,082][1651274] Saving new best policy, reward=64.780! [2024-06-15 11:46:02,643][1651669] Updated weights for policy 0, policy_version 18672 (0.0141) [2024-06-15 11:46:05,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 46208.4). Total num frames: 38273024. Throughput: 0: 11366.4. Samples: 9646080. Policy #0 lag: (min: 64.0, avg: 180.1, max: 320.0) [2024-06-15 11:46:05,767][1648981] Avg episode reward: [(0, '65.860')] [2024-06-15 11:46:05,768][1651274] Saving new best policy, reward=65.860! [2024-06-15 11:46:08,021][1651669] Updated weights for policy 0, policy_version 18740 (0.0038) [2024-06-15 11:46:10,220][1651669] Updated weights for policy 0, policy_version 18784 (0.0012) [2024-06-15 11:46:10,787][1648981] Fps is (10 sec: 39242.5, 60 sec: 46951.7, 300 sec: 45761.0). Total num frames: 38469632. Throughput: 0: 11315.8. Samples: 9679360. Policy #0 lag: (min: 10.0, avg: 103.9, max: 266.0) [2024-06-15 11:46:10,787][1648981] Avg episode reward: [(0, '66.780')] [2024-06-15 11:46:11,227][1651274] Saving new best policy, reward=66.780! [2024-06-15 11:46:11,253][1651669] Updated weights for policy 0, policy_version 18816 (0.0011) [2024-06-15 11:46:13,744][1651669] Updated weights for policy 0, policy_version 18896 (0.0014) [2024-06-15 11:46:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46421.6, 300 sec: 46208.4). Total num frames: 38797312. Throughput: 0: 11161.6. Samples: 9739776. Policy #0 lag: (min: 10.0, avg: 103.9, max: 266.0) [2024-06-15 11:46:15,767][1648981] Avg episode reward: [(0, '63.190')] [2024-06-15 11:46:18,351][1651669] Updated weights for policy 0, policy_version 18960 (0.0012) [2024-06-15 11:46:19,400][1651669] Updated weights for policy 0, policy_version 19006 (0.0022) [2024-06-15 11:46:20,766][1648981] Fps is (10 sec: 49251.3, 60 sec: 46421.5, 300 sec: 46319.5). Total num frames: 38961152. Throughput: 0: 11436.7. Samples: 9824768. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 11:46:20,767][1648981] Avg episode reward: [(0, '57.470')] [2024-06-15 11:46:21,762][1651669] Updated weights for policy 0, policy_version 19060 (0.0012) [2024-06-15 11:46:23,844][1651669] Updated weights for policy 0, policy_version 19104 (0.0013) [2024-06-15 11:46:23,933][1651274] Signal inference workers to stop experience collection... (1000 times) [2024-06-15 11:46:23,966][1651669] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2024-06-15 11:46:24,144][1651274] Signal inference workers to resume experience collection... (1000 times) [2024-06-15 11:46:24,145][1651669] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2024-06-15 11:46:25,208][1651669] Updated weights for policy 0, policy_version 19168 (0.0087) [2024-06-15 11:46:25,799][1648981] Fps is (10 sec: 48993.0, 60 sec: 46942.1, 300 sec: 46097.8). Total num frames: 39288832. Throughput: 0: 11574.2. Samples: 9858048. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 11:46:25,799][1648981] Avg episode reward: [(0, '56.280')] [2024-06-15 11:46:29,573][1651669] Updated weights for policy 0, policy_version 19216 (0.0012) [2024-06-15 11:46:30,767][1648981] Fps is (10 sec: 49150.3, 60 sec: 45875.1, 300 sec: 46541.6). Total num frames: 39452672. Throughput: 0: 11593.9. Samples: 9931776. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 11:46:30,768][1648981] Avg episode reward: [(0, '58.900')] [2024-06-15 11:46:31,713][1651669] Updated weights for policy 0, policy_version 19283 (0.0083) [2024-06-15 11:46:34,723][1651669] Updated weights for policy 0, policy_version 19347 (0.0013) [2024-06-15 11:46:35,766][1648981] Fps is (10 sec: 39449.5, 60 sec: 46427.3, 300 sec: 46209.7). Total num frames: 39682048. Throughput: 0: 11505.8. Samples: 9993216. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 11:46:35,767][1648981] Avg episode reward: [(0, '60.220')] [2024-06-15 11:46:36,367][1651669] Updated weights for policy 0, policy_version 19410 (0.0012) [2024-06-15 11:46:40,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 39845888. Throughput: 0: 11434.7. Samples: 10026496. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 11:46:40,767][1648981] Avg episode reward: [(0, '63.990')] [2024-06-15 11:46:40,926][1651669] Updated weights for policy 0, policy_version 19472 (0.0012) [2024-06-15 11:46:41,739][1651669] Updated weights for policy 0, policy_version 19513 (0.0030) [2024-06-15 11:46:44,049][1651669] Updated weights for policy 0, policy_version 19568 (0.0027) [2024-06-15 11:46:45,522][1651669] Updated weights for policy 0, policy_version 19603 (0.0012) [2024-06-15 11:46:45,767][1648981] Fps is (10 sec: 45874.2, 60 sec: 46421.1, 300 sec: 46319.5). Total num frames: 40140800. Throughput: 0: 11559.7. Samples: 10103808. Policy #0 lag: (min: 11.0, avg: 124.8, max: 267.0) [2024-06-15 11:46:45,767][1648981] Avg episode reward: [(0, '62.300')] [2024-06-15 11:46:47,049][1651669] Updated weights for policy 0, policy_version 19649 (0.0011) [2024-06-15 11:46:48,184][1651669] Updated weights for policy 0, policy_version 19712 (0.0013) [2024-06-15 11:46:50,802][1648981] Fps is (10 sec: 52242.1, 60 sec: 44797.9, 300 sec: 46202.9). Total num frames: 40370176. Throughput: 0: 11766.6. Samples: 10176000. Policy #0 lag: (min: 68.0, avg: 219.4, max: 351.0) [2024-06-15 11:46:50,803][1648981] Avg episode reward: [(0, '62.280')] [2024-06-15 11:46:52,382][1651669] Updated weights for policy 0, policy_version 19769 (0.0013) [2024-06-15 11:46:54,904][1651669] Updated weights for policy 0, policy_version 19824 (0.0012) [2024-06-15 11:46:55,767][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.3, 300 sec: 46654.3). Total num frames: 40632320. Throughput: 0: 11963.3. Samples: 10217472. Policy #0 lag: (min: 68.0, avg: 219.4, max: 351.0) [2024-06-15 11:46:55,767][1648981] Avg episode reward: [(0, '62.990')] [2024-06-15 11:46:55,787][1651669] Updated weights for policy 0, policy_version 19843 (0.0011) [2024-06-15 11:46:56,452][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000019872_40697856.pth... [2024-06-15 11:46:56,611][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000014336_29360128.pth [2024-06-15 11:46:58,013][1651669] Updated weights for policy 0, policy_version 19925 (0.0012) [2024-06-15 11:47:00,767][1648981] Fps is (10 sec: 52616.0, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 40894464. Throughput: 0: 11912.5. Samples: 10275840. Policy #0 lag: (min: 68.0, avg: 219.4, max: 351.0) [2024-06-15 11:47:00,767][1648981] Avg episode reward: [(0, '64.770')] [2024-06-15 11:47:02,201][1651669] Updated weights for policy 0, policy_version 19969 (0.0097) [2024-06-15 11:47:03,357][1651669] Updated weights for policy 0, policy_version 20028 (0.0012) [2024-06-15 11:47:05,767][1648981] Fps is (10 sec: 45875.1, 60 sec: 46967.3, 300 sec: 46763.8). Total num frames: 41091072. Throughput: 0: 11832.8. Samples: 10357248. Policy #0 lag: (min: 9.0, avg: 108.2, max: 265.0) [2024-06-15 11:47:05,769][1648981] Avg episode reward: [(0, '68.200')] [2024-06-15 11:47:06,009][1651274] Saving new best policy, reward=68.200! [2024-06-15 11:47:06,147][1651669] Updated weights for policy 0, policy_version 20082 (0.0013) [2024-06-15 11:47:07,542][1651274] Signal inference workers to stop experience collection... (1050 times) [2024-06-15 11:47:07,626][1651669] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2024-06-15 11:47:07,636][1651669] Updated weights for policy 0, policy_version 20116 (0.0020) [2024-06-15 11:47:07,814][1651274] Signal inference workers to resume experience collection... (1050 times) [2024-06-15 11:47:07,815][1651669] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2024-06-15 11:47:09,996][1651669] Updated weights for policy 0, policy_version 20208 (0.0119) [2024-06-15 11:47:10,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 49168.5, 300 sec: 46544.6). Total num frames: 41418752. Throughput: 0: 11784.5. Samples: 10387968. Policy #0 lag: (min: 9.0, avg: 108.2, max: 265.0) [2024-06-15 11:47:10,767][1648981] Avg episode reward: [(0, '69.010')] [2024-06-15 11:47:10,812][1651274] Saving new best policy, reward=69.010! [2024-06-15 11:47:13,711][1651669] Updated weights for policy 0, policy_version 20240 (0.0014) [2024-06-15 11:47:14,565][1651669] Updated weights for policy 0, policy_version 20282 (0.0014) [2024-06-15 11:47:15,769][1648981] Fps is (10 sec: 45863.5, 60 sec: 45873.1, 300 sec: 46652.3). Total num frames: 41549824. Throughput: 0: 11707.1. Samples: 10458624. Policy #0 lag: (min: 9.0, avg: 108.2, max: 265.0) [2024-06-15 11:47:15,770][1648981] Avg episode reward: [(0, '69.230')] [2024-06-15 11:47:15,771][1651274] Saving new best policy, reward=69.230! [2024-06-15 11:47:18,630][1651669] Updated weights for policy 0, policy_version 20353 (0.0014) [2024-06-15 11:47:20,173][1651669] Updated weights for policy 0, policy_version 20416 (0.0012) [2024-06-15 11:47:20,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 41844736. Throughput: 0: 11764.6. Samples: 10522624. Policy #0 lag: (min: 5.0, avg: 103.1, max: 261.0) [2024-06-15 11:47:20,767][1648981] Avg episode reward: [(0, '71.220')] [2024-06-15 11:47:21,132][1651274] Saving new best policy, reward=71.220! [2024-06-15 11:47:21,823][1651669] Updated weights for policy 0, policy_version 20472 (0.0015) [2024-06-15 11:47:25,577][1651669] Updated weights for policy 0, policy_version 20519 (0.0012) [2024-06-15 11:47:25,767][1648981] Fps is (10 sec: 49165.3, 60 sec: 45900.0, 300 sec: 46541.7). Total num frames: 42041344. Throughput: 0: 11764.6. Samples: 10555904. Policy #0 lag: (min: 5.0, avg: 103.1, max: 261.0) [2024-06-15 11:47:25,767][1648981] Avg episode reward: [(0, '73.850')] [2024-06-15 11:47:25,958][1651274] Saving new best policy, reward=73.850! [2024-06-15 11:47:28,530][1651669] Updated weights for policy 0, policy_version 20560 (0.0014) [2024-06-15 11:47:30,122][1651669] Updated weights for policy 0, policy_version 20625 (0.0014) [2024-06-15 11:47:30,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.7, 300 sec: 46874.9). Total num frames: 42270720. Throughput: 0: 11764.7. Samples: 10633216. Policy #0 lag: (min: 5.0, avg: 103.1, max: 261.0) [2024-06-15 11:47:30,767][1648981] Avg episode reward: [(0, '74.100')] [2024-06-15 11:47:31,262][1651274] Saving new best policy, reward=74.100! [2024-06-15 11:47:31,727][1651669] Updated weights for policy 0, policy_version 20688 (0.0013) [2024-06-15 11:47:32,808][1651669] Updated weights for policy 0, policy_version 20730 (0.0012) [2024-06-15 11:47:35,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 42467328. Throughput: 0: 11717.0. Samples: 10702848. Policy #0 lag: (min: 72.0, avg: 164.2, max: 299.0) [2024-06-15 11:47:35,767][1648981] Avg episode reward: [(0, '74.160')] [2024-06-15 11:47:36,171][1651274] Saving new best policy, reward=74.160! [2024-06-15 11:47:36,693][1651669] Updated weights for policy 0, policy_version 20795 (0.0022) [2024-06-15 11:47:40,523][1651669] Updated weights for policy 0, policy_version 20852 (0.0013) [2024-06-15 11:47:40,797][1648981] Fps is (10 sec: 45787.5, 60 sec: 48044.4, 300 sec: 47097.8). Total num frames: 42729472. Throughput: 0: 11657.3. Samples: 10742272. Policy #0 lag: (min: 72.0, avg: 164.2, max: 299.0) [2024-06-15 11:47:40,803][1648981] Avg episode reward: [(0, '68.100')] [2024-06-15 11:47:41,838][1651669] Updated weights for policy 0, policy_version 20898 (0.0012) [2024-06-15 11:47:44,262][1651669] Updated weights for policy 0, policy_version 20981 (0.0012) [2024-06-15 11:47:45,781][1648981] Fps is (10 sec: 52351.9, 60 sec: 47502.1, 300 sec: 46650.4). Total num frames: 42991616. Throughput: 0: 11647.1. Samples: 10800128. Policy #0 lag: (min: 72.0, avg: 164.2, max: 299.0) [2024-06-15 11:47:45,782][1648981] Avg episode reward: [(0, '68.060')] [2024-06-15 11:47:46,787][1651669] Updated weights for policy 0, policy_version 21026 (0.0010) [2024-06-15 11:47:50,766][1648981] Fps is (10 sec: 39397.2, 60 sec: 45902.6, 300 sec: 46653.3). Total num frames: 43122688. Throughput: 0: 11730.6. Samples: 10885120. Policy #0 lag: (min: 1.0, avg: 126.4, max: 257.0) [2024-06-15 11:47:50,767][1648981] Avg episode reward: [(0, '70.940')] [2024-06-15 11:47:51,083][1651669] Updated weights for policy 0, policy_version 21072 (0.0013) [2024-06-15 11:47:51,179][1651274] Signal inference workers to stop experience collection... (1100 times) [2024-06-15 11:47:51,210][1651669] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2024-06-15 11:47:51,502][1651274] Signal inference workers to resume experience collection... (1100 times) [2024-06-15 11:47:51,503][1651669] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2024-06-15 11:47:52,575][1651669] Updated weights for policy 0, policy_version 21128 (0.0016) [2024-06-15 11:47:53,627][1651669] Updated weights for policy 0, policy_version 21174 (0.0013) [2024-06-15 11:47:54,986][1651669] Updated weights for policy 0, policy_version 21232 (0.0017) [2024-06-15 11:47:55,775][1648981] Fps is (10 sec: 52464.3, 60 sec: 48053.5, 300 sec: 47095.8). Total num frames: 43515904. Throughput: 0: 11660.2. Samples: 10912768. Policy #0 lag: (min: 1.0, avg: 126.4, max: 257.0) [2024-06-15 11:47:55,777][1648981] Avg episode reward: [(0, '74.050')] [2024-06-15 11:47:57,857][1651669] Updated weights for policy 0, policy_version 21283 (0.0017) [2024-06-15 11:48:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45875.4, 300 sec: 46652.7). Total num frames: 43646976. Throughput: 0: 11810.9. Samples: 10990080. Policy #0 lag: (min: 1.0, avg: 126.4, max: 257.0) [2024-06-15 11:48:00,767][1648981] Avg episode reward: [(0, '74.980')] [2024-06-15 11:48:00,768][1651274] Saving new best policy, reward=74.980! [2024-06-15 11:48:01,943][1651669] Updated weights for policy 0, policy_version 21328 (0.0023) [2024-06-15 11:48:03,672][1651669] Updated weights for policy 0, policy_version 21393 (0.0111) [2024-06-15 11:48:04,898][1651669] Updated weights for policy 0, policy_version 21444 (0.0024) [2024-06-15 11:48:05,770][1648981] Fps is (10 sec: 45894.0, 60 sec: 48056.9, 300 sec: 47318.6). Total num frames: 43974656. Throughput: 0: 11945.6. Samples: 11060224. Policy #0 lag: (min: 15.0, avg: 78.2, max: 271.0) [2024-06-15 11:48:05,771][1648981] Avg episode reward: [(0, '77.520')] [2024-06-15 11:48:06,126][1651669] Updated weights for policy 0, policy_version 21500 (0.0012) [2024-06-15 11:48:06,162][1651274] Saving new best policy, reward=77.520! [2024-06-15 11:48:08,424][1651669] Updated weights for policy 0, policy_version 21565 (0.0022) [2024-06-15 11:48:10,778][1648981] Fps is (10 sec: 52366.8, 60 sec: 45866.2, 300 sec: 46650.9). Total num frames: 44171264. Throughput: 0: 11943.5. Samples: 11093504. Policy #0 lag: (min: 15.0, avg: 78.2, max: 271.0) [2024-06-15 11:48:10,779][1648981] Avg episode reward: [(0, '77.550')] [2024-06-15 11:48:10,783][1651274] Saving new best policy, reward=77.550! [2024-06-15 11:48:14,423][1651669] Updated weights for policy 0, policy_version 21648 (0.0013) [2024-06-15 11:48:15,768][1648981] Fps is (10 sec: 42608.6, 60 sec: 47514.7, 300 sec: 47207.9). Total num frames: 44400640. Throughput: 0: 12082.8. Samples: 11176960. Policy #0 lag: (min: 15.0, avg: 78.2, max: 271.0) [2024-06-15 11:48:15,769][1648981] Avg episode reward: [(0, '77.100')] [2024-06-15 11:48:16,193][1651669] Updated weights for policy 0, policy_version 21712 (0.0012) [2024-06-15 11:48:18,369][1651669] Updated weights for policy 0, policy_version 21777 (0.0013) [2024-06-15 11:48:20,766][1648981] Fps is (10 sec: 52491.0, 60 sec: 47513.6, 300 sec: 46763.8). Total num frames: 44695552. Throughput: 0: 11946.7. Samples: 11240448. Policy #0 lag: (min: 157.0, avg: 217.9, max: 393.0) [2024-06-15 11:48:20,767][1648981] Avg episode reward: [(0, '76.520')] [2024-06-15 11:48:24,177][1651669] Updated weights for policy 0, policy_version 21844 (0.0012) [2024-06-15 11:48:25,772][1648981] Fps is (10 sec: 45854.6, 60 sec: 46962.9, 300 sec: 47207.2). Total num frames: 44859392. Throughput: 0: 12109.5. Samples: 11287040. Policy #0 lag: (min: 157.0, avg: 217.9, max: 393.0) [2024-06-15 11:48:25,773][1648981] Avg episode reward: [(0, '77.360')] [2024-06-15 11:48:26,020][1651669] Updated weights for policy 0, policy_version 21910 (0.0012) [2024-06-15 11:48:27,506][1651669] Updated weights for policy 0, policy_version 21968 (0.0012) [2024-06-15 11:48:27,652][1651274] Signal inference workers to stop experience collection... (1150 times) [2024-06-15 11:48:27,748][1651669] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2024-06-15 11:48:27,929][1651274] Signal inference workers to resume experience collection... (1150 times) [2024-06-15 11:48:27,942][1651669] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2024-06-15 11:48:28,739][1651669] Updated weights for policy 0, policy_version 22016 (0.0012) [2024-06-15 11:48:30,437][1651669] Updated weights for policy 0, policy_version 22080 (0.0013) [2024-06-15 11:48:30,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 49151.9, 300 sec: 47208.1). Total num frames: 45219840. Throughput: 0: 12007.5. Samples: 11340288. Policy #0 lag: (min: 157.0, avg: 217.9, max: 393.0) [2024-06-15 11:48:30,767][1648981] Avg episode reward: [(0, '76.950')] [2024-06-15 11:48:35,780][1648981] Fps is (10 sec: 36016.0, 60 sec: 45864.6, 300 sec: 46650.5). Total num frames: 45219840. Throughput: 0: 11852.0. Samples: 11418624. Policy #0 lag: (min: 157.0, avg: 217.9, max: 393.0) [2024-06-15 11:48:35,781][1648981] Avg episode reward: [(0, '75.430')] [2024-06-15 11:48:37,345][1651669] Updated weights for policy 0, policy_version 22150 (0.0012) [2024-06-15 11:48:39,137][1651669] Updated weights for policy 0, policy_version 22209 (0.0012) [2024-06-15 11:48:40,768][1648981] Fps is (10 sec: 39317.0, 60 sec: 48074.1, 300 sec: 47207.9). Total num frames: 45613056. Throughput: 0: 11891.5. Samples: 11447808. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 11:48:40,768][1648981] Avg episode reward: [(0, '73.240')] [2024-06-15 11:48:40,969][1651669] Updated weights for policy 0, policy_version 22275 (0.0014) [2024-06-15 11:48:45,766][1648981] Fps is (10 sec: 52502.0, 60 sec: 45886.5, 300 sec: 46655.2). Total num frames: 45744128. Throughput: 0: 11571.2. Samples: 11510784. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 11:48:45,767][1648981] Avg episode reward: [(0, '72.480')] [2024-06-15 11:48:47,552][1651669] Updated weights for policy 0, policy_version 22338 (0.0012) [2024-06-15 11:48:49,100][1651669] Updated weights for policy 0, policy_version 22401 (0.0019) [2024-06-15 11:48:50,597][1651669] Updated weights for policy 0, policy_version 22458 (0.0015) [2024-06-15 11:48:50,766][1648981] Fps is (10 sec: 39326.5, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 46006272. Throughput: 0: 11583.6. Samples: 11581440. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 11:48:50,767][1648981] Avg episode reward: [(0, '72.150')] [2024-06-15 11:48:52,272][1651669] Updated weights for policy 0, policy_version 22516 (0.0014) [2024-06-15 11:48:53,833][1651669] Updated weights for policy 0, policy_version 22588 (0.0140) [2024-06-15 11:48:55,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45881.3, 300 sec: 46652.8). Total num frames: 46268416. Throughput: 0: 11346.6. Samples: 11603968. Policy #0 lag: (min: 88.0, avg: 185.8, max: 344.0) [2024-06-15 11:48:55,767][1648981] Avg episode reward: [(0, '72.160')] [2024-06-15 11:48:55,789][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000022592_46268416.pth... [2024-06-15 11:48:55,842][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000017152_35127296.pth [2024-06-15 11:49:00,503][1651669] Updated weights for policy 0, policy_version 22656 (0.0013) [2024-06-15 11:49:00,780][1648981] Fps is (10 sec: 39270.0, 60 sec: 45865.1, 300 sec: 46650.7). Total num frames: 46399488. Throughput: 0: 11352.1. Samples: 11687936. Policy #0 lag: (min: 88.0, avg: 185.8, max: 344.0) [2024-06-15 11:49:00,780][1648981] Avg episode reward: [(0, '74.790')] [2024-06-15 11:49:02,166][1651669] Updated weights for policy 0, policy_version 22708 (0.0012) [2024-06-15 11:49:04,140][1651669] Updated weights for policy 0, policy_version 22800 (0.0013) [2024-06-15 11:49:05,017][1651669] Updated weights for policy 0, policy_version 22844 (0.0010) [2024-06-15 11:49:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46970.5, 300 sec: 46763.8). Total num frames: 46792704. Throughput: 0: 11173.0. Samples: 11743232. Policy #0 lag: (min: 88.0, avg: 185.8, max: 344.0) [2024-06-15 11:49:05,767][1648981] Avg episode reward: [(0, '76.650')] [2024-06-15 11:49:10,575][1651274] Signal inference workers to stop experience collection... (1200 times) [2024-06-15 11:49:10,632][1651669] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2024-06-15 11:49:10,767][1648981] Fps is (10 sec: 39372.4, 60 sec: 43699.1, 300 sec: 46541.7). Total num frames: 46792704. Throughput: 0: 11049.2. Samples: 11784192. Policy #0 lag: (min: 88.0, avg: 185.8, max: 344.0) [2024-06-15 11:49:10,768][1648981] Avg episode reward: [(0, '78.250')] [2024-06-15 11:49:10,868][1651274] Signal inference workers to resume experience collection... (1200 times) [2024-06-15 11:49:10,870][1651669] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2024-06-15 11:49:11,230][1651274] Saving new best policy, reward=78.250! [2024-06-15 11:49:12,266][1651669] Updated weights for policy 0, policy_version 22912 (0.0011) [2024-06-15 11:49:14,060][1651669] Updated weights for policy 0, policy_version 22978 (0.0013) [2024-06-15 11:49:15,578][1651669] Updated weights for policy 0, policy_version 23041 (0.0013) [2024-06-15 11:49:15,767][1648981] Fps is (10 sec: 39319.2, 60 sec: 46422.0, 300 sec: 46652.7). Total num frames: 47185920. Throughput: 0: 11320.8. Samples: 11849728. Policy #0 lag: (min: 15.0, avg: 70.0, max: 271.0) [2024-06-15 11:49:15,767][1648981] Avg episode reward: [(0, '79.430')] [2024-06-15 11:49:16,177][1651274] Saving new best policy, reward=79.430! [2024-06-15 11:49:16,833][1651669] Updated weights for policy 0, policy_version 23101 (0.0012) [2024-06-15 11:49:20,781][1648981] Fps is (10 sec: 52355.7, 60 sec: 43680.3, 300 sec: 46650.5). Total num frames: 47316992. Throughput: 0: 11116.0. Samples: 11918848. Policy #0 lag: (min: 15.0, avg: 70.0, max: 271.0) [2024-06-15 11:49:20,781][1648981] Avg episode reward: [(0, '81.340')] [2024-06-15 11:49:20,782][1651274] Saving new best policy, reward=81.340! [2024-06-15 11:49:22,817][1651669] Updated weights for policy 0, policy_version 23152 (0.0012) [2024-06-15 11:49:23,995][1651669] Updated weights for policy 0, policy_version 23184 (0.0023) [2024-06-15 11:49:25,599][1651669] Updated weights for policy 0, policy_version 23249 (0.0012) [2024-06-15 11:49:25,766][1648981] Fps is (10 sec: 42601.0, 60 sec: 45879.7, 300 sec: 46432.5). Total num frames: 47611904. Throughput: 0: 11457.7. Samples: 11963392. Policy #0 lag: (min: 15.0, avg: 70.0, max: 271.0) [2024-06-15 11:49:25,767][1648981] Avg episode reward: [(0, '81.500')] [2024-06-15 11:49:26,246][1651274] Saving new best policy, reward=81.500! [2024-06-15 11:49:27,374][1651669] Updated weights for policy 0, policy_version 23315 (0.0055) [2024-06-15 11:49:30,778][1648981] Fps is (10 sec: 52441.6, 60 sec: 43682.1, 300 sec: 46651.0). Total num frames: 47841280. Throughput: 0: 11317.9. Samples: 12020224. Policy #0 lag: (min: 88.0, avg: 183.3, max: 310.0) [2024-06-15 11:49:30,779][1648981] Avg episode reward: [(0, '78.980')] [2024-06-15 11:49:33,410][1651669] Updated weights for policy 0, policy_version 23379 (0.0029) [2024-06-15 11:49:34,215][1651669] Updated weights for policy 0, policy_version 23420 (0.0016) [2024-06-15 11:49:35,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46978.3, 300 sec: 46319.5). Total num frames: 48037888. Throughput: 0: 11491.5. Samples: 12098560. Policy #0 lag: (min: 88.0, avg: 183.3, max: 310.0) [2024-06-15 11:49:35,767][1648981] Avg episode reward: [(0, '80.480')] [2024-06-15 11:49:35,994][1651669] Updated weights for policy 0, policy_version 23474 (0.0013) [2024-06-15 11:49:37,634][1651669] Updated weights for policy 0, policy_version 23555 (0.0141) [2024-06-15 11:49:39,048][1651669] Updated weights for policy 0, policy_version 23614 (0.0014) [2024-06-15 11:49:40,766][1648981] Fps is (10 sec: 52491.0, 60 sec: 45876.2, 300 sec: 46652.7). Total num frames: 48365568. Throughput: 0: 11548.5. Samples: 12123648. Policy #0 lag: (min: 88.0, avg: 183.3, max: 310.0) [2024-06-15 11:49:40,767][1648981] Avg episode reward: [(0, '79.620')] [2024-06-15 11:49:45,422][1651669] Updated weights for policy 0, policy_version 23678 (0.0026) [2024-06-15 11:49:45,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 48496640. Throughput: 0: 11438.0. Samples: 12202496. Policy #0 lag: (min: 15.0, avg: 80.7, max: 271.0) [2024-06-15 11:49:45,767][1648981] Avg episode reward: [(0, '79.780')] [2024-06-15 11:49:47,516][1651669] Updated weights for policy 0, policy_version 23744 (0.0012) [2024-06-15 11:49:47,631][1651274] Signal inference workers to stop experience collection... (1250 times) [2024-06-15 11:49:47,701][1651669] InferenceWorker_p0-w0: stopping experience collection (1250 times) [2024-06-15 11:49:47,816][1651274] Signal inference workers to resume experience collection... (1250 times) [2024-06-15 11:49:47,817][1651669] InferenceWorker_p0-w0: resuming experience collection (1250 times) [2024-06-15 11:49:48,926][1651669] Updated weights for policy 0, policy_version 23808 (0.0012) [2024-06-15 11:49:50,569][1651669] Updated weights for policy 0, policy_version 23872 (0.0021) [2024-06-15 11:49:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 46653.4). Total num frames: 48889856. Throughput: 0: 11548.5. Samples: 12262912. Policy #0 lag: (min: 15.0, avg: 80.7, max: 271.0) [2024-06-15 11:49:50,767][1648981] Avg episode reward: [(0, '80.120')] [2024-06-15 11:49:55,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 44782.9, 300 sec: 46541.7). Total num frames: 48955392. Throughput: 0: 11594.0. Samples: 12305920. Policy #0 lag: (min: 15.0, avg: 80.7, max: 271.0) [2024-06-15 11:49:55,767][1648981] Avg episode reward: [(0, '81.060')] [2024-06-15 11:49:56,254][1651669] Updated weights for policy 0, policy_version 23924 (0.0013) [2024-06-15 11:49:58,019][1651669] Updated weights for policy 0, policy_version 23956 (0.0026) [2024-06-15 11:49:59,497][1651669] Updated weights for policy 0, policy_version 24032 (0.0012) [2024-06-15 11:50:00,788][1648981] Fps is (10 sec: 42504.9, 60 sec: 48598.8, 300 sec: 46538.2). Total num frames: 49315840. Throughput: 0: 11781.8. Samples: 12380160. Policy #0 lag: (min: 111.0, avg: 207.0, max: 348.0) [2024-06-15 11:50:00,789][1648981] Avg episode reward: [(0, '80.760')] [2024-06-15 11:50:01,616][1651669] Updated weights for policy 0, policy_version 24112 (0.0014) [2024-06-15 11:50:05,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 43690.5, 300 sec: 46652.7). Total num frames: 49414144. Throughput: 0: 11791.0. Samples: 12449280. Policy #0 lag: (min: 111.0, avg: 207.0, max: 348.0) [2024-06-15 11:50:05,768][1648981] Avg episode reward: [(0, '79.930')] [2024-06-15 11:50:06,432][1651669] Updated weights for policy 0, policy_version 24147 (0.0014) [2024-06-15 11:50:08,636][1651669] Updated weights for policy 0, policy_version 24208 (0.0013) [2024-06-15 11:50:09,685][1651669] Updated weights for policy 0, policy_version 24259 (0.0018) [2024-06-15 11:50:10,766][1648981] Fps is (10 sec: 45975.7, 60 sec: 49698.3, 300 sec: 46652.8). Total num frames: 49774592. Throughput: 0: 11707.7. Samples: 12490240. Policy #0 lag: (min: 111.0, avg: 207.0, max: 348.0) [2024-06-15 11:50:10,767][1648981] Avg episode reward: [(0, '81.690')] [2024-06-15 11:50:11,039][1651274] Saving new best policy, reward=81.690! [2024-06-15 11:50:11,715][1651669] Updated weights for policy 0, policy_version 24337 (0.0013) [2024-06-15 11:50:12,643][1651669] Updated weights for policy 0, policy_version 24382 (0.0033) [2024-06-15 11:50:15,778][1648981] Fps is (10 sec: 52368.2, 60 sec: 45866.6, 300 sec: 46650.9). Total num frames: 49938432. Throughput: 0: 11946.7. Samples: 12557824. Policy #0 lag: (min: 111.0, avg: 207.0, max: 348.0) [2024-06-15 11:50:15,779][1648981] Avg episode reward: [(0, '79.590')] [2024-06-15 11:50:17,865][1651669] Updated weights for policy 0, policy_version 24448 (0.0012) [2024-06-15 11:50:20,768][1648981] Fps is (10 sec: 36040.0, 60 sec: 46977.5, 300 sec: 46319.3). Total num frames: 50135040. Throughput: 0: 11787.0. Samples: 12628992. Policy #0 lag: (min: 25.0, avg: 117.8, max: 281.0) [2024-06-15 11:50:20,768][1648981] Avg episode reward: [(0, '81.570')] [2024-06-15 11:50:21,494][1651669] Updated weights for policy 0, policy_version 24513 (0.0017) [2024-06-15 11:50:22,992][1651669] Updated weights for policy 0, policy_version 24581 (0.0136) [2024-06-15 11:50:25,769][1648981] Fps is (10 sec: 52478.5, 60 sec: 47511.7, 300 sec: 46652.4). Total num frames: 50462720. Throughput: 0: 11798.1. Samples: 12654592. Policy #0 lag: (min: 25.0, avg: 117.8, max: 281.0) [2024-06-15 11:50:25,769][1648981] Avg episode reward: [(0, '83.220')] [2024-06-15 11:50:25,833][1651274] Saving new best policy, reward=83.220! [2024-06-15 11:50:28,148][1651669] Updated weights for policy 0, policy_version 24641 (0.0013) [2024-06-15 11:50:28,575][1651274] Signal inference workers to stop experience collection... (1300 times) [2024-06-15 11:50:28,632][1651669] InferenceWorker_p0-w0: stopping experience collection (1300 times) [2024-06-15 11:50:28,864][1651274] Signal inference workers to resume experience collection... (1300 times) [2024-06-15 11:50:28,865][1651669] InferenceWorker_p0-w0: resuming experience collection (1300 times) [2024-06-15 11:50:29,488][1651669] Updated weights for policy 0, policy_version 24700 (0.0013) [2024-06-15 11:50:30,766][1648981] Fps is (10 sec: 45881.7, 60 sec: 45884.2, 300 sec: 46431.8). Total num frames: 50593792. Throughput: 0: 11662.2. Samples: 12727296. Policy #0 lag: (min: 25.0, avg: 117.8, max: 281.0) [2024-06-15 11:50:30,767][1648981] Avg episode reward: [(0, '84.770')] [2024-06-15 11:50:30,771][1651274] Saving new best policy, reward=84.770! [2024-06-15 11:50:32,501][1651669] Updated weights for policy 0, policy_version 24767 (0.0019) [2024-06-15 11:50:33,907][1651669] Updated weights for policy 0, policy_version 24817 (0.0013) [2024-06-15 11:50:35,463][1651669] Updated weights for policy 0, policy_version 24884 (0.0014) [2024-06-15 11:50:35,766][1648981] Fps is (10 sec: 52441.2, 60 sec: 49152.1, 300 sec: 46652.8). Total num frames: 50987008. Throughput: 0: 11650.8. Samples: 12787200. Policy #0 lag: (min: 50.0, avg: 181.3, max: 351.0) [2024-06-15 11:50:35,767][1648981] Avg episode reward: [(0, '83.880')] [2024-06-15 11:50:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 51052544. Throughput: 0: 11662.2. Samples: 12830720. Policy #0 lag: (min: 50.0, avg: 181.3, max: 351.0) [2024-06-15 11:50:40,767][1648981] Avg episode reward: [(0, '81.300')] [2024-06-15 11:50:40,922][1651669] Updated weights for policy 0, policy_version 24944 (0.0013) [2024-06-15 11:50:42,925][1651669] Updated weights for policy 0, policy_version 24994 (0.0013) [2024-06-15 11:50:44,698][1651669] Updated weights for policy 0, policy_version 25072 (0.0129) [2024-06-15 11:50:45,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 46550.5). Total num frames: 51412992. Throughput: 0: 11576.8. Samples: 12900864. Policy #0 lag: (min: 50.0, avg: 181.3, max: 351.0) [2024-06-15 11:50:45,767][1648981] Avg episode reward: [(0, '80.210')] [2024-06-15 11:50:46,510][1651669] Updated weights for policy 0, policy_version 25144 (0.0038) [2024-06-15 11:50:50,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 51511296. Throughput: 0: 11696.4. Samples: 12975616. Policy #0 lag: (min: 50.0, avg: 181.3, max: 351.0) [2024-06-15 11:50:50,767][1648981] Avg episode reward: [(0, '79.120')] [2024-06-15 11:50:51,813][1651669] Updated weights for policy 0, policy_version 25187 (0.0012) [2024-06-15 11:50:54,571][1651669] Updated weights for policy 0, policy_version 25249 (0.0012) [2024-06-15 11:50:55,767][1648981] Fps is (10 sec: 39320.7, 60 sec: 47513.4, 300 sec: 46541.6). Total num frames: 51806208. Throughput: 0: 11582.5. Samples: 13011456. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 11:50:55,768][1648981] Avg episode reward: [(0, '79.300')] [2024-06-15 11:50:56,130][1651669] Updated weights for policy 0, policy_version 25323 (0.0162) [2024-06-15 11:50:56,181][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000025328_51871744.pth... [2024-06-15 11:50:56,350][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000019872_40697856.pth [2024-06-15 11:50:56,355][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000025328_51871744.pth [2024-06-15 11:50:57,848][1651669] Updated weights for policy 0, policy_version 25392 (0.0022) [2024-06-15 11:51:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45345.6, 300 sec: 46652.7). Total num frames: 52035584. Throughput: 0: 11358.0. Samples: 13068800. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 11:51:00,767][1648981] Avg episode reward: [(0, '80.260')] [2024-06-15 11:51:03,188][1651669] Updated weights for policy 0, policy_version 25426 (0.0013) [2024-06-15 11:51:05,479][1651669] Updated weights for policy 0, policy_version 25478 (0.0015) [2024-06-15 11:51:05,767][1648981] Fps is (10 sec: 39320.7, 60 sec: 46421.2, 300 sec: 46544.8). Total num frames: 52199424. Throughput: 0: 11503.2. Samples: 13146624. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 11:51:05,768][1648981] Avg episode reward: [(0, '81.420')] [2024-06-15 11:51:07,669][1651669] Updated weights for policy 0, policy_version 25584 (0.0105) [2024-06-15 11:51:07,808][1651274] Signal inference workers to stop experience collection... (1350 times) [2024-06-15 11:51:07,847][1651669] InferenceWorker_p0-w0: stopping experience collection (1350 times) [2024-06-15 11:51:08,022][1651274] Signal inference workers to resume experience collection... (1350 times) [2024-06-15 11:51:08,023][1651669] InferenceWorker_p0-w0: resuming experience collection (1350 times) [2024-06-15 11:51:08,870][1651669] Updated weights for policy 0, policy_version 25636 (0.0013) [2024-06-15 11:51:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46421.4, 300 sec: 46652.7). Total num frames: 52559872. Throughput: 0: 11435.3. Samples: 13169152. Policy #0 lag: (min: 98.0, avg: 200.7, max: 358.0) [2024-06-15 11:51:10,767][1648981] Avg episode reward: [(0, '81.280')] [2024-06-15 11:51:15,105][1651669] Updated weights for policy 0, policy_version 25696 (0.0023) [2024-06-15 11:51:15,767][1648981] Fps is (10 sec: 49152.6, 60 sec: 45884.0, 300 sec: 46541.6). Total num frames: 52690944. Throughput: 0: 11707.6. Samples: 13254144. Policy #0 lag: (min: 98.0, avg: 200.7, max: 358.0) [2024-06-15 11:51:15,768][1648981] Avg episode reward: [(0, '80.070')] [2024-06-15 11:51:16,581][1651669] Updated weights for policy 0, policy_version 25744 (0.0016) [2024-06-15 11:51:18,282][1651669] Updated weights for policy 0, policy_version 25813 (0.0076) [2024-06-15 11:51:19,596][1651669] Updated weights for policy 0, policy_version 25873 (0.0013) [2024-06-15 11:51:20,424][1651669] Updated weights for policy 0, policy_version 25920 (0.0017) [2024-06-15 11:51:20,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49153.2, 300 sec: 46769.0). Total num frames: 53084160. Throughput: 0: 11628.1. Samples: 13310464. Policy #0 lag: (min: 98.0, avg: 200.7, max: 358.0) [2024-06-15 11:51:20,767][1648981] Avg episode reward: [(0, '78.300')] [2024-06-15 11:51:25,766][1648981] Fps is (10 sec: 49153.6, 60 sec: 45330.8, 300 sec: 46541.7). Total num frames: 53182464. Throughput: 0: 11810.1. Samples: 13362176. Policy #0 lag: (min: 98.0, avg: 200.7, max: 358.0) [2024-06-15 11:51:25,767][1648981] Avg episode reward: [(0, '75.030')] [2024-06-15 11:51:25,808][1651669] Updated weights for policy 0, policy_version 25977 (0.0014) [2024-06-15 11:51:28,436][1651669] Updated weights for policy 0, policy_version 26033 (0.0119) [2024-06-15 11:51:30,104][1651669] Updated weights for policy 0, policy_version 26105 (0.0013) [2024-06-15 11:51:30,767][1648981] Fps is (10 sec: 42596.9, 60 sec: 48605.6, 300 sec: 46874.9). Total num frames: 53510144. Throughput: 0: 11673.5. Samples: 13426176. Policy #0 lag: (min: 63.0, avg: 129.5, max: 319.0) [2024-06-15 11:51:30,767][1648981] Avg episode reward: [(0, '77.910')] [2024-06-15 11:51:31,722][1651669] Updated weights for policy 0, policy_version 26176 (0.0033) [2024-06-15 11:51:35,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 53608448. Throughput: 0: 11787.4. Samples: 13506048. Policy #0 lag: (min: 63.0, avg: 129.5, max: 319.0) [2024-06-15 11:51:35,767][1648981] Avg episode reward: [(0, '82.570')] [2024-06-15 11:51:36,611][1651669] Updated weights for policy 0, policy_version 26229 (0.0013) [2024-06-15 11:51:38,433][1651669] Updated weights for policy 0, policy_version 26274 (0.0013) [2024-06-15 11:51:40,620][1651669] Updated weights for policy 0, policy_version 26368 (0.0013) [2024-06-15 11:51:40,770][1648981] Fps is (10 sec: 49134.8, 60 sec: 49148.9, 300 sec: 46985.4). Total num frames: 54001664. Throughput: 0: 11877.5. Samples: 13545984. Policy #0 lag: (min: 63.0, avg: 129.5, max: 319.0) [2024-06-15 11:51:40,771][1648981] Avg episode reward: [(0, '84.270')] [2024-06-15 11:51:42,136][1651669] Updated weights for policy 0, policy_version 26432 (0.0013) [2024-06-15 11:51:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45329.1, 300 sec: 46658.4). Total num frames: 54132736. Throughput: 0: 11855.6. Samples: 13602304. Policy #0 lag: (min: 63.0, avg: 129.5, max: 319.0) [2024-06-15 11:51:45,767][1648981] Avg episode reward: [(0, '84.940')] [2024-06-15 11:51:45,768][1651274] Saving new best policy, reward=84.940! [2024-06-15 11:51:48,397][1651669] Updated weights for policy 0, policy_version 26484 (0.0013) [2024-06-15 11:51:48,746][1651669] Updated weights for policy 0, policy_version 26496 (0.0010) [2024-06-15 11:51:48,862][1651274] Signal inference workers to stop experience collection... (1400 times) [2024-06-15 11:51:48,900][1651669] InferenceWorker_p0-w0: stopping experience collection (1400 times) [2024-06-15 11:51:49,137][1651274] Signal inference workers to resume experience collection... (1400 times) [2024-06-15 11:51:49,137][1651669] InferenceWorker_p0-w0: resuming experience collection (1400 times) [2024-06-15 11:51:50,559][1651669] Updated weights for policy 0, policy_version 26565 (0.0017) [2024-06-15 11:51:50,777][1648981] Fps is (10 sec: 42570.7, 60 sec: 48597.5, 300 sec: 46762.2). Total num frames: 54427648. Throughput: 0: 11875.8. Samples: 13681152. Policy #0 lag: (min: 10.0, avg: 83.0, max: 266.0) [2024-06-15 11:51:50,778][1648981] Avg episode reward: [(0, '83.700')] [2024-06-15 11:51:52,038][1651669] Updated weights for policy 0, policy_version 26627 (0.0015) [2024-06-15 11:51:53,322][1651669] Updated weights for policy 0, policy_version 26688 (0.0012) [2024-06-15 11:51:55,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 47513.6, 300 sec: 46652.7). Total num frames: 54657024. Throughput: 0: 11980.8. Samples: 13708288. Policy #0 lag: (min: 10.0, avg: 83.0, max: 266.0) [2024-06-15 11:51:55,767][1648981] Avg episode reward: [(0, '82.480')] [2024-06-15 11:52:00,016][1651669] Updated weights for policy 0, policy_version 26738 (0.0012) [2024-06-15 11:52:00,766][1648981] Fps is (10 sec: 39362.7, 60 sec: 46421.4, 300 sec: 46541.7). Total num frames: 54820864. Throughput: 0: 11912.7. Samples: 13790208. Policy #0 lag: (min: 10.0, avg: 83.0, max: 266.0) [2024-06-15 11:52:00,767][1648981] Avg episode reward: [(0, '80.750')] [2024-06-15 11:52:01,532][1651669] Updated weights for policy 0, policy_version 26802 (0.0012) [2024-06-15 11:52:03,568][1651669] Updated weights for policy 0, policy_version 26882 (0.0030) [2024-06-15 11:52:04,905][1651669] Updated weights for policy 0, policy_version 26944 (0.0039) [2024-06-15 11:52:05,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 49698.5, 300 sec: 46652.7). Total num frames: 55181312. Throughput: 0: 11798.7. Samples: 13841408. Policy #0 lag: (min: 10.0, avg: 83.0, max: 266.0) [2024-06-15 11:52:05,767][1648981] Avg episode reward: [(0, '82.300')] [2024-06-15 11:52:10,766][1648981] Fps is (10 sec: 39321.1, 60 sec: 44236.8, 300 sec: 46320.0). Total num frames: 55214080. Throughput: 0: 11537.1. Samples: 13881344. Policy #0 lag: (min: 7.0, avg: 66.7, max: 263.0) [2024-06-15 11:52:10,767][1648981] Avg episode reward: [(0, '84.690')] [2024-06-15 11:52:11,460][1651669] Updated weights for policy 0, policy_version 27000 (0.0012) [2024-06-15 11:52:13,228][1651669] Updated weights for policy 0, policy_version 27060 (0.0015) [2024-06-15 11:52:15,449][1651669] Updated weights for policy 0, policy_version 27152 (0.0108) [2024-06-15 11:52:15,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 48605.9, 300 sec: 46652.7). Total num frames: 55607296. Throughput: 0: 11582.6. Samples: 13947392. Policy #0 lag: (min: 7.0, avg: 66.7, max: 263.0) [2024-06-15 11:52:15,767][1648981] Avg episode reward: [(0, '83.000')] [2024-06-15 11:52:16,524][1651669] Updated weights for policy 0, policy_version 27200 (0.0015) [2024-06-15 11:52:20,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 55705600. Throughput: 0: 11400.5. Samples: 14019072. Policy #0 lag: (min: 7.0, avg: 66.7, max: 263.0) [2024-06-15 11:52:20,767][1648981] Avg episode reward: [(0, '84.880')] [2024-06-15 11:52:22,544][1651669] Updated weights for policy 0, policy_version 27256 (0.0013) [2024-06-15 11:52:24,148][1651669] Updated weights for policy 0, policy_version 27284 (0.0011) [2024-06-15 11:52:25,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 46967.4, 300 sec: 46541.7). Total num frames: 56000512. Throughput: 0: 11390.1. Samples: 14058496. Policy #0 lag: (min: 7.0, avg: 66.7, max: 263.0) [2024-06-15 11:52:25,767][1648981] Avg episode reward: [(0, '82.750')] [2024-06-15 11:52:25,848][1651669] Updated weights for policy 0, policy_version 27345 (0.0015) [2024-06-15 11:52:26,636][1651274] Signal inference workers to stop experience collection... (1450 times) [2024-06-15 11:52:26,695][1651669] InferenceWorker_p0-w0: stopping experience collection (1450 times) [2024-06-15 11:52:26,855][1651274] Signal inference workers to resume experience collection... (1450 times) [2024-06-15 11:52:26,855][1651669] InferenceWorker_p0-w0: resuming experience collection (1450 times) [2024-06-15 11:52:27,401][1651669] Updated weights for policy 0, policy_version 27409 (0.0014) [2024-06-15 11:52:28,022][1651669] Updated weights for policy 0, policy_version 27453 (0.0012) [2024-06-15 11:52:30,782][1648981] Fps is (10 sec: 52345.1, 60 sec: 45317.3, 300 sec: 46650.2). Total num frames: 56229888. Throughput: 0: 11567.1. Samples: 14123008. Policy #0 lag: (min: 7.0, avg: 66.7, max: 263.0) [2024-06-15 11:52:30,783][1648981] Avg episode reward: [(0, '82.400')] [2024-06-15 11:52:33,971][1651669] Updated weights for policy 0, policy_version 27512 (0.0013) [2024-06-15 11:52:35,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 46421.3, 300 sec: 46322.5). Total num frames: 56393728. Throughput: 0: 11528.3. Samples: 14199808. Policy #0 lag: (min: 2.0, avg: 91.0, max: 258.0) [2024-06-15 11:52:35,767][1648981] Avg episode reward: [(0, '85.470')] [2024-06-15 11:52:35,973][1651669] Updated weights for policy 0, policy_version 27552 (0.0013) [2024-06-15 11:52:36,395][1651274] Saving new best policy, reward=85.470! [2024-06-15 11:52:37,634][1651669] Updated weights for policy 0, policy_version 27602 (0.0014) [2024-06-15 11:52:39,147][1651669] Updated weights for policy 0, policy_version 27665 (0.0016) [2024-06-15 11:52:40,766][1648981] Fps is (10 sec: 52512.5, 60 sec: 45878.1, 300 sec: 46655.1). Total num frames: 56754176. Throughput: 0: 11457.5. Samples: 14223872. Policy #0 lag: (min: 2.0, avg: 91.0, max: 258.0) [2024-06-15 11:52:40,767][1648981] Avg episode reward: [(0, '85.810')] [2024-06-15 11:52:40,771][1651274] Saving new best policy, reward=85.810! [2024-06-15 11:52:43,516][1651669] Updated weights for policy 0, policy_version 27716 (0.0035) [2024-06-15 11:52:44,938][1651669] Updated weights for policy 0, policy_version 27776 (0.0013) [2024-06-15 11:52:45,771][1648981] Fps is (10 sec: 49127.9, 60 sec: 45871.4, 300 sec: 46652.0). Total num frames: 56885248. Throughput: 0: 11194.5. Samples: 14294016. Policy #0 lag: (min: 2.0, avg: 91.0, max: 258.0) [2024-06-15 11:52:45,772][1648981] Avg episode reward: [(0, '86.950')] [2024-06-15 11:52:45,773][1651274] Saving new best policy, reward=86.950! [2024-06-15 11:52:48,393][1651669] Updated weights for policy 0, policy_version 27840 (0.0012) [2024-06-15 11:52:49,725][1651669] Updated weights for policy 0, policy_version 27891 (0.0014) [2024-06-15 11:52:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46429.3, 300 sec: 46431.8). Total num frames: 57212928. Throughput: 0: 11730.5. Samples: 14369280. Policy #0 lag: (min: 2.0, avg: 91.0, max: 258.0) [2024-06-15 11:52:50,767][1648981] Avg episode reward: [(0, '84.820')] [2024-06-15 11:52:51,008][1651669] Updated weights for policy 0, policy_version 27952 (0.0098) [2024-06-15 11:52:54,388][1651669] Updated weights for policy 0, policy_version 27986 (0.0013) [2024-06-15 11:52:55,767][1648981] Fps is (10 sec: 52453.6, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 57409536. Throughput: 0: 11628.0. Samples: 14404608. Policy #0 lag: (min: 175.0, avg: 245.9, max: 415.0) [2024-06-15 11:52:55,767][1648981] Avg episode reward: [(0, '86.420')] [2024-06-15 11:52:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000028032_57409536.pth... [2024-06-15 11:52:55,848][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000022592_46268416.pth [2024-06-15 11:52:58,233][1651669] Updated weights for policy 0, policy_version 28048 (0.0014) [2024-06-15 11:53:00,146][1651669] Updated weights for policy 0, policy_version 28114 (0.0012) [2024-06-15 11:53:00,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 46967.3, 300 sec: 46320.1). Total num frames: 57638912. Throughput: 0: 11719.2. Samples: 14474752. Policy #0 lag: (min: 175.0, avg: 245.9, max: 415.0) [2024-06-15 11:53:00,767][1648981] Avg episode reward: [(0, '87.880')] [2024-06-15 11:53:01,103][1651274] Saving new best policy, reward=87.880! [2024-06-15 11:53:01,915][1651669] Updated weights for policy 0, policy_version 28192 (0.0013) [2024-06-15 11:53:05,109][1651669] Updated weights for policy 0, policy_version 28228 (0.0012) [2024-06-15 11:53:05,777][1648981] Fps is (10 sec: 45825.5, 60 sec: 44774.7, 300 sec: 46430.7). Total num frames: 57868288. Throughput: 0: 11693.5. Samples: 14545408. Policy #0 lag: (min: 175.0, avg: 245.9, max: 415.0) [2024-06-15 11:53:05,778][1648981] Avg episode reward: [(0, '87.170')] [2024-06-15 11:53:09,293][1651669] Updated weights for policy 0, policy_version 28292 (0.0013) [2024-06-15 11:53:09,597][1651274] Signal inference workers to stop experience collection... (1500 times) [2024-06-15 11:53:09,660][1651669] InferenceWorker_p0-w0: stopping experience collection (1500 times) [2024-06-15 11:53:09,883][1651274] Signal inference workers to resume experience collection... (1500 times) [2024-06-15 11:53:09,884][1651669] InferenceWorker_p0-w0: resuming experience collection (1500 times) [2024-06-15 11:53:10,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 46967.4, 300 sec: 46208.6). Total num frames: 58032128. Throughput: 0: 11548.4. Samples: 14578176. Policy #0 lag: (min: 31.0, avg: 113.9, max: 287.0) [2024-06-15 11:53:10,767][1648981] Avg episode reward: [(0, '87.970')] [2024-06-15 11:53:11,282][1651274] Saving new best policy, reward=87.970! [2024-06-15 11:53:11,284][1651669] Updated weights for policy 0, policy_version 28368 (0.0144) [2024-06-15 11:53:12,978][1651669] Updated weights for policy 0, policy_version 28437 (0.0012) [2024-06-15 11:53:13,796][1651669] Updated weights for policy 0, policy_version 28478 (0.0013) [2024-06-15 11:53:15,766][1648981] Fps is (10 sec: 45925.9, 60 sec: 45329.3, 300 sec: 46208.4). Total num frames: 58327040. Throughput: 0: 11370.4. Samples: 14634496. Policy #0 lag: (min: 31.0, avg: 113.9, max: 287.0) [2024-06-15 11:53:15,767][1648981] Avg episode reward: [(0, '86.700')] [2024-06-15 11:53:18,444][1651669] Updated weights for policy 0, policy_version 28528 (0.0014) [2024-06-15 11:53:20,768][1648981] Fps is (10 sec: 42591.8, 60 sec: 45873.9, 300 sec: 46098.0). Total num frames: 58458112. Throughput: 0: 11252.2. Samples: 14706176. Policy #0 lag: (min: 31.0, avg: 113.9, max: 287.0) [2024-06-15 11:53:20,768][1648981] Avg episode reward: [(0, '83.670')] [2024-06-15 11:53:22,273][1651669] Updated weights for policy 0, policy_version 28577 (0.0013) [2024-06-15 11:53:23,979][1651669] Updated weights for policy 0, policy_version 28656 (0.0012) [2024-06-15 11:53:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46967.5, 300 sec: 46097.4). Total num frames: 58818560. Throughput: 0: 11434.7. Samples: 14738432. Policy #0 lag: (min: 31.0, avg: 113.9, max: 287.0) [2024-06-15 11:53:25,767][1648981] Avg episode reward: [(0, '84.620')] [2024-06-15 11:53:25,854][1651669] Updated weights for policy 0, policy_version 28726 (0.0013) [2024-06-15 11:53:30,774][1648981] Fps is (10 sec: 45846.7, 60 sec: 44789.0, 300 sec: 46431.5). Total num frames: 58916864. Throughput: 0: 11377.0. Samples: 14806016. Policy #0 lag: (min: 14.0, avg: 107.6, max: 270.0) [2024-06-15 11:53:30,775][1648981] Avg episode reward: [(0, '85.780')] [2024-06-15 11:53:31,343][1651669] Updated weights for policy 0, policy_version 28792 (0.0015) [2024-06-15 11:53:34,294][1651669] Updated weights for policy 0, policy_version 28852 (0.0018) [2024-06-15 11:53:35,690][1651669] Updated weights for policy 0, policy_version 28912 (0.0012) [2024-06-15 11:53:35,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 46967.5, 300 sec: 46097.6). Total num frames: 59211776. Throughput: 0: 11104.7. Samples: 14868992. Policy #0 lag: (min: 14.0, avg: 107.6, max: 270.0) [2024-06-15 11:53:35,767][1648981] Avg episode reward: [(0, '84.060')] [2024-06-15 11:53:37,177][1651669] Updated weights for policy 0, policy_version 28976 (0.0159) [2024-06-15 11:53:40,767][1648981] Fps is (10 sec: 45909.7, 60 sec: 43690.4, 300 sec: 46208.4). Total num frames: 59375616. Throughput: 0: 11093.3. Samples: 14903808. Policy #0 lag: (min: 14.0, avg: 107.6, max: 270.0) [2024-06-15 11:53:40,768][1648981] Avg episode reward: [(0, '83.140')] [2024-06-15 11:53:41,944][1651669] Updated weights for policy 0, policy_version 29026 (0.0012) [2024-06-15 11:53:44,017][1651669] Updated weights for policy 0, policy_version 29075 (0.0013) [2024-06-15 11:53:44,932][1651669] Updated weights for policy 0, policy_version 29120 (0.0127) [2024-06-15 11:53:45,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 45878.9, 300 sec: 46208.4). Total num frames: 59637760. Throughput: 0: 11229.9. Samples: 14980096. Policy #0 lag: (min: 14.0, avg: 107.6, max: 270.0) [2024-06-15 11:53:45,767][1648981] Avg episode reward: [(0, '83.070')] [2024-06-15 11:53:46,888][1651669] Updated weights for policy 0, policy_version 29170 (0.0015) [2024-06-15 11:53:47,894][1651274] Signal inference workers to stop experience collection... (1550 times) [2024-06-15 11:53:47,959][1651669] InferenceWorker_p0-w0: stopping experience collection (1550 times) [2024-06-15 11:53:47,961][1651669] Updated weights for policy 0, policy_version 29220 (0.0011) [2024-06-15 11:53:48,087][1651274] Signal inference workers to resume experience collection... (1550 times) [2024-06-15 11:53:48,088][1651669] InferenceWorker_p0-w0: resuming experience collection (1550 times) [2024-06-15 11:53:48,416][1651669] Updated weights for policy 0, policy_version 29248 (0.0012) [2024-06-15 11:53:50,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 44782.8, 300 sec: 46208.4). Total num frames: 59899904. Throughput: 0: 11483.0. Samples: 15062016. Policy #0 lag: (min: 14.0, avg: 107.6, max: 270.0) [2024-06-15 11:53:50,767][1648981] Avg episode reward: [(0, '83.800')] [2024-06-15 11:53:52,839][1651669] Updated weights for policy 0, policy_version 29303 (0.0015) [2024-06-15 11:53:53,802][1651669] Updated weights for policy 0, policy_version 29344 (0.0012) [2024-06-15 11:53:55,592][1651669] Updated weights for policy 0, policy_version 29379 (0.0014) [2024-06-15 11:53:55,767][1648981] Fps is (10 sec: 55702.9, 60 sec: 46421.1, 300 sec: 46765.8). Total num frames: 60194816. Throughput: 0: 11571.1. Samples: 15098880. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 11:53:55,768][1648981] Avg episode reward: [(0, '81.840')] [2024-06-15 11:53:57,620][1651669] Updated weights for policy 0, policy_version 29456 (0.0012) [2024-06-15 11:54:00,772][1648981] Fps is (10 sec: 52398.6, 60 sec: 46416.8, 300 sec: 46207.5). Total num frames: 60424192. Throughput: 0: 11672.1. Samples: 15159808. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 11:54:00,773][1648981] Avg episode reward: [(0, '82.490')] [2024-06-15 11:54:02,351][1651669] Updated weights for policy 0, policy_version 29522 (0.0025) [2024-06-15 11:54:04,020][1651669] Updated weights for policy 0, policy_version 29570 (0.0012) [2024-06-15 11:54:05,766][1648981] Fps is (10 sec: 49154.4, 60 sec: 46976.1, 300 sec: 47097.1). Total num frames: 60686336. Throughput: 0: 11981.2. Samples: 15245312. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 11:54:05,767][1648981] Avg episode reward: [(0, '81.950')] [2024-06-15 11:54:06,403][1651669] Updated weights for policy 0, policy_version 29635 (0.0015) [2024-06-15 11:54:07,912][1651669] Updated weights for policy 0, policy_version 29712 (0.0014) [2024-06-15 11:54:09,018][1651669] Updated weights for policy 0, policy_version 29757 (0.0012) [2024-06-15 11:54:10,778][1648981] Fps is (10 sec: 52398.0, 60 sec: 48596.4, 300 sec: 46651.0). Total num frames: 60948480. Throughput: 0: 11943.6. Samples: 15276032. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 11:54:10,779][1648981] Avg episode reward: [(0, '80.370')] [2024-06-15 11:54:14,725][1651669] Updated weights for policy 0, policy_version 29827 (0.0014) [2024-06-15 11:54:15,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 47513.7, 300 sec: 46988.3). Total num frames: 61177856. Throughput: 0: 12301.5. Samples: 15359488. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 11:54:15,767][1648981] Avg episode reward: [(0, '82.500')] [2024-06-15 11:54:16,054][1651669] Updated weights for policy 0, policy_version 29882 (0.0015) [2024-06-15 11:54:17,642][1651669] Updated weights for policy 0, policy_version 29938 (0.0011) [2024-06-15 11:54:18,962][1651669] Updated weights for policy 0, policy_version 30008 (0.0026) [2024-06-15 11:54:20,766][1648981] Fps is (10 sec: 52490.2, 60 sec: 50245.6, 300 sec: 46986.0). Total num frames: 61472768. Throughput: 0: 12435.9. Samples: 15428608. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 11:54:20,767][1648981] Avg episode reward: [(0, '85.300')] [2024-06-15 11:54:25,234][1651669] Updated weights for policy 0, policy_version 30050 (0.0012) [2024-06-15 11:54:25,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 46421.1, 300 sec: 46654.6). Total num frames: 61603840. Throughput: 0: 12526.9. Samples: 15467520. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 11:54:25,767][1648981] Avg episode reward: [(0, '84.290')] [2024-06-15 11:54:26,283][1651669] Updated weights for policy 0, policy_version 30112 (0.0012) [2024-06-15 11:54:27,274][1651274] Signal inference workers to stop experience collection... (1600 times) [2024-06-15 11:54:27,310][1651669] InferenceWorker_p0-w0: stopping experience collection (1600 times) [2024-06-15 11:54:27,553][1651274] Signal inference workers to resume experience collection... (1600 times) [2024-06-15 11:54:27,554][1651669] InferenceWorker_p0-w0: resuming experience collection (1600 times) [2024-06-15 11:54:28,109][1651669] Updated weights for policy 0, policy_version 30180 (0.0103) [2024-06-15 11:54:29,229][1651669] Updated weights for policy 0, policy_version 30230 (0.0016) [2024-06-15 11:54:29,921][1651669] Updated weights for policy 0, policy_version 30272 (0.0014) [2024-06-15 11:54:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 51343.2, 300 sec: 47319.2). Total num frames: 61997056. Throughput: 0: 12265.2. Samples: 15532032. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 11:54:30,767][1648981] Avg episode reward: [(0, '90.510')] [2024-06-15 11:54:30,768][1651274] Saving new best policy, reward=90.510! [2024-06-15 11:54:35,766][1648981] Fps is (10 sec: 49154.1, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 62095360. Throughput: 0: 12288.1. Samples: 15614976. Policy #0 lag: (min: 15.0, avg: 77.6, max: 271.0) [2024-06-15 11:54:35,767][1648981] Avg episode reward: [(0, '90.980')] [2024-06-15 11:54:35,968][1651669] Updated weights for policy 0, policy_version 30324 (0.0013) [2024-06-15 11:54:36,202][1651274] Saving new best policy, reward=90.980! [2024-06-15 11:54:37,602][1651669] Updated weights for policy 0, policy_version 30369 (0.0012) [2024-06-15 11:54:38,850][1651669] Updated weights for policy 0, policy_version 30432 (0.0103) [2024-06-15 11:54:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 51336.8, 300 sec: 47319.2). Total num frames: 62455808. Throughput: 0: 12208.5. Samples: 15648256. Policy #0 lag: (min: 15.0, avg: 77.6, max: 271.0) [2024-06-15 11:54:40,767][1648981] Avg episode reward: [(0, '89.220')] [2024-06-15 11:54:40,788][1651669] Updated weights for policy 0, policy_version 30512 (0.0141) [2024-06-15 11:54:45,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 48059.6, 300 sec: 46208.4). Total num frames: 62521344. Throughput: 0: 12494.4. Samples: 15721984. Policy #0 lag: (min: 15.0, avg: 77.6, max: 271.0) [2024-06-15 11:54:45,768][1648981] Avg episode reward: [(0, '92.040')] [2024-06-15 11:54:45,999][1651274] Saving new best policy, reward=92.040! [2024-06-15 11:54:47,194][1651669] Updated weights for policy 0, policy_version 30592 (0.0032) [2024-06-15 11:54:50,181][1651669] Updated weights for policy 0, policy_version 30688 (0.0129) [2024-06-15 11:54:50,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 49698.2, 300 sec: 47208.1). Total num frames: 62881792. Throughput: 0: 11844.3. Samples: 15778304. Policy #0 lag: (min: 15.0, avg: 77.6, max: 271.0) [2024-06-15 11:54:50,767][1648981] Avg episode reward: [(0, '90.110')] [2024-06-15 11:54:52,501][1651669] Updated weights for policy 0, policy_version 30757 (0.0013) [2024-06-15 11:54:55,767][1648981] Fps is (10 sec: 52428.8, 60 sec: 47513.8, 300 sec: 46545.1). Total num frames: 63045632. Throughput: 0: 12018.0. Samples: 15816704. Policy #0 lag: (min: 15.0, avg: 77.6, max: 271.0) [2024-06-15 11:54:55,768][1648981] Avg episode reward: [(0, '89.470')] [2024-06-15 11:54:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000030784_63045632.pth... [2024-06-15 11:54:55,840][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000025328_51871744.pth [2024-06-15 11:54:56,880][1651669] Updated weights for policy 0, policy_version 30801 (0.0014) [2024-06-15 11:54:57,954][1651669] Updated weights for policy 0, policy_version 30845 (0.0016) [2024-06-15 11:54:59,876][1651669] Updated weights for policy 0, policy_version 30896 (0.0012) [2024-06-15 11:55:00,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48610.6, 300 sec: 47208.2). Total num frames: 63340544. Throughput: 0: 11969.4. Samples: 15898112. Policy #0 lag: (min: 12.0, avg: 82.1, max: 268.0) [2024-06-15 11:55:00,767][1648981] Avg episode reward: [(0, '89.260')] [2024-06-15 11:55:01,707][1651669] Updated weights for policy 0, policy_version 30969 (0.0015) [2024-06-15 11:55:03,160][1651669] Updated weights for policy 0, policy_version 31008 (0.0024) [2024-06-15 11:55:05,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 48059.8, 300 sec: 46763.8). Total num frames: 63569920. Throughput: 0: 12003.6. Samples: 15968768. Policy #0 lag: (min: 12.0, avg: 82.1, max: 268.0) [2024-06-15 11:55:05,767][1648981] Avg episode reward: [(0, '86.120')] [2024-06-15 11:55:08,109][1651669] Updated weights for policy 0, policy_version 31076 (0.0015) [2024-06-15 11:55:09,055][1651274] Signal inference workers to stop experience collection... (1650 times) [2024-06-15 11:55:09,116][1651669] InferenceWorker_p0-w0: stopping experience collection (1650 times) [2024-06-15 11:55:09,338][1651274] Signal inference workers to resume experience collection... (1650 times) [2024-06-15 11:55:09,339][1651669] InferenceWorker_p0-w0: resuming experience collection (1650 times) [2024-06-15 11:55:09,341][1651669] Updated weights for policy 0, policy_version 31120 (0.0012) [2024-06-15 11:55:10,770][1648981] Fps is (10 sec: 49133.6, 60 sec: 48066.1, 300 sec: 47098.3). Total num frames: 63832064. Throughput: 0: 12048.1. Samples: 16009728. Policy #0 lag: (min: 12.0, avg: 82.1, max: 268.0) [2024-06-15 11:55:10,771][1648981] Avg episode reward: [(0, '83.620')] [2024-06-15 11:55:11,055][1651669] Updated weights for policy 0, policy_version 31184 (0.0076) [2024-06-15 11:55:13,618][1651669] Updated weights for policy 0, policy_version 31236 (0.0013) [2024-06-15 11:55:14,826][1651669] Updated weights for policy 0, policy_version 31295 (0.0015) [2024-06-15 11:55:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.8, 300 sec: 47319.4). Total num frames: 64094208. Throughput: 0: 11969.4. Samples: 16070656. Policy #0 lag: (min: 12.0, avg: 82.1, max: 268.0) [2024-06-15 11:55:15,767][1648981] Avg episode reward: [(0, '87.610')] [2024-06-15 11:55:19,607][1651669] Updated weights for policy 0, policy_version 31350 (0.0093) [2024-06-15 11:55:20,768][1648981] Fps is (10 sec: 45887.4, 60 sec: 46966.6, 300 sec: 46875.1). Total num frames: 64290816. Throughput: 0: 11946.3. Samples: 16152576. Policy #0 lag: (min: 15.0, avg: 88.2, max: 271.0) [2024-06-15 11:55:20,768][1648981] Avg episode reward: [(0, '85.700')] [2024-06-15 11:55:20,900][1651669] Updated weights for policy 0, policy_version 31396 (0.0014) [2024-06-15 11:55:22,009][1651669] Updated weights for policy 0, policy_version 31441 (0.0012) [2024-06-15 11:55:22,853][1651669] Updated weights for policy 0, policy_version 31488 (0.0011) [2024-06-15 11:55:25,098][1651669] Updated weights for policy 0, policy_version 31543 (0.0019) [2024-06-15 11:55:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.5, 300 sec: 47541.4). Total num frames: 64618496. Throughput: 0: 11923.9. Samples: 16184832. Policy #0 lag: (min: 15.0, avg: 88.2, max: 271.0) [2024-06-15 11:55:25,767][1648981] Avg episode reward: [(0, '87.200')] [2024-06-15 11:55:30,373][1651669] Updated weights for policy 0, policy_version 31600 (0.0034) [2024-06-15 11:55:30,766][1648981] Fps is (10 sec: 42603.2, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 64716800. Throughput: 0: 12094.6. Samples: 16266240. Policy #0 lag: (min: 15.0, avg: 88.2, max: 271.0) [2024-06-15 11:55:30,767][1648981] Avg episode reward: [(0, '90.280')] [2024-06-15 11:55:31,418][1651669] Updated weights for policy 0, policy_version 31636 (0.0013) [2024-06-15 11:55:32,911][1651669] Updated weights for policy 0, policy_version 31696 (0.0012) [2024-06-15 11:55:34,075][1651669] Updated weights for policy 0, policy_version 31741 (0.0014) [2024-06-15 11:55:35,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 49151.9, 300 sec: 47430.3). Total num frames: 65044480. Throughput: 0: 12151.5. Samples: 16325120. Policy #0 lag: (min: 15.0, avg: 88.2, max: 271.0) [2024-06-15 11:55:35,767][1648981] Avg episode reward: [(0, '92.180')] [2024-06-15 11:55:36,596][1651274] Saving new best policy, reward=92.180! [2024-06-15 11:55:36,839][1651669] Updated weights for policy 0, policy_version 31798 (0.0012) [2024-06-15 11:55:40,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 44782.9, 300 sec: 46541.6). Total num frames: 65142784. Throughput: 0: 11992.2. Samples: 16356352. Policy #0 lag: (min: 15.0, avg: 88.2, max: 271.0) [2024-06-15 11:55:40,767][1648981] Avg episode reward: [(0, '91.380')] [2024-06-15 11:55:41,127][1651669] Updated weights for policy 0, policy_version 31832 (0.0079) [2024-06-15 11:55:43,022][1651669] Updated weights for policy 0, policy_version 31905 (0.0024) [2024-06-15 11:55:44,425][1651669] Updated weights for policy 0, policy_version 31954 (0.0015) [2024-06-15 11:55:45,782][1648981] Fps is (10 sec: 49075.3, 60 sec: 50231.4, 300 sec: 47538.9). Total num frames: 65536000. Throughput: 0: 11851.5. Samples: 16431616. Policy #0 lag: (min: 15.0, avg: 93.1, max: 271.0) [2024-06-15 11:55:45,783][1648981] Avg episode reward: [(0, '93.040')] [2024-06-15 11:55:45,784][1651274] Saving new best policy, reward=93.040! [2024-06-15 11:55:46,582][1651669] Updated weights for policy 0, policy_version 32003 (0.0010) [2024-06-15 11:55:50,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 65667072. Throughput: 0: 11855.6. Samples: 16502272. Policy #0 lag: (min: 15.0, avg: 93.1, max: 271.0) [2024-06-15 11:55:50,767][1648981] Avg episode reward: [(0, '88.310')] [2024-06-15 11:55:51,746][1651669] Updated weights for policy 0, policy_version 32080 (0.0013) [2024-06-15 11:55:51,898][1651274] Signal inference workers to stop experience collection... (1700 times) [2024-06-15 11:55:51,955][1651669] InferenceWorker_p0-w0: stopping experience collection (1700 times) [2024-06-15 11:55:52,103][1651274] Signal inference workers to resume experience collection... (1700 times) [2024-06-15 11:55:52,104][1651669] InferenceWorker_p0-w0: resuming experience collection (1700 times) [2024-06-15 11:55:52,828][1651669] Updated weights for policy 0, policy_version 32124 (0.0010) [2024-06-15 11:55:54,295][1651669] Updated weights for policy 0, policy_version 32176 (0.0021) [2024-06-15 11:55:55,451][1651669] Updated weights for policy 0, policy_version 32208 (0.0012) [2024-06-15 11:55:55,766][1648981] Fps is (10 sec: 42665.1, 60 sec: 48606.1, 300 sec: 47208.1). Total num frames: 65961984. Throughput: 0: 11811.1. Samples: 16541184. Policy #0 lag: (min: 15.0, avg: 93.1, max: 271.0) [2024-06-15 11:55:55,767][1648981] Avg episode reward: [(0, '87.770')] [2024-06-15 11:55:56,415][1651669] Updated weights for policy 0, policy_version 32251 (0.0040) [2024-06-15 11:55:58,628][1651669] Updated weights for policy 0, policy_version 32304 (0.0012) [2024-06-15 11:56:00,767][1648981] Fps is (10 sec: 52426.9, 60 sec: 47513.4, 300 sec: 47430.3). Total num frames: 66191360. Throughput: 0: 11912.4. Samples: 16606720. Policy #0 lag: (min: 15.0, avg: 93.1, max: 271.0) [2024-06-15 11:56:00,768][1648981] Avg episode reward: [(0, '86.090')] [2024-06-15 11:56:02,805][1651669] Updated weights for policy 0, policy_version 32339 (0.0013) [2024-06-15 11:56:04,451][1651669] Updated weights for policy 0, policy_version 32385 (0.0012) [2024-06-15 11:56:05,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 66420736. Throughput: 0: 11810.4. Samples: 16684032. Policy #0 lag: (min: 3.0, avg: 89.9, max: 259.0) [2024-06-15 11:56:05,767][1648981] Avg episode reward: [(0, '85.850')] [2024-06-15 11:56:05,889][1651669] Updated weights for policy 0, policy_version 32442 (0.0011) [2024-06-15 11:56:07,038][1651669] Updated weights for policy 0, policy_version 32482 (0.0011) [2024-06-15 11:56:08,898][1651669] Updated weights for policy 0, policy_version 32528 (0.0014) [2024-06-15 11:56:09,965][1651669] Updated weights for policy 0, policy_version 32574 (0.0012) [2024-06-15 11:56:10,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 48062.7, 300 sec: 47541.4). Total num frames: 66715648. Throughput: 0: 11901.1. Samples: 16720384. Policy #0 lag: (min: 3.0, avg: 89.9, max: 259.0) [2024-06-15 11:56:10,767][1648981] Avg episode reward: [(0, '83.570')] [2024-06-15 11:56:13,990][1651669] Updated weights for policy 0, policy_version 32638 (0.0032) [2024-06-15 11:56:15,785][1648981] Fps is (10 sec: 45791.2, 60 sec: 46407.1, 300 sec: 46760.9). Total num frames: 66879488. Throughput: 0: 11566.5. Samples: 16786944. Policy #0 lag: (min: 3.0, avg: 89.9, max: 259.0) [2024-06-15 11:56:15,785][1648981] Avg episode reward: [(0, '86.200')] [2024-06-15 11:56:16,339][1651669] Updated weights for policy 0, policy_version 32674 (0.0013) [2024-06-15 11:56:17,863][1651669] Updated weights for policy 0, policy_version 32721 (0.0014) [2024-06-15 11:56:19,915][1651669] Updated weights for policy 0, policy_version 32772 (0.0020) [2024-06-15 11:56:20,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 48060.7, 300 sec: 47430.3). Total num frames: 67174400. Throughput: 0: 11832.9. Samples: 16857600. Policy #0 lag: (min: 3.0, avg: 89.9, max: 259.0) [2024-06-15 11:56:20,767][1648981] Avg episode reward: [(0, '89.330')] [2024-06-15 11:56:21,203][1651669] Updated weights for policy 0, policy_version 32824 (0.0013) [2024-06-15 11:56:25,484][1651669] Updated weights for policy 0, policy_version 32888 (0.0030) [2024-06-15 11:56:25,767][1648981] Fps is (10 sec: 49241.7, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 67371008. Throughput: 0: 12014.9. Samples: 16897024. Policy #0 lag: (min: 1.0, avg: 98.5, max: 257.0) [2024-06-15 11:56:25,767][1648981] Avg episode reward: [(0, '91.980')] [2024-06-15 11:56:27,137][1651669] Updated weights for policy 0, policy_version 32932 (0.0012) [2024-06-15 11:56:29,509][1651669] Updated weights for policy 0, policy_version 32978 (0.0015) [2024-06-15 11:56:30,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 67633152. Throughput: 0: 11825.6. Samples: 16963584. Policy #0 lag: (min: 1.0, avg: 98.5, max: 257.0) [2024-06-15 11:56:30,767][1648981] Avg episode reward: [(0, '92.970')] [2024-06-15 11:56:31,866][1651669] Updated weights for policy 0, policy_version 33027 (0.0016) [2024-06-15 11:56:33,262][1651669] Updated weights for policy 0, policy_version 33086 (0.0066) [2024-06-15 11:56:35,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 45329.0, 300 sec: 46653.3). Total num frames: 67764224. Throughput: 0: 11810.1. Samples: 17033728. Policy #0 lag: (min: 1.0, avg: 98.5, max: 257.0) [2024-06-15 11:56:35,767][1648981] Avg episode reward: [(0, '91.810')] [2024-06-15 11:56:36,019][1651274] Signal inference workers to stop experience collection... (1750 times) [2024-06-15 11:56:36,072][1651669] InferenceWorker_p0-w0: stopping experience collection (1750 times) [2024-06-15 11:56:36,297][1651274] Signal inference workers to resume experience collection... (1750 times) [2024-06-15 11:56:36,298][1651669] InferenceWorker_p0-w0: resuming experience collection (1750 times) [2024-06-15 11:56:36,919][1651669] Updated weights for policy 0, policy_version 33143 (0.0012) [2024-06-15 11:56:38,954][1651669] Updated weights for policy 0, policy_version 33185 (0.0133) [2024-06-15 11:56:40,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48606.0, 300 sec: 47208.1). Total num frames: 68059136. Throughput: 0: 11628.1. Samples: 17064448. Policy #0 lag: (min: 1.0, avg: 98.5, max: 257.0) [2024-06-15 11:56:40,767][1648981] Avg episode reward: [(0, '89.160')] [2024-06-15 11:56:41,020][1651669] Updated weights for policy 0, policy_version 33241 (0.0045) [2024-06-15 11:56:41,857][1651669] Updated weights for policy 0, policy_version 33279 (0.0012) [2024-06-15 11:56:43,515][1651669] Updated weights for policy 0, policy_version 33314 (0.0013) [2024-06-15 11:56:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45887.1, 300 sec: 46987.6). Total num frames: 68288512. Throughput: 0: 11798.8. Samples: 17137664. Policy #0 lag: (min: 1.0, avg: 98.5, max: 257.0) [2024-06-15 11:56:45,767][1648981] Avg episode reward: [(0, '88.210')] [2024-06-15 11:56:46,542][1651669] Updated weights for policy 0, policy_version 33350 (0.0013) [2024-06-15 11:56:47,749][1651669] Updated weights for policy 0, policy_version 33401 (0.0013) [2024-06-15 11:56:49,600][1651669] Updated weights for policy 0, policy_version 33461 (0.0013) [2024-06-15 11:56:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 68550656. Throughput: 0: 11719.1. Samples: 17211392. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 11:56:50,767][1648981] Avg episode reward: [(0, '84.730')] [2024-06-15 11:56:52,050][1651669] Updated weights for policy 0, policy_version 33520 (0.0101) [2024-06-15 11:56:54,526][1651669] Updated weights for policy 0, policy_version 33552 (0.0012) [2024-06-15 11:56:55,627][1651669] Updated weights for policy 0, policy_version 33598 (0.0011) [2024-06-15 11:56:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 68812800. Throughput: 0: 11639.5. Samples: 17244160. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 11:56:55,767][1648981] Avg episode reward: [(0, '83.950')] [2024-06-15 11:56:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000033600_68812800.pth... [2024-06-15 11:56:55,835][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000028032_57409536.pth [2024-06-15 11:56:58,576][1651669] Updated weights for policy 0, policy_version 33655 (0.0022) [2024-06-15 11:57:00,112][1651669] Updated weights for policy 0, policy_version 33700 (0.0013) [2024-06-15 11:57:00,769][1648981] Fps is (10 sec: 52416.0, 60 sec: 48058.1, 300 sec: 47096.7). Total num frames: 69074944. Throughput: 0: 11825.7. Samples: 17318912. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 11:57:00,770][1648981] Avg episode reward: [(0, '87.430')] [2024-06-15 11:57:02,681][1651669] Updated weights for policy 0, policy_version 33760 (0.0017) [2024-06-15 11:57:05,778][1648981] Fps is (10 sec: 39275.2, 60 sec: 46412.1, 300 sec: 47428.4). Total num frames: 69206016. Throughput: 0: 11772.9. Samples: 17387520. Policy #0 lag: (min: 15.0, avg: 112.9, max: 271.0) [2024-06-15 11:57:05,779][1648981] Avg episode reward: [(0, '88.740')] [2024-06-15 11:57:06,499][1651669] Updated weights for policy 0, policy_version 33824 (0.0099) [2024-06-15 11:57:09,197][1651669] Updated weights for policy 0, policy_version 33876 (0.0013) [2024-06-15 11:57:10,669][1651669] Updated weights for policy 0, policy_version 33938 (0.0013) [2024-06-15 11:57:10,780][1648981] Fps is (10 sec: 42549.7, 60 sec: 46410.7, 300 sec: 47094.9). Total num frames: 69500928. Throughput: 0: 11726.9. Samples: 17424896. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 11:57:10,781][1648981] Avg episode reward: [(0, '89.310')] [2024-06-15 11:57:11,620][1651669] Updated weights for policy 0, policy_version 33983 (0.0012) [2024-06-15 11:57:14,258][1651669] Updated weights for policy 0, policy_version 34024 (0.0018) [2024-06-15 11:57:15,778][1648981] Fps is (10 sec: 52429.5, 60 sec: 47518.8, 300 sec: 47539.5). Total num frames: 69730304. Throughput: 0: 11750.2. Samples: 17492480. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 11:57:15,779][1648981] Avg episode reward: [(0, '88.970')] [2024-06-15 11:57:17,861][1651669] Updated weights for policy 0, policy_version 34082 (0.0083) [2024-06-15 11:57:18,583][1651669] Updated weights for policy 0, policy_version 34112 (0.0012) [2024-06-15 11:57:20,081][1651274] Signal inference workers to stop experience collection... (1800 times) [2024-06-15 11:57:20,140][1651669] InferenceWorker_p0-w0: stopping experience collection (1800 times) [2024-06-15 11:57:20,359][1651274] Signal inference workers to resume experience collection... (1800 times) [2024-06-15 11:57:20,370][1651669] InferenceWorker_p0-w0: resuming experience collection (1800 times) [2024-06-15 11:57:20,743][1651669] Updated weights for policy 0, policy_version 34176 (0.0012) [2024-06-15 11:57:20,766][1648981] Fps is (10 sec: 49220.3, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 69992448. Throughput: 0: 11776.0. Samples: 17563648. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 11:57:20,767][1648981] Avg episode reward: [(0, '92.480')] [2024-06-15 11:57:22,085][1651669] Updated weights for policy 0, policy_version 34229 (0.0012) [2024-06-15 11:57:24,872][1651669] Updated weights for policy 0, policy_version 34272 (0.0013) [2024-06-15 11:57:25,767][1648981] Fps is (10 sec: 52489.0, 60 sec: 48059.6, 300 sec: 47543.9). Total num frames: 70254592. Throughput: 0: 12014.8. Samples: 17605120. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 11:57:25,768][1648981] Avg episode reward: [(0, '94.550')] [2024-06-15 11:57:25,772][1651274] Saving new best policy, reward=94.550! [2024-06-15 11:57:28,218][1651669] Updated weights for policy 0, policy_version 34309 (0.0055) [2024-06-15 11:57:29,149][1651669] Updated weights for policy 0, policy_version 34365 (0.0108) [2024-06-15 11:57:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 70483968. Throughput: 0: 11992.2. Samples: 17677312. Policy #0 lag: (min: 15.0, avg: 117.6, max: 271.0) [2024-06-15 11:57:30,767][1648981] Avg episode reward: [(0, '93.500')] [2024-06-15 11:57:30,898][1651669] Updated weights for policy 0, policy_version 34424 (0.0014) [2024-06-15 11:57:32,097][1651669] Updated weights for policy 0, policy_version 34464 (0.0012) [2024-06-15 11:57:34,928][1651669] Updated weights for policy 0, policy_version 34500 (0.0013) [2024-06-15 11:57:35,778][1648981] Fps is (10 sec: 49095.8, 60 sec: 49688.4, 300 sec: 47428.4). Total num frames: 70746112. Throughput: 0: 12091.4. Samples: 17755648. Policy #0 lag: (min: 15.0, avg: 163.5, max: 272.0) [2024-06-15 11:57:35,779][1648981] Avg episode reward: [(0, '93.750')] [2024-06-15 11:57:35,857][1651669] Updated weights for policy 0, policy_version 34554 (0.0020) [2024-06-15 11:57:39,249][1651669] Updated weights for policy 0, policy_version 34608 (0.0014) [2024-06-15 11:57:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 47764.3). Total num frames: 70975488. Throughput: 0: 12310.8. Samples: 17798144. Policy #0 lag: (min: 15.0, avg: 163.5, max: 272.0) [2024-06-15 11:57:40,767][1648981] Avg episode reward: [(0, '95.720')] [2024-06-15 11:57:40,787][1651669] Updated weights for policy 0, policy_version 34672 (0.0023) [2024-06-15 11:57:41,131][1651274] Saving new best policy, reward=95.720! [2024-06-15 11:57:41,745][1651669] Updated weights for policy 0, policy_version 34697 (0.0066) [2024-06-15 11:57:42,937][1651669] Updated weights for policy 0, policy_version 34751 (0.0016) [2024-06-15 11:57:45,766][1648981] Fps is (10 sec: 45929.3, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 71204864. Throughput: 0: 12106.6. Samples: 17863680. Policy #0 lag: (min: 15.0, avg: 163.5, max: 272.0) [2024-06-15 11:57:45,767][1648981] Avg episode reward: [(0, '97.190')] [2024-06-15 11:57:46,017][1651274] Saving new best policy, reward=97.190! [2024-06-15 11:57:46,592][1651669] Updated weights for policy 0, policy_version 34807 (0.0015) [2024-06-15 11:57:50,191][1651669] Updated weights for policy 0, policy_version 34850 (0.0012) [2024-06-15 11:57:50,774][1648981] Fps is (10 sec: 42565.4, 60 sec: 47507.4, 300 sec: 47429.1). Total num frames: 71401472. Throughput: 0: 12346.0. Samples: 17943040. Policy #0 lag: (min: 15.0, avg: 163.5, max: 272.0) [2024-06-15 11:57:50,775][1648981] Avg episode reward: [(0, '98.570')] [2024-06-15 11:57:51,291][1651274] Saving new best policy, reward=98.570! [2024-06-15 11:57:51,700][1651669] Updated weights for policy 0, policy_version 34912 (0.0014) [2024-06-15 11:57:53,222][1651669] Updated weights for policy 0, policy_version 34976 (0.0015) [2024-06-15 11:57:55,767][1648981] Fps is (10 sec: 49149.9, 60 sec: 48059.5, 300 sec: 47652.4). Total num frames: 71696384. Throughput: 0: 12109.6. Samples: 17969664. Policy #0 lag: (min: 15.0, avg: 163.5, max: 272.0) [2024-06-15 11:57:55,767][1648981] Avg episode reward: [(0, '99.150')] [2024-06-15 11:57:55,773][1651274] Saving new best policy, reward=99.150! [2024-06-15 11:57:56,723][1651669] Updated weights for policy 0, policy_version 35040 (0.0014) [2024-06-15 11:58:00,307][1651669] Updated weights for policy 0, policy_version 35077 (0.0012) [2024-06-15 11:58:00,766][1648981] Fps is (10 sec: 45910.6, 60 sec: 46423.1, 300 sec: 47432.1). Total num frames: 71860224. Throughput: 0: 12461.9. Samples: 18053120. Policy #0 lag: (min: 11.0, avg: 131.8, max: 267.0) [2024-06-15 11:58:00,767][1648981] Avg episode reward: [(0, '98.180')] [2024-06-15 11:58:01,334][1651274] Signal inference workers to stop experience collection... (1850 times) [2024-06-15 11:58:01,389][1651669] InferenceWorker_p0-w0: stopping experience collection (1850 times) [2024-06-15 11:58:01,588][1651274] Signal inference workers to resume experience collection... (1850 times) [2024-06-15 11:58:01,588][1651669] InferenceWorker_p0-w0: resuming experience collection (1850 times) [2024-06-15 11:58:01,804][1651669] Updated weights for policy 0, policy_version 35138 (0.0013) [2024-06-15 11:58:03,401][1651669] Updated weights for policy 0, policy_version 35201 (0.0014) [2024-06-15 11:58:04,468][1651669] Updated weights for policy 0, policy_version 35263 (0.0013) [2024-06-15 11:58:05,766][1648981] Fps is (10 sec: 52430.9, 60 sec: 50254.2, 300 sec: 48096.8). Total num frames: 72220672. Throughput: 0: 12253.9. Samples: 18115072. Policy #0 lag: (min: 11.0, avg: 131.8, max: 267.0) [2024-06-15 11:58:05,767][1648981] Avg episode reward: [(0, '96.880')] [2024-06-15 11:58:09,039][1651669] Updated weights for policy 0, policy_version 35322 (0.0013) [2024-06-15 11:58:10,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 47524.5, 300 sec: 47541.4). Total num frames: 72351744. Throughput: 0: 12151.5. Samples: 18151936. Policy #0 lag: (min: 11.0, avg: 131.8, max: 267.0) [2024-06-15 11:58:10,767][1648981] Avg episode reward: [(0, '96.470')] [2024-06-15 11:58:11,711][1651669] Updated weights for policy 0, policy_version 35361 (0.0015) [2024-06-15 11:58:12,969][1651669] Updated weights for policy 0, policy_version 35409 (0.0012) [2024-06-15 11:58:15,047][1651669] Updated weights for policy 0, policy_version 35491 (0.0012) [2024-06-15 11:58:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50254.1, 300 sec: 48430.3). Total num frames: 72744960. Throughput: 0: 12208.3. Samples: 18226688. Policy #0 lag: (min: 11.0, avg: 131.8, max: 267.0) [2024-06-15 11:58:15,767][1648981] Avg episode reward: [(0, '93.490')] [2024-06-15 11:58:19,316][1651669] Updated weights for policy 0, policy_version 35552 (0.0014) [2024-06-15 11:58:20,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 72876032. Throughput: 0: 12086.4. Samples: 18299392. Policy #0 lag: (min: 31.0, avg: 139.5, max: 287.0) [2024-06-15 11:58:20,767][1648981] Avg episode reward: [(0, '96.590')] [2024-06-15 11:58:21,668][1651669] Updated weights for policy 0, policy_version 35600 (0.0034) [2024-06-15 11:58:23,716][1651669] Updated weights for policy 0, policy_version 35650 (0.0012) [2024-06-15 11:58:25,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 48606.0, 300 sec: 48320.2). Total num frames: 73170944. Throughput: 0: 11958.0. Samples: 18336256. Policy #0 lag: (min: 31.0, avg: 139.5, max: 287.0) [2024-06-15 11:58:25,767][1648981] Avg episode reward: [(0, '97.710')] [2024-06-15 11:58:25,804][1651669] Updated weights for policy 0, policy_version 35731 (0.0013) [2024-06-15 11:58:26,769][1651669] Updated weights for policy 0, policy_version 35774 (0.0011) [2024-06-15 11:58:30,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 73269248. Throughput: 0: 11889.8. Samples: 18398720. Policy #0 lag: (min: 31.0, avg: 139.5, max: 287.0) [2024-06-15 11:58:30,767][1648981] Avg episode reward: [(0, '98.480')] [2024-06-15 11:58:31,854][1651669] Updated weights for policy 0, policy_version 35829 (0.0013) [2024-06-15 11:58:34,850][1651669] Updated weights for policy 0, policy_version 35901 (0.0015) [2024-06-15 11:58:35,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 46976.7, 300 sec: 48096.8). Total num frames: 73564160. Throughput: 0: 11573.2. Samples: 18463744. Policy #0 lag: (min: 31.0, avg: 139.5, max: 287.0) [2024-06-15 11:58:35,767][1648981] Avg episode reward: [(0, '98.960')] [2024-06-15 11:58:36,488][1651669] Updated weights for policy 0, policy_version 35967 (0.0012) [2024-06-15 11:58:38,242][1651669] Updated weights for policy 0, policy_version 36032 (0.0011) [2024-06-15 11:58:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 73793536. Throughput: 0: 11616.8. Samples: 18492416. Policy #0 lag: (min: 31.0, avg: 139.5, max: 287.0) [2024-06-15 11:58:40,767][1648981] Avg episode reward: [(0, '97.600')] [2024-06-15 11:58:43,553][1651669] Updated weights for policy 0, policy_version 36080 (0.0013) [2024-06-15 11:58:45,610][1651274] Signal inference workers to stop experience collection... (1900 times) [2024-06-15 11:58:45,664][1651669] InferenceWorker_p0-w0: stopping experience collection (1900 times) [2024-06-15 11:58:45,767][1648981] Fps is (10 sec: 36043.9, 60 sec: 45328.9, 300 sec: 47541.3). Total num frames: 73924608. Throughput: 0: 11502.9. Samples: 18570752. Policy #0 lag: (min: 1.0, avg: 86.5, max: 257.0) [2024-06-15 11:58:45,768][1648981] Avg episode reward: [(0, '95.030')] [2024-06-15 11:58:45,961][1651274] Signal inference workers to resume experience collection... (1900 times) [2024-06-15 11:58:45,964][1651669] InferenceWorker_p0-w0: resuming experience collection (1900 times) [2024-06-15 11:58:46,046][1651669] Updated weights for policy 0, policy_version 36112 (0.0094) [2024-06-15 11:58:47,649][1651669] Updated weights for policy 0, policy_version 36177 (0.0121) [2024-06-15 11:58:49,037][1651669] Updated weights for policy 0, policy_version 36240 (0.0012) [2024-06-15 11:58:50,048][1651669] Updated weights for policy 0, policy_version 36284 (0.0012) [2024-06-15 11:58:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48612.2, 300 sec: 47874.7). Total num frames: 74317824. Throughput: 0: 11480.2. Samples: 18631680. Policy #0 lag: (min: 1.0, avg: 86.5, max: 257.0) [2024-06-15 11:58:50,767][1648981] Avg episode reward: [(0, '95.830')] [2024-06-15 11:58:55,040][1651669] Updated weights for policy 0, policy_version 36325 (0.0015) [2024-06-15 11:58:55,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 45875.5, 300 sec: 47542.3). Total num frames: 74448896. Throughput: 0: 11514.3. Samples: 18670080. Policy #0 lag: (min: 1.0, avg: 86.5, max: 257.0) [2024-06-15 11:58:55,767][1648981] Avg episode reward: [(0, '93.630')] [2024-06-15 11:58:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000036352_74448896.pth... [2024-06-15 11:58:55,852][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000030784_63045632.pth [2024-06-15 11:58:56,844][1651669] Updated weights for policy 0, policy_version 36368 (0.0011) [2024-06-15 11:58:58,276][1651669] Updated weights for policy 0, policy_version 36418 (0.0012) [2024-06-15 11:58:59,790][1651669] Updated weights for policy 0, policy_version 36480 (0.0017) [2024-06-15 11:59:00,769][1648981] Fps is (10 sec: 45861.4, 60 sec: 48603.5, 300 sec: 47763.0). Total num frames: 74776576. Throughput: 0: 11377.0. Samples: 18738688. Policy #0 lag: (min: 1.0, avg: 86.5, max: 257.0) [2024-06-15 11:59:00,770][1648981] Avg episode reward: [(0, '94.940')] [2024-06-15 11:59:01,166][1651669] Updated weights for policy 0, policy_version 36537 (0.0013) [2024-06-15 11:59:05,767][1648981] Fps is (10 sec: 42598.0, 60 sec: 44236.8, 300 sec: 47210.0). Total num frames: 74874880. Throughput: 0: 11434.6. Samples: 18813952. Policy #0 lag: (min: 1.0, avg: 86.5, max: 257.0) [2024-06-15 11:59:05,767][1648981] Avg episode reward: [(0, '96.700')] [2024-06-15 11:59:06,316][1651669] Updated weights for policy 0, policy_version 36579 (0.0013) [2024-06-15 11:59:07,694][1651669] Updated weights for policy 0, policy_version 36609 (0.0011) [2024-06-15 11:59:09,489][1651669] Updated weights for policy 0, policy_version 36677 (0.0013) [2024-06-15 11:59:10,766][1648981] Fps is (10 sec: 42611.1, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 75202560. Throughput: 0: 11377.8. Samples: 18848256. Policy #0 lag: (min: 31.0, avg: 123.7, max: 287.0) [2024-06-15 11:59:10,767][1648981] Avg episode reward: [(0, '99.980')] [2024-06-15 11:59:10,812][1651669] Updated weights for policy 0, policy_version 36735 (0.0013) [2024-06-15 11:59:10,823][1651274] Saving new best policy, reward=99.980! [2024-06-15 11:59:15,794][1648981] Fps is (10 sec: 49016.1, 60 sec: 43670.4, 300 sec: 47092.6). Total num frames: 75366400. Throughput: 0: 11473.1. Samples: 18915328. Policy #0 lag: (min: 31.0, avg: 123.7, max: 287.0) [2024-06-15 11:59:15,795][1648981] Avg episode reward: [(0, '101.860')] [2024-06-15 11:59:15,796][1651274] Saving new best policy, reward=101.860! [2024-06-15 11:59:17,309][1651669] Updated weights for policy 0, policy_version 36816 (0.0101) [2024-06-15 11:59:18,641][1651669] Updated weights for policy 0, policy_version 36866 (0.0014) [2024-06-15 11:59:20,622][1651669] Updated weights for policy 0, policy_version 36944 (0.0011) [2024-06-15 11:59:20,782][1648981] Fps is (10 sec: 45803.2, 60 sec: 46409.2, 300 sec: 47650.0). Total num frames: 75661312. Throughput: 0: 11464.8. Samples: 18979840. Policy #0 lag: (min: 31.0, avg: 123.7, max: 287.0) [2024-06-15 11:59:20,783][1648981] Avg episode reward: [(0, '100.990')] [2024-06-15 11:59:21,747][1651669] Updated weights for policy 0, policy_version 36990 (0.0016) [2024-06-15 11:59:24,478][1651274] Signal inference workers to stop experience collection... (1950 times) [2024-06-15 11:59:24,537][1651669] InferenceWorker_p0-w0: stopping experience collection (1950 times) [2024-06-15 11:59:24,722][1651274] Signal inference workers to resume experience collection... (1950 times) [2024-06-15 11:59:24,723][1651669] InferenceWorker_p0-w0: resuming experience collection (1950 times) [2024-06-15 11:59:24,897][1651669] Updated weights for policy 0, policy_version 37046 (0.0013) [2024-06-15 11:59:25,766][1648981] Fps is (10 sec: 52575.0, 60 sec: 45329.2, 300 sec: 47097.1). Total num frames: 75890688. Throughput: 0: 11537.1. Samples: 19011584. Policy #0 lag: (min: 31.0, avg: 123.7, max: 287.0) [2024-06-15 11:59:25,767][1648981] Avg episode reward: [(0, '104.980')] [2024-06-15 11:59:25,786][1651274] Saving new best policy, reward=104.980! [2024-06-15 11:59:30,656][1651669] Updated weights for policy 0, policy_version 37106 (0.0012) [2024-06-15 11:59:30,766][1648981] Fps is (10 sec: 32819.7, 60 sec: 45329.1, 300 sec: 47097.0). Total num frames: 75988992. Throughput: 0: 11537.1. Samples: 19089920. Policy #0 lag: (min: 31.0, avg: 92.8, max: 287.0) [2024-06-15 11:59:30,767][1648981] Avg episode reward: [(0, '101.090')] [2024-06-15 11:59:32,843][1651669] Updated weights for policy 0, policy_version 37185 (0.0014) [2024-06-15 11:59:34,256][1651669] Updated weights for policy 0, policy_version 37245 (0.0012) [2024-06-15 11:59:35,786][1648981] Fps is (10 sec: 39246.1, 60 sec: 45314.5, 300 sec: 46871.9). Total num frames: 76283904. Throughput: 0: 11350.2. Samples: 19142656. Policy #0 lag: (min: 31.0, avg: 92.8, max: 287.0) [2024-06-15 11:59:35,786][1648981] Avg episode reward: [(0, '100.450')] [2024-06-15 11:59:37,106][1651669] Updated weights for policy 0, policy_version 37300 (0.0023) [2024-06-15 11:59:40,723][1651669] Updated weights for policy 0, policy_version 37318 (0.0011) [2024-06-15 11:59:40,779][1648981] Fps is (10 sec: 42545.1, 60 sec: 43681.6, 300 sec: 47095.1). Total num frames: 76414976. Throughput: 0: 11488.4. Samples: 19187200. Policy #0 lag: (min: 31.0, avg: 92.8, max: 287.0) [2024-06-15 11:59:40,780][1648981] Avg episode reward: [(0, '99.480')] [2024-06-15 11:59:42,553][1651669] Updated weights for policy 0, policy_version 37392 (0.0010) [2024-06-15 11:59:44,465][1651669] Updated weights for policy 0, policy_version 37472 (0.0012) [2024-06-15 11:59:45,138][1651669] Updated weights for policy 0, policy_version 37503 (0.0012) [2024-06-15 11:59:45,766][1648981] Fps is (10 sec: 52530.1, 60 sec: 48060.0, 300 sec: 47208.1). Total num frames: 76808192. Throughput: 0: 11298.9. Samples: 19247104. Policy #0 lag: (min: 31.0, avg: 92.8, max: 287.0) [2024-06-15 11:59:45,767][1648981] Avg episode reward: [(0, '99.350')] [2024-06-15 11:59:48,679][1651669] Updated weights for policy 0, policy_version 37552 (0.0013) [2024-06-15 11:59:50,767][1648981] Fps is (10 sec: 52494.0, 60 sec: 43690.6, 300 sec: 47097.1). Total num frames: 76939264. Throughput: 0: 11366.4. Samples: 19325440. Policy #0 lag: (min: 31.0, avg: 92.8, max: 287.0) [2024-06-15 11:59:50,767][1648981] Avg episode reward: [(0, '96.500')] [2024-06-15 11:59:52,355][1651669] Updated weights for policy 0, policy_version 37588 (0.0012) [2024-06-15 11:59:54,416][1651669] Updated weights for policy 0, policy_version 37664 (0.0014) [2024-06-15 11:59:55,708][1651669] Updated weights for policy 0, policy_version 37713 (0.0014) [2024-06-15 11:59:55,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 77234176. Throughput: 0: 11264.0. Samples: 19355136. Policy #0 lag: (min: 52.0, avg: 139.5, max: 306.0) [2024-06-15 11:59:55,767][1648981] Avg episode reward: [(0, '98.570')] [2024-06-15 11:59:56,377][1651669] Updated weights for policy 0, policy_version 37758 (0.0012) [2024-06-15 11:59:59,987][1651669] Updated weights for policy 0, policy_version 37795 (0.0014) [2024-06-15 12:00:00,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 44785.2, 300 sec: 47097.1). Total num frames: 77463552. Throughput: 0: 11259.6. Samples: 19421696. Policy #0 lag: (min: 52.0, avg: 139.5, max: 306.0) [2024-06-15 12:00:00,767][1648981] Avg episode reward: [(0, '99.970')] [2024-06-15 12:00:02,412][1651669] Updated weights for policy 0, policy_version 37826 (0.0012) [2024-06-15 12:00:03,677][1651669] Updated weights for policy 0, policy_version 37887 (0.0142) [2024-06-15 12:00:05,498][1651669] Updated weights for policy 0, policy_version 37936 (0.0017) [2024-06-15 12:00:05,786][1648981] Fps is (10 sec: 45784.6, 60 sec: 46952.1, 300 sec: 46983.4). Total num frames: 77692928. Throughput: 0: 11524.7. Samples: 19498496. Policy #0 lag: (min: 52.0, avg: 139.5, max: 306.0) [2024-06-15 12:00:05,787][1648981] Avg episode reward: [(0, '103.040')] [2024-06-15 12:00:05,896][1651274] Signal inference workers to stop experience collection... (2000 times) [2024-06-15 12:00:05,946][1651669] InferenceWorker_p0-w0: stopping experience collection (2000 times) [2024-06-15 12:00:06,175][1651274] Signal inference workers to resume experience collection... (2000 times) [2024-06-15 12:00:06,176][1651669] InferenceWorker_p0-w0: resuming experience collection (2000 times) [2024-06-15 12:00:07,112][1651669] Updated weights for policy 0, policy_version 38011 (0.0096) [2024-06-15 12:00:10,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 45328.9, 300 sec: 46874.9). Total num frames: 77922304. Throughput: 0: 11502.9. Samples: 19529216. Policy #0 lag: (min: 52.0, avg: 139.5, max: 306.0) [2024-06-15 12:00:10,767][1648981] Avg episode reward: [(0, '103.040')] [2024-06-15 12:00:11,026][1651669] Updated weights for policy 0, policy_version 38064 (0.0012) [2024-06-15 12:00:14,452][1651669] Updated weights for policy 0, policy_version 38112 (0.0012) [2024-06-15 12:00:15,574][1651669] Updated weights for policy 0, policy_version 38160 (0.0018) [2024-06-15 12:00:15,766][1648981] Fps is (10 sec: 45966.2, 60 sec: 46442.9, 300 sec: 46986.2). Total num frames: 78151680. Throughput: 0: 11480.2. Samples: 19606528. Policy #0 lag: (min: 52.0, avg: 139.5, max: 306.0) [2024-06-15 12:00:15,767][1648981] Avg episode reward: [(0, '104.030')] [2024-06-15 12:00:16,882][1651669] Updated weights for policy 0, policy_version 38208 (0.0013) [2024-06-15 12:00:18,341][1651669] Updated weights for policy 0, policy_version 38272 (0.0040) [2024-06-15 12:00:20,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 45340.9, 300 sec: 46652.7). Total num frames: 78381056. Throughput: 0: 11781.0. Samples: 19672576. Policy #0 lag: (min: 52.0, avg: 139.5, max: 306.0) [2024-06-15 12:00:20,767][1648981] Avg episode reward: [(0, '103.920')] [2024-06-15 12:00:22,448][1651669] Updated weights for policy 0, policy_version 38329 (0.0109) [2024-06-15 12:00:25,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 44236.8, 300 sec: 46874.9). Total num frames: 78544896. Throughput: 0: 11483.4. Samples: 19703808. Policy #0 lag: (min: 3.0, avg: 120.4, max: 259.0) [2024-06-15 12:00:25,767][1648981] Avg episode reward: [(0, '102.520')] [2024-06-15 12:00:26,407][1651669] Updated weights for policy 0, policy_version 38388 (0.0013) [2024-06-15 12:00:27,889][1651669] Updated weights for policy 0, policy_version 38459 (0.0012) [2024-06-15 12:00:29,400][1651669] Updated weights for policy 0, policy_version 38520 (0.0014) [2024-06-15 12:00:30,798][1648981] Fps is (10 sec: 52266.5, 60 sec: 48580.7, 300 sec: 46981.0). Total num frames: 78905344. Throughput: 0: 11665.5. Samples: 19772416. Policy #0 lag: (min: 3.0, avg: 120.4, max: 259.0) [2024-06-15 12:00:30,798][1648981] Avg episode reward: [(0, '104.580')] [2024-06-15 12:00:33,240][1651669] Updated weights for policy 0, policy_version 38564 (0.0013) [2024-06-15 12:00:33,720][1651669] Updated weights for policy 0, policy_version 38592 (0.0021) [2024-06-15 12:00:35,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 45890.0, 300 sec: 47097.1). Total num frames: 79036416. Throughput: 0: 11741.9. Samples: 19853824. Policy #0 lag: (min: 3.0, avg: 120.4, max: 259.0) [2024-06-15 12:00:35,767][1648981] Avg episode reward: [(0, '103.680')] [2024-06-15 12:00:38,268][1651669] Updated weights for policy 0, policy_version 38660 (0.0012) [2024-06-15 12:00:39,396][1651669] Updated weights for policy 0, policy_version 38720 (0.0012) [2024-06-15 12:00:40,766][1648981] Fps is (10 sec: 52592.5, 60 sec: 50254.7, 300 sec: 47099.6). Total num frames: 79429632. Throughput: 0: 11844.3. Samples: 19888128. Policy #0 lag: (min: 3.0, avg: 120.4, max: 259.0) [2024-06-15 12:00:40,767][1648981] Avg episode reward: [(0, '104.880')] [2024-06-15 12:00:43,635][1651669] Updated weights for policy 0, policy_version 38790 (0.0013) [2024-06-15 12:00:45,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 79560704. Throughput: 0: 11810.1. Samples: 19953152. Policy #0 lag: (min: 3.0, avg: 120.4, max: 259.0) [2024-06-15 12:00:45,767][1648981] Avg episode reward: [(0, '104.200')] [2024-06-15 12:00:47,598][1651669] Updated weights for policy 0, policy_version 38851 (0.0012) [2024-06-15 12:00:49,019][1651669] Updated weights for policy 0, policy_version 38912 (0.0012) [2024-06-15 12:00:49,130][1651274] Signal inference workers to stop experience collection... (2050 times) [2024-06-15 12:00:49,149][1651669] InferenceWorker_p0-w0: stopping experience collection (2050 times) [2024-06-15 12:00:49,305][1651274] Signal inference workers to resume experience collection... (2050 times) [2024-06-15 12:00:49,306][1651669] InferenceWorker_p0-w0: resuming experience collection (2050 times) [2024-06-15 12:00:50,259][1651669] Updated weights for policy 0, policy_version 38964 (0.0012) [2024-06-15 12:00:50,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48605.9, 300 sec: 47097.1). Total num frames: 79855616. Throughput: 0: 11803.9. Samples: 20029440. Policy #0 lag: (min: 5.0, avg: 83.3, max: 261.0) [2024-06-15 12:00:50,767][1648981] Avg episode reward: [(0, '105.520')] [2024-06-15 12:00:51,116][1651274] Saving new best policy, reward=105.520! [2024-06-15 12:00:51,886][1651669] Updated weights for policy 0, policy_version 39040 (0.0077) [2024-06-15 12:00:55,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 80019456. Throughput: 0: 11878.4. Samples: 20063744. Policy #0 lag: (min: 5.0, avg: 83.3, max: 261.0) [2024-06-15 12:00:55,767][1648981] Avg episode reward: [(0, '104.430')] [2024-06-15 12:00:55,987][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000039088_80052224.pth... [2024-06-15 12:00:55,989][1651669] Updated weights for policy 0, policy_version 39088 (0.0056) [2024-06-15 12:00:56,030][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000033600_68812800.pth [2024-06-15 12:00:59,304][1651669] Updated weights for policy 0, policy_version 39137 (0.0013) [2024-06-15 12:01:00,276][1651669] Updated weights for policy 0, policy_version 39184 (0.0012) [2024-06-15 12:01:00,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 46967.3, 300 sec: 46986.0). Total num frames: 80281600. Throughput: 0: 11844.2. Samples: 20139520. Policy #0 lag: (min: 5.0, avg: 83.3, max: 261.0) [2024-06-15 12:01:00,767][1648981] Avg episode reward: [(0, '103.980')] [2024-06-15 12:01:02,041][1651669] Updated weights for policy 0, policy_version 39250 (0.0014) [2024-06-15 12:01:05,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46436.7, 300 sec: 46652.8). Total num frames: 80478208. Throughput: 0: 11832.9. Samples: 20205056. Policy #0 lag: (min: 5.0, avg: 83.3, max: 261.0) [2024-06-15 12:01:05,767][1648981] Avg episode reward: [(0, '103.220')] [2024-06-15 12:01:06,102][1651669] Updated weights for policy 0, policy_version 39301 (0.0013) [2024-06-15 12:01:07,269][1651669] Updated weights for policy 0, policy_version 39354 (0.0044) [2024-06-15 12:01:10,766][1648981] Fps is (10 sec: 39322.1, 60 sec: 45875.3, 300 sec: 46766.7). Total num frames: 80674816. Throughput: 0: 11969.4. Samples: 20242432. Policy #0 lag: (min: 5.0, avg: 83.3, max: 261.0) [2024-06-15 12:01:10,767][1648981] Avg episode reward: [(0, '103.530')] [2024-06-15 12:01:11,358][1651669] Updated weights for policy 0, policy_version 39424 (0.0108) [2024-06-15 12:01:13,421][1651669] Updated weights for policy 0, policy_version 39493 (0.0013) [2024-06-15 12:01:14,493][1651669] Updated weights for policy 0, policy_version 39549 (0.0012) [2024-06-15 12:01:15,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 81002496. Throughput: 0: 11772.7. Samples: 20301824. Policy #0 lag: (min: 127.0, avg: 197.6, max: 367.0) [2024-06-15 12:01:15,767][1648981] Avg episode reward: [(0, '106.690')] [2024-06-15 12:01:15,768][1651274] Saving new best policy, reward=106.690! [2024-06-15 12:01:18,791][1651669] Updated weights for policy 0, policy_version 39602 (0.0013) [2024-06-15 12:01:20,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 81133568. Throughput: 0: 11628.0. Samples: 20377088. Policy #0 lag: (min: 127.0, avg: 197.6, max: 367.0) [2024-06-15 12:01:20,767][1648981] Avg episode reward: [(0, '105.860')] [2024-06-15 12:01:22,410][1651669] Updated weights for policy 0, policy_version 39649 (0.0012) [2024-06-15 12:01:23,850][1651669] Updated weights for policy 0, policy_version 39712 (0.0014) [2024-06-15 12:01:25,256][1651669] Updated weights for policy 0, policy_version 39761 (0.0010) [2024-06-15 12:01:25,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48605.9, 300 sec: 46874.9). Total num frames: 81461248. Throughput: 0: 11571.2. Samples: 20408832. Policy #0 lag: (min: 127.0, avg: 197.6, max: 367.0) [2024-06-15 12:01:25,767][1648981] Avg episode reward: [(0, '109.190')] [2024-06-15 12:01:26,296][1651274] Saving new best policy, reward=109.190! [2024-06-15 12:01:29,022][1651669] Updated weights for policy 0, policy_version 39811 (0.0013) [2024-06-15 12:01:29,337][1651274] Signal inference workers to stop experience collection... (2100 times) [2024-06-15 12:01:29,430][1651669] InferenceWorker_p0-w0: stopping experience collection (2100 times) [2024-06-15 12:01:29,585][1651274] Signal inference workers to resume experience collection... (2100 times) [2024-06-15 12:01:29,586][1651669] InferenceWorker_p0-w0: resuming experience collection (2100 times) [2024-06-15 12:01:30,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 45899.0, 300 sec: 47097.1). Total num frames: 81657856. Throughput: 0: 11685.0. Samples: 20478976. Policy #0 lag: (min: 127.0, avg: 197.6, max: 367.0) [2024-06-15 12:01:30,767][1648981] Avg episode reward: [(0, '109.520')] [2024-06-15 12:01:30,768][1651274] Saving new best policy, reward=109.520! [2024-06-15 12:01:33,146][1651669] Updated weights for policy 0, policy_version 39874 (0.0013) [2024-06-15 12:01:34,653][1651669] Updated weights for policy 0, policy_version 39937 (0.0128) [2024-06-15 12:01:35,774][1648981] Fps is (10 sec: 42565.1, 60 sec: 47507.3, 300 sec: 46873.7). Total num frames: 81887232. Throughput: 0: 11466.8. Samples: 20545536. Policy #0 lag: (min: 127.0, avg: 197.6, max: 367.0) [2024-06-15 12:01:35,775][1648981] Avg episode reward: [(0, '111.380')] [2024-06-15 12:01:35,860][1651669] Updated weights for policy 0, policy_version 39986 (0.0012) [2024-06-15 12:01:36,096][1651274] Saving new best policy, reward=111.380! [2024-06-15 12:01:37,800][1651669] Updated weights for policy 0, policy_version 40061 (0.0012) [2024-06-15 12:01:40,774][1648981] Fps is (10 sec: 42565.3, 60 sec: 44231.1, 300 sec: 46762.6). Total num frames: 82083840. Throughput: 0: 11387.2. Samples: 20576256. Policy #0 lag: (min: 3.0, avg: 125.0, max: 259.0) [2024-06-15 12:01:40,775][1648981] Avg episode reward: [(0, '110.120')] [2024-06-15 12:01:41,600][1651669] Updated weights for policy 0, policy_version 40128 (0.0012) [2024-06-15 12:01:45,767][1648981] Fps is (10 sec: 32793.0, 60 sec: 44236.6, 300 sec: 46319.5). Total num frames: 82214912. Throughput: 0: 11275.4. Samples: 20646912. Policy #0 lag: (min: 3.0, avg: 125.0, max: 259.0) [2024-06-15 12:01:45,768][1648981] Avg episode reward: [(0, '109.570')] [2024-06-15 12:01:46,892][1651669] Updated weights for policy 0, policy_version 40198 (0.0015) [2024-06-15 12:01:48,638][1651669] Updated weights for policy 0, policy_version 40262 (0.0015) [2024-06-15 12:01:50,067][1651669] Updated weights for policy 0, policy_version 40320 (0.0013) [2024-06-15 12:01:50,766][1648981] Fps is (10 sec: 49189.9, 60 sec: 45329.0, 300 sec: 46652.8). Total num frames: 82575360. Throughput: 0: 11138.8. Samples: 20706304. Policy #0 lag: (min: 3.0, avg: 125.0, max: 259.0) [2024-06-15 12:01:50,767][1648981] Avg episode reward: [(0, '107.410')] [2024-06-15 12:01:53,074][1651669] Updated weights for policy 0, policy_version 40376 (0.0014) [2024-06-15 12:01:55,767][1648981] Fps is (10 sec: 49147.7, 60 sec: 44782.2, 300 sec: 46208.7). Total num frames: 82706432. Throughput: 0: 11161.4. Samples: 20744704. Policy #0 lag: (min: 3.0, avg: 125.0, max: 259.0) [2024-06-15 12:01:55,768][1648981] Avg episode reward: [(0, '106.410')] [2024-06-15 12:01:57,348][1651669] Updated weights for policy 0, policy_version 40433 (0.0012) [2024-06-15 12:01:59,096][1651669] Updated weights for policy 0, policy_version 40497 (0.0120) [2024-06-15 12:02:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46421.5, 300 sec: 46987.9). Total num frames: 83066880. Throughput: 0: 11377.8. Samples: 20813824. Policy #0 lag: (min: 3.0, avg: 125.0, max: 259.0) [2024-06-15 12:02:00,767][1648981] Avg episode reward: [(0, '109.050')] [2024-06-15 12:02:00,828][1651669] Updated weights for policy 0, policy_version 40569 (0.0123) [2024-06-15 12:02:03,985][1651669] Updated weights for policy 0, policy_version 40599 (0.0020) [2024-06-15 12:02:05,766][1648981] Fps is (10 sec: 52433.6, 60 sec: 45875.1, 300 sec: 46543.8). Total num frames: 83230720. Throughput: 0: 11207.1. Samples: 20881408. Policy #0 lag: (min: 3.0, avg: 125.0, max: 259.0) [2024-06-15 12:02:05,767][1648981] Avg episode reward: [(0, '109.230')] [2024-06-15 12:02:08,959][1651669] Updated weights for policy 0, policy_version 40660 (0.0014) [2024-06-15 12:02:10,766][1648981] Fps is (10 sec: 32768.1, 60 sec: 45329.1, 300 sec: 46321.4). Total num frames: 83394560. Throughput: 0: 11423.3. Samples: 20922880. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 12:02:10,767][1648981] Avg episode reward: [(0, '109.880')] [2024-06-15 12:02:10,854][1651669] Updated weights for policy 0, policy_version 40736 (0.0011) [2024-06-15 12:02:11,011][1651274] Signal inference workers to stop experience collection... (2150 times) [2024-06-15 12:02:11,066][1651669] InferenceWorker_p0-w0: stopping experience collection (2150 times) [2024-06-15 12:02:11,266][1651274] Signal inference workers to resume experience collection... (2150 times) [2024-06-15 12:02:11,267][1651669] InferenceWorker_p0-w0: resuming experience collection (2150 times) [2024-06-15 12:02:12,547][1651669] Updated weights for policy 0, policy_version 40793 (0.0160) [2024-06-15 12:02:15,767][1648981] Fps is (10 sec: 45875.4, 60 sec: 44783.0, 300 sec: 46430.6). Total num frames: 83689472. Throughput: 0: 11116.1. Samples: 20979200. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 12:02:15,767][1648981] Avg episode reward: [(0, '112.950')] [2024-06-15 12:02:15,945][1651669] Updated weights for policy 0, policy_version 40880 (0.0110) [2024-06-15 12:02:16,071][1651274] Saving new best policy, reward=112.950! [2024-06-15 12:02:20,774][1648981] Fps is (10 sec: 39291.1, 60 sec: 44231.1, 300 sec: 45874.0). Total num frames: 83787776. Throughput: 0: 11343.7. Samples: 21056000. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 12:02:20,775][1648981] Avg episode reward: [(0, '112.180')] [2024-06-15 12:02:21,314][1651669] Updated weights for policy 0, policy_version 40944 (0.0013) [2024-06-15 12:02:23,059][1651669] Updated weights for policy 0, policy_version 41008 (0.0014) [2024-06-15 12:02:24,795][1651669] Updated weights for policy 0, policy_version 41074 (0.0013) [2024-06-15 12:02:25,779][1648981] Fps is (10 sec: 45816.7, 60 sec: 44773.3, 300 sec: 46317.5). Total num frames: 84148224. Throughput: 0: 11171.7. Samples: 21079040. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 12:02:25,780][1648981] Avg episode reward: [(0, '111.680')] [2024-06-15 12:02:28,002][1651669] Updated weights for policy 0, policy_version 41136 (0.0098) [2024-06-15 12:02:30,766][1648981] Fps is (10 sec: 49190.4, 60 sec: 43690.7, 300 sec: 45877.0). Total num frames: 84279296. Throughput: 0: 11013.7. Samples: 21142528. Policy #0 lag: (min: 8.0, avg: 76.7, max: 264.0) [2024-06-15 12:02:30,767][1648981] Avg episode reward: [(0, '113.150')] [2024-06-15 12:02:30,768][1651274] Saving new best policy, reward=113.150! [2024-06-15 12:02:32,938][1651669] Updated weights for policy 0, policy_version 41187 (0.0013) [2024-06-15 12:02:34,688][1651669] Updated weights for policy 0, policy_version 41255 (0.0013) [2024-06-15 12:02:35,767][1648981] Fps is (10 sec: 42652.8, 60 sec: 44788.7, 300 sec: 46097.3). Total num frames: 84574208. Throughput: 0: 11161.6. Samples: 21208576. Policy #0 lag: (min: 103.0, avg: 170.2, max: 362.0) [2024-06-15 12:02:35,767][1648981] Avg episode reward: [(0, '113.550')] [2024-06-15 12:02:36,089][1651274] Saving new best policy, reward=113.550! [2024-06-15 12:02:38,988][1651669] Updated weights for policy 0, policy_version 41361 (0.0014) [2024-06-15 12:02:40,793][1648981] Fps is (10 sec: 52287.3, 60 sec: 45314.5, 300 sec: 46093.1). Total num frames: 84803584. Throughput: 0: 11075.6. Samples: 21243392. Policy #0 lag: (min: 103.0, avg: 170.2, max: 362.0) [2024-06-15 12:02:40,794][1648981] Avg episode reward: [(0, '113.890')] [2024-06-15 12:02:40,799][1651274] Saving new best policy, reward=113.890! [2024-06-15 12:02:45,019][1651669] Updated weights for policy 0, policy_version 41456 (0.0013) [2024-06-15 12:02:45,766][1648981] Fps is (10 sec: 36045.5, 60 sec: 45329.3, 300 sec: 45876.4). Total num frames: 84934656. Throughput: 0: 11343.7. Samples: 21324288. Policy #0 lag: (min: 103.0, avg: 170.2, max: 362.0) [2024-06-15 12:02:45,767][1648981] Avg episode reward: [(0, '112.830')] [2024-06-15 12:02:46,825][1651669] Updated weights for policy 0, policy_version 41524 (0.0139) [2024-06-15 12:02:48,576][1651669] Updated weights for policy 0, policy_version 41594 (0.0012) [2024-06-15 12:02:50,771][1648981] Fps is (10 sec: 42692.6, 60 sec: 44233.2, 300 sec: 45874.5). Total num frames: 85229568. Throughput: 0: 11046.6. Samples: 21378560. Policy #0 lag: (min: 103.0, avg: 170.2, max: 362.0) [2024-06-15 12:02:50,772][1648981] Avg episode reward: [(0, '114.290')] [2024-06-15 12:02:50,931][1651669] Updated weights for policy 0, policy_version 41633 (0.0024) [2024-06-15 12:02:51,269][1651274] Saving new best policy, reward=114.290! [2024-06-15 12:02:55,413][1651274] Signal inference workers to stop experience collection... (2200 times) [2024-06-15 12:02:55,536][1651669] InferenceWorker_p0-w0: stopping experience collection (2200 times) [2024-06-15 12:02:55,658][1651274] Signal inference workers to resume experience collection... (2200 times) [2024-06-15 12:02:55,659][1651669] InferenceWorker_p0-w0: resuming experience collection (2200 times) [2024-06-15 12:02:55,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 44237.6, 300 sec: 45764.1). Total num frames: 85360640. Throughput: 0: 10945.4. Samples: 21415424. Policy #0 lag: (min: 103.0, avg: 170.2, max: 362.0) [2024-06-15 12:02:55,767][1648981] Avg episode reward: [(0, '112.740')] [2024-06-15 12:02:55,957][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000041696_85393408.pth... [2024-06-15 12:02:55,958][1651669] Updated weights for policy 0, policy_version 41696 (0.0012) [2024-06-15 12:02:56,142][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000036352_74448896.pth [2024-06-15 12:02:57,700][1651669] Updated weights for policy 0, policy_version 41765 (0.0022) [2024-06-15 12:02:59,399][1651669] Updated weights for policy 0, policy_version 41829 (0.0149) [2024-06-15 12:03:00,766][1648981] Fps is (10 sec: 49176.3, 60 sec: 44236.8, 300 sec: 45764.1). Total num frames: 85721088. Throughput: 0: 11161.6. Samples: 21481472. Policy #0 lag: (min: 103.0, avg: 170.2, max: 362.0) [2024-06-15 12:03:00,767][1648981] Avg episode reward: [(0, '110.790')] [2024-06-15 12:03:01,715][1651669] Updated weights for policy 0, policy_version 41872 (0.0012) [2024-06-15 12:03:05,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 45764.1). Total num frames: 85852160. Throughput: 0: 11152.1. Samples: 21557760. Policy #0 lag: (min: 15.0, avg: 155.9, max: 271.0) [2024-06-15 12:03:05,767][1648981] Avg episode reward: [(0, '111.400')] [2024-06-15 12:03:07,293][1651669] Updated weights for policy 0, policy_version 41952 (0.0016) [2024-06-15 12:03:08,678][1651669] Updated weights for policy 0, policy_version 42000 (0.0014) [2024-06-15 12:03:10,724][1651669] Updated weights for policy 0, policy_version 42066 (0.0027) [2024-06-15 12:03:10,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 45875.2, 300 sec: 45430.9). Total num frames: 86147072. Throughput: 0: 11392.4. Samples: 21591552. Policy #0 lag: (min: 15.0, avg: 155.9, max: 271.0) [2024-06-15 12:03:10,767][1648981] Avg episode reward: [(0, '110.030')] [2024-06-15 12:03:11,661][1651669] Updated weights for policy 0, policy_version 42112 (0.0013) [2024-06-15 12:03:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 44783.0, 300 sec: 45764.1). Total num frames: 86376448. Throughput: 0: 11275.4. Samples: 21649920. Policy #0 lag: (min: 15.0, avg: 155.9, max: 271.0) [2024-06-15 12:03:15,767][1648981] Avg episode reward: [(0, '107.780')] [2024-06-15 12:03:18,918][1651669] Updated weights for policy 0, policy_version 42192 (0.0126) [2024-06-15 12:03:20,767][1648981] Fps is (10 sec: 39320.5, 60 sec: 45881.0, 300 sec: 45319.8). Total num frames: 86540288. Throughput: 0: 11537.0. Samples: 21727744. Policy #0 lag: (min: 15.0, avg: 155.9, max: 271.0) [2024-06-15 12:03:20,767][1648981] Avg episode reward: [(0, '107.390')] [2024-06-15 12:03:20,803][1651669] Updated weights for policy 0, policy_version 42260 (0.0037) [2024-06-15 12:03:23,236][1651669] Updated weights for policy 0, policy_version 42358 (0.0142) [2024-06-15 12:03:25,025][1651669] Updated weights for policy 0, policy_version 42400 (0.0015) [2024-06-15 12:03:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45885.0, 300 sec: 46208.4). Total num frames: 86900736. Throughput: 0: 11202.5. Samples: 21747200. Policy #0 lag: (min: 15.0, avg: 155.9, max: 271.0) [2024-06-15 12:03:25,767][1648981] Avg episode reward: [(0, '108.500')] [2024-06-15 12:03:30,766][1648981] Fps is (10 sec: 36045.6, 60 sec: 43690.7, 300 sec: 45208.7). Total num frames: 86900736. Throughput: 0: 11195.7. Samples: 21828096. Policy #0 lag: (min: 15.0, avg: 155.9, max: 271.0) [2024-06-15 12:03:30,767][1648981] Avg episode reward: [(0, '112.100')] [2024-06-15 12:03:30,811][1651669] Updated weights for policy 0, policy_version 42448 (0.0013) [2024-06-15 12:03:32,508][1651669] Updated weights for policy 0, policy_version 42512 (0.0036) [2024-06-15 12:03:33,915][1651274] Signal inference workers to stop experience collection... (2250 times) [2024-06-15 12:03:33,944][1651669] InferenceWorker_p0-w0: stopping experience collection (2250 times) [2024-06-15 12:03:34,063][1651274] Signal inference workers to resume experience collection... (2250 times) [2024-06-15 12:03:34,064][1651669] InferenceWorker_p0-w0: resuming experience collection (2250 times) [2024-06-15 12:03:34,067][1651669] Updated weights for policy 0, policy_version 42576 (0.0012) [2024-06-15 12:03:35,135][1651669] Updated weights for policy 0, policy_version 42623 (0.0016) [2024-06-15 12:03:35,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 45875.2). Total num frames: 87326720. Throughput: 0: 11401.8. Samples: 21891584. Policy #0 lag: (min: 5.0, avg: 64.1, max: 261.0) [2024-06-15 12:03:35,767][1648981] Avg episode reward: [(0, '113.040')] [2024-06-15 12:03:36,792][1651669] Updated weights for policy 0, policy_version 42680 (0.0019) [2024-06-15 12:03:40,768][1648981] Fps is (10 sec: 52423.0, 60 sec: 43709.6, 300 sec: 45764.0). Total num frames: 87425024. Throughput: 0: 11320.6. Samples: 21924864. Policy #0 lag: (min: 5.0, avg: 64.1, max: 261.0) [2024-06-15 12:03:40,768][1648981] Avg episode reward: [(0, '115.170')] [2024-06-15 12:03:40,773][1651274] Saving new best policy, reward=115.170! [2024-06-15 12:03:43,185][1651669] Updated weights for policy 0, policy_version 42750 (0.0013) [2024-06-15 12:03:44,114][1651669] Updated weights for policy 0, policy_version 42787 (0.0015) [2024-06-15 12:03:45,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46967.5, 300 sec: 45542.0). Total num frames: 87752704. Throughput: 0: 11468.8. Samples: 21997568. Policy #0 lag: (min: 5.0, avg: 64.1, max: 261.0) [2024-06-15 12:03:45,767][1648981] Avg episode reward: [(0, '117.550')] [2024-06-15 12:03:45,864][1651669] Updated weights for policy 0, policy_version 42852 (0.0012) [2024-06-15 12:03:46,085][1651274] Saving new best policy, reward=117.550! [2024-06-15 12:03:47,547][1651669] Updated weights for policy 0, policy_version 42914 (0.0017) [2024-06-15 12:03:50,798][1648981] Fps is (10 sec: 52270.9, 60 sec: 45309.2, 300 sec: 45759.3). Total num frames: 87949312. Throughput: 0: 11222.1. Samples: 22063104. Policy #0 lag: (min: 5.0, avg: 64.1, max: 261.0) [2024-06-15 12:03:50,798][1648981] Avg episode reward: [(0, '118.470')] [2024-06-15 12:03:50,799][1651274] Saving new best policy, reward=118.470! [2024-06-15 12:03:54,159][1651669] Updated weights for policy 0, policy_version 42976 (0.0051) [2024-06-15 12:03:55,768][1648981] Fps is (10 sec: 42589.4, 60 sec: 46965.9, 300 sec: 45431.0). Total num frames: 88178688. Throughput: 0: 11354.5. Samples: 22102528. Policy #0 lag: (min: 5.0, avg: 64.1, max: 261.0) [2024-06-15 12:03:55,769][1648981] Avg episode reward: [(0, '121.500')] [2024-06-15 12:03:56,142][1651274] Saving new best policy, reward=121.500! [2024-06-15 12:03:56,146][1651669] Updated weights for policy 0, policy_version 43072 (0.0014) [2024-06-15 12:03:58,114][1651669] Updated weights for policy 0, policy_version 43137 (0.0013) [2024-06-15 12:03:59,407][1651669] Updated weights for policy 0, policy_version 43195 (0.0014) [2024-06-15 12:04:00,767][1648981] Fps is (10 sec: 52588.8, 60 sec: 45874.5, 300 sec: 46097.2). Total num frames: 88473600. Throughput: 0: 11229.6. Samples: 22155264. Policy #0 lag: (min: 15.0, avg: 184.6, max: 287.0) [2024-06-15 12:04:00,768][1648981] Avg episode reward: [(0, '120.470')] [2024-06-15 12:04:05,689][1651669] Updated weights for policy 0, policy_version 43233 (0.0057) [2024-06-15 12:04:05,766][1648981] Fps is (10 sec: 36051.9, 60 sec: 44782.9, 300 sec: 45208.7). Total num frames: 88539136. Throughput: 0: 11389.2. Samples: 22240256. Policy #0 lag: (min: 15.0, avg: 184.6, max: 287.0) [2024-06-15 12:04:05,767][1648981] Avg episode reward: [(0, '119.880')] [2024-06-15 12:04:06,635][1651669] Updated weights for policy 0, policy_version 43281 (0.0014) [2024-06-15 12:04:08,093][1651669] Updated weights for policy 0, policy_version 43348 (0.0101) [2024-06-15 12:04:09,214][1651669] Updated weights for policy 0, policy_version 43395 (0.0013) [2024-06-15 12:04:10,774][1648981] Fps is (10 sec: 52392.9, 60 sec: 47507.4, 300 sec: 46211.6). Total num frames: 88997888. Throughput: 0: 11626.1. Samples: 22270464. Policy #0 lag: (min: 15.0, avg: 184.6, max: 287.0) [2024-06-15 12:04:10,775][1648981] Avg episode reward: [(0, '118.750')] [2024-06-15 12:04:15,770][1648981] Fps is (10 sec: 45858.1, 60 sec: 43687.9, 300 sec: 45210.6). Total num frames: 88997888. Throughput: 0: 11479.2. Samples: 22344704. Policy #0 lag: (min: 15.0, avg: 184.6, max: 287.0) [2024-06-15 12:04:15,771][1648981] Avg episode reward: [(0, '121.240')] [2024-06-15 12:04:15,969][1651274] Signal inference workers to stop experience collection... (2300 times) [2024-06-15 12:04:16,081][1651669] InferenceWorker_p0-w0: stopping experience collection (2300 times) [2024-06-15 12:04:16,083][1651669] Updated weights for policy 0, policy_version 43465 (0.0088) [2024-06-15 12:04:16,164][1651274] Signal inference workers to resume experience collection... (2300 times) [2024-06-15 12:04:16,165][1651669] InferenceWorker_p0-w0: resuming experience collection (2300 times) [2024-06-15 12:04:16,978][1651669] Updated weights for policy 0, policy_version 43520 (0.0015) [2024-06-15 12:04:18,384][1651669] Updated weights for policy 0, policy_version 43584 (0.0012) [2024-06-15 12:04:20,370][1651669] Updated weights for policy 0, policy_version 43665 (0.0099) [2024-06-15 12:04:20,766][1648981] Fps is (10 sec: 45911.0, 60 sec: 48606.1, 300 sec: 45986.3). Total num frames: 89456640. Throughput: 0: 11571.2. Samples: 22412288. Policy #0 lag: (min: 15.0, avg: 184.6, max: 287.0) [2024-06-15 12:04:20,767][1648981] Avg episode reward: [(0, '119.370')] [2024-06-15 12:04:21,437][1651669] Updated weights for policy 0, policy_version 43708 (0.0012) [2024-06-15 12:04:25,790][1648981] Fps is (10 sec: 52323.7, 60 sec: 43673.3, 300 sec: 45871.5). Total num frames: 89522176. Throughput: 0: 11656.3. Samples: 22449664. Policy #0 lag: (min: 15.0, avg: 184.6, max: 287.0) [2024-06-15 12:04:25,791][1648981] Avg episode reward: [(0, '119.680')] [2024-06-15 12:04:27,909][1651669] Updated weights for policy 0, policy_version 43760 (0.0013) [2024-06-15 12:04:29,080][1651669] Updated weights for policy 0, policy_version 43810 (0.0012) [2024-06-15 12:04:30,620][1651669] Updated weights for policy 0, policy_version 43888 (0.0013) [2024-06-15 12:04:30,798][1648981] Fps is (10 sec: 42463.1, 60 sec: 49671.8, 300 sec: 46095.4). Total num frames: 89882624. Throughput: 0: 11744.9. Samples: 22526464. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 12:04:30,799][1648981] Avg episode reward: [(0, '119.610')] [2024-06-15 12:04:32,230][1651669] Updated weights for policy 0, policy_version 43940 (0.0194) [2024-06-15 12:04:35,766][1648981] Fps is (10 sec: 52554.1, 60 sec: 45329.1, 300 sec: 46210.4). Total num frames: 90046464. Throughput: 0: 11920.8. Samples: 22599168. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 12:04:35,767][1648981] Avg episode reward: [(0, '118.630')] [2024-06-15 12:04:38,225][1651669] Updated weights for policy 0, policy_version 44005 (0.0014) [2024-06-15 12:04:40,005][1651669] Updated weights for policy 0, policy_version 44080 (0.0197) [2024-06-15 12:04:40,766][1648981] Fps is (10 sec: 46021.9, 60 sec: 48606.8, 300 sec: 45875.2). Total num frames: 90341376. Throughput: 0: 11844.8. Samples: 22635520. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 12:04:40,844][1648981] Avg episode reward: [(0, '117.730')] [2024-06-15 12:04:41,274][1651669] Updated weights for policy 0, policy_version 44154 (0.0013) [2024-06-15 12:04:44,077][1651669] Updated weights for policy 0, policy_version 44213 (0.0014) [2024-06-15 12:04:45,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 46967.2, 300 sec: 46208.4). Total num frames: 90570752. Throughput: 0: 12106.1. Samples: 22700032. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 12:04:45,767][1648981] Avg episode reward: [(0, '118.160')] [2024-06-15 12:04:48,983][1651669] Updated weights for policy 0, policy_version 44256 (0.0013) [2024-06-15 12:04:50,257][1651669] Updated weights for policy 0, policy_version 44305 (0.0014) [2024-06-15 12:04:50,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46992.0, 300 sec: 45875.2). Total num frames: 90767360. Throughput: 0: 11889.8. Samples: 22775296. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 12:04:50,767][1648981] Avg episode reward: [(0, '117.240')] [2024-06-15 12:04:51,035][1651274] Signal inference workers to stop experience collection... (2350 times) [2024-06-15 12:04:51,084][1651669] InferenceWorker_p0-w0: stopping experience collection (2350 times) [2024-06-15 12:04:51,367][1651274] Signal inference workers to resume experience collection... (2350 times) [2024-06-15 12:04:51,378][1651669] InferenceWorker_p0-w0: resuming experience collection (2350 times) [2024-06-15 12:04:51,858][1651669] Updated weights for policy 0, policy_version 44380 (0.0013) [2024-06-15 12:04:52,365][1651669] Updated weights for policy 0, policy_version 44416 (0.0012) [2024-06-15 12:04:55,090][1651669] Updated weights for policy 0, policy_version 44475 (0.0013) [2024-06-15 12:04:55,767][1648981] Fps is (10 sec: 52429.3, 60 sec: 48607.4, 300 sec: 46208.4). Total num frames: 91095040. Throughput: 0: 11937.3. Samples: 22807552. Policy #0 lag: (min: 15.0, avg: 74.3, max: 271.0) [2024-06-15 12:04:55,767][1648981] Avg episode reward: [(0, '116.100')] [2024-06-15 12:04:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000044480_91095040.pth... [2024-06-15 12:04:55,832][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000039088_80052224.pth [2024-06-15 12:04:59,950][1651669] Updated weights for policy 0, policy_version 44528 (0.0023) [2024-06-15 12:05:00,767][1648981] Fps is (10 sec: 49147.5, 60 sec: 46421.3, 300 sec: 45989.2). Total num frames: 91258880. Throughput: 0: 12118.1. Samples: 22889984. Policy #0 lag: (min: 15.0, avg: 82.8, max: 271.0) [2024-06-15 12:05:00,769][1648981] Avg episode reward: [(0, '116.460')] [2024-06-15 12:05:01,445][1651669] Updated weights for policy 0, policy_version 44592 (0.0015) [2024-06-15 12:05:02,996][1651669] Updated weights for policy 0, policy_version 44661 (0.0051) [2024-06-15 12:05:05,098][1651669] Updated weights for policy 0, policy_version 44704 (0.0012) [2024-06-15 12:05:05,784][1648981] Fps is (10 sec: 52337.0, 60 sec: 51321.4, 300 sec: 46427.8). Total num frames: 91619328. Throughput: 0: 12078.4. Samples: 22956032. Policy #0 lag: (min: 15.0, avg: 82.8, max: 271.0) [2024-06-15 12:05:05,785][1648981] Avg episode reward: [(0, '118.180')] [2024-06-15 12:05:09,640][1651669] Updated weights for policy 0, policy_version 44737 (0.0014) [2024-06-15 12:05:10,767][1648981] Fps is (10 sec: 45878.7, 60 sec: 45334.8, 300 sec: 45986.3). Total num frames: 91717632. Throughput: 0: 12271.7. Samples: 23001600. Policy #0 lag: (min: 15.0, avg: 82.8, max: 271.0) [2024-06-15 12:05:10,767][1648981] Avg episode reward: [(0, '116.620')] [2024-06-15 12:05:11,747][1651669] Updated weights for policy 0, policy_version 44821 (0.0050) [2024-06-15 12:05:12,912][1651669] Updated weights for policy 0, policy_version 44881 (0.0012) [2024-06-15 12:05:13,808][1651669] Updated weights for policy 0, policy_version 44927 (0.0013) [2024-06-15 12:05:15,766][1648981] Fps is (10 sec: 42674.0, 60 sec: 50793.6, 300 sec: 46319.5). Total num frames: 92045312. Throughput: 0: 11932.3. Samples: 23063040. Policy #0 lag: (min: 15.0, avg: 82.8, max: 271.0) [2024-06-15 12:05:15,767][1648981] Avg episode reward: [(0, '116.210')] [2024-06-15 12:05:20,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 44782.7, 300 sec: 46097.3). Total num frames: 92143616. Throughput: 0: 12083.1. Samples: 23142912. Policy #0 lag: (min: 15.0, avg: 82.8, max: 271.0) [2024-06-15 12:05:20,768][1648981] Avg episode reward: [(0, '119.250')] [2024-06-15 12:05:22,281][1651669] Updated weights for policy 0, policy_version 45056 (0.0015) [2024-06-15 12:05:23,710][1651669] Updated weights for policy 0, policy_version 45125 (0.0015) [2024-06-15 12:05:24,915][1651669] Updated weights for policy 0, policy_version 45180 (0.0013) [2024-06-15 12:05:25,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50264.3, 300 sec: 46213.3). Total num frames: 92536832. Throughput: 0: 11810.1. Samples: 23166976. Policy #0 lag: (min: 15.0, avg: 82.8, max: 271.0) [2024-06-15 12:05:25,767][1648981] Avg episode reward: [(0, '119.670')] [2024-06-15 12:05:28,075][1651669] Updated weights for policy 0, policy_version 45218 (0.0012) [2024-06-15 12:05:30,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 46446.0, 300 sec: 46208.4). Total num frames: 92667904. Throughput: 0: 11958.1. Samples: 23238144. Policy #0 lag: (min: 31.0, avg: 163.2, max: 287.0) [2024-06-15 12:05:30,767][1648981] Avg episode reward: [(0, '120.260')] [2024-06-15 12:05:31,712][1651669] Updated weights for policy 0, policy_version 45251 (0.0037) [2024-06-15 12:05:32,064][1651274] Signal inference workers to stop experience collection... (2400 times) [2024-06-15 12:05:32,140][1651669] InferenceWorker_p0-w0: stopping experience collection (2400 times) [2024-06-15 12:05:32,401][1651274] Signal inference workers to resume experience collection... (2400 times) [2024-06-15 12:05:32,402][1651669] InferenceWorker_p0-w0: resuming experience collection (2400 times) [2024-06-15 12:05:33,432][1651669] Updated weights for policy 0, policy_version 45314 (0.0022) [2024-06-15 12:05:34,702][1651669] Updated weights for policy 0, policy_version 45375 (0.0033) [2024-06-15 12:05:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 45986.3). Total num frames: 92995584. Throughput: 0: 11764.6. Samples: 23304704. Policy #0 lag: (min: 31.0, avg: 163.2, max: 287.0) [2024-06-15 12:05:35,767][1648981] Avg episode reward: [(0, '120.880')] [2024-06-15 12:05:36,330][1651669] Updated weights for policy 0, policy_version 45428 (0.0013) [2024-06-15 12:05:39,592][1651669] Updated weights for policy 0, policy_version 45472 (0.0013) [2024-06-15 12:05:40,798][1648981] Fps is (10 sec: 52262.1, 60 sec: 47488.4, 300 sec: 46203.5). Total num frames: 93192192. Throughput: 0: 11881.4. Samples: 23342592. Policy #0 lag: (min: 31.0, avg: 163.2, max: 287.0) [2024-06-15 12:05:40,799][1648981] Avg episode reward: [(0, '124.320')] [2024-06-15 12:05:40,803][1651274] Saving new best policy, reward=124.320! [2024-06-15 12:05:42,994][1651669] Updated weights for policy 0, policy_version 45526 (0.0013) [2024-06-15 12:05:43,789][1651669] Updated weights for policy 0, policy_version 45565 (0.0014) [2024-06-15 12:05:45,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 46967.6, 300 sec: 45875.2). Total num frames: 93388800. Throughput: 0: 11696.5. Samples: 23416320. Policy #0 lag: (min: 31.0, avg: 163.2, max: 287.0) [2024-06-15 12:05:45,767][1648981] Avg episode reward: [(0, '120.290')] [2024-06-15 12:05:46,073][1651669] Updated weights for policy 0, policy_version 45624 (0.0013) [2024-06-15 12:05:47,736][1651669] Updated weights for policy 0, policy_version 45692 (0.0014) [2024-06-15 12:05:50,766][1648981] Fps is (10 sec: 46021.5, 60 sec: 48059.7, 300 sec: 46208.4). Total num frames: 93650944. Throughput: 0: 11723.7. Samples: 23483392. Policy #0 lag: (min: 31.0, avg: 163.2, max: 287.0) [2024-06-15 12:05:50,767][1648981] Avg episode reward: [(0, '119.030')] [2024-06-15 12:05:50,776][1651669] Updated weights for policy 0, policy_version 45744 (0.0029) [2024-06-15 12:05:54,294][1651669] Updated weights for policy 0, policy_version 45794 (0.0017) [2024-06-15 12:05:55,786][1648981] Fps is (10 sec: 45788.1, 60 sec: 45860.7, 300 sec: 45983.3). Total num frames: 93847552. Throughput: 0: 11611.8. Samples: 23524352. Policy #0 lag: (min: 31.0, avg: 163.2, max: 287.0) [2024-06-15 12:05:55,786][1648981] Avg episode reward: [(0, '120.210')] [2024-06-15 12:05:56,205][1651669] Updated weights for policy 0, policy_version 45840 (0.0129) [2024-06-15 12:05:58,476][1651669] Updated weights for policy 0, policy_version 45907 (0.0014) [2024-06-15 12:06:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47514.3, 300 sec: 46208.4). Total num frames: 94109696. Throughput: 0: 11559.8. Samples: 23583232. Policy #0 lag: (min: 47.0, avg: 152.1, max: 303.0) [2024-06-15 12:06:00,767][1648981] Avg episode reward: [(0, '119.160')] [2024-06-15 12:06:01,425][1651669] Updated weights for policy 0, policy_version 45970 (0.0013) [2024-06-15 12:06:02,491][1651669] Updated weights for policy 0, policy_version 46013 (0.0013) [2024-06-15 12:06:05,766][1648981] Fps is (10 sec: 45963.2, 60 sec: 44796.1, 300 sec: 46208.4). Total num frames: 94306304. Throughput: 0: 11525.8. Samples: 23661568. Policy #0 lag: (min: 47.0, avg: 152.1, max: 303.0) [2024-06-15 12:06:05,767][1648981] Avg episode reward: [(0, '120.270')] [2024-06-15 12:06:06,192][1651669] Updated weights for policy 0, policy_version 46080 (0.0038) [2024-06-15 12:06:07,673][1651669] Updated weights for policy 0, policy_version 46141 (0.0103) [2024-06-15 12:06:10,335][1651669] Updated weights for policy 0, policy_version 46206 (0.0013) [2024-06-15 12:06:10,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48606.0, 300 sec: 46208.5). Total num frames: 94633984. Throughput: 0: 11707.7. Samples: 23693824. Policy #0 lag: (min: 47.0, avg: 152.1, max: 303.0) [2024-06-15 12:06:10,767][1648981] Avg episode reward: [(0, '123.330')] [2024-06-15 12:06:13,777][1651669] Updated weights for policy 0, policy_version 46263 (0.0013) [2024-06-15 12:06:15,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 94765056. Throughput: 0: 11730.4. Samples: 23766016. Policy #0 lag: (min: 47.0, avg: 152.1, max: 303.0) [2024-06-15 12:06:15,767][1648981] Avg episode reward: [(0, '123.360')] [2024-06-15 12:06:16,767][1651274] Signal inference workers to stop experience collection... (2450 times) [2024-06-15 12:06:16,824][1651669] InferenceWorker_p0-w0: stopping experience collection (2450 times) [2024-06-15 12:06:17,000][1651274] Signal inference workers to resume experience collection... (2450 times) [2024-06-15 12:06:17,001][1651669] InferenceWorker_p0-w0: resuming experience collection (2450 times) [2024-06-15 12:06:17,161][1651669] Updated weights for policy 0, policy_version 46322 (0.0016) [2024-06-15 12:06:18,461][1651669] Updated weights for policy 0, policy_version 46389 (0.0019) [2024-06-15 12:06:20,633][1651669] Updated weights for policy 0, policy_version 46422 (0.0013) [2024-06-15 12:06:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48606.1, 300 sec: 46097.4). Total num frames: 95059968. Throughput: 0: 11844.3. Samples: 23837696. Policy #0 lag: (min: 47.0, avg: 152.1, max: 303.0) [2024-06-15 12:06:20,767][1648981] Avg episode reward: [(0, '126.930')] [2024-06-15 12:06:21,296][1651274] Saving new best policy, reward=126.930! [2024-06-15 12:06:21,709][1651669] Updated weights for policy 0, policy_version 46464 (0.0015) [2024-06-15 12:06:25,453][1651669] Updated weights for policy 0, policy_version 46525 (0.0029) [2024-06-15 12:06:25,773][1648981] Fps is (10 sec: 52394.0, 60 sec: 45870.0, 300 sec: 46207.4). Total num frames: 95289344. Throughput: 0: 11771.2. Samples: 23872000. Policy #0 lag: (min: 47.0, avg: 152.1, max: 303.0) [2024-06-15 12:06:25,774][1648981] Avg episode reward: [(0, '126.360')] [2024-06-15 12:06:27,376][1651669] Updated weights for policy 0, policy_version 46592 (0.0013) [2024-06-15 12:06:29,555][1651669] Updated weights for policy 0, policy_version 46656 (0.0013) [2024-06-15 12:06:30,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.6, 300 sec: 46320.7). Total num frames: 95551488. Throughput: 0: 11685.0. Samples: 23942144. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 12:06:30,767][1648981] Avg episode reward: [(0, '129.300')] [2024-06-15 12:06:30,768][1651274] Saving new best policy, reward=129.300! [2024-06-15 12:06:32,770][1651669] Updated weights for policy 0, policy_version 46712 (0.0013) [2024-06-15 12:06:35,766][1648981] Fps is (10 sec: 39348.4, 60 sec: 44782.9, 300 sec: 46098.6). Total num frames: 95682560. Throughput: 0: 11889.8. Samples: 24018432. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 12:06:35,767][1648981] Avg episode reward: [(0, '130.590')] [2024-06-15 12:06:36,145][1651274] Saving new best policy, reward=130.590! [2024-06-15 12:06:36,972][1651669] Updated weights for policy 0, policy_version 46784 (0.0022) [2024-06-15 12:06:38,521][1651669] Updated weights for policy 0, policy_version 46834 (0.0025) [2024-06-15 12:06:40,052][1651669] Updated weights for policy 0, policy_version 46901 (0.0020) [2024-06-15 12:06:40,779][1648981] Fps is (10 sec: 52360.5, 60 sec: 48074.7, 300 sec: 46983.9). Total num frames: 96075776. Throughput: 0: 11720.7. Samples: 24051712. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 12:06:40,780][1648981] Avg episode reward: [(0, '132.100')] [2024-06-15 12:06:40,784][1651274] Saving new best policy, reward=132.100! [2024-06-15 12:06:43,453][1651669] Updated weights for policy 0, policy_version 46948 (0.0013) [2024-06-15 12:06:45,673][1651669] Updated weights for policy 0, policy_version 46981 (0.0013) [2024-06-15 12:06:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46967.6, 300 sec: 46208.4). Total num frames: 96206848. Throughput: 0: 12094.6. Samples: 24127488. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 12:06:45,767][1648981] Avg episode reward: [(0, '129.650')] [2024-06-15 12:06:46,964][1651669] Updated weights for policy 0, policy_version 47035 (0.0012) [2024-06-15 12:06:50,305][1651669] Updated weights for policy 0, policy_version 47104 (0.0129) [2024-06-15 12:06:50,766][1648981] Fps is (10 sec: 42654.0, 60 sec: 47513.6, 300 sec: 46764.0). Total num frames: 96501760. Throughput: 0: 11696.4. Samples: 24187904. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 12:06:50,767][1648981] Avg episode reward: [(0, '131.410')] [2024-06-15 12:06:51,390][1651669] Updated weights for policy 0, policy_version 47160 (0.0014) [2024-06-15 12:06:55,132][1651669] Updated weights for policy 0, policy_version 47201 (0.0012) [2024-06-15 12:06:55,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 48074.9, 300 sec: 46319.5). Total num frames: 96731136. Throughput: 0: 11832.8. Samples: 24226304. Policy #0 lag: (min: 15.0, avg: 119.4, max: 271.0) [2024-06-15 12:06:55,767][1648981] Avg episode reward: [(0, '131.580')] [2024-06-15 12:06:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000047232_96731136.pth... [2024-06-15 12:06:55,808][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000041696_85393408.pth [2024-06-15 12:06:57,042][1651669] Updated weights for policy 0, policy_version 47234 (0.0012) [2024-06-15 12:07:00,774][1648981] Fps is (10 sec: 36016.7, 60 sec: 45869.3, 300 sec: 46207.2). Total num frames: 96862208. Throughput: 0: 11717.1. Samples: 24293376. Policy #0 lag: (min: 15.0, avg: 122.5, max: 271.0) [2024-06-15 12:07:00,775][1648981] Avg episode reward: [(0, '126.970')] [2024-06-15 12:07:01,065][1651669] Updated weights for policy 0, policy_version 47312 (0.0138) [2024-06-15 12:07:01,543][1651274] Signal inference workers to stop experience collection... (2500 times) [2024-06-15 12:07:01,596][1651669] InferenceWorker_p0-w0: stopping experience collection (2500 times) [2024-06-15 12:07:01,753][1651274] Signal inference workers to resume experience collection... (2500 times) [2024-06-15 12:07:01,754][1651669] InferenceWorker_p0-w0: resuming experience collection (2500 times) [2024-06-15 12:07:02,728][1651669] Updated weights for policy 0, policy_version 47379 (0.0021) [2024-06-15 12:07:05,774][1648981] Fps is (10 sec: 39291.9, 60 sec: 46961.4, 300 sec: 46540.4). Total num frames: 97124352. Throughput: 0: 11694.3. Samples: 24364032. Policy #0 lag: (min: 15.0, avg: 122.5, max: 271.0) [2024-06-15 12:07:05,775][1648981] Avg episode reward: [(0, '125.730')] [2024-06-15 12:07:06,304][1651669] Updated weights for policy 0, policy_version 47444 (0.0016) [2024-06-15 12:07:07,241][1651669] Updated weights for policy 0, policy_version 47488 (0.0012) [2024-06-15 12:07:09,816][1651669] Updated weights for policy 0, policy_version 47551 (0.0040) [2024-06-15 12:07:10,766][1648981] Fps is (10 sec: 52470.0, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 97386496. Throughput: 0: 11811.9. Samples: 24403456. Policy #0 lag: (min: 15.0, avg: 122.5, max: 271.0) [2024-06-15 12:07:10,767][1648981] Avg episode reward: [(0, '124.560')] [2024-06-15 12:07:13,908][1651669] Updated weights for policy 0, policy_version 47647 (0.0153) [2024-06-15 12:07:15,766][1648981] Fps is (10 sec: 52469.5, 60 sec: 48059.8, 300 sec: 46987.2). Total num frames: 97648640. Throughput: 0: 11548.4. Samples: 24461824. Policy #0 lag: (min: 15.0, avg: 122.5, max: 271.0) [2024-06-15 12:07:15,767][1648981] Avg episode reward: [(0, '125.340')] [2024-06-15 12:07:17,670][1651669] Updated weights for policy 0, policy_version 47697 (0.0013) [2024-06-15 12:07:18,786][1651669] Updated weights for policy 0, policy_version 47744 (0.0013) [2024-06-15 12:07:20,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 46967.5, 300 sec: 46543.7). Total num frames: 97878016. Throughput: 0: 11616.7. Samples: 24541184. Policy #0 lag: (min: 15.0, avg: 122.5, max: 271.0) [2024-06-15 12:07:20,767][1648981] Avg episode reward: [(0, '127.480')] [2024-06-15 12:07:20,780][1651669] Updated weights for policy 0, policy_version 47803 (0.0012) [2024-06-15 12:07:23,126][1651669] Updated weights for policy 0, policy_version 47856 (0.0013) [2024-06-15 12:07:24,238][1651669] Updated weights for policy 0, policy_version 47890 (0.0019) [2024-06-15 12:07:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48065.1, 300 sec: 47097.0). Total num frames: 98172928. Throughput: 0: 11699.7. Samples: 24578048. Policy #0 lag: (min: 15.0, avg: 122.5, max: 271.0) [2024-06-15 12:07:25,767][1648981] Avg episode reward: [(0, '126.080')] [2024-06-15 12:07:29,070][1651669] Updated weights for policy 0, policy_version 47971 (0.0013) [2024-06-15 12:07:30,431][1651669] Updated weights for policy 0, policy_version 48001 (0.0015) [2024-06-15 12:07:30,790][1648981] Fps is (10 sec: 45766.1, 60 sec: 46402.9, 300 sec: 46649.0). Total num frames: 98336768. Throughput: 0: 11633.3. Samples: 24651264. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:07:30,791][1648981] Avg episode reward: [(0, '132.980')] [2024-06-15 12:07:31,049][1651274] Saving new best policy, reward=132.980! [2024-06-15 12:07:31,773][1651669] Updated weights for policy 0, policy_version 48062 (0.0013) [2024-06-15 12:07:34,047][1651669] Updated weights for policy 0, policy_version 48125 (0.0012) [2024-06-15 12:07:35,528][1651669] Updated weights for policy 0, policy_version 48176 (0.0016) [2024-06-15 12:07:35,767][1648981] Fps is (10 sec: 49151.6, 60 sec: 49698.0, 300 sec: 46990.3). Total num frames: 98664448. Throughput: 0: 11855.6. Samples: 24721408. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:07:35,767][1648981] Avg episode reward: [(0, '131.950')] [2024-06-15 12:07:40,215][1651669] Updated weights for policy 0, policy_version 48224 (0.0029) [2024-06-15 12:07:40,766][1648981] Fps is (10 sec: 45984.5, 60 sec: 45338.9, 300 sec: 46986.0). Total num frames: 98795520. Throughput: 0: 11832.9. Samples: 24758784. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:07:40,767][1648981] Avg episode reward: [(0, '131.280')] [2024-06-15 12:07:40,915][1651669] Updated weights for policy 0, policy_version 48256 (0.0014) [2024-06-15 12:07:43,169][1651669] Updated weights for policy 0, policy_version 48319 (0.0096) [2024-06-15 12:07:44,699][1651274] Signal inference workers to stop experience collection... (2550 times) [2024-06-15 12:07:44,787][1651669] InferenceWorker_p0-w0: stopping experience collection (2550 times) [2024-06-15 12:07:45,058][1651274] Signal inference workers to resume experience collection... (2550 times) [2024-06-15 12:07:45,059][1651669] InferenceWorker_p0-w0: resuming experience collection (2550 times) [2024-06-15 12:07:45,216][1651669] Updated weights for policy 0, policy_version 48374 (0.0013) [2024-06-15 12:07:45,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 48059.7, 300 sec: 46986.8). Total num frames: 99090432. Throughput: 0: 11926.0. Samples: 24829952. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:07:45,767][1648981] Avg episode reward: [(0, '129.800')] [2024-06-15 12:07:46,113][1651669] Updated weights for policy 0, policy_version 48416 (0.0070) [2024-06-15 12:07:50,530][1651669] Updated weights for policy 0, policy_version 48451 (0.0012) [2024-06-15 12:07:50,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 99254272. Throughput: 0: 11960.1. Samples: 24902144. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:07:50,767][1648981] Avg episode reward: [(0, '130.740')] [2024-06-15 12:07:53,324][1651669] Updated weights for policy 0, policy_version 48530 (0.0093) [2024-06-15 12:07:53,951][1651669] Updated weights for policy 0, policy_version 48567 (0.0014) [2024-06-15 12:07:55,235][1651669] Updated weights for policy 0, policy_version 48593 (0.0013) [2024-06-15 12:07:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 47513.8, 300 sec: 46986.0). Total num frames: 99581952. Throughput: 0: 11844.2. Samples: 24936448. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:07:55,767][1648981] Avg episode reward: [(0, '128.210')] [2024-06-15 12:07:56,712][1651669] Updated weights for policy 0, policy_version 48658 (0.0013) [2024-06-15 12:08:00,768][1648981] Fps is (10 sec: 49146.1, 60 sec: 48065.0, 300 sec: 47096.9). Total num frames: 99745792. Throughput: 0: 12310.4. Samples: 25015808. Policy #0 lag: (min: 4.0, avg: 104.3, max: 260.0) [2024-06-15 12:08:00,768][1648981] Avg episode reward: [(0, '130.110')] [2024-06-15 12:08:00,784][1651669] Updated weights for policy 0, policy_version 48705 (0.0011) [2024-06-15 12:08:01,988][1651669] Updated weights for policy 0, policy_version 48758 (0.0012) [2024-06-15 12:08:04,817][1651669] Updated weights for policy 0, policy_version 48802 (0.0014) [2024-06-15 12:08:05,773][1648981] Fps is (10 sec: 42570.6, 60 sec: 48060.7, 300 sec: 46984.9). Total num frames: 100007936. Throughput: 0: 12013.2. Samples: 25081856. Policy #0 lag: (min: 3.0, avg: 103.4, max: 259.0) [2024-06-15 12:08:05,772][1651669] Updated weights for policy 0, policy_version 48833 (0.0078) [2024-06-15 12:08:05,773][1648981] Avg episode reward: [(0, '135.380')] [2024-06-15 12:08:06,437][1651274] Saving new best policy, reward=135.380! [2024-06-15 12:08:07,160][1651669] Updated weights for policy 0, policy_version 48888 (0.0013) [2024-06-15 12:08:08,764][1651669] Updated weights for policy 0, policy_version 48950 (0.0071) [2024-06-15 12:08:10,766][1648981] Fps is (10 sec: 52435.2, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 100270080. Throughput: 0: 11832.9. Samples: 25110528. Policy #0 lag: (min: 3.0, avg: 103.4, max: 259.0) [2024-06-15 12:08:10,767][1648981] Avg episode reward: [(0, '132.530')] [2024-06-15 12:08:12,073][1651669] Updated weights for policy 0, policy_version 48978 (0.0012) [2024-06-15 12:08:15,189][1651669] Updated weights for policy 0, policy_version 49026 (0.0031) [2024-06-15 12:08:15,766][1648981] Fps is (10 sec: 42626.5, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 100433920. Throughput: 0: 11918.8. Samples: 25187328. Policy #0 lag: (min: 3.0, avg: 103.4, max: 259.0) [2024-06-15 12:08:15,767][1648981] Avg episode reward: [(0, '133.580')] [2024-06-15 12:08:16,592][1651669] Updated weights for policy 0, policy_version 49078 (0.0035) [2024-06-15 12:08:18,052][1651669] Updated weights for policy 0, policy_version 49120 (0.0014) [2024-06-15 12:08:19,847][1651669] Updated weights for policy 0, policy_version 49185 (0.0019) [2024-06-15 12:08:20,769][1648981] Fps is (10 sec: 52417.1, 60 sec: 48604.0, 300 sec: 47096.7). Total num frames: 100794368. Throughput: 0: 11661.7. Samples: 25246208. Policy #0 lag: (min: 3.0, avg: 103.4, max: 259.0) [2024-06-15 12:08:20,770][1648981] Avg episode reward: [(0, '133.850')] [2024-06-15 12:08:22,927][1651669] Updated weights for policy 0, policy_version 49232 (0.0018) [2024-06-15 12:08:23,927][1651669] Updated weights for policy 0, policy_version 49276 (0.0012) [2024-06-15 12:08:25,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 100925440. Throughput: 0: 11764.6. Samples: 25288192. Policy #0 lag: (min: 3.0, avg: 103.4, max: 259.0) [2024-06-15 12:08:25,767][1648981] Avg episode reward: [(0, '136.330')] [2024-06-15 12:08:25,773][1651274] Saving new best policy, reward=136.330! [2024-06-15 12:08:27,697][1651669] Updated weights for policy 0, policy_version 49333 (0.0014) [2024-06-15 12:08:29,288][1651274] Signal inference workers to stop experience collection... (2600 times) [2024-06-15 12:08:29,308][1651669] InferenceWorker_p0-w0: stopping experience collection (2600 times) [2024-06-15 12:08:29,455][1651274] Signal inference workers to resume experience collection... (2600 times) [2024-06-15 12:08:29,456][1651669] InferenceWorker_p0-w0: resuming experience collection (2600 times) [2024-06-15 12:08:29,931][1651669] Updated weights for policy 0, policy_version 49377 (0.0014) [2024-06-15 12:08:30,768][1648981] Fps is (10 sec: 39324.2, 60 sec: 47531.2, 300 sec: 46985.7). Total num frames: 101187584. Throughput: 0: 11798.3. Samples: 25360896. Policy #0 lag: (min: 3.0, avg: 103.4, max: 259.0) [2024-06-15 12:08:30,769][1648981] Avg episode reward: [(0, '134.720')] [2024-06-15 12:08:31,404][1651669] Updated weights for policy 0, policy_version 49441 (0.0012) [2024-06-15 12:08:34,014][1651669] Updated weights for policy 0, policy_version 49476 (0.0016) [2024-06-15 12:08:35,349][1651669] Updated weights for policy 0, policy_version 49536 (0.0031) [2024-06-15 12:08:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 46421.5, 300 sec: 47541.6). Total num frames: 101449728. Throughput: 0: 11832.9. Samples: 25434624. Policy #0 lag: (min: 116.0, avg: 201.9, max: 324.0) [2024-06-15 12:08:35,767][1648981] Avg episode reward: [(0, '132.720')] [2024-06-15 12:08:38,935][1651669] Updated weights for policy 0, policy_version 49589 (0.0014) [2024-06-15 12:08:39,925][1651669] Updated weights for policy 0, policy_version 49617 (0.0013) [2024-06-15 12:08:40,766][1648981] Fps is (10 sec: 52437.2, 60 sec: 48605.9, 300 sec: 47319.2). Total num frames: 101711872. Throughput: 0: 11867.0. Samples: 25470464. Policy #0 lag: (min: 116.0, avg: 201.9, max: 324.0) [2024-06-15 12:08:40,767][1648981] Avg episode reward: [(0, '134.790')] [2024-06-15 12:08:41,285][1651669] Updated weights for policy 0, policy_version 49696 (0.0016) [2024-06-15 12:08:44,981][1651669] Updated weights for policy 0, policy_version 49746 (0.0012) [2024-06-15 12:08:45,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47513.6, 300 sec: 47435.3). Total num frames: 101941248. Throughput: 0: 11856.0. Samples: 25549312. Policy #0 lag: (min: 116.0, avg: 201.9, max: 324.0) [2024-06-15 12:08:45,767][1648981] Avg episode reward: [(0, '134.910')] [2024-06-15 12:08:48,353][1651669] Updated weights for policy 0, policy_version 49798 (0.0014) [2024-06-15 12:08:49,417][1651669] Updated weights for policy 0, policy_version 49855 (0.0014) [2024-06-15 12:08:50,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.9, 300 sec: 47430.6). Total num frames: 102170624. Throughput: 0: 12028.1. Samples: 25623040. Policy #0 lag: (min: 116.0, avg: 201.9, max: 324.0) [2024-06-15 12:08:50,767][1648981] Avg episode reward: [(0, '134.090')] [2024-06-15 12:08:51,300][1651669] Updated weights for policy 0, policy_version 49920 (0.0023) [2024-06-15 12:08:52,535][1651669] Updated weights for policy 0, policy_version 49984 (0.0013) [2024-06-15 12:08:55,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 46967.3, 300 sec: 47208.2). Total num frames: 102400000. Throughput: 0: 12071.7. Samples: 25653760. Policy #0 lag: (min: 116.0, avg: 201.9, max: 324.0) [2024-06-15 12:08:55,767][1648981] Avg episode reward: [(0, '133.070')] [2024-06-15 12:08:56,343][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000050032_102465536.pth... [2024-06-15 12:08:56,392][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000044480_91095040.pth [2024-06-15 12:08:56,668][1651669] Updated weights for policy 0, policy_version 50045 (0.0012) [2024-06-15 12:08:59,822][1651669] Updated weights for policy 0, policy_version 50082 (0.0013) [2024-06-15 12:09:00,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48060.7, 300 sec: 47763.5). Total num frames: 102629376. Throughput: 0: 11980.8. Samples: 25726464. Policy #0 lag: (min: 116.0, avg: 201.9, max: 324.0) [2024-06-15 12:09:00,767][1648981] Avg episode reward: [(0, '132.680')] [2024-06-15 12:09:00,863][1651669] Updated weights for policy 0, policy_version 50113 (0.0013) [2024-06-15 12:09:01,932][1651669] Updated weights for policy 0, policy_version 50162 (0.0012) [2024-06-15 12:09:03,459][1651669] Updated weights for policy 0, policy_version 50233 (0.0163) [2024-06-15 12:09:05,767][1648981] Fps is (10 sec: 49151.8, 60 sec: 48064.7, 300 sec: 47098.2). Total num frames: 102891520. Throughput: 0: 12288.5. Samples: 25799168. Policy #0 lag: (min: 101.0, avg: 203.8, max: 357.0) [2024-06-15 12:09:05,767][1648981] Avg episode reward: [(0, '133.620')] [2024-06-15 12:09:07,826][1651669] Updated weights for policy 0, policy_version 50301 (0.0015) [2024-06-15 12:09:10,514][1651274] Signal inference workers to stop experience collection... (2650 times) [2024-06-15 12:09:10,581][1651669] InferenceWorker_p0-w0: stopping experience collection (2650 times) [2024-06-15 12:09:10,768][1648981] Fps is (10 sec: 42592.0, 60 sec: 46420.2, 300 sec: 47652.8). Total num frames: 103055360. Throughput: 0: 12048.7. Samples: 25830400. Policy #0 lag: (min: 101.0, avg: 203.8, max: 357.0) [2024-06-15 12:09:10,768][1648981] Avg episode reward: [(0, '131.040')] [2024-06-15 12:09:10,773][1651274] Signal inference workers to resume experience collection... (2650 times) [2024-06-15 12:09:10,774][1651669] InferenceWorker_p0-w0: resuming experience collection (2650 times) [2024-06-15 12:09:11,213][1651669] Updated weights for policy 0, policy_version 50352 (0.0017) [2024-06-15 12:09:11,570][1651669] Updated weights for policy 0, policy_version 50366 (0.0011) [2024-06-15 12:09:13,593][1651669] Updated weights for policy 0, policy_version 50448 (0.0090) [2024-06-15 12:09:14,591][1651669] Updated weights for policy 0, policy_version 50492 (0.0018) [2024-06-15 12:09:15,770][1648981] Fps is (10 sec: 52410.7, 60 sec: 49695.0, 300 sec: 47318.6). Total num frames: 103415808. Throughput: 0: 12003.0. Samples: 25901056. Policy #0 lag: (min: 101.0, avg: 203.8, max: 357.0) [2024-06-15 12:09:15,771][1648981] Avg episode reward: [(0, '131.140')] [2024-06-15 12:09:19,051][1651669] Updated weights for policy 0, policy_version 50544 (0.0012) [2024-06-15 12:09:20,766][1648981] Fps is (10 sec: 49159.5, 60 sec: 45877.0, 300 sec: 47545.2). Total num frames: 103546880. Throughput: 0: 12049.1. Samples: 25976832. Policy #0 lag: (min: 101.0, avg: 203.8, max: 357.0) [2024-06-15 12:09:20,767][1648981] Avg episode reward: [(0, '127.950')] [2024-06-15 12:09:22,048][1651669] Updated weights for policy 0, policy_version 50598 (0.0023) [2024-06-15 12:09:22,597][1651669] Updated weights for policy 0, policy_version 50620 (0.0011) [2024-06-15 12:09:23,896][1651669] Updated weights for policy 0, policy_version 50672 (0.0026) [2024-06-15 12:09:25,150][1651669] Updated weights for policy 0, policy_version 50736 (0.0013) [2024-06-15 12:09:25,766][1648981] Fps is (10 sec: 52448.8, 60 sec: 50244.3, 300 sec: 47657.6). Total num frames: 103940096. Throughput: 0: 12174.2. Samples: 26018304. Policy #0 lag: (min: 101.0, avg: 203.8, max: 357.0) [2024-06-15 12:09:25,767][1648981] Avg episode reward: [(0, '132.800')] [2024-06-15 12:09:28,880][1651669] Updated weights for policy 0, policy_version 50787 (0.0113) [2024-06-15 12:09:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48061.0, 300 sec: 47541.4). Total num frames: 104071168. Throughput: 0: 11992.2. Samples: 26088960. Policy #0 lag: (min: 101.0, avg: 203.8, max: 357.0) [2024-06-15 12:09:30,767][1648981] Avg episode reward: [(0, '135.690')] [2024-06-15 12:09:32,795][1651669] Updated weights for policy 0, policy_version 50848 (0.0014) [2024-06-15 12:09:34,123][1651669] Updated weights for policy 0, policy_version 50903 (0.0037) [2024-06-15 12:09:35,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 48605.7, 300 sec: 47541.3). Total num frames: 104366080. Throughput: 0: 11810.1. Samples: 26154496. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:09:35,767][1648981] Avg episode reward: [(0, '134.900')] [2024-06-15 12:09:36,082][1651669] Updated weights for policy 0, policy_version 50977 (0.0013) [2024-06-15 12:09:38,824][1651669] Updated weights for policy 0, policy_version 51024 (0.0012) [2024-06-15 12:09:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 104595456. Throughput: 0: 12185.7. Samples: 26202112. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:09:40,767][1648981] Avg episode reward: [(0, '139.070')] [2024-06-15 12:09:40,770][1651274] Saving new best policy, reward=139.070! [2024-06-15 12:09:44,594][1651669] Updated weights for policy 0, policy_version 51120 (0.0017) [2024-06-15 12:09:45,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 104792064. Throughput: 0: 12128.7. Samples: 26272256. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:09:45,767][1648981] Avg episode reward: [(0, '142.800')] [2024-06-15 12:09:46,035][1651274] Saving new best policy, reward=142.800! [2024-06-15 12:09:46,629][1651669] Updated weights for policy 0, policy_version 51203 (0.0012) [2024-06-15 12:09:47,303][1651274] Signal inference workers to stop experience collection... (2700 times) [2024-06-15 12:09:47,334][1651669] InferenceWorker_p0-w0: stopping experience collection (2700 times) [2024-06-15 12:09:47,465][1651274] Signal inference workers to resume experience collection... (2700 times) [2024-06-15 12:09:47,466][1651669] InferenceWorker_p0-w0: resuming experience collection (2700 times) [2024-06-15 12:09:47,664][1651669] Updated weights for policy 0, policy_version 51264 (0.0082) [2024-06-15 12:09:50,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 48053.5, 300 sec: 47318.0). Total num frames: 105054208. Throughput: 0: 12160.8. Samples: 26346496. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:09:50,775][1648981] Avg episode reward: [(0, '141.230')] [2024-06-15 12:09:51,136][1651669] Updated weights for policy 0, policy_version 51325 (0.0013) [2024-06-15 12:09:54,639][1651669] Updated weights for policy 0, policy_version 51376 (0.0013) [2024-06-15 12:09:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48606.1, 300 sec: 47652.6). Total num frames: 105316352. Throughput: 0: 12322.5. Samples: 26384896. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:09:55,767][1648981] Avg episode reward: [(0, '140.590')] [2024-06-15 12:09:55,917][1651669] Updated weights for policy 0, policy_version 51427 (0.0011) [2024-06-15 12:09:57,239][1651669] Updated weights for policy 0, policy_version 51490 (0.0033) [2024-06-15 12:10:00,766][1648981] Fps is (10 sec: 45911.0, 60 sec: 48059.7, 300 sec: 47099.9). Total num frames: 105512960. Throughput: 0: 12254.9. Samples: 26452480. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:10:00,767][1648981] Avg episode reward: [(0, '141.160')] [2024-06-15 12:10:01,684][1651669] Updated weights for policy 0, policy_version 51554 (0.0030) [2024-06-15 12:10:04,743][1651669] Updated weights for policy 0, policy_version 51616 (0.0082) [2024-06-15 12:10:05,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48060.0, 300 sec: 47652.5). Total num frames: 105775104. Throughput: 0: 12231.1. Samples: 26527232. Policy #0 lag: (min: 93.0, avg: 186.2, max: 325.0) [2024-06-15 12:10:05,767][1648981] Avg episode reward: [(0, '141.850')] [2024-06-15 12:10:06,463][1651669] Updated weights for policy 0, policy_version 51681 (0.0028) [2024-06-15 12:10:08,090][1651669] Updated weights for policy 0, policy_version 51772 (0.0109) [2024-06-15 12:10:10,767][1648981] Fps is (10 sec: 52423.3, 60 sec: 49698.5, 300 sec: 47430.1). Total num frames: 106037248. Throughput: 0: 11935.0. Samples: 26555392. Policy #0 lag: (min: 79.0, avg: 200.6, max: 335.0) [2024-06-15 12:10:10,768][1648981] Avg episode reward: [(0, '139.410')] [2024-06-15 12:10:13,193][1651669] Updated weights for policy 0, policy_version 51827 (0.0014) [2024-06-15 12:10:15,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46424.3, 300 sec: 47652.5). Total num frames: 106201088. Throughput: 0: 12174.2. Samples: 26636800. Policy #0 lag: (min: 79.0, avg: 200.6, max: 335.0) [2024-06-15 12:10:15,767][1648981] Avg episode reward: [(0, '138.240')] [2024-06-15 12:10:16,110][1651669] Updated weights for policy 0, policy_version 51872 (0.0103) [2024-06-15 12:10:17,983][1651669] Updated weights for policy 0, policy_version 51937 (0.0046) [2024-06-15 12:10:18,953][1651669] Updated weights for policy 0, policy_version 51987 (0.0013) [2024-06-15 12:10:20,766][1648981] Fps is (10 sec: 52433.4, 60 sec: 50244.1, 300 sec: 47541.3). Total num frames: 106561536. Throughput: 0: 12049.1. Samples: 26696704. Policy #0 lag: (min: 79.0, avg: 200.6, max: 335.0) [2024-06-15 12:10:20,767][1648981] Avg episode reward: [(0, '140.880')] [2024-06-15 12:10:23,359][1651669] Updated weights for policy 0, policy_version 52048 (0.0015) [2024-06-15 12:10:24,380][1651669] Updated weights for policy 0, policy_version 52093 (0.0021) [2024-06-15 12:10:25,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 45875.0, 300 sec: 47541.3). Total num frames: 106692608. Throughput: 0: 11912.5. Samples: 26738176. Policy #0 lag: (min: 79.0, avg: 200.6, max: 335.0) [2024-06-15 12:10:25,767][1648981] Avg episode reward: [(0, '143.630')] [2024-06-15 12:10:25,772][1651274] Saving new best policy, reward=143.630! [2024-06-15 12:10:27,258][1651669] Updated weights for policy 0, policy_version 52152 (0.0015) [2024-06-15 12:10:27,444][1651274] Signal inference workers to stop experience collection... (2750 times) [2024-06-15 12:10:27,527][1651669] InferenceWorker_p0-w0: stopping experience collection (2750 times) [2024-06-15 12:10:27,680][1651274] Signal inference workers to resume experience collection... (2750 times) [2024-06-15 12:10:27,682][1651669] InferenceWorker_p0-w0: resuming experience collection (2750 times) [2024-06-15 12:10:28,697][1651669] Updated weights for policy 0, policy_version 52224 (0.0083) [2024-06-15 12:10:30,515][1651669] Updated weights for policy 0, policy_version 52282 (0.0018) [2024-06-15 12:10:30,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.2, 300 sec: 47763.5). Total num frames: 107085824. Throughput: 0: 11855.6. Samples: 26805760. Policy #0 lag: (min: 79.0, avg: 200.6, max: 335.0) [2024-06-15 12:10:30,767][1648981] Avg episode reward: [(0, '140.760')] [2024-06-15 12:10:34,852][1651669] Updated weights for policy 0, policy_version 52327 (0.0012) [2024-06-15 12:10:35,775][1648981] Fps is (10 sec: 52386.5, 60 sec: 47507.1, 300 sec: 47545.2). Total num frames: 107216896. Throughput: 0: 11935.2. Samples: 26883584. Policy #0 lag: (min: 79.0, avg: 200.6, max: 335.0) [2024-06-15 12:10:35,775][1648981] Avg episode reward: [(0, '143.620')] [2024-06-15 12:10:36,770][1651669] Updated weights for policy 0, policy_version 52354 (0.0012) [2024-06-15 12:10:39,271][1651669] Updated weights for policy 0, policy_version 52464 (0.0069) [2024-06-15 12:10:40,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 107511808. Throughput: 0: 11855.6. Samples: 26918400. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:10:40,767][1648981] Avg episode reward: [(0, '143.350')] [2024-06-15 12:10:41,628][1651669] Updated weights for policy 0, policy_version 52528 (0.0179) [2024-06-15 12:10:45,597][1651669] Updated weights for policy 0, policy_version 52576 (0.0015) [2024-06-15 12:10:45,766][1648981] Fps is (10 sec: 45913.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 107675648. Throughput: 0: 11958.0. Samples: 26990592. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:10:45,767][1648981] Avg episode reward: [(0, '145.300')] [2024-06-15 12:10:46,351][1651669] Updated weights for policy 0, policy_version 52605 (0.0020) [2024-06-15 12:10:46,398][1651274] Saving new best policy, reward=145.300! [2024-06-15 12:10:49,074][1651669] Updated weights for policy 0, policy_version 52663 (0.0131) [2024-06-15 12:10:50,305][1651669] Updated weights for policy 0, policy_version 52708 (0.0013) [2024-06-15 12:10:50,772][1648981] Fps is (10 sec: 49126.1, 60 sec: 49154.0, 300 sec: 47987.9). Total num frames: 108003328. Throughput: 0: 11751.9. Samples: 27056128. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:10:50,772][1648981] Avg episode reward: [(0, '140.240')] [2024-06-15 12:10:52,211][1651669] Updated weights for policy 0, policy_version 52753 (0.0014) [2024-06-15 12:10:55,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 108167168. Throughput: 0: 11901.5. Samples: 27090944. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:10:55,767][1648981] Avg episode reward: [(0, '143.270')] [2024-06-15 12:10:55,937][1651669] Updated weights for policy 0, policy_version 52818 (0.0014) [2024-06-15 12:10:56,147][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000052832_108199936.pth... [2024-06-15 12:10:56,334][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000047232_96731136.pth [2024-06-15 12:10:56,340][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000052832_108199936.pth [2024-06-15 12:10:59,082][1651669] Updated weights for policy 0, policy_version 52867 (0.0014) [2024-06-15 12:11:00,766][1648981] Fps is (10 sec: 39342.2, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 108396544. Throughput: 0: 11753.2. Samples: 27165696. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:11:00,767][1648981] Avg episode reward: [(0, '143.900')] [2024-06-15 12:11:00,941][1651669] Updated weights for policy 0, policy_version 52944 (0.0013) [2024-06-15 12:11:02,022][1651669] Updated weights for policy 0, policy_version 52992 (0.0010) [2024-06-15 12:11:05,766][1648981] Fps is (10 sec: 49151.3, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 108658688. Throughput: 0: 11946.7. Samples: 27234304. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:11:05,767][1648981] Avg episode reward: [(0, '143.410')] [2024-06-15 12:11:07,157][1651669] Updated weights for policy 0, policy_version 53058 (0.0012) [2024-06-15 12:11:08,262][1651669] Updated weights for policy 0, policy_version 53117 (0.0014) [2024-06-15 12:11:10,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46968.3, 300 sec: 47763.5). Total num frames: 108855296. Throughput: 0: 11844.3. Samples: 27271168. Policy #0 lag: (min: 15.0, avg: 163.9, max: 271.0) [2024-06-15 12:11:10,767][1648981] Avg episode reward: [(0, '144.490')] [2024-06-15 12:11:10,932][1651669] Updated weights for policy 0, policy_version 53168 (0.0019) [2024-06-15 12:11:11,013][1651274] Signal inference workers to stop experience collection... (2800 times) [2024-06-15 12:11:11,070][1651669] InferenceWorker_p0-w0: stopping experience collection (2800 times) [2024-06-15 12:11:11,264][1651274] Signal inference workers to resume experience collection... (2800 times) [2024-06-15 12:11:11,265][1651669] InferenceWorker_p0-w0: resuming experience collection (2800 times) [2024-06-15 12:11:12,426][1651669] Updated weights for policy 0, policy_version 53232 (0.0104) [2024-06-15 12:11:15,126][1651669] Updated weights for policy 0, policy_version 53267 (0.0012) [2024-06-15 12:11:15,774][1648981] Fps is (10 sec: 49114.3, 60 sec: 49145.6, 300 sec: 47762.3). Total num frames: 109150208. Throughput: 0: 12058.4. Samples: 27348480. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:15,775][1648981] Avg episode reward: [(0, '145.080')] [2024-06-15 12:11:18,150][1651669] Updated weights for policy 0, policy_version 53331 (0.0012) [2024-06-15 12:11:20,232][1651669] Updated weights for policy 0, policy_version 53380 (0.0014) [2024-06-15 12:11:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46967.6, 300 sec: 47764.6). Total num frames: 109379584. Throughput: 0: 11937.5. Samples: 27420672. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:20,767][1648981] Avg episode reward: [(0, '142.520')] [2024-06-15 12:11:22,404][1651669] Updated weights for policy 0, policy_version 53462 (0.0136) [2024-06-15 12:11:25,766][1648981] Fps is (10 sec: 42631.3, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 109576192. Throughput: 0: 11650.8. Samples: 27442688. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:25,767][1648981] Avg episode reward: [(0, '143.220')] [2024-06-15 12:11:26,559][1651669] Updated weights for policy 0, policy_version 53520 (0.0013) [2024-06-15 12:11:29,015][1651669] Updated weights for policy 0, policy_version 53584 (0.0021) [2024-06-15 12:11:29,876][1651669] Updated weights for policy 0, policy_version 53632 (0.0014) [2024-06-15 12:11:30,770][1648981] Fps is (10 sec: 45857.5, 60 sec: 45872.3, 300 sec: 47985.1). Total num frames: 109838336. Throughput: 0: 11797.7. Samples: 27521536. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:30,771][1648981] Avg episode reward: [(0, '139.430')] [2024-06-15 12:11:32,564][1651669] Updated weights for policy 0, policy_version 53681 (0.0016) [2024-06-15 12:11:34,166][1651669] Updated weights for policy 0, policy_version 53744 (0.0014) [2024-06-15 12:11:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48066.4, 300 sec: 47543.5). Total num frames: 110100480. Throughput: 0: 11879.8. Samples: 27590656. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:35,767][1648981] Avg episode reward: [(0, '138.650')] [2024-06-15 12:11:38,752][1651669] Updated weights for policy 0, policy_version 53817 (0.0017) [2024-06-15 12:11:40,766][1648981] Fps is (10 sec: 42615.1, 60 sec: 45875.3, 300 sec: 47652.5). Total num frames: 110264320. Throughput: 0: 11992.2. Samples: 27630592. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:40,767][1648981] Avg episode reward: [(0, '137.280')] [2024-06-15 12:11:41,345][1651669] Updated weights for policy 0, policy_version 53883 (0.0018) [2024-06-15 12:11:43,031][1651669] Updated weights for policy 0, policy_version 53922 (0.0013) [2024-06-15 12:11:44,347][1651669] Updated weights for policy 0, policy_version 53984 (0.0019) [2024-06-15 12:11:45,027][1651669] Updated weights for policy 0, policy_version 54016 (0.0013) [2024-06-15 12:11:45,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 49151.8, 300 sec: 47874.6). Total num frames: 110624768. Throughput: 0: 11867.0. Samples: 27699712. Policy #0 lag: (min: 15.0, avg: 131.8, max: 271.0) [2024-06-15 12:11:45,768][1648981] Avg episode reward: [(0, '136.710')] [2024-06-15 12:11:49,549][1651669] Updated weights for policy 0, policy_version 54080 (0.0013) [2024-06-15 12:11:50,767][1648981] Fps is (10 sec: 49150.2, 60 sec: 45879.0, 300 sec: 47541.4). Total num frames: 110755840. Throughput: 0: 12117.3. Samples: 27779584. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:11:50,767][1648981] Avg episode reward: [(0, '137.580')] [2024-06-15 12:11:52,027][1651669] Updated weights for policy 0, policy_version 54144 (0.0013) [2024-06-15 12:11:53,183][1651274] Signal inference workers to stop experience collection... (2850 times) [2024-06-15 12:11:53,235][1651669] InferenceWorker_p0-w0: stopping experience collection (2850 times) [2024-06-15 12:11:53,410][1651274] Signal inference workers to resume experience collection... (2850 times) [2024-06-15 12:11:53,411][1651669] InferenceWorker_p0-w0: resuming experience collection (2850 times) [2024-06-15 12:11:53,910][1651669] Updated weights for policy 0, policy_version 54194 (0.0024) [2024-06-15 12:11:55,429][1651669] Updated weights for policy 0, policy_version 54268 (0.0191) [2024-06-15 12:11:55,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 49698.0, 300 sec: 48431.3). Total num frames: 111149056. Throughput: 0: 12026.3. Samples: 27812352. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:11:55,767][1648981] Avg episode reward: [(0, '138.260')] [2024-06-15 12:12:00,110][1651669] Updated weights for policy 0, policy_version 54330 (0.0013) [2024-06-15 12:12:00,766][1648981] Fps is (10 sec: 52430.5, 60 sec: 48059.8, 300 sec: 47987.0). Total num frames: 111280128. Throughput: 0: 11994.3. Samples: 27888128. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:12:00,767][1648981] Avg episode reward: [(0, '140.920')] [2024-06-15 12:12:02,608][1651669] Updated weights for policy 0, policy_version 54384 (0.0013) [2024-06-15 12:12:04,327][1651669] Updated weights for policy 0, policy_version 54436 (0.0013) [2024-06-15 12:12:05,775][1648981] Fps is (10 sec: 45836.2, 60 sec: 49145.1, 300 sec: 48206.4). Total num frames: 111607808. Throughput: 0: 11864.8. Samples: 27954688. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:12:05,775][1648981] Avg episode reward: [(0, '140.040')] [2024-06-15 12:12:05,828][1651669] Updated weights for policy 0, policy_version 54503 (0.0012) [2024-06-15 12:12:10,484][1651669] Updated weights for policy 0, policy_version 54548 (0.0014) [2024-06-15 12:12:10,767][1648981] Fps is (10 sec: 45872.2, 60 sec: 48059.2, 300 sec: 47763.4). Total num frames: 111738880. Throughput: 0: 12162.7. Samples: 27990016. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:12:10,768][1648981] Avg episode reward: [(0, '141.630')] [2024-06-15 12:12:11,320][1651669] Updated weights for policy 0, policy_version 54586 (0.0013) [2024-06-15 12:12:13,762][1651669] Updated weights for policy 0, policy_version 54640 (0.0012) [2024-06-15 12:12:15,707][1651669] Updated weights for policy 0, policy_version 54688 (0.0053) [2024-06-15 12:12:15,766][1648981] Fps is (10 sec: 39355.2, 60 sec: 47519.7, 300 sec: 47874.6). Total num frames: 112001024. Throughput: 0: 12129.7. Samples: 28067328. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:12:15,767][1648981] Avg episode reward: [(0, '143.890')] [2024-06-15 12:12:17,430][1651669] Updated weights for policy 0, policy_version 54761 (0.0017) [2024-06-15 12:12:20,638][1651669] Updated weights for policy 0, policy_version 54800 (0.0022) [2024-06-15 12:12:20,766][1648981] Fps is (10 sec: 49155.4, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 112230400. Throughput: 0: 12253.9. Samples: 28142080. Policy #0 lag: (min: 31.0, avg: 132.1, max: 287.0) [2024-06-15 12:12:20,767][1648981] Avg episode reward: [(0, '145.420')] [2024-06-15 12:12:21,308][1651274] Saving new best policy, reward=145.420! [2024-06-15 12:12:23,682][1651669] Updated weights for policy 0, policy_version 54852 (0.0013) [2024-06-15 12:12:25,089][1651669] Updated weights for policy 0, policy_version 54907 (0.0013) [2024-06-15 12:12:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 47878.5). Total num frames: 112459776. Throughput: 0: 12151.4. Samples: 28177408. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:25,767][1648981] Avg episode reward: [(0, '148.320')] [2024-06-15 12:12:25,771][1651274] Saving new best policy, reward=148.320! [2024-06-15 12:12:27,726][1651669] Updated weights for policy 0, policy_version 54976 (0.0012) [2024-06-15 12:12:30,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48062.8, 300 sec: 47652.5). Total num frames: 112721920. Throughput: 0: 11787.4. Samples: 28230144. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:30,767][1648981] Avg episode reward: [(0, '149.010')] [2024-06-15 12:12:30,768][1651274] Saving new best policy, reward=149.010! [2024-06-15 12:12:32,786][1651669] Updated weights for policy 0, policy_version 55042 (0.0015) [2024-06-15 12:12:33,709][1651669] Updated weights for policy 0, policy_version 55101 (0.0014) [2024-06-15 12:12:35,786][1648981] Fps is (10 sec: 42514.3, 60 sec: 46406.0, 300 sec: 47760.3). Total num frames: 112885760. Throughput: 0: 11805.0. Samples: 28311040. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:35,787][1648981] Avg episode reward: [(0, '148.020')] [2024-06-15 12:12:36,228][1651274] Signal inference workers to stop experience collection... (2900 times) [2024-06-15 12:12:36,280][1651669] InferenceWorker_p0-w0: stopping experience collection (2900 times) [2024-06-15 12:12:36,478][1651274] Signal inference workers to resume experience collection... (2900 times) [2024-06-15 12:12:36,479][1651669] InferenceWorker_p0-w0: resuming experience collection (2900 times) [2024-06-15 12:12:36,482][1651669] Updated weights for policy 0, policy_version 55152 (0.0014) [2024-06-15 12:12:38,421][1651669] Updated weights for policy 0, policy_version 55202 (0.0012) [2024-06-15 12:12:40,245][1651669] Updated weights for policy 0, policy_version 55270 (0.0013) [2024-06-15 12:12:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 113246208. Throughput: 0: 11719.1. Samples: 28339712. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:40,767][1648981] Avg episode reward: [(0, '146.130')] [2024-06-15 12:12:44,383][1651669] Updated weights for policy 0, policy_version 55320 (0.0016) [2024-06-15 12:12:45,766][1648981] Fps is (10 sec: 49249.2, 60 sec: 45875.4, 300 sec: 47874.6). Total num frames: 113377280. Throughput: 0: 11696.3. Samples: 28414464. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:45,767][1648981] Avg episode reward: [(0, '142.850')] [2024-06-15 12:12:47,427][1651669] Updated weights for policy 0, policy_version 55392 (0.0014) [2024-06-15 12:12:48,272][1651669] Updated weights for policy 0, policy_version 55423 (0.0015) [2024-06-15 12:12:50,767][1648981] Fps is (10 sec: 36044.1, 60 sec: 47513.7, 300 sec: 47541.3). Total num frames: 113606656. Throughput: 0: 11493.7. Samples: 28471808. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:50,767][1648981] Avg episode reward: [(0, '141.650')] [2024-06-15 12:12:50,805][1651669] Updated weights for policy 0, policy_version 55488 (0.0013) [2024-06-15 12:12:51,862][1651669] Updated weights for policy 0, policy_version 55537 (0.0013) [2024-06-15 12:12:55,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 47541.6). Total num frames: 113770496. Throughput: 0: 11594.1. Samples: 28511744. Policy #0 lag: (min: 6.0, avg: 106.3, max: 262.0) [2024-06-15 12:12:55,767][1648981] Avg episode reward: [(0, '140.740')] [2024-06-15 12:12:55,768][1651669] Updated weights for policy 0, policy_version 55556 (0.0013) [2024-06-15 12:12:56,194][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000055584_113836032.pth... [2024-06-15 12:12:56,358][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000050032_102465536.pth [2024-06-15 12:12:58,029][1651669] Updated weights for policy 0, policy_version 55617 (0.0014) [2024-06-15 12:12:59,380][1651669] Updated weights for policy 0, policy_version 55676 (0.0138) [2024-06-15 12:13:00,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 46421.3, 300 sec: 47653.5). Total num frames: 114065408. Throughput: 0: 11468.8. Samples: 28583424. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:00,767][1648981] Avg episode reward: [(0, '142.910')] [2024-06-15 12:13:01,459][1651669] Updated weights for policy 0, policy_version 55734 (0.0012) [2024-06-15 12:13:02,851][1651669] Updated weights for policy 0, policy_version 55800 (0.0013) [2024-06-15 12:13:05,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 44789.2, 300 sec: 47541.4). Total num frames: 114294784. Throughput: 0: 11457.4. Samples: 28657664. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:05,767][1648981] Avg episode reward: [(0, '144.210')] [2024-06-15 12:13:06,949][1651669] Updated weights for policy 0, policy_version 55840 (0.0018) [2024-06-15 12:13:08,905][1651669] Updated weights for policy 0, policy_version 55888 (0.0014) [2024-06-15 12:13:10,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 46967.9, 300 sec: 47874.6). Total num frames: 114556928. Throughput: 0: 11605.3. Samples: 28699648. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:10,767][1648981] Avg episode reward: [(0, '146.880')] [2024-06-15 12:13:11,253][1651669] Updated weights for policy 0, policy_version 55952 (0.0014) [2024-06-15 12:13:12,754][1651669] Updated weights for policy 0, policy_version 56016 (0.0011) [2024-06-15 12:13:13,726][1651669] Updated weights for policy 0, policy_version 56057 (0.0012) [2024-06-15 12:13:15,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 46967.5, 300 sec: 47541.7). Total num frames: 114819072. Throughput: 0: 11980.8. Samples: 28769280. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:15,767][1648981] Avg episode reward: [(0, '146.730')] [2024-06-15 12:13:17,660][1651669] Updated weights for policy 0, policy_version 56097 (0.0036) [2024-06-15 12:13:18,295][1651669] Updated weights for policy 0, policy_version 56128 (0.0031) [2024-06-15 12:13:19,111][1651274] Signal inference workers to stop experience collection... (2950 times) [2024-06-15 12:13:19,134][1651669] InferenceWorker_p0-w0: stopping experience collection (2950 times) [2024-06-15 12:13:19,314][1651274] Signal inference workers to resume experience collection... (2950 times) [2024-06-15 12:13:19,315][1651669] InferenceWorker_p0-w0: resuming experience collection (2950 times) [2024-06-15 12:13:20,059][1651669] Updated weights for policy 0, policy_version 56178 (0.0020) [2024-06-15 12:13:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 115081216. Throughput: 0: 11803.9. Samples: 28841984. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:20,767][1648981] Avg episode reward: [(0, '147.990')] [2024-06-15 12:13:22,925][1651669] Updated weights for policy 0, policy_version 56230 (0.0012) [2024-06-15 12:13:24,620][1651669] Updated weights for policy 0, policy_version 56304 (0.0015) [2024-06-15 12:13:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47985.9). Total num frames: 115343360. Throughput: 0: 11969.4. Samples: 28878336. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:25,767][1648981] Avg episode reward: [(0, '144.700')] [2024-06-15 12:13:28,040][1651669] Updated weights for policy 0, policy_version 56352 (0.0014) [2024-06-15 12:13:30,740][1651669] Updated weights for policy 0, policy_version 56416 (0.0012) [2024-06-15 12:13:30,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 47763.5). Total num frames: 115539968. Throughput: 0: 11992.1. Samples: 28954112. Policy #0 lag: (min: 15.0, avg: 113.7, max: 271.0) [2024-06-15 12:13:30,768][1648981] Avg episode reward: [(0, '141.990')] [2024-06-15 12:13:32,575][1651669] Updated weights for policy 0, policy_version 56464 (0.0013) [2024-06-15 12:13:33,836][1651669] Updated weights for policy 0, policy_version 56528 (0.0013) [2024-06-15 12:13:34,997][1651669] Updated weights for policy 0, policy_version 56576 (0.0013) [2024-06-15 12:13:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49714.5, 300 sec: 47985.7). Total num frames: 115867648. Throughput: 0: 12276.7. Samples: 29024256. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:13:35,767][1648981] Avg episode reward: [(0, '138.960')] [2024-06-15 12:13:39,499][1651669] Updated weights for policy 0, policy_version 56634 (0.0019) [2024-06-15 12:13:40,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 46421.3, 300 sec: 47763.5). Total num frames: 116031488. Throughput: 0: 12322.1. Samples: 29066240. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:13:40,767][1648981] Avg episode reward: [(0, '134.100')] [2024-06-15 12:13:41,229][1651669] Updated weights for policy 0, policy_version 56688 (0.0011) [2024-06-15 12:13:43,505][1651669] Updated weights for policy 0, policy_version 56736 (0.0013) [2024-06-15 12:13:44,699][1651669] Updated weights for policy 0, policy_version 56788 (0.0014) [2024-06-15 12:13:45,593][1651669] Updated weights for policy 0, policy_version 56830 (0.0016) [2024-06-15 12:13:45,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 48207.9). Total num frames: 116391936. Throughput: 0: 12231.1. Samples: 29133824. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:13:45,767][1648981] Avg episode reward: [(0, '133.850')] [2024-06-15 12:13:50,305][1651669] Updated weights for policy 0, policy_version 56883 (0.0105) [2024-06-15 12:13:50,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48606.0, 300 sec: 47874.7). Total num frames: 116523008. Throughput: 0: 12276.7. Samples: 29210112. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:13:50,767][1648981] Avg episode reward: [(0, '139.420')] [2024-06-15 12:13:52,152][1651669] Updated weights for policy 0, policy_version 56929 (0.0016) [2024-06-15 12:13:54,085][1651669] Updated weights for policy 0, policy_version 56976 (0.0016) [2024-06-15 12:13:55,435][1651669] Updated weights for policy 0, policy_version 57029 (0.0012) [2024-06-15 12:13:55,768][1648981] Fps is (10 sec: 42591.0, 60 sec: 50788.9, 300 sec: 48096.5). Total num frames: 116817920. Throughput: 0: 12173.8. Samples: 29247488. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:13:55,774][1648981] Avg episode reward: [(0, '145.020')] [2024-06-15 12:13:56,547][1651669] Updated weights for policy 0, policy_version 57088 (0.0075) [2024-06-15 12:14:00,744][1651274] Signal inference workers to stop experience collection... (3000 times) [2024-06-15 12:14:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 116948992. Throughput: 0: 12265.2. Samples: 29321216. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:14:00,767][1648981] Avg episode reward: [(0, '141.540')] [2024-06-15 12:14:00,796][1651669] InferenceWorker_p0-w0: stopping experience collection (3000 times) [2024-06-15 12:14:01,017][1651274] Signal inference workers to resume experience collection... (3000 times) [2024-06-15 12:14:01,018][1651669] InferenceWorker_p0-w0: resuming experience collection (3000 times) [2024-06-15 12:14:01,748][1651669] Updated weights for policy 0, policy_version 57150 (0.0112) [2024-06-15 12:14:03,623][1651669] Updated weights for policy 0, policy_version 57207 (0.0014) [2024-06-15 12:14:05,766][1648981] Fps is (10 sec: 42605.3, 60 sec: 49152.0, 300 sec: 48097.0). Total num frames: 117243904. Throughput: 0: 12071.8. Samples: 29385216. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:14:05,767][1648981] Avg episode reward: [(0, '144.690')] [2024-06-15 12:14:06,269][1651669] Updated weights for policy 0, policy_version 57265 (0.0014) [2024-06-15 12:14:07,974][1651669] Updated weights for policy 0, policy_version 57344 (0.0013) [2024-06-15 12:14:10,779][1648981] Fps is (10 sec: 49092.9, 60 sec: 48050.2, 300 sec: 47540.0). Total num frames: 117440512. Throughput: 0: 11886.6. Samples: 29413376. Policy #0 lag: (min: 56.0, avg: 165.2, max: 312.0) [2024-06-15 12:14:10,779][1648981] Avg episode reward: [(0, '151.910')] [2024-06-15 12:14:10,783][1651274] Saving new best policy, reward=151.910! [2024-06-15 12:14:13,322][1651669] Updated weights for policy 0, policy_version 57405 (0.0123) [2024-06-15 12:14:14,821][1651669] Updated weights for policy 0, policy_version 57443 (0.0011) [2024-06-15 12:14:15,782][1648981] Fps is (10 sec: 45803.3, 60 sec: 48047.1, 300 sec: 47983.1). Total num frames: 117702656. Throughput: 0: 11806.0. Samples: 29485568. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:15,783][1648981] Avg episode reward: [(0, '150.780')] [2024-06-15 12:14:17,667][1651669] Updated weights for policy 0, policy_version 57490 (0.0013) [2024-06-15 12:14:18,933][1651669] Updated weights for policy 0, policy_version 57561 (0.0013) [2024-06-15 12:14:19,823][1651669] Updated weights for policy 0, policy_version 57600 (0.0013) [2024-06-15 12:14:20,786][1648981] Fps is (10 sec: 52387.6, 60 sec: 48043.8, 300 sec: 47538.2). Total num frames: 117964800. Throughput: 0: 11748.1. Samples: 29553152. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:20,787][1648981] Avg episode reward: [(0, '146.540')] [2024-06-15 12:14:24,362][1651669] Updated weights for policy 0, policy_version 57664 (0.0085) [2024-06-15 12:14:25,769][1648981] Fps is (10 sec: 49218.2, 60 sec: 47511.8, 300 sec: 47874.2). Total num frames: 118194176. Throughput: 0: 11741.3. Samples: 29594624. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:25,769][1648981] Avg episode reward: [(0, '145.590')] [2024-06-15 12:14:25,853][1651669] Updated weights for policy 0, policy_version 57724 (0.0019) [2024-06-15 12:14:29,023][1651669] Updated weights for policy 0, policy_version 57766 (0.0015) [2024-06-15 12:14:30,463][1651669] Updated weights for policy 0, policy_version 57826 (0.0014) [2024-06-15 12:14:30,766][1648981] Fps is (10 sec: 49250.0, 60 sec: 48606.0, 300 sec: 47763.5). Total num frames: 118456320. Throughput: 0: 11832.9. Samples: 29666304. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:30,767][1648981] Avg episode reward: [(0, '146.420')] [2024-06-15 12:14:34,505][1651669] Updated weights for policy 0, policy_version 57888 (0.0015) [2024-06-15 12:14:35,767][1648981] Fps is (10 sec: 45885.4, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 118652928. Throughput: 0: 11673.6. Samples: 29735424. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:35,769][1648981] Avg episode reward: [(0, '146.530')] [2024-06-15 12:14:36,404][1651669] Updated weights for policy 0, policy_version 57968 (0.0012) [2024-06-15 12:14:40,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 46967.6, 300 sec: 47652.5). Total num frames: 118849536. Throughput: 0: 11605.8. Samples: 29769728. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:40,767][1648981] Avg episode reward: [(0, '143.420')] [2024-06-15 12:14:41,360][1651274] Signal inference workers to stop experience collection... (3050 times) [2024-06-15 12:14:41,448][1651669] Updated weights for policy 0, policy_version 58053 (0.0015) [2024-06-15 12:14:41,499][1651669] InferenceWorker_p0-w0: stopping experience collection (3050 times) [2024-06-15 12:14:41,602][1651274] Signal inference workers to resume experience collection... (3050 times) [2024-06-15 12:14:41,603][1651669] InferenceWorker_p0-w0: resuming experience collection (3050 times) [2024-06-15 12:14:42,428][1651669] Updated weights for policy 0, policy_version 58103 (0.0012) [2024-06-15 12:14:45,799][1648981] Fps is (10 sec: 39192.4, 60 sec: 44212.4, 300 sec: 47426.2). Total num frames: 119046144. Throughput: 0: 11517.2. Samples: 29839872. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 12:14:45,800][1648981] Avg episode reward: [(0, '145.760')] [2024-06-15 12:14:46,102][1651669] Updated weights for policy 0, policy_version 58160 (0.0027) [2024-06-15 12:14:47,474][1651669] Updated weights for policy 0, policy_version 58224 (0.0060) [2024-06-15 12:14:50,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 47430.3). Total num frames: 119308288. Throughput: 0: 11776.0. Samples: 29915136. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:14:50,767][1648981] Avg episode reward: [(0, '145.820')] [2024-06-15 12:14:50,882][1651669] Updated weights for policy 0, policy_version 58272 (0.0024) [2024-06-15 12:14:53,122][1651669] Updated weights for policy 0, policy_version 58361 (0.0106) [2024-06-15 12:14:55,766][1648981] Fps is (10 sec: 49314.8, 60 sec: 45330.3, 300 sec: 47541.4). Total num frames: 119537664. Throughput: 0: 11688.1. Samples: 29939200. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:14:55,767][1648981] Avg episode reward: [(0, '147.250')] [2024-06-15 12:14:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000058368_119537664.pth... [2024-06-15 12:14:55,850][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000052832_108199936.pth [2024-06-15 12:14:57,492][1651669] Updated weights for policy 0, policy_version 58416 (0.0014) [2024-06-15 12:14:58,619][1651669] Updated weights for policy 0, policy_version 58451 (0.0012) [2024-06-15 12:14:59,541][1651669] Updated weights for policy 0, policy_version 58494 (0.0037) [2024-06-15 12:15:00,778][1648981] Fps is (10 sec: 49093.7, 60 sec: 47504.3, 300 sec: 47539.5). Total num frames: 119799808. Throughput: 0: 11617.7. Samples: 30008320. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:15:00,779][1648981] Avg episode reward: [(0, '148.500')] [2024-06-15 12:15:02,569][1651669] Updated weights for policy 0, policy_version 58536 (0.0014) [2024-06-15 12:15:04,050][1651669] Updated weights for policy 0, policy_version 58608 (0.0013) [2024-06-15 12:15:05,773][1648981] Fps is (10 sec: 52395.7, 60 sec: 46962.6, 300 sec: 47540.5). Total num frames: 120061952. Throughput: 0: 11927.5. Samples: 30089728. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:15:05,774][1648981] Avg episode reward: [(0, '150.140')] [2024-06-15 12:15:07,485][1651669] Updated weights for policy 0, policy_version 58645 (0.0013) [2024-06-15 12:15:09,048][1651669] Updated weights for policy 0, policy_version 58690 (0.0015) [2024-06-15 12:15:10,265][1651669] Updated weights for policy 0, policy_version 58746 (0.0014) [2024-06-15 12:15:10,767][1648981] Fps is (10 sec: 52488.6, 60 sec: 48069.1, 300 sec: 47874.5). Total num frames: 120324096. Throughput: 0: 11810.6. Samples: 30126080. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:15:10,767][1648981] Avg episode reward: [(0, '146.790')] [2024-06-15 12:15:12,663][1651669] Updated weights for policy 0, policy_version 58808 (0.0012) [2024-06-15 12:15:14,244][1651669] Updated weights for policy 0, policy_version 58850 (0.0017) [2024-06-15 12:15:15,766][1648981] Fps is (10 sec: 52462.8, 60 sec: 48072.5, 300 sec: 47541.4). Total num frames: 120586240. Throughput: 0: 11719.1. Samples: 30193664. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:15:15,767][1648981] Avg episode reward: [(0, '149.480')] [2024-06-15 12:15:17,980][1651669] Updated weights for policy 0, policy_version 58900 (0.0014) [2024-06-15 12:15:20,052][1651669] Updated weights for policy 0, policy_version 58960 (0.0017) [2024-06-15 12:15:20,772][1648981] Fps is (10 sec: 45851.8, 60 sec: 46978.7, 300 sec: 47762.7). Total num frames: 120782848. Throughput: 0: 11933.8. Samples: 30272512. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:15:20,775][1648981] Avg episode reward: [(0, '148.600')] [2024-06-15 12:15:22,664][1651669] Updated weights for policy 0, policy_version 59024 (0.0025) [2024-06-15 12:15:23,214][1651274] Signal inference workers to stop experience collection... (3100 times) [2024-06-15 12:15:23,262][1651669] InferenceWorker_p0-w0: stopping experience collection (3100 times) [2024-06-15 12:15:23,434][1651274] Signal inference workers to resume experience collection... (3100 times) [2024-06-15 12:15:23,436][1651669] InferenceWorker_p0-w0: resuming experience collection (3100 times) [2024-06-15 12:15:23,675][1651669] Updated weights for policy 0, policy_version 59072 (0.0013) [2024-06-15 12:15:25,388][1651669] Updated weights for policy 0, policy_version 59134 (0.0014) [2024-06-15 12:15:25,766][1648981] Fps is (10 sec: 52427.9, 60 sec: 48607.7, 300 sec: 47541.4). Total num frames: 121110528. Throughput: 0: 11878.4. Samples: 30304256. Policy #0 lag: (min: 15.0, avg: 116.0, max: 271.0) [2024-06-15 12:15:25,767][1648981] Avg episode reward: [(0, '148.210')] [2024-06-15 12:15:30,534][1651669] Updated weights for policy 0, policy_version 59192 (0.0012) [2024-06-15 12:15:30,767][1648981] Fps is (10 sec: 45897.9, 60 sec: 46420.9, 300 sec: 47542.6). Total num frames: 121241600. Throughput: 0: 12057.8. Samples: 30382080. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:15:30,768][1648981] Avg episode reward: [(0, '149.890')] [2024-06-15 12:15:31,873][1651669] Updated weights for policy 0, policy_version 59259 (0.0015) [2024-06-15 12:15:34,412][1651669] Updated weights for policy 0, policy_version 59323 (0.0014) [2024-06-15 12:15:35,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 121569280. Throughput: 0: 11787.3. Samples: 30445568. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:15:35,767][1648981] Avg episode reward: [(0, '153.180')] [2024-06-15 12:15:36,009][1651274] Saving new best policy, reward=153.180! [2024-06-15 12:15:36,049][1651669] Updated weights for policy 0, policy_version 59376 (0.0015) [2024-06-15 12:15:40,764][1651669] Updated weights for policy 0, policy_version 59408 (0.0013) [2024-06-15 12:15:40,794][1648981] Fps is (10 sec: 42482.7, 60 sec: 46945.7, 300 sec: 47425.8). Total num frames: 121667584. Throughput: 0: 12132.6. Samples: 30485504. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:15:40,795][1648981] Avg episode reward: [(0, '151.810')] [2024-06-15 12:15:42,046][1651669] Updated weights for policy 0, policy_version 59458 (0.0054) [2024-06-15 12:15:44,577][1651669] Updated weights for policy 0, policy_version 59521 (0.0012) [2024-06-15 12:15:45,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 49179.1, 300 sec: 47431.1). Total num frames: 121995264. Throughput: 0: 12222.9. Samples: 30558208. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:15:45,767][1648981] Avg episode reward: [(0, '154.460')] [2024-06-15 12:15:46,110][1651274] Saving new best policy, reward=154.460! [2024-06-15 12:15:46,113][1651669] Updated weights for policy 0, policy_version 59584 (0.0089) [2024-06-15 12:15:50,766][1648981] Fps is (10 sec: 49288.9, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 122159104. Throughput: 0: 12016.6. Samples: 30630400. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:15:50,767][1648981] Avg episode reward: [(0, '149.750')] [2024-06-15 12:15:51,499][1651669] Updated weights for policy 0, policy_version 59651 (0.0013) [2024-06-15 12:15:53,109][1651669] Updated weights for policy 0, policy_version 59716 (0.0011) [2024-06-15 12:15:54,066][1651669] Updated weights for policy 0, policy_version 59770 (0.0113) [2024-06-15 12:15:55,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 48605.7, 300 sec: 47652.4). Total num frames: 122454016. Throughput: 0: 11935.3. Samples: 30663168. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:15:55,768][1648981] Avg episode reward: [(0, '150.540')] [2024-06-15 12:15:56,657][1651669] Updated weights for policy 0, policy_version 59836 (0.0013) [2024-06-15 12:15:58,209][1651669] Updated weights for policy 0, policy_version 59888 (0.0040) [2024-06-15 12:16:00,790][1648981] Fps is (10 sec: 52304.3, 60 sec: 48050.1, 300 sec: 47537.5). Total num frames: 122683392. Throughput: 0: 11883.5. Samples: 30728704. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:16:00,791][1648981] Avg episode reward: [(0, '146.140')] [2024-06-15 12:16:03,795][1651669] Updated weights for policy 0, policy_version 59952 (0.0039) [2024-06-15 12:16:04,993][1651669] Updated weights for policy 0, policy_version 60000 (0.0011) [2024-06-15 12:16:05,092][1651274] Signal inference workers to stop experience collection... (3150 times) [2024-06-15 12:16:05,138][1651669] InferenceWorker_p0-w0: stopping experience collection (3150 times) [2024-06-15 12:16:05,345][1651274] Signal inference workers to resume experience collection... (3150 times) [2024-06-15 12:16:05,348][1651669] InferenceWorker_p0-w0: resuming experience collection (3150 times) [2024-06-15 12:16:05,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 48064.9, 300 sec: 47763.5). Total num frames: 122945536. Throughput: 0: 11697.8. Samples: 30798848. Policy #0 lag: (min: 15.0, avg: 100.1, max: 271.0) [2024-06-15 12:16:05,767][1648981] Avg episode reward: [(0, '147.900')] [2024-06-15 12:16:07,163][1651669] Updated weights for policy 0, policy_version 60064 (0.0010) [2024-06-15 12:16:08,880][1651669] Updated weights for policy 0, policy_version 60115 (0.0017) [2024-06-15 12:16:10,766][1648981] Fps is (10 sec: 52553.8, 60 sec: 48060.0, 300 sec: 47653.7). Total num frames: 123207680. Throughput: 0: 11810.1. Samples: 30835712. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:10,767][1648981] Avg episode reward: [(0, '151.240')] [2024-06-15 12:16:14,234][1651669] Updated weights for policy 0, policy_version 60161 (0.0014) [2024-06-15 12:16:15,766][1648981] Fps is (10 sec: 39320.9, 60 sec: 45875.0, 300 sec: 47319.2). Total num frames: 123338752. Throughput: 0: 11730.6. Samples: 30909952. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:15,767][1648981] Avg episode reward: [(0, '150.620')] [2024-06-15 12:16:15,798][1651669] Updated weights for policy 0, policy_version 60225 (0.0011) [2024-06-15 12:16:17,012][1651669] Updated weights for policy 0, policy_version 60277 (0.0011) [2024-06-15 12:16:18,179][1651669] Updated weights for policy 0, policy_version 60310 (0.0021) [2024-06-15 12:16:18,861][1651669] Updated weights for policy 0, policy_version 60352 (0.0035) [2024-06-15 12:16:20,299][1651669] Updated weights for policy 0, policy_version 60413 (0.0014) [2024-06-15 12:16:20,793][1648981] Fps is (10 sec: 52290.8, 60 sec: 49134.9, 300 sec: 47981.4). Total num frames: 123731968. Throughput: 0: 11735.0. Samples: 30973952. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:20,793][1648981] Avg episode reward: [(0, '152.040')] [2024-06-15 12:16:25,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 47097.7). Total num frames: 123731968. Throughput: 0: 11771.9. Samples: 31014912. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:25,767][1648981] Avg episode reward: [(0, '153.140')] [2024-06-15 12:16:26,769][1651669] Updated weights for policy 0, policy_version 60467 (0.0013) [2024-06-15 12:16:28,068][1651669] Updated weights for policy 0, policy_version 60514 (0.0083) [2024-06-15 12:16:29,495][1651669] Updated weights for policy 0, policy_version 60581 (0.0011) [2024-06-15 12:16:30,766][1648981] Fps is (10 sec: 42711.1, 60 sec: 48606.3, 300 sec: 47652.4). Total num frames: 124157952. Throughput: 0: 11662.2. Samples: 31083008. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:30,767][1648981] Avg episode reward: [(0, '152.300')] [2024-06-15 12:16:30,826][1651669] Updated weights for policy 0, policy_version 60640 (0.0013) [2024-06-15 12:16:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 44783.0, 300 sec: 47430.3). Total num frames: 124256256. Throughput: 0: 11685.0. Samples: 31156224. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:35,767][1648981] Avg episode reward: [(0, '150.770')] [2024-06-15 12:16:37,493][1651669] Updated weights for policy 0, policy_version 60709 (0.0032) [2024-06-15 12:16:38,047][1651669] Updated weights for policy 0, policy_version 60736 (0.0012) [2024-06-15 12:16:40,177][1651669] Updated weights for policy 0, policy_version 60789 (0.0014) [2024-06-15 12:16:40,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 48081.7, 300 sec: 47208.1). Total num frames: 124551168. Throughput: 0: 11764.6. Samples: 31192576. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:40,768][1648981] Avg episode reward: [(0, '148.620')] [2024-06-15 12:16:41,785][1651669] Updated weights for policy 0, policy_version 60864 (0.0013) [2024-06-15 12:16:43,013][1651669] Updated weights for policy 0, policy_version 60912 (0.0081) [2024-06-15 12:16:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 124780544. Throughput: 0: 11736.7. Samples: 31256576. Policy #0 lag: (min: 26.0, avg: 144.6, max: 282.0) [2024-06-15 12:16:45,767][1648981] Avg episode reward: [(0, '151.270')] [2024-06-15 12:16:47,916][1651274] Signal inference workers to stop experience collection... (3200 times) [2024-06-15 12:16:47,975][1651669] InferenceWorker_p0-w0: stopping experience collection (3200 times) [2024-06-15 12:16:47,977][1651669] Updated weights for policy 0, policy_version 60945 (0.0012) [2024-06-15 12:16:48,250][1651274] Signal inference workers to resume experience collection... (3200 times) [2024-06-15 12:16:48,252][1651669] InferenceWorker_p0-w0: resuming experience collection (3200 times) [2024-06-15 12:16:48,967][1651669] Updated weights for policy 0, policy_version 60989 (0.0012) [2024-06-15 12:16:50,768][1648981] Fps is (10 sec: 39318.1, 60 sec: 46420.4, 300 sec: 46763.6). Total num frames: 124944384. Throughput: 0: 11946.3. Samples: 31336448. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:16:50,768][1648981] Avg episode reward: [(0, '156.470')] [2024-06-15 12:16:51,296][1651274] Saving new best policy, reward=156.470! [2024-06-15 12:16:51,748][1651669] Updated weights for policy 0, policy_version 61056 (0.0012) [2024-06-15 12:16:53,726][1651669] Updated weights for policy 0, policy_version 61136 (0.0013) [2024-06-15 12:16:54,807][1651669] Updated weights for policy 0, policy_version 61184 (0.0012) [2024-06-15 12:16:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 47513.7, 300 sec: 47541.3). Total num frames: 125304832. Throughput: 0: 11673.6. Samples: 31361024. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:16:55,767][1648981] Avg episode reward: [(0, '154.440')] [2024-06-15 12:16:55,799][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000061184_125304832.pth... [2024-06-15 12:16:55,912][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000055584_113836032.pth [2024-06-15 12:17:00,543][1651669] Updated weights for policy 0, policy_version 61243 (0.0013) [2024-06-15 12:17:00,768][1648981] Fps is (10 sec: 49149.3, 60 sec: 45892.0, 300 sec: 46876.0). Total num frames: 125435904. Throughput: 0: 11570.8. Samples: 31430656. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:17:00,769][1648981] Avg episode reward: [(0, '154.600')] [2024-06-15 12:17:03,626][1651669] Updated weights for policy 0, policy_version 61298 (0.0010) [2024-06-15 12:17:05,133][1651669] Updated weights for policy 0, policy_version 61361 (0.0011) [2024-06-15 12:17:05,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46421.3, 300 sec: 47430.4). Total num frames: 125730816. Throughput: 0: 11475.5. Samples: 31490048. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:17:05,767][1648981] Avg episode reward: [(0, '157.320')] [2024-06-15 12:17:06,026][1651274] Saving new best policy, reward=157.320! [2024-06-15 12:17:06,709][1651669] Updated weights for policy 0, policy_version 61436 (0.0014) [2024-06-15 12:17:10,766][1648981] Fps is (10 sec: 39328.4, 60 sec: 43690.6, 300 sec: 46874.9). Total num frames: 125829120. Throughput: 0: 11343.6. Samples: 31525376. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:17:10,767][1648981] Avg episode reward: [(0, '158.210')] [2024-06-15 12:17:11,275][1651274] Saving new best policy, reward=158.210! [2024-06-15 12:17:11,953][1651669] Updated weights for policy 0, policy_version 61501 (0.0017) [2024-06-15 12:17:14,551][1651669] Updated weights for policy 0, policy_version 61563 (0.0038) [2024-06-15 12:17:15,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46421.4, 300 sec: 47097.0). Total num frames: 126124032. Throughput: 0: 11343.6. Samples: 31593472. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:17:15,767][1648981] Avg episode reward: [(0, '159.740')] [2024-06-15 12:17:16,133][1651274] Saving new best policy, reward=159.740! [2024-06-15 12:17:16,963][1651669] Updated weights for policy 0, policy_version 61649 (0.0043) [2024-06-15 12:17:20,783][1648981] Fps is (10 sec: 52343.0, 60 sec: 43697.9, 300 sec: 47094.4). Total num frames: 126353408. Throughput: 0: 11294.0. Samples: 31664640. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:17:20,783][1648981] Avg episode reward: [(0, '151.470')] [2024-06-15 12:17:22,872][1651669] Updated weights for policy 0, policy_version 61714 (0.0016) [2024-06-15 12:17:24,144][1651669] Updated weights for policy 0, policy_version 61761 (0.0019) [2024-06-15 12:17:25,554][1651669] Updated weights for policy 0, policy_version 61820 (0.0015) [2024-06-15 12:17:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 126615552. Throughput: 0: 11355.1. Samples: 31703552. Policy #0 lag: (min: 2.0, avg: 91.6, max: 258.0) [2024-06-15 12:17:25,767][1648981] Avg episode reward: [(0, '153.500')] [2024-06-15 12:17:27,469][1651274] Signal inference workers to stop experience collection... (3250 times) [2024-06-15 12:17:27,515][1651669] InferenceWorker_p0-w0: stopping experience collection (3250 times) [2024-06-15 12:17:27,527][1651669] Updated weights for policy 0, policy_version 61876 (0.0013) [2024-06-15 12:17:27,656][1651274] Signal inference workers to resume experience collection... (3250 times) [2024-06-15 12:17:27,656][1651669] InferenceWorker_p0-w0: resuming experience collection (3250 times) [2024-06-15 12:17:28,591][1651669] Updated weights for policy 0, policy_version 61943 (0.0014) [2024-06-15 12:17:30,766][1648981] Fps is (10 sec: 52515.3, 60 sec: 45329.1, 300 sec: 47433.5). Total num frames: 126877696. Throughput: 0: 11502.9. Samples: 31774208. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:17:30,767][1648981] Avg episode reward: [(0, '158.290')] [2024-06-15 12:17:33,806][1651669] Updated weights for policy 0, policy_version 61985 (0.0016) [2024-06-15 12:17:34,775][1651669] Updated weights for policy 0, policy_version 62019 (0.0012) [2024-06-15 12:17:35,774][1648981] Fps is (10 sec: 49116.7, 60 sec: 47507.9, 300 sec: 46984.8). Total num frames: 127107072. Throughput: 0: 11467.3. Samples: 31852544. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:17:35,774][1648981] Avg episode reward: [(0, '162.050')] [2024-06-15 12:17:36,073][1651669] Updated weights for policy 0, policy_version 62077 (0.0014) [2024-06-15 12:17:36,122][1651274] Saving new best policy, reward=162.050! [2024-06-15 12:17:37,991][1651669] Updated weights for policy 0, policy_version 62133 (0.0012) [2024-06-15 12:17:39,481][1651669] Updated weights for policy 0, policy_version 62203 (0.0012) [2024-06-15 12:17:40,770][1648981] Fps is (10 sec: 52408.1, 60 sec: 47510.8, 300 sec: 47540.7). Total num frames: 127401984. Throughput: 0: 11638.5. Samples: 31884800. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:17:40,771][1648981] Avg episode reward: [(0, '162.140')] [2024-06-15 12:17:40,774][1651274] Saving new best policy, reward=162.140! [2024-06-15 12:17:45,766][1648981] Fps is (10 sec: 39350.1, 60 sec: 45329.1, 300 sec: 47097.1). Total num frames: 127500288. Throughput: 0: 11833.4. Samples: 31963136. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:17:45,767][1648981] Avg episode reward: [(0, '163.830')] [2024-06-15 12:17:45,956][1651669] Updated weights for policy 0, policy_version 62274 (0.0121) [2024-06-15 12:17:46,311][1651274] Saving new best policy, reward=163.830! [2024-06-15 12:17:47,475][1651669] Updated weights for policy 0, policy_version 62336 (0.0027) [2024-06-15 12:17:49,836][1651669] Updated weights for policy 0, policy_version 62400 (0.0015) [2024-06-15 12:17:50,774][1648981] Fps is (10 sec: 45857.3, 60 sec: 48600.5, 300 sec: 47762.3). Total num frames: 127860736. Throughput: 0: 11682.9. Samples: 32015872. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:17:50,775][1648981] Avg episode reward: [(0, '162.460')] [2024-06-15 12:17:51,460][1651669] Updated weights for policy 0, policy_version 62462 (0.0013) [2024-06-15 12:17:55,768][1648981] Fps is (10 sec: 42591.9, 60 sec: 43689.6, 300 sec: 46985.7). Total num frames: 127926272. Throughput: 0: 11673.2. Samples: 32050688. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:17:55,768][1648981] Avg episode reward: [(0, '164.590')] [2024-06-15 12:17:55,772][1651274] Saving new best policy, reward=164.590! [2024-06-15 12:17:57,319][1651669] Updated weights for policy 0, policy_version 62512 (0.0070) [2024-06-15 12:17:59,149][1651669] Updated weights for policy 0, policy_version 62578 (0.0145) [2024-06-15 12:18:00,770][1648981] Fps is (10 sec: 32781.2, 60 sec: 45873.7, 300 sec: 47096.5). Total num frames: 128188416. Throughput: 0: 11649.9. Samples: 32117760. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:18:00,771][1648981] Avg episode reward: [(0, '159.380')] [2024-06-15 12:18:02,632][1651669] Updated weights for policy 0, policy_version 62672 (0.0016) [2024-06-15 12:18:03,995][1651669] Updated weights for policy 0, policy_version 62720 (0.0013) [2024-06-15 12:18:05,767][1648981] Fps is (10 sec: 52436.4, 60 sec: 45329.0, 300 sec: 47097.1). Total num frames: 128450560. Throughput: 0: 11495.7. Samples: 32181760. Policy #0 lag: (min: 14.0, avg: 124.7, max: 254.0) [2024-06-15 12:18:05,767][1648981] Avg episode reward: [(0, '161.010')] [2024-06-15 12:18:10,232][1651274] Signal inference workers to stop experience collection... (3300 times) [2024-06-15 12:18:10,265][1651669] InferenceWorker_p0-w0: stopping experience collection (3300 times) [2024-06-15 12:18:10,300][1651669] Updated weights for policy 0, policy_version 62789 (0.0014) [2024-06-15 12:18:10,446][1651274] Signal inference workers to resume experience collection... (3300 times) [2024-06-15 12:18:10,446][1651669] InferenceWorker_p0-w0: resuming experience collection (3300 times) [2024-06-15 12:18:10,766][1648981] Fps is (10 sec: 42614.7, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 128614400. Throughput: 0: 11594.0. Samples: 32225280. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:10,767][1648981] Avg episode reward: [(0, '159.220')] [2024-06-15 12:18:11,517][1651669] Updated weights for policy 0, policy_version 62842 (0.0013) [2024-06-15 12:18:13,715][1651669] Updated weights for policy 0, policy_version 62896 (0.0116) [2024-06-15 12:18:15,399][1651669] Updated weights for policy 0, policy_version 62960 (0.0014) [2024-06-15 12:18:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 128974848. Throughput: 0: 11241.2. Samples: 32280064. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:15,767][1648981] Avg episode reward: [(0, '157.320')] [2024-06-15 12:18:20,299][1651669] Updated weights for policy 0, policy_version 62996 (0.0015) [2024-06-15 12:18:20,772][1648981] Fps is (10 sec: 42574.9, 60 sec: 44791.1, 300 sec: 46429.7). Total num frames: 129040384. Throughput: 0: 11366.8. Samples: 32364032. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:20,772][1648981] Avg episode reward: [(0, '159.420')] [2024-06-15 12:18:22,359][1651669] Updated weights for policy 0, policy_version 63077 (0.0041) [2024-06-15 12:18:23,527][1651669] Updated weights for policy 0, policy_version 63124 (0.0016) [2024-06-15 12:18:25,271][1651669] Updated weights for policy 0, policy_version 63187 (0.0013) [2024-06-15 12:18:25,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 129433600. Throughput: 0: 11276.4. Samples: 32392192. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:25,767][1648981] Avg episode reward: [(0, '160.390')] [2024-06-15 12:18:30,770][1648981] Fps is (10 sec: 45882.7, 60 sec: 43687.8, 300 sec: 46207.8). Total num frames: 129499136. Throughput: 0: 11206.1. Samples: 32467456. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:30,771][1648981] Avg episode reward: [(0, '159.790')] [2024-06-15 12:18:31,490][1651669] Updated weights for policy 0, policy_version 63252 (0.0015) [2024-06-15 12:18:33,445][1651669] Updated weights for policy 0, policy_version 63330 (0.0012) [2024-06-15 12:18:35,325][1651669] Updated weights for policy 0, policy_version 63395 (0.0013) [2024-06-15 12:18:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46427.0, 300 sec: 46986.0). Total num frames: 129892352. Throughput: 0: 11334.2. Samples: 32525824. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:35,767][1648981] Avg episode reward: [(0, '157.640')] [2024-06-15 12:18:36,545][1651669] Updated weights for policy 0, policy_version 63451 (0.0013) [2024-06-15 12:18:37,372][1651669] Updated weights for policy 0, policy_version 63488 (0.0012) [2024-06-15 12:18:40,766][1648981] Fps is (10 sec: 52449.4, 60 sec: 43693.5, 300 sec: 46208.4). Total num frames: 130023424. Throughput: 0: 11355.4. Samples: 32561664. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:40,767][1648981] Avg episode reward: [(0, '157.970')] [2024-06-15 12:18:42,982][1651669] Updated weights for policy 0, policy_version 63545 (0.0012) [2024-06-15 12:18:44,311][1651669] Updated weights for policy 0, policy_version 63572 (0.0013) [2024-06-15 12:18:45,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46421.3, 300 sec: 46652.7). Total num frames: 130285568. Throughput: 0: 11708.7. Samples: 32644608. Policy #0 lag: (min: 15.0, avg: 93.6, max: 271.0) [2024-06-15 12:18:45,767][1648981] Avg episode reward: [(0, '165.370')] [2024-06-15 12:18:46,247][1651274] Saving new best policy, reward=165.370! [2024-06-15 12:18:46,248][1651669] Updated weights for policy 0, policy_version 63648 (0.0011) [2024-06-15 12:18:47,696][1651274] Signal inference workers to stop experience collection... (3350 times) [2024-06-15 12:18:47,741][1651669] InferenceWorker_p0-w0: stopping experience collection (3350 times) [2024-06-15 12:18:47,934][1651274] Signal inference workers to resume experience collection... (3350 times) [2024-06-15 12:18:47,935][1651669] InferenceWorker_p0-w0: resuming experience collection (3350 times) [2024-06-15 12:18:48,054][1651669] Updated weights for policy 0, policy_version 63713 (0.0014) [2024-06-15 12:18:48,628][1651669] Updated weights for policy 0, policy_version 63744 (0.0015) [2024-06-15 12:18:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 44788.8, 300 sec: 46541.9). Total num frames: 130547712. Throughput: 0: 11650.9. Samples: 32706048. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:18:50,767][1648981] Avg episode reward: [(0, '164.740')] [2024-06-15 12:18:54,090][1651669] Updated weights for policy 0, policy_version 63799 (0.0039) [2024-06-15 12:18:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46968.6, 300 sec: 46763.8). Total num frames: 130744320. Throughput: 0: 11616.7. Samples: 32748032. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:18:55,767][1648981] Avg episode reward: [(0, '168.330')] [2024-06-15 12:18:55,768][1651669] Updated weights for policy 0, policy_version 63841 (0.0012) [2024-06-15 12:18:56,521][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000063872_130809856.pth... [2024-06-15 12:18:56,660][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000058368_119537664.pth [2024-06-15 12:18:56,665][1651274] Saving new best policy, reward=168.330! [2024-06-15 12:18:57,162][1651669] Updated weights for policy 0, policy_version 63890 (0.0022) [2024-06-15 12:18:58,843][1651669] Updated weights for policy 0, policy_version 63956 (0.0016) [2024-06-15 12:19:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48062.8, 300 sec: 46874.9). Total num frames: 131072000. Throughput: 0: 11662.2. Samples: 32804864. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:19:00,767][1648981] Avg episode reward: [(0, '164.580')] [2024-06-15 12:19:04,132][1651669] Updated weights for policy 0, policy_version 64003 (0.0013) [2024-06-15 12:19:05,273][1651669] Updated weights for policy 0, policy_version 64057 (0.0013) [2024-06-15 12:19:05,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 46654.7). Total num frames: 131203072. Throughput: 0: 11640.9. Samples: 32887808. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:19:05,767][1648981] Avg episode reward: [(0, '165.500')] [2024-06-15 12:19:07,178][1651669] Updated weights for policy 0, policy_version 64097 (0.0021) [2024-06-15 12:19:08,466][1651669] Updated weights for policy 0, policy_version 64160 (0.0014) [2024-06-15 12:19:10,289][1651669] Updated weights for policy 0, policy_version 64228 (0.0033) [2024-06-15 12:19:10,771][1648981] Fps is (10 sec: 49127.4, 60 sec: 49147.9, 300 sec: 46987.7). Total num frames: 131563520. Throughput: 0: 11706.4. Samples: 32919040. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:19:10,772][1648981] Avg episode reward: [(0, '166.680')] [2024-06-15 12:19:15,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 46322.6). Total num frames: 131629056. Throughput: 0: 11526.7. Samples: 32986112. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:19:15,767][1648981] Avg episode reward: [(0, '169.050')] [2024-06-15 12:19:16,003][1651669] Updated weights for policy 0, policy_version 64288 (0.0013) [2024-06-15 12:19:16,405][1651274] Saving new best policy, reward=169.050! [2024-06-15 12:19:16,837][1651669] Updated weights for policy 0, policy_version 64318 (0.0011) [2024-06-15 12:19:18,425][1651669] Updated weights for policy 0, policy_version 64380 (0.0013) [2024-06-15 12:19:20,667][1651669] Updated weights for policy 0, policy_version 64433 (0.0014) [2024-06-15 12:19:20,766][1648981] Fps is (10 sec: 39341.4, 60 sec: 48610.4, 300 sec: 46653.1). Total num frames: 131956736. Throughput: 0: 11832.9. Samples: 33058304. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:19:20,767][1648981] Avg episode reward: [(0, '168.340')] [2024-06-15 12:19:22,118][1651669] Updated weights for policy 0, policy_version 64503 (0.0011) [2024-06-15 12:19:25,774][1648981] Fps is (10 sec: 49113.6, 60 sec: 44777.1, 300 sec: 46318.3). Total num frames: 132120576. Throughput: 0: 11717.1. Samples: 33089024. Policy #0 lag: (min: 49.0, avg: 212.8, max: 303.0) [2024-06-15 12:19:25,775][1648981] Avg episode reward: [(0, '166.270')] [2024-06-15 12:19:27,192][1651669] Updated weights for policy 0, policy_version 64560 (0.0021) [2024-06-15 12:19:29,726][1651669] Updated weights for policy 0, policy_version 64609 (0.0017) [2024-06-15 12:19:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48062.9, 300 sec: 46541.7). Total num frames: 132382720. Throughput: 0: 11571.2. Samples: 33165312. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:19:30,767][1648981] Avg episode reward: [(0, '165.460')] [2024-06-15 12:19:31,198][1651669] Updated weights for policy 0, policy_version 64643 (0.0012) [2024-06-15 12:19:31,911][1651274] Signal inference workers to stop experience collection... (3400 times) [2024-06-15 12:19:32,015][1651669] InferenceWorker_p0-w0: stopping experience collection (3400 times) [2024-06-15 12:19:32,146][1651274] Signal inference workers to resume experience collection... (3400 times) [2024-06-15 12:19:32,147][1651669] InferenceWorker_p0-w0: resuming experience collection (3400 times) [2024-06-15 12:19:32,360][1651669] Updated weights for policy 0, policy_version 64690 (0.0013) [2024-06-15 12:19:34,077][1651669] Updated weights for policy 0, policy_version 64761 (0.0012) [2024-06-15 12:19:35,769][1648981] Fps is (10 sec: 52468.8, 60 sec: 45875.0, 300 sec: 46763.8). Total num frames: 132644864. Throughput: 0: 11582.5. Samples: 33227264. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:19:35,769][1648981] Avg episode reward: [(0, '168.500')] [2024-06-15 12:19:38,204][1651669] Updated weights for policy 0, policy_version 64800 (0.0067) [2024-06-15 12:19:40,734][1651669] Updated weights for policy 0, policy_version 64848 (0.0023) [2024-06-15 12:19:40,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 46658.0). Total num frames: 132808704. Throughput: 0: 11537.1. Samples: 33267200. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:19:40,767][1648981] Avg episode reward: [(0, '170.570')] [2024-06-15 12:19:41,070][1651274] Saving new best policy, reward=170.570! [2024-06-15 12:19:42,680][1651669] Updated weights for policy 0, policy_version 64912 (0.0014) [2024-06-15 12:19:44,822][1651669] Updated weights for policy 0, policy_version 64992 (0.0041) [2024-06-15 12:19:45,490][1651669] Updated weights for policy 0, policy_version 65020 (0.0017) [2024-06-15 12:19:45,793][1648981] Fps is (10 sec: 52291.2, 60 sec: 48038.5, 300 sec: 46981.8). Total num frames: 133169152. Throughput: 0: 11689.5. Samples: 33331200. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:19:45,796][1648981] Avg episode reward: [(0, '171.570')] [2024-06-15 12:19:45,797][1651274] Saving new best policy, reward=171.570! [2024-06-15 12:19:49,229][1651669] Updated weights for policy 0, policy_version 65081 (0.0013) [2024-06-15 12:19:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 133300224. Throughput: 0: 11673.6. Samples: 33413120. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:19:50,767][1648981] Avg episode reward: [(0, '170.540')] [2024-06-15 12:19:52,601][1651669] Updated weights for policy 0, policy_version 65136 (0.0014) [2024-06-15 12:19:54,289][1651669] Updated weights for policy 0, policy_version 65200 (0.0014) [2024-06-15 12:19:55,393][1651669] Updated weights for policy 0, policy_version 65248 (0.0013) [2024-06-15 12:19:55,786][1648981] Fps is (10 sec: 49187.0, 60 sec: 48590.2, 300 sec: 46984.8). Total num frames: 133660672. Throughput: 0: 11738.1. Samples: 33447424. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:19:55,786][1648981] Avg episode reward: [(0, '171.260')] [2024-06-15 12:19:59,293][1651669] Updated weights for policy 0, policy_version 65285 (0.0014) [2024-06-15 12:20:00,567][1651669] Updated weights for policy 0, policy_version 65344 (0.0012) [2024-06-15 12:20:00,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 46653.7). Total num frames: 133824512. Throughput: 0: 11741.8. Samples: 33514496. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:20:00,767][1648981] Avg episode reward: [(0, '168.920')] [2024-06-15 12:20:03,764][1651669] Updated weights for policy 0, policy_version 65395 (0.0015) [2024-06-15 12:20:05,626][1651669] Updated weights for policy 0, policy_version 65472 (0.0014) [2024-06-15 12:20:05,766][1648981] Fps is (10 sec: 42681.0, 60 sec: 48059.7, 300 sec: 46652.8). Total num frames: 134086656. Throughput: 0: 11844.3. Samples: 33591296. Policy #0 lag: (min: 5.0, avg: 100.5, max: 261.0) [2024-06-15 12:20:05,767][1648981] Avg episode reward: [(0, '166.130')] [2024-06-15 12:20:07,159][1651669] Updated weights for policy 0, policy_version 65531 (0.0013) [2024-06-15 12:20:10,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 45332.8, 300 sec: 46430.6). Total num frames: 134283264. Throughput: 0: 11812.2. Samples: 33620480. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:10,767][1648981] Avg episode reward: [(0, '165.380')] [2024-06-15 12:20:10,988][1651669] Updated weights for policy 0, policy_version 65592 (0.0014) [2024-06-15 12:20:13,874][1651274] Signal inference workers to stop experience collection... (3450 times) [2024-06-15 12:20:13,915][1651669] InferenceWorker_p0-w0: stopping experience collection (3450 times) [2024-06-15 12:20:14,045][1651274] Signal inference workers to resume experience collection... (3450 times) [2024-06-15 12:20:14,046][1651669] InferenceWorker_p0-w0: resuming experience collection (3450 times) [2024-06-15 12:20:14,637][1651669] Updated weights for policy 0, policy_version 65658 (0.0085) [2024-06-15 12:20:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 46653.6). Total num frames: 134545408. Throughput: 0: 11889.8. Samples: 33700352. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:15,767][1648981] Avg episode reward: [(0, '157.490')] [2024-06-15 12:20:15,883][1651669] Updated weights for policy 0, policy_version 65712 (0.0105) [2024-06-15 12:20:17,306][1651669] Updated weights for policy 0, policy_version 65764 (0.0016) [2024-06-15 12:20:20,768][1648981] Fps is (10 sec: 45867.8, 60 sec: 46420.1, 300 sec: 46208.2). Total num frames: 134742016. Throughput: 0: 11991.8. Samples: 33766912. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:20,769][1648981] Avg episode reward: [(0, '159.140')] [2024-06-15 12:20:21,390][1651669] Updated weights for policy 0, policy_version 65810 (0.0015) [2024-06-15 12:20:25,634][1651669] Updated weights for policy 0, policy_version 65875 (0.0014) [2024-06-15 12:20:25,766][1648981] Fps is (10 sec: 36044.9, 60 sec: 46427.4, 300 sec: 46319.6). Total num frames: 134905856. Throughput: 0: 11912.6. Samples: 33803264. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:25,767][1648981] Avg episode reward: [(0, '156.410')] [2024-06-15 12:20:26,998][1651669] Updated weights for policy 0, policy_version 65940 (0.0015) [2024-06-15 12:20:28,949][1651669] Updated weights for policy 0, policy_version 66016 (0.0013) [2024-06-15 12:20:29,730][1651669] Updated weights for policy 0, policy_version 66047 (0.0013) [2024-06-15 12:20:30,767][1648981] Fps is (10 sec: 52435.2, 60 sec: 48059.4, 300 sec: 46430.5). Total num frames: 135266304. Throughput: 0: 11817.0. Samples: 33862656. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:30,768][1648981] Avg episode reward: [(0, '153.860')] [2024-06-15 12:20:33,026][1651669] Updated weights for policy 0, policy_version 66097 (0.0014) [2024-06-15 12:20:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45875.4, 300 sec: 46546.1). Total num frames: 135397376. Throughput: 0: 11776.0. Samples: 33943040. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:35,767][1648981] Avg episode reward: [(0, '152.360')] [2024-06-15 12:20:37,268][1651669] Updated weights for policy 0, policy_version 66160 (0.0014) [2024-06-15 12:20:39,075][1651669] Updated weights for policy 0, policy_version 66224 (0.0012) [2024-06-15 12:20:40,766][1648981] Fps is (10 sec: 45877.1, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 135725056. Throughput: 0: 11803.8. Samples: 33978368. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:40,767][1648981] Avg episode reward: [(0, '152.820')] [2024-06-15 12:20:40,984][1651669] Updated weights for policy 0, policy_version 66294 (0.0013) [2024-06-15 12:20:43,833][1651669] Updated weights for policy 0, policy_version 66336 (0.0012) [2024-06-15 12:20:45,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 45895.3, 300 sec: 46652.7). Total num frames: 135921664. Throughput: 0: 11628.1. Samples: 34037760. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:45,769][1648981] Avg episode reward: [(0, '155.800')] [2024-06-15 12:20:47,471][1651669] Updated weights for policy 0, policy_version 66373 (0.0017) [2024-06-15 12:20:48,758][1651669] Updated weights for policy 0, policy_version 66425 (0.0012) [2024-06-15 12:20:50,073][1651669] Updated weights for policy 0, policy_version 66483 (0.0015) [2024-06-15 12:20:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 46541.7). Total num frames: 136183808. Throughput: 0: 11696.4. Samples: 34117632. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:20:50,767][1648981] Avg episode reward: [(0, '157.040')] [2024-06-15 12:20:51,617][1651669] Updated weights for policy 0, policy_version 66530 (0.0012) [2024-06-15 12:20:54,374][1651274] Signal inference workers to stop experience collection... (3500 times) [2024-06-15 12:20:54,460][1651669] InferenceWorker_p0-w0: stopping experience collection (3500 times) [2024-06-15 12:20:54,708][1651274] Signal inference workers to resume experience collection... (3500 times) [2024-06-15 12:20:54,709][1651669] InferenceWorker_p0-w0: resuming experience collection (3500 times) [2024-06-15 12:20:55,319][1651669] Updated weights for policy 0, policy_version 66594 (0.0033) [2024-06-15 12:20:55,767][1648981] Fps is (10 sec: 49149.3, 60 sec: 45889.4, 300 sec: 46545.3). Total num frames: 136413184. Throughput: 0: 11684.8. Samples: 34146304. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:20:55,768][1648981] Avg episode reward: [(0, '163.750')] [2024-06-15 12:20:55,787][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000066624_136445952.pth... [2024-06-15 12:20:55,871][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000061184_125304832.pth [2024-06-15 12:20:58,902][1651669] Updated weights for policy 0, policy_version 66640 (0.0015) [2024-06-15 12:20:59,925][1651669] Updated weights for policy 0, policy_version 66679 (0.0015) [2024-06-15 12:21:00,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 46421.4, 300 sec: 46319.5). Total num frames: 136609792. Throughput: 0: 11628.1. Samples: 34223616. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:00,767][1648981] Avg episode reward: [(0, '164.210')] [2024-06-15 12:21:00,995][1651669] Updated weights for policy 0, policy_version 66720 (0.0018) [2024-06-15 12:21:02,497][1651669] Updated weights for policy 0, policy_version 66772 (0.0014) [2024-06-15 12:21:03,476][1651669] Updated weights for policy 0, policy_version 66816 (0.0012) [2024-06-15 12:21:05,525][1651669] Updated weights for policy 0, policy_version 66869 (0.0013) [2024-06-15 12:21:05,767][1648981] Fps is (10 sec: 55708.5, 60 sec: 48059.5, 300 sec: 46652.7). Total num frames: 136970240. Throughput: 0: 11639.8. Samples: 34290688. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:05,767][1648981] Avg episode reward: [(0, '164.870')] [2024-06-15 12:21:10,128][1651669] Updated weights for policy 0, policy_version 66918 (0.0135) [2024-06-15 12:21:10,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 46967.5, 300 sec: 46652.8). Total num frames: 137101312. Throughput: 0: 11798.8. Samples: 34334208. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:10,767][1648981] Avg episode reward: [(0, '165.100')] [2024-06-15 12:21:11,736][1651669] Updated weights for policy 0, policy_version 66962 (0.0013) [2024-06-15 12:21:13,143][1651669] Updated weights for policy 0, policy_version 67024 (0.0011) [2024-06-15 12:21:14,585][1651669] Updated weights for policy 0, policy_version 67072 (0.0012) [2024-06-15 12:21:15,769][1648981] Fps is (10 sec: 39313.7, 60 sec: 46965.7, 300 sec: 46212.2). Total num frames: 137363456. Throughput: 0: 11821.0. Samples: 34394624. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:15,769][1648981] Avg episode reward: [(0, '168.250')] [2024-06-15 12:21:16,816][1651669] Updated weights for policy 0, policy_version 67122 (0.0107) [2024-06-15 12:21:20,766][1648981] Fps is (10 sec: 39321.2, 60 sec: 45876.4, 300 sec: 46652.7). Total num frames: 137494528. Throughput: 0: 11798.7. Samples: 34473984. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:20,767][1648981] Avg episode reward: [(0, '167.870')] [2024-06-15 12:21:20,982][1651669] Updated weights for policy 0, policy_version 67153 (0.0011) [2024-06-15 12:21:23,115][1651669] Updated weights for policy 0, policy_version 67216 (0.0028) [2024-06-15 12:21:24,903][1651669] Updated weights for policy 0, policy_version 67284 (0.0013) [2024-06-15 12:21:25,782][1648981] Fps is (10 sec: 49085.0, 60 sec: 49138.9, 300 sec: 46428.1). Total num frames: 137854976. Throughput: 0: 11783.2. Samples: 34508800. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:25,783][1648981] Avg episode reward: [(0, '167.810')] [2024-06-15 12:21:27,570][1651669] Updated weights for policy 0, policy_version 67346 (0.0015) [2024-06-15 12:21:28,416][1651669] Updated weights for policy 0, policy_version 67392 (0.0013) [2024-06-15 12:21:30,774][1648981] Fps is (10 sec: 52387.1, 60 sec: 45869.4, 300 sec: 46651.5). Total num frames: 138018816. Throughput: 0: 11910.5. Samples: 34573824. Policy #0 lag: (min: 79.0, avg: 193.2, max: 350.0) [2024-06-15 12:21:30,775][1648981] Avg episode reward: [(0, '162.170')] [2024-06-15 12:21:32,825][1651669] Updated weights for policy 0, policy_version 67456 (0.0015) [2024-06-15 12:21:35,765][1651669] Updated weights for policy 0, policy_version 67520 (0.0015) [2024-06-15 12:21:35,766][1648981] Fps is (10 sec: 42666.1, 60 sec: 48059.6, 300 sec: 46541.7). Total num frames: 138280960. Throughput: 0: 11753.2. Samples: 34646528. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:21:35,767][1648981] Avg episode reward: [(0, '160.450')] [2024-06-15 12:21:36,297][1651274] Signal inference workers to stop experience collection... (3550 times) [2024-06-15 12:21:36,414][1651669] InferenceWorker_p0-w0: stopping experience collection (3550 times) [2024-06-15 12:21:36,597][1651274] Signal inference workers to resume experience collection... (3550 times) [2024-06-15 12:21:36,598][1651669] InferenceWorker_p0-w0: resuming experience collection (3550 times) [2024-06-15 12:21:37,327][1651669] Updated weights for policy 0, policy_version 67581 (0.0014) [2024-06-15 12:21:39,641][1651669] Updated weights for policy 0, policy_version 67621 (0.0014) [2024-06-15 12:21:40,776][1648981] Fps is (10 sec: 52420.6, 60 sec: 46960.0, 300 sec: 46651.2). Total num frames: 138543104. Throughput: 0: 11762.3. Samples: 34675712. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:21:40,777][1648981] Avg episode reward: [(0, '161.410')] [2024-06-15 12:21:43,305][1651669] Updated weights for policy 0, policy_version 67666 (0.0016) [2024-06-15 12:21:44,070][1651669] Updated weights for policy 0, policy_version 67712 (0.0013) [2024-06-15 12:21:45,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46421.6, 300 sec: 46653.0). Total num frames: 138706944. Throughput: 0: 11753.3. Samples: 34752512. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:21:45,767][1648981] Avg episode reward: [(0, '159.890')] [2024-06-15 12:21:46,889][1651669] Updated weights for policy 0, policy_version 67786 (0.0066) [2024-06-15 12:21:49,986][1651669] Updated weights for policy 0, policy_version 67842 (0.0012) [2024-06-15 12:21:50,766][1648981] Fps is (10 sec: 49199.1, 60 sec: 47513.6, 300 sec: 46541.7). Total num frames: 139034624. Throughput: 0: 11741.9. Samples: 34819072. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:21:50,767][1648981] Avg episode reward: [(0, '162.270')] [2024-06-15 12:21:53,612][1651669] Updated weights for policy 0, policy_version 67920 (0.0016) [2024-06-15 12:21:55,767][1648981] Fps is (10 sec: 49151.7, 60 sec: 46422.0, 300 sec: 46653.0). Total num frames: 139198464. Throughput: 0: 11787.4. Samples: 34864640. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:21:55,768][1648981] Avg episode reward: [(0, '158.640')] [2024-06-15 12:21:56,416][1651669] Updated weights for policy 0, policy_version 67984 (0.0104) [2024-06-15 12:21:58,056][1651669] Updated weights for policy 0, policy_version 68051 (0.0013) [2024-06-15 12:22:00,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 139460608. Throughput: 0: 11879.0. Samples: 34929152. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:22:00,767][1648981] Avg episode reward: [(0, '162.430')] [2024-06-15 12:22:00,861][1651669] Updated weights for policy 0, policy_version 68112 (0.0013) [2024-06-15 12:22:04,562][1651669] Updated weights for policy 0, policy_version 68176 (0.0013) [2024-06-15 12:22:05,727][1651669] Updated weights for policy 0, policy_version 68224 (0.0015) [2024-06-15 12:22:05,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.4, 300 sec: 47097.1). Total num frames: 139722752. Throughput: 0: 11776.0. Samples: 35003904. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:22:05,767][1648981] Avg episode reward: [(0, '161.180')] [2024-06-15 12:22:08,638][1651669] Updated weights for policy 0, policy_version 68288 (0.0013) [2024-06-15 12:22:09,805][1651669] Updated weights for policy 0, policy_version 68342 (0.0015) [2024-06-15 12:22:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 139984896. Throughput: 0: 11791.6. Samples: 35039232. Policy #0 lag: (min: 59.0, avg: 149.6, max: 315.0) [2024-06-15 12:22:10,767][1648981] Avg episode reward: [(0, '160.100')] [2024-06-15 12:22:13,036][1651669] Updated weights for policy 0, policy_version 68409 (0.0013) [2024-06-15 12:22:15,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46423.2, 300 sec: 46766.4). Total num frames: 140148736. Throughput: 0: 11846.4. Samples: 35106816. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:15,767][1648981] Avg episode reward: [(0, '162.240')] [2024-06-15 12:22:16,652][1651669] Updated weights for policy 0, policy_version 68474 (0.0012) [2024-06-15 12:22:18,565][1651669] Updated weights for policy 0, policy_version 68512 (0.0013) [2024-06-15 12:22:19,837][1651274] Signal inference workers to stop experience collection... (3600 times) [2024-06-15 12:22:19,870][1651669] InferenceWorker_p0-w0: stopping experience collection (3600 times) [2024-06-15 12:22:19,872][1651669] Updated weights for policy 0, policy_version 68547 (0.0014) [2024-06-15 12:22:20,066][1651274] Signal inference workers to resume experience collection... (3600 times) [2024-06-15 12:22:20,067][1651669] InferenceWorker_p0-w0: resuming experience collection (3600 times) [2024-06-15 12:22:20,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 140443648. Throughput: 0: 11685.0. Samples: 35172352. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:20,767][1648981] Avg episode reward: [(0, '161.120')] [2024-06-15 12:22:23,544][1651669] Updated weights for policy 0, policy_version 68629 (0.0012) [2024-06-15 12:22:25,794][1648981] Fps is (10 sec: 49015.6, 60 sec: 46412.2, 300 sec: 46648.4). Total num frames: 140640256. Throughput: 0: 11850.9. Samples: 35209216. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:25,795][1648981] Avg episode reward: [(0, '162.370')] [2024-06-15 12:22:27,832][1651669] Updated weights for policy 0, policy_version 68688 (0.0013) [2024-06-15 12:22:29,147][1651669] Updated weights for policy 0, policy_version 68739 (0.0011) [2024-06-15 12:22:30,396][1651669] Updated weights for policy 0, policy_version 68793 (0.0017) [2024-06-15 12:22:30,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48066.2, 300 sec: 46765.0). Total num frames: 140902400. Throughput: 0: 11844.3. Samples: 35285504. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:30,767][1648981] Avg episode reward: [(0, '162.880')] [2024-06-15 12:22:31,687][1651669] Updated weights for policy 0, policy_version 68852 (0.0012) [2024-06-15 12:22:34,346][1651669] Updated weights for policy 0, policy_version 68896 (0.0076) [2024-06-15 12:22:35,774][1648981] Fps is (10 sec: 52532.7, 60 sec: 48053.4, 300 sec: 46652.1). Total num frames: 141164544. Throughput: 0: 11955.9. Samples: 35357184. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:35,775][1648981] Avg episode reward: [(0, '163.510')] [2024-06-15 12:22:38,297][1651669] Updated weights for policy 0, policy_version 68948 (0.0036) [2024-06-15 12:22:39,397][1651669] Updated weights for policy 0, policy_version 68991 (0.0012) [2024-06-15 12:22:40,765][1651669] Updated weights for policy 0, policy_version 69042 (0.0016) [2024-06-15 12:22:40,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47521.3, 300 sec: 47097.1). Total num frames: 141393920. Throughput: 0: 11798.8. Samples: 35395584. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:40,767][1648981] Avg episode reward: [(0, '165.250')] [2024-06-15 12:22:42,093][1651669] Updated weights for policy 0, policy_version 69073 (0.0012) [2024-06-15 12:22:45,344][1651669] Updated weights for policy 0, policy_version 69129 (0.0014) [2024-06-15 12:22:45,766][1648981] Fps is (10 sec: 45911.8, 60 sec: 48605.8, 300 sec: 46654.0). Total num frames: 141623296. Throughput: 0: 11935.3. Samples: 35466240. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:45,767][1648981] Avg episode reward: [(0, '161.490')] [2024-06-15 12:22:46,403][1651669] Updated weights for policy 0, policy_version 69184 (0.0012) [2024-06-15 12:22:49,583][1651669] Updated weights for policy 0, policy_version 69238 (0.0015) [2024-06-15 12:22:50,544][1651669] Updated weights for policy 0, policy_version 69280 (0.0012) [2024-06-15 12:22:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 47319.5). Total num frames: 141885440. Throughput: 0: 11958.1. Samples: 35542016. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:50,767][1648981] Avg episode reward: [(0, '168.370')] [2024-06-15 12:22:53,068][1651669] Updated weights for policy 0, policy_version 69344 (0.0032) [2024-06-15 12:22:55,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 48059.6, 300 sec: 47097.6). Total num frames: 142082048. Throughput: 0: 11844.2. Samples: 35572224. Policy #0 lag: (min: 4.0, avg: 106.8, max: 260.0) [2024-06-15 12:22:55,767][1648981] Avg episode reward: [(0, '167.570')] [2024-06-15 12:22:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000069376_142082048.pth... [2024-06-15 12:22:55,827][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000063872_130809856.pth [2024-06-15 12:22:56,711][1651669] Updated weights for policy 0, policy_version 69395 (0.0036) [2024-06-15 12:22:59,445][1651669] Updated weights for policy 0, policy_version 69442 (0.0014) [2024-06-15 12:23:00,799][1648981] Fps is (10 sec: 42459.9, 60 sec: 47487.8, 300 sec: 46980.8). Total num frames: 142311424. Throughput: 0: 12108.6. Samples: 35652096. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:00,800][1648981] Avg episode reward: [(0, '166.390')] [2024-06-15 12:23:01,051][1651274] Signal inference workers to stop experience collection... (3650 times) [2024-06-15 12:23:01,051][1651669] Updated weights for policy 0, policy_version 69505 (0.0026) [2024-06-15 12:23:01,130][1651669] InferenceWorker_p0-w0: stopping experience collection (3650 times) [2024-06-15 12:23:01,356][1651274] Signal inference workers to resume experience collection... (3650 times) [2024-06-15 12:23:01,358][1651669] InferenceWorker_p0-w0: resuming experience collection (3650 times) [2024-06-15 12:23:02,280][1651669] Updated weights for policy 0, policy_version 69555 (0.0028) [2024-06-15 12:23:03,556][1651669] Updated weights for policy 0, policy_version 69589 (0.0016) [2024-06-15 12:23:05,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 142606336. Throughput: 0: 12197.0. Samples: 35721216. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:05,767][1648981] Avg episode reward: [(0, '169.120')] [2024-06-15 12:23:07,104][1651669] Updated weights for policy 0, policy_version 69634 (0.0012) [2024-06-15 12:23:08,448][1651669] Updated weights for policy 0, policy_version 69690 (0.0164) [2024-06-15 12:23:10,766][1648981] Fps is (10 sec: 46025.0, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 142770176. Throughput: 0: 12124.8. Samples: 35754496. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:10,767][1648981] Avg episode reward: [(0, '167.250')] [2024-06-15 12:23:10,932][1651669] Updated weights for policy 0, policy_version 69728 (0.0047) [2024-06-15 12:23:12,329][1651669] Updated weights for policy 0, policy_version 69776 (0.0013) [2024-06-15 12:23:13,439][1651669] Updated weights for policy 0, policy_version 69820 (0.0011) [2024-06-15 12:23:14,939][1651669] Updated weights for policy 0, policy_version 69872 (0.0014) [2024-06-15 12:23:15,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 49698.0, 300 sec: 47764.4). Total num frames: 143130624. Throughput: 0: 12003.5. Samples: 35825664. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:15,767][1648981] Avg episode reward: [(0, '167.980')] [2024-06-15 12:23:18,571][1651669] Updated weights for policy 0, policy_version 69906 (0.0012) [2024-06-15 12:23:20,767][1648981] Fps is (10 sec: 49150.7, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 143261696. Throughput: 0: 12017.0. Samples: 35897856. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:20,767][1648981] Avg episode reward: [(0, '164.120')] [2024-06-15 12:23:21,728][1651669] Updated weights for policy 0, policy_version 69968 (0.0013) [2024-06-15 12:23:22,928][1651669] Updated weights for policy 0, policy_version 70021 (0.0013) [2024-06-15 12:23:24,143][1651669] Updated weights for policy 0, policy_version 70080 (0.0012) [2024-06-15 12:23:25,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 48081.9, 300 sec: 47542.0). Total num frames: 143523840. Throughput: 0: 11867.0. Samples: 35929600. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:25,767][1648981] Avg episode reward: [(0, '162.390')] [2024-06-15 12:23:26,848][1651669] Updated weights for policy 0, policy_version 70144 (0.0011) [2024-06-15 12:23:30,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 143785984. Throughput: 0: 12014.9. Samples: 36006912. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:30,767][1648981] Avg episode reward: [(0, '165.840')] [2024-06-15 12:23:32,370][1651669] Updated weights for policy 0, policy_version 70214 (0.0014) [2024-06-15 12:23:34,286][1651669] Updated weights for policy 0, policy_version 70291 (0.0095) [2024-06-15 12:23:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48066.1, 300 sec: 47541.4). Total num frames: 144048128. Throughput: 0: 11776.0. Samples: 36071936. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:35,767][1648981] Avg episode reward: [(0, '162.520')] [2024-06-15 12:23:37,391][1651669] Updated weights for policy 0, policy_version 70352 (0.0012) [2024-06-15 12:23:38,441][1651669] Updated weights for policy 0, policy_version 70400 (0.0015) [2024-06-15 12:23:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 144244736. Throughput: 0: 12003.6. Samples: 36112384. Policy #0 lag: (min: 2.0, avg: 95.6, max: 258.0) [2024-06-15 12:23:40,767][1648981] Avg episode reward: [(0, '163.750')] [2024-06-15 12:23:41,265][1651669] Updated weights for policy 0, policy_version 70454 (0.0017) [2024-06-15 12:23:43,935][1651669] Updated weights for policy 0, policy_version 70497 (0.0013) [2024-06-15 12:23:44,296][1651274] Signal inference workers to stop experience collection... (3700 times) [2024-06-15 12:23:44,378][1651669] InferenceWorker_p0-w0: stopping experience collection (3700 times) [2024-06-15 12:23:44,558][1651274] Signal inference workers to resume experience collection... (3700 times) [2024-06-15 12:23:44,559][1651669] InferenceWorker_p0-w0: resuming experience collection (3700 times) [2024-06-15 12:23:45,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 144539648. Throughput: 0: 11773.1. Samples: 36181504. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:23:45,767][1648981] Avg episode reward: [(0, '165.200')] [2024-06-15 12:23:45,856][1651669] Updated weights for policy 0, policy_version 70580 (0.0120) [2024-06-15 12:23:49,315][1651669] Updated weights for policy 0, policy_version 70626 (0.0012) [2024-06-15 12:23:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 144703488. Throughput: 0: 11946.7. Samples: 36258816. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:23:50,767][1648981] Avg episode reward: [(0, '163.160')] [2024-06-15 12:23:51,157][1651669] Updated weights for policy 0, policy_version 70661 (0.0011) [2024-06-15 12:23:52,325][1651669] Updated weights for policy 0, policy_version 70714 (0.0090) [2024-06-15 12:23:54,629][1651669] Updated weights for policy 0, policy_version 70776 (0.0013) [2024-06-15 12:23:55,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 48605.7, 300 sec: 47208.1). Total num frames: 144998400. Throughput: 0: 11969.3. Samples: 36293120. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:23:55,768][1648981] Avg episode reward: [(0, '165.360')] [2024-06-15 12:23:56,432][1651669] Updated weights for policy 0, policy_version 70842 (0.0018) [2024-06-15 12:24:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47539.4, 300 sec: 47319.2). Total num frames: 145162240. Throughput: 0: 11946.7. Samples: 36363264. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:24:00,767][1648981] Avg episode reward: [(0, '165.640')] [2024-06-15 12:24:01,236][1651669] Updated weights for policy 0, policy_version 70901 (0.0256) [2024-06-15 12:24:03,350][1651669] Updated weights for policy 0, policy_version 70930 (0.0011) [2024-06-15 12:24:04,999][1651669] Updated weights for policy 0, policy_version 70996 (0.0165) [2024-06-15 12:24:05,675][1651669] Updated weights for policy 0, policy_version 71040 (0.0049) [2024-06-15 12:24:05,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 48059.7, 300 sec: 47208.9). Total num frames: 145489920. Throughput: 0: 11719.2. Samples: 36425216. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:24:05,767][1648981] Avg episode reward: [(0, '165.300')] [2024-06-15 12:24:08,078][1651669] Updated weights for policy 0, policy_version 71102 (0.0013) [2024-06-15 12:24:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 145620992. Throughput: 0: 11798.8. Samples: 36460544. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:24:10,767][1648981] Avg episode reward: [(0, '165.790')] [2024-06-15 12:24:11,850][1651669] Updated weights for policy 0, policy_version 71156 (0.0016) [2024-06-15 12:24:14,557][1651669] Updated weights for policy 0, policy_version 71200 (0.0012) [2024-06-15 12:24:15,769][1648981] Fps is (10 sec: 39311.9, 60 sec: 45873.4, 300 sec: 47207.7). Total num frames: 145883136. Throughput: 0: 11923.3. Samples: 36543488. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:24:15,769][1648981] Avg episode reward: [(0, '175.980')] [2024-06-15 12:24:16,204][1651274] Saving new best policy, reward=175.980! [2024-06-15 12:24:16,615][1651669] Updated weights for policy 0, policy_version 71280 (0.0023) [2024-06-15 12:24:18,774][1651669] Updated weights for policy 0, policy_version 71328 (0.0013) [2024-06-15 12:24:19,611][1651669] Updated weights for policy 0, policy_version 71360 (0.0019) [2024-06-15 12:24:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48060.0, 300 sec: 47542.6). Total num frames: 146145280. Throughput: 0: 11810.1. Samples: 36603392. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:24:20,767][1648981] Avg episode reward: [(0, '174.970')] [2024-06-15 12:24:22,938][1651669] Updated weights for policy 0, policy_version 71421 (0.0024) [2024-06-15 12:24:25,766][1648981] Fps is (10 sec: 42609.2, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 146309120. Throughput: 0: 11753.3. Samples: 36641280. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 12:24:25,767][1648981] Avg episode reward: [(0, '172.440')] [2024-06-15 12:24:27,058][1651669] Updated weights for policy 0, policy_version 71489 (0.0012) [2024-06-15 12:24:27,397][1651274] Signal inference workers to stop experience collection... (3750 times) [2024-06-15 12:24:27,505][1651669] InferenceWorker_p0-w0: stopping experience collection (3750 times) [2024-06-15 12:24:27,676][1651274] Signal inference workers to resume experience collection... (3750 times) [2024-06-15 12:24:27,677][1651669] InferenceWorker_p0-w0: resuming experience collection (3750 times) [2024-06-15 12:24:29,904][1651669] Updated weights for policy 0, policy_version 71573 (0.0108) [2024-06-15 12:24:30,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 146636800. Throughput: 0: 11776.0. Samples: 36711424. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:24:30,767][1648981] Avg episode reward: [(0, '172.100')] [2024-06-15 12:24:30,803][1651669] Updated weights for policy 0, policy_version 71616 (0.0012) [2024-06-15 12:24:33,598][1651669] Updated weights for policy 0, policy_version 71676 (0.0013) [2024-06-15 12:24:35,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 146800640. Throughput: 0: 11832.9. Samples: 36791296. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:24:35,767][1648981] Avg episode reward: [(0, '175.020')] [2024-06-15 12:24:37,075][1651669] Updated weights for policy 0, policy_version 71714 (0.0013) [2024-06-15 12:24:38,572][1651669] Updated weights for policy 0, policy_version 71780 (0.0033) [2024-06-15 12:24:40,235][1651669] Updated weights for policy 0, policy_version 71812 (0.0013) [2024-06-15 12:24:40,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 47323.5). Total num frames: 147128320. Throughput: 0: 11730.6. Samples: 36820992. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:24:40,767][1648981] Avg episode reward: [(0, '174.610')] [2024-06-15 12:24:41,225][1651669] Updated weights for policy 0, policy_version 71863 (0.0014) [2024-06-15 12:24:43,495][1651669] Updated weights for policy 0, policy_version 71893 (0.0013) [2024-06-15 12:24:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 147324928. Throughput: 0: 11855.6. Samples: 36896768. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:24:45,767][1648981] Avg episode reward: [(0, '175.930')] [2024-06-15 12:24:47,188][1651669] Updated weights for policy 0, policy_version 71952 (0.0014) [2024-06-15 12:24:48,766][1651669] Updated weights for policy 0, policy_version 72008 (0.0012) [2024-06-15 12:24:49,809][1651669] Updated weights for policy 0, policy_version 72063 (0.0014) [2024-06-15 12:24:50,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47211.2). Total num frames: 147587072. Throughput: 0: 12128.7. Samples: 36971008. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:24:50,767][1648981] Avg episode reward: [(0, '167.720')] [2024-06-15 12:24:52,065][1651669] Updated weights for policy 0, policy_version 72128 (0.0015) [2024-06-15 12:24:55,500][1651669] Updated weights for policy 0, policy_version 72184 (0.0013) [2024-06-15 12:24:55,767][1648981] Fps is (10 sec: 52427.4, 60 sec: 47513.7, 300 sec: 47541.3). Total num frames: 147849216. Throughput: 0: 12037.6. Samples: 37002240. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:24:55,767][1648981] Avg episode reward: [(0, '166.750')] [2024-06-15 12:24:55,785][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000072192_147849216.pth... [2024-06-15 12:24:55,887][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000066624_136445952.pth [2024-06-15 12:24:59,479][1651669] Updated weights for policy 0, policy_version 72241 (0.0012) [2024-06-15 12:25:00,471][1651669] Updated weights for policy 0, policy_version 72288 (0.0019) [2024-06-15 12:25:00,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 148045824. Throughput: 0: 11788.0. Samples: 37073920. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:25:00,767][1648981] Avg episode reward: [(0, '168.570')] [2024-06-15 12:25:01,706][1651669] Updated weights for policy 0, policy_version 72322 (0.0013) [2024-06-15 12:25:02,952][1651669] Updated weights for policy 0, policy_version 72381 (0.0013) [2024-06-15 12:25:05,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 148307968. Throughput: 0: 12094.6. Samples: 37147648. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:25:05,767][1648981] Avg episode reward: [(0, '171.600')] [2024-06-15 12:25:06,078][1651669] Updated weights for policy 0, policy_version 72432 (0.0014) [2024-06-15 12:25:09,578][1651669] Updated weights for policy 0, policy_version 72464 (0.0013) [2024-06-15 12:25:10,337][1651274] Signal inference workers to stop experience collection... (3800 times) [2024-06-15 12:25:10,374][1651669] InferenceWorker_p0-w0: stopping experience collection (3800 times) [2024-06-15 12:25:10,546][1651274] Signal inference workers to resume experience collection... (3800 times) [2024-06-15 12:25:10,547][1651669] InferenceWorker_p0-w0: resuming experience collection (3800 times) [2024-06-15 12:25:10,799][1648981] Fps is (10 sec: 45726.4, 60 sec: 48033.5, 300 sec: 47314.0). Total num frames: 148504576. Throughput: 0: 12245.0. Samples: 37192704. Policy #0 lag: (min: 63.0, avg: 140.3, max: 283.0) [2024-06-15 12:25:10,800][1648981] Avg episode reward: [(0, '169.290')] [2024-06-15 12:25:10,997][1651669] Updated weights for policy 0, policy_version 72535 (0.0015) [2024-06-15 12:25:13,125][1651669] Updated weights for policy 0, policy_version 72611 (0.0026) [2024-06-15 12:25:13,691][1651669] Updated weights for policy 0, policy_version 72638 (0.0025) [2024-06-15 12:25:15,776][1648981] Fps is (10 sec: 45831.1, 60 sec: 48054.0, 300 sec: 47540.1). Total num frames: 148766720. Throughput: 0: 12103.4. Samples: 37256192. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:15,777][1648981] Avg episode reward: [(0, '171.240')] [2024-06-15 12:25:17,033][1651669] Updated weights for policy 0, policy_version 72688 (0.0095) [2024-06-15 12:25:19,352][1651669] Updated weights for policy 0, policy_version 72720 (0.0013) [2024-06-15 12:25:20,766][1648981] Fps is (10 sec: 52601.0, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 149028864. Throughput: 0: 12128.7. Samples: 37337088. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:20,767][1648981] Avg episode reward: [(0, '171.270')] [2024-06-15 12:25:21,633][1651669] Updated weights for policy 0, policy_version 72804 (0.0159) [2024-06-15 12:25:24,345][1651669] Updated weights for policy 0, policy_version 72890 (0.0061) [2024-06-15 12:25:25,767][1648981] Fps is (10 sec: 52477.7, 60 sec: 49697.9, 300 sec: 47541.4). Total num frames: 149291008. Throughput: 0: 12083.1. Samples: 37364736. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:25,768][1648981] Avg episode reward: [(0, '173.570')] [2024-06-15 12:25:27,943][1651669] Updated weights for policy 0, policy_version 72945 (0.0012) [2024-06-15 12:25:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 149454848. Throughput: 0: 12106.0. Samples: 37441536. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:30,767][1648981] Avg episode reward: [(0, '170.900')] [2024-06-15 12:25:30,812][1651669] Updated weights for policy 0, policy_version 72992 (0.0012) [2024-06-15 12:25:32,951][1651669] Updated weights for policy 0, policy_version 73072 (0.0015) [2024-06-15 12:25:35,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 50244.3, 300 sec: 47763.5). Total num frames: 149815296. Throughput: 0: 11844.3. Samples: 37504000. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:35,767][1648981] Avg episode reward: [(0, '171.680')] [2024-06-15 12:25:37,796][1651669] Updated weights for policy 0, policy_version 73153 (0.0015) [2024-06-15 12:25:39,081][1651669] Updated weights for policy 0, policy_version 73210 (0.0015) [2024-06-15 12:25:40,766][1648981] Fps is (10 sec: 49151.3, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 149946368. Throughput: 0: 12071.9. Samples: 37545472. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:40,767][1648981] Avg episode reward: [(0, '169.120')] [2024-06-15 12:25:42,837][1651669] Updated weights for policy 0, policy_version 73264 (0.0014) [2024-06-15 12:25:44,695][1651669] Updated weights for policy 0, policy_version 73334 (0.0015) [2024-06-15 12:25:45,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.1, 300 sec: 47763.5). Total num frames: 150274048. Throughput: 0: 11992.3. Samples: 37613568. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:45,767][1648981] Avg episode reward: [(0, '171.970')] [2024-06-15 12:25:45,970][1651669] Updated weights for policy 0, policy_version 73401 (0.0026) [2024-06-15 12:25:50,750][1651669] Updated weights for policy 0, policy_version 73456 (0.0025) [2024-06-15 12:25:50,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 47430.4). Total num frames: 150405120. Throughput: 0: 11946.7. Samples: 37685248. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:50,767][1648981] Avg episode reward: [(0, '162.910')] [2024-06-15 12:25:52,783][1651274] Signal inference workers to stop experience collection... (3850 times) [2024-06-15 12:25:52,832][1651669] InferenceWorker_p0-w0: stopping experience collection (3850 times) [2024-06-15 12:25:53,027][1651274] Signal inference workers to resume experience collection... (3850 times) [2024-06-15 12:25:53,033][1651669] InferenceWorker_p0-w0: resuming experience collection (3850 times) [2024-06-15 12:25:53,035][1651669] Updated weights for policy 0, policy_version 73488 (0.0012) [2024-06-15 12:25:54,823][1651669] Updated weights for policy 0, policy_version 73554 (0.0012) [2024-06-15 12:25:55,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 47513.7, 300 sec: 47763.5). Total num frames: 150700032. Throughput: 0: 11830.1. Samples: 37724672. Policy #0 lag: (min: 127.0, avg: 200.2, max: 383.0) [2024-06-15 12:25:55,767][1648981] Avg episode reward: [(0, '163.800')] [2024-06-15 12:25:56,548][1651669] Updated weights for policy 0, policy_version 73601 (0.0013) [2024-06-15 12:26:00,619][1651669] Updated weights for policy 0, policy_version 73668 (0.0015) [2024-06-15 12:26:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.6, 300 sec: 47097.1). Total num frames: 150863872. Throughput: 0: 11778.5. Samples: 37786112. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:00,767][1648981] Avg episode reward: [(0, '167.620')] [2024-06-15 12:26:01,885][1651669] Updated weights for policy 0, policy_version 73722 (0.0011) [2024-06-15 12:26:05,583][1651669] Updated weights for policy 0, policy_version 73792 (0.0014) [2024-06-15 12:26:05,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 151126016. Throughput: 0: 11639.5. Samples: 37860864. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:05,767][1648981] Avg episode reward: [(0, '163.710')] [2024-06-15 12:26:08,447][1651669] Updated weights for policy 0, policy_version 73857 (0.0015) [2024-06-15 12:26:09,646][1651669] Updated weights for policy 0, policy_version 73911 (0.0012) [2024-06-15 12:26:10,777][1648981] Fps is (10 sec: 52373.3, 60 sec: 48077.5, 300 sec: 47540.0). Total num frames: 151388160. Throughput: 0: 11636.8. Samples: 37888512. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:10,778][1648981] Avg episode reward: [(0, '164.640')] [2024-06-15 12:26:13,396][1651669] Updated weights for policy 0, policy_version 73973 (0.0020) [2024-06-15 12:26:15,716][1651669] Updated weights for policy 0, policy_version 74000 (0.0024) [2024-06-15 12:26:15,770][1648981] Fps is (10 sec: 42581.8, 60 sec: 46425.8, 300 sec: 47651.8). Total num frames: 151552000. Throughput: 0: 11661.2. Samples: 37966336. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:15,771][1648981] Avg episode reward: [(0, '165.860')] [2024-06-15 12:26:17,197][1651669] Updated weights for policy 0, policy_version 74052 (0.0014) [2024-06-15 12:26:19,725][1651669] Updated weights for policy 0, policy_version 74120 (0.0013) [2024-06-15 12:26:20,766][1648981] Fps is (10 sec: 49204.2, 60 sec: 47513.6, 300 sec: 47544.0). Total num frames: 151879680. Throughput: 0: 11639.5. Samples: 38027776. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:20,767][1648981] Avg episode reward: [(0, '162.810')] [2024-06-15 12:26:20,893][1651669] Updated weights for policy 0, policy_version 74176 (0.0021) [2024-06-15 12:26:24,407][1651669] Updated weights for policy 0, policy_version 74224 (0.0013) [2024-06-15 12:26:25,766][1648981] Fps is (10 sec: 49170.8, 60 sec: 45875.4, 300 sec: 47542.7). Total num frames: 152043520. Throughput: 0: 11594.0. Samples: 38067200. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:25,767][1648981] Avg episode reward: [(0, '167.970')] [2024-06-15 12:26:27,263][1651669] Updated weights for policy 0, policy_version 74288 (0.0012) [2024-06-15 12:26:29,139][1651669] Updated weights for policy 0, policy_version 74352 (0.0010) [2024-06-15 12:26:30,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 152338432. Throughput: 0: 11639.4. Samples: 38137344. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:30,767][1648981] Avg episode reward: [(0, '173.950')] [2024-06-15 12:26:30,828][1651669] Updated weights for policy 0, policy_version 74387 (0.0014) [2024-06-15 12:26:34,644][1651669] Updated weights for policy 0, policy_version 74448 (0.0013) [2024-06-15 12:26:34,724][1651274] Signal inference workers to stop experience collection... (3900 times) [2024-06-15 12:26:34,762][1651669] InferenceWorker_p0-w0: stopping experience collection (3900 times) [2024-06-15 12:26:34,962][1651274] Signal inference workers to resume experience collection... (3900 times) [2024-06-15 12:26:34,963][1651669] InferenceWorker_p0-w0: resuming experience collection (3900 times) [2024-06-15 12:26:35,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 45875.0, 300 sec: 47542.9). Total num frames: 152567808. Throughput: 0: 11730.4. Samples: 38213120. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:35,767][1648981] Avg episode reward: [(0, '170.200')] [2024-06-15 12:26:38,009][1651669] Updated weights for policy 0, policy_version 74512 (0.0048) [2024-06-15 12:26:40,306][1651669] Updated weights for policy 0, policy_version 74593 (0.0017) [2024-06-15 12:26:40,772][1648981] Fps is (10 sec: 45850.5, 60 sec: 47509.4, 300 sec: 47762.6). Total num frames: 152797184. Throughput: 0: 11831.5. Samples: 38257152. Policy #0 lag: (min: 15.0, avg: 165.4, max: 274.0) [2024-06-15 12:26:40,772][1648981] Avg episode reward: [(0, '171.860')] [2024-06-15 12:26:41,992][1651669] Updated weights for policy 0, policy_version 74657 (0.0014) [2024-06-15 12:26:45,688][1651669] Updated weights for policy 0, policy_version 74708 (0.0013) [2024-06-15 12:26:45,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 45329.0, 300 sec: 47319.2). Total num frames: 152993792. Throughput: 0: 11753.3. Samples: 38315008. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:26:45,767][1648981] Avg episode reward: [(0, '165.870')] [2024-06-15 12:26:48,550][1651669] Updated weights for policy 0, policy_version 74756 (0.0013) [2024-06-15 12:26:49,903][1651669] Updated weights for policy 0, policy_version 74816 (0.0019) [2024-06-15 12:26:50,766][1648981] Fps is (10 sec: 45899.9, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 153255936. Throughput: 0: 11798.7. Samples: 38391808. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:26:50,767][1648981] Avg episode reward: [(0, '171.610')] [2024-06-15 12:26:51,449][1651669] Updated weights for policy 0, policy_version 74877 (0.0025) [2024-06-15 12:26:52,932][1651669] Updated weights for policy 0, policy_version 74929 (0.0012) [2024-06-15 12:26:55,767][1648981] Fps is (10 sec: 49150.2, 60 sec: 46421.2, 300 sec: 47541.3). Total num frames: 153485312. Throughput: 0: 11847.0. Samples: 38421504. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:26:55,767][1648981] Avg episode reward: [(0, '170.410')] [2024-06-15 12:26:55,782][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000074944_153485312.pth... [2024-06-15 12:26:55,850][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000069376_142082048.pth [2024-06-15 12:26:57,290][1651669] Updated weights for policy 0, policy_version 74976 (0.0013) [2024-06-15 12:26:59,656][1651669] Updated weights for policy 0, policy_version 75010 (0.0023) [2024-06-15 12:27:00,719][1651669] Updated weights for policy 0, policy_version 75058 (0.0013) [2024-06-15 12:27:00,767][1648981] Fps is (10 sec: 45872.9, 60 sec: 47513.2, 300 sec: 47430.2). Total num frames: 153714688. Throughput: 0: 11947.6. Samples: 38503936. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:27:00,768][1648981] Avg episode reward: [(0, '168.000')] [2024-06-15 12:27:02,506][1651669] Updated weights for policy 0, policy_version 75136 (0.0082) [2024-06-15 12:27:03,864][1651669] Updated weights for policy 0, policy_version 75196 (0.0013) [2024-06-15 12:27:05,774][1648981] Fps is (10 sec: 52388.7, 60 sec: 48053.3, 300 sec: 47540.1). Total num frames: 154009600. Throughput: 0: 11967.3. Samples: 38566400. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:27:05,775][1648981] Avg episode reward: [(0, '172.000')] [2024-06-15 12:27:09,233][1651669] Updated weights for policy 0, policy_version 75259 (0.0014) [2024-06-15 12:27:10,766][1648981] Fps is (10 sec: 42600.3, 60 sec: 45883.3, 300 sec: 47430.3). Total num frames: 154140672. Throughput: 0: 11935.3. Samples: 38604288. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:27:10,767][1648981] Avg episode reward: [(0, '173.190')] [2024-06-15 12:27:11,673][1651669] Updated weights for policy 0, policy_version 75314 (0.0013) [2024-06-15 12:27:13,713][1651669] Updated weights for policy 0, policy_version 75376 (0.0013) [2024-06-15 12:27:14,668][1651274] Signal inference workers to stop experience collection... (3950 times) [2024-06-15 12:27:14,764][1651669] InferenceWorker_p0-w0: stopping experience collection (3950 times) [2024-06-15 12:27:14,862][1651274] Signal inference workers to resume experience collection... (3950 times) [2024-06-15 12:27:14,863][1651669] InferenceWorker_p0-w0: resuming experience collection (3950 times) [2024-06-15 12:27:15,078][1651669] Updated weights for policy 0, policy_version 75429 (0.0013) [2024-06-15 12:27:15,782][1648981] Fps is (10 sec: 52387.8, 60 sec: 49688.2, 300 sec: 47761.0). Total num frames: 154533888. Throughput: 0: 11771.9. Samples: 38667264. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:27:15,783][1648981] Avg episode reward: [(0, '173.090')] [2024-06-15 12:27:19,931][1651669] Updated weights for policy 0, policy_version 75488 (0.0017) [2024-06-15 12:27:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 46421.4, 300 sec: 47545.9). Total num frames: 154664960. Throughput: 0: 11719.2. Samples: 38740480. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:27:20,767][1648981] Avg episode reward: [(0, '177.220')] [2024-06-15 12:27:20,767][1651274] Saving new best policy, reward=177.220! [2024-06-15 12:27:23,558][1651669] Updated weights for policy 0, policy_version 75556 (0.0013) [2024-06-15 12:27:24,822][1651669] Updated weights for policy 0, policy_version 75607 (0.0011) [2024-06-15 12:27:25,767][1648981] Fps is (10 sec: 39383.0, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 154927104. Throughput: 0: 11515.6. Samples: 38775296. Policy #0 lag: (min: 56.0, avg: 209.2, max: 312.0) [2024-06-15 12:27:25,767][1648981] Avg episode reward: [(0, '177.250')] [2024-06-15 12:27:26,199][1651274] Saving new best policy, reward=177.250! [2024-06-15 12:27:26,531][1651669] Updated weights for policy 0, policy_version 75680 (0.0011) [2024-06-15 12:27:30,776][1648981] Fps is (10 sec: 42555.7, 60 sec: 45867.6, 300 sec: 47207.8). Total num frames: 155090944. Throughput: 0: 11807.5. Samples: 38846464. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:27:30,777][1648981] Avg episode reward: [(0, '179.300')] [2024-06-15 12:27:30,915][1651669] Updated weights for policy 0, policy_version 75729 (0.0026) [2024-06-15 12:27:31,076][1651274] Saving new best policy, reward=179.300! [2024-06-15 12:27:33,740][1651669] Updated weights for policy 0, policy_version 75785 (0.0011) [2024-06-15 12:27:35,521][1651669] Updated weights for policy 0, policy_version 75856 (0.0171) [2024-06-15 12:27:35,773][1648981] Fps is (10 sec: 45845.5, 60 sec: 46962.4, 300 sec: 47429.2). Total num frames: 155385856. Throughput: 0: 11671.9. Samples: 38917120. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:27:35,774][1648981] Avg episode reward: [(0, '172.250')] [2024-06-15 12:27:37,041][1651669] Updated weights for policy 0, policy_version 75920 (0.0013) [2024-06-15 12:27:40,766][1648981] Fps is (10 sec: 49200.6, 60 sec: 46425.5, 300 sec: 47319.2). Total num frames: 155582464. Throughput: 0: 11537.1. Samples: 38940672. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:27:40,767][1648981] Avg episode reward: [(0, '171.850')] [2024-06-15 12:27:41,704][1651669] Updated weights for policy 0, policy_version 75969 (0.0012) [2024-06-15 12:27:42,891][1651669] Updated weights for policy 0, policy_version 76022 (0.0014) [2024-06-15 12:27:45,635][1651669] Updated weights for policy 0, policy_version 76064 (0.0013) [2024-06-15 12:27:45,766][1648981] Fps is (10 sec: 39347.5, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 155779072. Throughput: 0: 11503.0. Samples: 39021568. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:27:45,767][1648981] Avg episode reward: [(0, '172.030')] [2024-06-15 12:27:47,018][1651669] Updated weights for policy 0, policy_version 76114 (0.0017) [2024-06-15 12:27:48,268][1651669] Updated weights for policy 0, policy_version 76181 (0.0014) [2024-06-15 12:27:48,947][1651669] Updated weights for policy 0, policy_version 76219 (0.0011) [2024-06-15 12:27:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 156106752. Throughput: 0: 11766.7. Samples: 39095808. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:27:50,767][1648981] Avg episode reward: [(0, '167.500')] [2024-06-15 12:27:53,188][1651669] Updated weights for policy 0, policy_version 76260 (0.0014) [2024-06-15 12:27:55,770][1648981] Fps is (10 sec: 45858.0, 60 sec: 45872.5, 300 sec: 47212.7). Total num frames: 156237824. Throughput: 0: 11661.2. Samples: 39129088. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:27:55,771][1648981] Avg episode reward: [(0, '167.250')] [2024-06-15 12:27:56,256][1651669] Updated weights for policy 0, policy_version 76290 (0.0013) [2024-06-15 12:27:57,902][1651274] Signal inference workers to stop experience collection... (4000 times) [2024-06-15 12:27:57,986][1651669] InferenceWorker_p0-w0: stopping experience collection (4000 times) [2024-06-15 12:27:57,995][1651669] Updated weights for policy 0, policy_version 76373 (0.0013) [2024-06-15 12:27:58,135][1651274] Signal inference workers to resume experience collection... (4000 times) [2024-06-15 12:27:58,136][1651669] InferenceWorker_p0-w0: resuming experience collection (4000 times) [2024-06-15 12:27:59,565][1651669] Updated weights for policy 0, policy_version 76448 (0.0014) [2024-06-15 12:28:00,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 48606.1, 300 sec: 47541.3). Total num frames: 156631040. Throughput: 0: 11928.1. Samples: 39203840. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:28:00,768][1648981] Avg episode reward: [(0, '168.330')] [2024-06-15 12:28:03,843][1651669] Updated weights for policy 0, policy_version 76513 (0.0101) [2024-06-15 12:28:05,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 45881.2, 300 sec: 47430.3). Total num frames: 156762112. Throughput: 0: 11958.0. Samples: 39278592. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:28:05,767][1648981] Avg episode reward: [(0, '168.350')] [2024-06-15 12:28:07,338][1651669] Updated weights for policy 0, policy_version 76560 (0.0013) [2024-06-15 12:28:09,056][1651669] Updated weights for policy 0, policy_version 76629 (0.0077) [2024-06-15 12:28:10,273][1651669] Updated weights for policy 0, policy_version 76692 (0.0013) [2024-06-15 12:28:10,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 157122560. Throughput: 0: 11923.9. Samples: 39311872. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:28:10,767][1648981] Avg episode reward: [(0, '168.750')] [2024-06-15 12:28:10,921][1651669] Updated weights for policy 0, policy_version 76734 (0.0017) [2024-06-15 12:28:15,714][1651669] Updated weights for policy 0, policy_version 76787 (0.0061) [2024-06-15 12:28:15,782][1648981] Fps is (10 sec: 49074.7, 60 sec: 45329.1, 300 sec: 47427.8). Total num frames: 157253632. Throughput: 0: 12058.9. Samples: 39389184. Policy #0 lag: (min: 15.0, avg: 113.2, max: 271.0) [2024-06-15 12:28:15,783][1648981] Avg episode reward: [(0, '169.610')] [2024-06-15 12:28:18,732][1651669] Updated weights for policy 0, policy_version 76832 (0.0011) [2024-06-15 12:28:20,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 157515776. Throughput: 0: 11720.9. Samples: 39444480. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:20,767][1648981] Avg episode reward: [(0, '175.090')] [2024-06-15 12:28:21,194][1651669] Updated weights for policy 0, policy_version 76944 (0.0015) [2024-06-15 12:28:25,766][1648981] Fps is (10 sec: 42665.9, 60 sec: 45875.4, 300 sec: 47097.1). Total num frames: 157679616. Throughput: 0: 11901.2. Samples: 39476224. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:25,767][1648981] Avg episode reward: [(0, '172.510')] [2024-06-15 12:28:26,419][1651669] Updated weights for policy 0, policy_version 76999 (0.0023) [2024-06-15 12:28:29,718][1651669] Updated weights for policy 0, policy_version 77064 (0.0011) [2024-06-15 12:28:30,767][1648981] Fps is (10 sec: 39320.2, 60 sec: 46975.0, 300 sec: 46985.9). Total num frames: 157908992. Throughput: 0: 11912.5. Samples: 39557632. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:30,767][1648981] Avg episode reward: [(0, '182.190')] [2024-06-15 12:28:31,193][1651274] Saving new best policy, reward=182.190! [2024-06-15 12:28:31,562][1651669] Updated weights for policy 0, policy_version 77152 (0.0150) [2024-06-15 12:28:32,998][1651669] Updated weights for policy 0, policy_version 77220 (0.0014) [2024-06-15 12:28:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46972.7, 300 sec: 47319.2). Total num frames: 158203904. Throughput: 0: 11685.0. Samples: 39621632. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:35,767][1648981] Avg episode reward: [(0, '184.600')] [2024-06-15 12:28:35,768][1651274] Saving new best policy, reward=184.600! [2024-06-15 12:28:37,773][1651669] Updated weights for policy 0, policy_version 77280 (0.0013) [2024-06-15 12:28:37,955][1651274] Signal inference workers to stop experience collection... (4050 times) [2024-06-15 12:28:37,992][1651669] InferenceWorker_p0-w0: stopping experience collection (4050 times) [2024-06-15 12:28:38,219][1651274] Signal inference workers to resume experience collection... (4050 times) [2024-06-15 12:28:38,220][1651669] InferenceWorker_p0-w0: resuming experience collection (4050 times) [2024-06-15 12:28:38,462][1651669] Updated weights for policy 0, policy_version 77308 (0.0012) [2024-06-15 12:28:40,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 158334976. Throughput: 0: 11788.4. Samples: 39659520. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:40,767][1648981] Avg episode reward: [(0, '189.300')] [2024-06-15 12:28:41,248][1651274] Saving new best policy, reward=189.300! [2024-06-15 12:28:41,836][1651669] Updated weights for policy 0, policy_version 77362 (0.0074) [2024-06-15 12:28:43,363][1651669] Updated weights for policy 0, policy_version 77440 (0.0014) [2024-06-15 12:28:44,619][1651669] Updated weights for policy 0, policy_version 77501 (0.0125) [2024-06-15 12:28:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 158728192. Throughput: 0: 11537.1. Samples: 39723008. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:45,767][1648981] Avg episode reward: [(0, '191.700')] [2024-06-15 12:28:45,768][1651274] Saving new best policy, reward=191.700! [2024-06-15 12:28:49,633][1651669] Updated weights for policy 0, policy_version 77552 (0.0013) [2024-06-15 12:28:50,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 158859264. Throughput: 0: 11616.7. Samples: 39801344. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:50,767][1648981] Avg episode reward: [(0, '190.980')] [2024-06-15 12:28:52,505][1651669] Updated weights for policy 0, policy_version 77587 (0.0018) [2024-06-15 12:28:54,097][1651669] Updated weights for policy 0, policy_version 77651 (0.0012) [2024-06-15 12:28:55,767][1648981] Fps is (10 sec: 42596.9, 60 sec: 48608.7, 300 sec: 47430.2). Total num frames: 159154176. Throughput: 0: 11559.7. Samples: 39832064. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:28:55,767][1648981] Avg episode reward: [(0, '193.030')] [2024-06-15 12:28:55,775][1651669] Updated weights for policy 0, policy_version 77714 (0.0014) [2024-06-15 12:28:56,226][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000077744_159219712.pth... [2024-06-15 12:28:56,309][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000072192_147849216.pth [2024-06-15 12:28:56,313][1651274] Saving new best policy, reward=193.030! [2024-06-15 12:28:56,522][1651669] Updated weights for policy 0, policy_version 77752 (0.0011) [2024-06-15 12:28:59,861][1651669] Updated weights for policy 0, policy_version 77776 (0.0011) [2024-06-15 12:29:00,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 45329.2, 300 sec: 46986.0). Total num frames: 159350784. Throughput: 0: 11518.4. Samples: 39907328. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:29:00,767][1648981] Avg episode reward: [(0, '190.300')] [2024-06-15 12:29:03,331][1651669] Updated weights for policy 0, policy_version 77840 (0.0014) [2024-06-15 12:29:05,311][1651669] Updated weights for policy 0, policy_version 77920 (0.0113) [2024-06-15 12:29:05,766][1648981] Fps is (10 sec: 42600.2, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 159580160. Throughput: 0: 11707.7. Samples: 39971328. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 12:29:05,767][1648981] Avg episode reward: [(0, '186.640')] [2024-06-15 12:29:06,764][1651669] Updated weights for policy 0, policy_version 77968 (0.0062) [2024-06-15 12:29:10,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 44236.8, 300 sec: 47097.4). Total num frames: 159776768. Throughput: 0: 11571.2. Samples: 39996928. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:10,767][1648981] Avg episode reward: [(0, '184.600')] [2024-06-15 12:29:12,147][1651669] Updated weights for policy 0, policy_version 78049 (0.0014) [2024-06-15 12:29:14,831][1651669] Updated weights for policy 0, policy_version 78096 (0.0029) [2024-06-15 12:29:15,770][1648981] Fps is (10 sec: 42581.9, 60 sec: 45884.3, 300 sec: 46985.4). Total num frames: 160006144. Throughput: 0: 11536.2. Samples: 40076800. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:15,771][1648981] Avg episode reward: [(0, '184.460')] [2024-06-15 12:29:16,046][1651669] Updated weights for policy 0, policy_version 78150 (0.0018) [2024-06-15 12:29:17,340][1651669] Updated weights for policy 0, policy_version 78204 (0.0013) [2024-06-15 12:29:17,341][1651274] Signal inference workers to stop experience collection... (4100 times) [2024-06-15 12:29:17,397][1651669] InferenceWorker_p0-w0: stopping experience collection (4100 times) [2024-06-15 12:29:17,407][1651274] Signal inference workers to resume experience collection... (4100 times) [2024-06-15 12:29:17,425][1651669] InferenceWorker_p0-w0: resuming experience collection (4100 times) [2024-06-15 12:29:18,562][1651669] Updated weights for policy 0, policy_version 78256 (0.0011) [2024-06-15 12:29:20,771][1648981] Fps is (10 sec: 52403.7, 60 sec: 46417.6, 300 sec: 47429.5). Total num frames: 160301056. Throughput: 0: 11661.0. Samples: 40146432. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:20,772][1648981] Avg episode reward: [(0, '186.090')] [2024-06-15 12:29:22,613][1651669] Updated weights for policy 0, policy_version 78277 (0.0011) [2024-06-15 12:29:23,572][1651669] Updated weights for policy 0, policy_version 78330 (0.0013) [2024-06-15 12:29:25,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 160464896. Throughput: 0: 11662.2. Samples: 40184320. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:25,767][1648981] Avg episode reward: [(0, '183.140')] [2024-06-15 12:29:26,134][1651669] Updated weights for policy 0, policy_version 78370 (0.0013) [2024-06-15 12:29:27,624][1651669] Updated weights for policy 0, policy_version 78433 (0.0014) [2024-06-15 12:29:29,346][1651669] Updated weights for policy 0, policy_version 78497 (0.0012) [2024-06-15 12:29:30,766][1648981] Fps is (10 sec: 52454.3, 60 sec: 48606.2, 300 sec: 47541.4). Total num frames: 160825344. Throughput: 0: 11673.6. Samples: 40248320. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:30,767][1648981] Avg episode reward: [(0, '184.090')] [2024-06-15 12:29:33,764][1651669] Updated weights for policy 0, policy_version 78536 (0.0011) [2024-06-15 12:29:34,737][1651669] Updated weights for policy 0, policy_version 78592 (0.0085) [2024-06-15 12:29:35,797][1648981] Fps is (10 sec: 49003.6, 60 sec: 45852.0, 300 sec: 46870.1). Total num frames: 160956416. Throughput: 0: 11756.7. Samples: 40330752. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:35,797][1648981] Avg episode reward: [(0, '184.570')] [2024-06-15 12:29:37,873][1651669] Updated weights for policy 0, policy_version 78640 (0.0014) [2024-06-15 12:29:39,292][1651669] Updated weights for policy 0, policy_version 78690 (0.0011) [2024-06-15 12:29:40,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 161251328. Throughput: 0: 11844.4. Samples: 40365056. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:40,767][1648981] Avg episode reward: [(0, '187.580')] [2024-06-15 12:29:40,811][1651669] Updated weights for policy 0, policy_version 78752 (0.0034) [2024-06-15 12:29:44,926][1651669] Updated weights for policy 0, policy_version 78785 (0.0011) [2024-06-15 12:29:45,766][1648981] Fps is (10 sec: 49301.4, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 161447936. Throughput: 0: 11685.0. Samples: 40433152. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:45,767][1648981] Avg episode reward: [(0, '185.020')] [2024-06-15 12:29:48,264][1651669] Updated weights for policy 0, policy_version 78867 (0.0018) [2024-06-15 12:29:49,837][1651669] Updated weights for policy 0, policy_version 78929 (0.0011) [2024-06-15 12:29:50,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.7, 300 sec: 46986.0). Total num frames: 161710080. Throughput: 0: 11696.3. Samples: 40497664. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:50,767][1648981] Avg episode reward: [(0, '184.950')] [2024-06-15 12:29:51,968][1651669] Updated weights for policy 0, policy_version 79009 (0.0013) [2024-06-15 12:29:55,776][1648981] Fps is (10 sec: 42559.1, 60 sec: 45322.3, 300 sec: 46873.5). Total num frames: 161873920. Throughput: 0: 11944.2. Samples: 40534528. Policy #0 lag: (min: 89.0, avg: 190.9, max: 361.0) [2024-06-15 12:29:55,776][1648981] Avg episode reward: [(0, '178.570')] [2024-06-15 12:29:56,531][1651669] Updated weights for policy 0, policy_version 79072 (0.0014) [2024-06-15 12:29:58,666][1651274] Signal inference workers to stop experience collection... (4150 times) [2024-06-15 12:29:58,693][1651669] InferenceWorker_p0-w0: stopping experience collection (4150 times) [2024-06-15 12:29:58,863][1651274] Signal inference workers to resume experience collection... (4150 times) [2024-06-15 12:29:58,864][1651669] InferenceWorker_p0-w0: resuming experience collection (4150 times) [2024-06-15 12:29:58,866][1651669] Updated weights for policy 0, policy_version 79136 (0.0014) [2024-06-15 12:30:00,610][1651669] Updated weights for policy 0, policy_version 79216 (0.0076) [2024-06-15 12:30:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 162234368. Throughput: 0: 11913.6. Samples: 40612864. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:00,767][1648981] Avg episode reward: [(0, '174.800')] [2024-06-15 12:30:01,993][1651669] Updated weights for policy 0, policy_version 79268 (0.0012) [2024-06-15 12:30:05,605][1651669] Updated weights for policy 0, policy_version 79298 (0.0012) [2024-06-15 12:30:05,770][1648981] Fps is (10 sec: 52460.3, 60 sec: 46964.9, 300 sec: 47101.8). Total num frames: 162398208. Throughput: 0: 12208.8. Samples: 40695808. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:05,771][1648981] Avg episode reward: [(0, '179.880')] [2024-06-15 12:30:06,955][1651669] Updated weights for policy 0, policy_version 79359 (0.0011) [2024-06-15 12:30:09,652][1651669] Updated weights for policy 0, policy_version 79423 (0.0135) [2024-06-15 12:30:10,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 49151.8, 300 sec: 47320.7). Total num frames: 162725888. Throughput: 0: 12185.5. Samples: 40732672. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:10,767][1648981] Avg episode reward: [(0, '181.090')] [2024-06-15 12:30:11,550][1651669] Updated weights for policy 0, policy_version 79491 (0.0015) [2024-06-15 12:30:12,740][1651669] Updated weights for policy 0, policy_version 79544 (0.0013) [2024-06-15 12:30:15,766][1648981] Fps is (10 sec: 55723.7, 60 sec: 49155.1, 300 sec: 47208.1). Total num frames: 162955264. Throughput: 0: 12310.7. Samples: 40802304. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:15,767][1648981] Avg episode reward: [(0, '176.210')] [2024-06-15 12:30:16,482][1651669] Updated weights for policy 0, policy_version 79600 (0.0012) [2024-06-15 12:30:19,146][1651669] Updated weights for policy 0, policy_version 79618 (0.0010) [2024-06-15 12:30:20,735][1651669] Updated weights for policy 0, policy_version 79696 (0.0013) [2024-06-15 12:30:20,766][1648981] Fps is (10 sec: 49153.4, 60 sec: 48609.8, 300 sec: 47208.2). Total num frames: 163217408. Throughput: 0: 12182.4. Samples: 40878592. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:20,767][1648981] Avg episode reward: [(0, '177.460')] [2024-06-15 12:30:21,776][1651669] Updated weights for policy 0, policy_version 79748 (0.0014) [2024-06-15 12:30:22,939][1651669] Updated weights for policy 0, policy_version 79803 (0.0013) [2024-06-15 12:30:25,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 47430.3). Total num frames: 163446784. Throughput: 0: 12117.3. Samples: 40910336. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:25,767][1648981] Avg episode reward: [(0, '175.150')] [2024-06-15 12:30:27,096][1651669] Updated weights for policy 0, policy_version 79861 (0.0024) [2024-06-15 12:30:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 163643392. Throughput: 0: 12504.2. Samples: 40995840. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:30,767][1648981] Avg episode reward: [(0, '180.460')] [2024-06-15 12:30:31,678][1651669] Updated weights for policy 0, policy_version 79952 (0.0014) [2024-06-15 12:30:32,947][1651669] Updated weights for policy 0, policy_version 80016 (0.0014) [2024-06-15 12:30:33,330][1651274] Signal inference workers to stop experience collection... (4200 times) [2024-06-15 12:30:33,396][1651669] InferenceWorker_p0-w0: stopping experience collection (4200 times) [2024-06-15 12:30:33,616][1651274] Signal inference workers to resume experience collection... (4200 times) [2024-06-15 12:30:33,616][1651669] InferenceWorker_p0-w0: resuming experience collection (4200 times) [2024-06-15 12:30:35,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50269.7, 300 sec: 47541.4). Total num frames: 163971072. Throughput: 0: 12481.4. Samples: 41059328. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:35,767][1648981] Avg episode reward: [(0, '183.210')] [2024-06-15 12:30:37,615][1651669] Updated weights for policy 0, policy_version 80080 (0.0014) [2024-06-15 12:30:38,698][1651669] Updated weights for policy 0, policy_version 80123 (0.0013) [2024-06-15 12:30:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 164102144. Throughput: 0: 12529.5. Samples: 41098240. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:40,767][1648981] Avg episode reward: [(0, '184.590')] [2024-06-15 12:30:41,866][1651669] Updated weights for policy 0, policy_version 80181 (0.0014) [2024-06-15 12:30:43,356][1651669] Updated weights for policy 0, policy_version 80256 (0.0013) [2024-06-15 12:30:44,649][1651669] Updated weights for policy 0, policy_version 80309 (0.0038) [2024-06-15 12:30:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50790.4, 300 sec: 47763.5). Total num frames: 164495360. Throughput: 0: 12185.6. Samples: 41161216. Policy #0 lag: (min: 15.0, avg: 98.2, max: 271.0) [2024-06-15 12:30:45,767][1648981] Avg episode reward: [(0, '181.190')] [2024-06-15 12:30:49,483][1651669] Updated weights for policy 0, policy_version 80353 (0.0048) [2024-06-15 12:30:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.9, 300 sec: 47208.2). Total num frames: 164626432. Throughput: 0: 12141.0. Samples: 41242112. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:30:50,767][1648981] Avg episode reward: [(0, '181.550')] [2024-06-15 12:30:51,913][1651669] Updated weights for policy 0, policy_version 80407 (0.0013) [2024-06-15 12:30:52,633][1651669] Updated weights for policy 0, policy_version 80448 (0.0010) [2024-06-15 12:30:53,998][1651669] Updated weights for policy 0, policy_version 80489 (0.0013) [2024-06-15 12:30:55,787][1648981] Fps is (10 sec: 49053.7, 60 sec: 51873.4, 300 sec: 47871.3). Total num frames: 164986880. Throughput: 0: 12214.4. Samples: 41282560. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:30:55,788][1648981] Avg episode reward: [(0, '184.060')] [2024-06-15 12:30:55,929][1651669] Updated weights for policy 0, policy_version 80566 (0.0136) [2024-06-15 12:30:56,177][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000080576_165019648.pth... [2024-06-15 12:30:56,218][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000074944_153485312.pth [2024-06-15 12:30:56,221][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000080576_165019648.pth [2024-06-15 12:30:59,641][1651669] Updated weights for policy 0, policy_version 80608 (0.0045) [2024-06-15 12:31:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 165150720. Throughput: 0: 12174.2. Samples: 41350144. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:00,767][1648981] Avg episode reward: [(0, '180.780')] [2024-06-15 12:31:02,376][1651669] Updated weights for policy 0, policy_version 80642 (0.0012) [2024-06-15 12:31:03,944][1651669] Updated weights for policy 0, policy_version 80705 (0.0013) [2024-06-15 12:31:05,217][1651669] Updated weights for policy 0, policy_version 80784 (0.0012) [2024-06-15 12:31:05,766][1648981] Fps is (10 sec: 49250.5, 60 sec: 51339.3, 300 sec: 47765.2). Total num frames: 165478400. Throughput: 0: 12174.2. Samples: 41426432. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:05,767][1648981] Avg episode reward: [(0, '181.800')] [2024-06-15 12:31:06,428][1651669] Updated weights for policy 0, policy_version 80832 (0.0013) [2024-06-15 12:31:10,314][1651669] Updated weights for policy 0, policy_version 80896 (0.0015) [2024-06-15 12:31:10,774][1648981] Fps is (10 sec: 52387.9, 60 sec: 49145.8, 300 sec: 47874.0). Total num frames: 165675008. Throughput: 0: 12274.5. Samples: 41462784. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:10,775][1648981] Avg episode reward: [(0, '183.110')] [2024-06-15 12:31:14,675][1651669] Updated weights for policy 0, policy_version 80960 (0.0014) [2024-06-15 12:31:15,229][1651274] Signal inference workers to stop experience collection... (4250 times) [2024-06-15 12:31:15,261][1651669] InferenceWorker_p0-w0: stopping experience collection (4250 times) [2024-06-15 12:31:15,513][1651274] Signal inference workers to resume experience collection... (4250 times) [2024-06-15 12:31:15,514][1651669] InferenceWorker_p0-w0: resuming experience collection (4250 times) [2024-06-15 12:31:15,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 48606.0, 300 sec: 47430.3). Total num frames: 165871616. Throughput: 0: 12208.4. Samples: 41545216. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:15,767][1648981] Avg episode reward: [(0, '182.850')] [2024-06-15 12:31:16,236][1651669] Updated weights for policy 0, policy_version 81025 (0.0014) [2024-06-15 12:31:17,291][1651669] Updated weights for policy 0, policy_version 81073 (0.0015) [2024-06-15 12:31:19,661][1651669] Updated weights for policy 0, policy_version 81104 (0.0026) [2024-06-15 12:31:20,768][1648981] Fps is (10 sec: 49189.3, 60 sec: 49151.8, 300 sec: 47874.6). Total num frames: 166166528. Throughput: 0: 12242.4. Samples: 41610240. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:20,770][1648981] Avg episode reward: [(0, '185.040')] [2024-06-15 12:31:20,778][1651669] Updated weights for policy 0, policy_version 81150 (0.0014) [2024-06-15 12:31:25,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 166297600. Throughput: 0: 12265.2. Samples: 41650176. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:25,767][1648981] Avg episode reward: [(0, '183.380')] [2024-06-15 12:31:25,992][1651669] Updated weights for policy 0, policy_version 81219 (0.0079) [2024-06-15 12:31:28,042][1651669] Updated weights for policy 0, policy_version 81312 (0.0087) [2024-06-15 12:31:30,677][1651669] Updated weights for policy 0, policy_version 81350 (0.0015) [2024-06-15 12:31:30,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 166592512. Throughput: 0: 12265.2. Samples: 41713152. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:30,767][1648981] Avg episode reward: [(0, '186.140')] [2024-06-15 12:31:32,086][1651669] Updated weights for policy 0, policy_version 81408 (0.0019) [2024-06-15 12:31:35,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 45875.1, 300 sec: 47209.0). Total num frames: 166723584. Throughput: 0: 12128.7. Samples: 41787904. Policy #0 lag: (min: 15.0, avg: 108.6, max: 271.0) [2024-06-15 12:31:35,767][1648981] Avg episode reward: [(0, '185.720')] [2024-06-15 12:31:37,697][1651669] Updated weights for policy 0, policy_version 81488 (0.0014) [2024-06-15 12:31:39,085][1651669] Updated weights for policy 0, policy_version 81538 (0.0010) [2024-06-15 12:31:40,191][1651669] Updated weights for policy 0, policy_version 81597 (0.0013) [2024-06-15 12:31:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 47874.6). Total num frames: 167116800. Throughput: 0: 11974.7. Samples: 41821184. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:31:40,767][1648981] Avg episode reward: [(0, '185.270')] [2024-06-15 12:31:43,192][1651669] Updated weights for policy 0, policy_version 81653 (0.0025) [2024-06-15 12:31:45,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 167247872. Throughput: 0: 11958.1. Samples: 41888256. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:31:45,767][1648981] Avg episode reward: [(0, '183.380')] [2024-06-15 12:31:47,131][1651669] Updated weights for policy 0, policy_version 81682 (0.0013) [2024-06-15 12:31:48,231][1651669] Updated weights for policy 0, policy_version 81744 (0.0013) [2024-06-15 12:31:49,811][1651669] Updated weights for policy 0, policy_version 81808 (0.0025) [2024-06-15 12:31:50,782][1648981] Fps is (10 sec: 49075.9, 60 sec: 49685.3, 300 sec: 47872.1). Total num frames: 167608320. Throughput: 0: 11749.2. Samples: 41955328. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:31:50,782][1648981] Avg episode reward: [(0, '181.270')] [2024-06-15 12:31:50,903][1651669] Updated weights for policy 0, policy_version 81846 (0.0011) [2024-06-15 12:31:54,307][1651669] Updated weights for policy 0, policy_version 81888 (0.0012) [2024-06-15 12:31:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46436.8, 300 sec: 47652.5). Total num frames: 167772160. Throughput: 0: 11891.9. Samples: 41997824. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:31:55,767][1648981] Avg episode reward: [(0, '175.900')] [2024-06-15 12:31:57,694][1651274] Signal inference workers to stop experience collection... (4300 times) [2024-06-15 12:31:57,727][1651669] InferenceWorker_p0-w0: stopping experience collection (4300 times) [2024-06-15 12:31:57,890][1651274] Signal inference workers to resume experience collection... (4300 times) [2024-06-15 12:31:57,891][1651669] InferenceWorker_p0-w0: resuming experience collection (4300 times) [2024-06-15 12:31:57,893][1651669] Updated weights for policy 0, policy_version 81936 (0.0013) [2024-06-15 12:31:59,084][1651669] Updated weights for policy 0, policy_version 81986 (0.0015) [2024-06-15 12:32:00,768][1648981] Fps is (10 sec: 42663.9, 60 sec: 48059.6, 300 sec: 47542.6). Total num frames: 168034304. Throughput: 0: 11605.3. Samples: 42067456. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:32:00,770][1648981] Avg episode reward: [(0, '175.980')] [2024-06-15 12:32:01,099][1651669] Updated weights for policy 0, policy_version 82065 (0.0096) [2024-06-15 12:32:02,076][1651669] Updated weights for policy 0, policy_version 82110 (0.0013) [2024-06-15 12:32:05,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47763.5). Total num frames: 168230912. Throughput: 0: 11673.7. Samples: 42135552. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:32:05,767][1648981] Avg episode reward: [(0, '179.630')] [2024-06-15 12:32:06,227][1651669] Updated weights for policy 0, policy_version 82164 (0.0017) [2024-06-15 12:32:10,141][1651669] Updated weights for policy 0, policy_version 82224 (0.0015) [2024-06-15 12:32:10,767][1648981] Fps is (10 sec: 39321.5, 60 sec: 45881.0, 300 sec: 47099.5). Total num frames: 168427520. Throughput: 0: 11514.3. Samples: 42168320. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:32:10,767][1648981] Avg episode reward: [(0, '185.490')] [2024-06-15 12:32:12,028][1651669] Updated weights for policy 0, policy_version 82288 (0.0011) [2024-06-15 12:32:13,345][1651669] Updated weights for policy 0, policy_version 82336 (0.0011) [2024-06-15 12:32:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 168689664. Throughput: 0: 11525.7. Samples: 42231808. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:32:15,767][1648981] Avg episode reward: [(0, '182.470')] [2024-06-15 12:32:16,821][1651669] Updated weights for policy 0, policy_version 82385 (0.0013) [2024-06-15 12:32:20,767][1648981] Fps is (10 sec: 39320.4, 60 sec: 44236.6, 300 sec: 47097.0). Total num frames: 168820736. Throughput: 0: 11525.6. Samples: 42306560. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:32:20,768][1648981] Avg episode reward: [(0, '185.850')] [2024-06-15 12:32:21,236][1651669] Updated weights for policy 0, policy_version 82433 (0.0022) [2024-06-15 12:32:22,932][1651669] Updated weights for policy 0, policy_version 82500 (0.0012) [2024-06-15 12:32:24,568][1651669] Updated weights for policy 0, policy_version 82560 (0.0011) [2024-06-15 12:32:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 47654.0). Total num frames: 169148416. Throughput: 0: 11434.7. Samples: 42335744. Policy #0 lag: (min: 124.0, avg: 189.6, max: 381.0) [2024-06-15 12:32:25,767][1648981] Avg episode reward: [(0, '185.190')] [2024-06-15 12:32:26,230][1651669] Updated weights for policy 0, policy_version 82613 (0.0012) [2024-06-15 12:32:28,790][1651669] Updated weights for policy 0, policy_version 82640 (0.0013) [2024-06-15 12:32:30,766][1648981] Fps is (10 sec: 52431.5, 60 sec: 45875.2, 300 sec: 47320.3). Total num frames: 169345024. Throughput: 0: 11411.9. Samples: 42401792. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:32:30,767][1648981] Avg episode reward: [(0, '186.540')] [2024-06-15 12:32:32,807][1651669] Updated weights for policy 0, policy_version 82704 (0.0013) [2024-06-15 12:32:33,967][1651669] Updated weights for policy 0, policy_version 82754 (0.0014) [2024-06-15 12:32:35,402][1651669] Updated weights for policy 0, policy_version 82817 (0.0011) [2024-06-15 12:32:35,403][1651274] Signal inference workers to stop experience collection... (4350 times) [2024-06-15 12:32:35,440][1651669] InferenceWorker_p0-w0: stopping experience collection (4350 times) [2024-06-15 12:32:35,643][1651274] Signal inference workers to resume experience collection... (4350 times) [2024-06-15 12:32:35,644][1651669] InferenceWorker_p0-w0: resuming experience collection (4350 times) [2024-06-15 12:32:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48606.0, 300 sec: 47652.5). Total num frames: 169639936. Throughput: 0: 11495.5. Samples: 42472448. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:32:35,767][1648981] Avg episode reward: [(0, '187.170')] [2024-06-15 12:32:36,751][1651669] Updated weights for policy 0, policy_version 82872 (0.0010) [2024-06-15 12:32:40,026][1651669] Updated weights for policy 0, policy_version 82916 (0.0018) [2024-06-15 12:32:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 47763.5). Total num frames: 169869312. Throughput: 0: 11423.3. Samples: 42511872. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:32:40,767][1648981] Avg episode reward: [(0, '189.810')] [2024-06-15 12:32:43,944][1651669] Updated weights for policy 0, policy_version 82992 (0.0077) [2024-06-15 12:32:45,421][1651669] Updated weights for policy 0, policy_version 83068 (0.0013) [2024-06-15 12:32:45,782][1648981] Fps is (10 sec: 49074.1, 60 sec: 48047.0, 300 sec: 47538.8). Total num frames: 170131456. Throughput: 0: 11407.9. Samples: 42580992. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:32:45,783][1648981] Avg episode reward: [(0, '185.820')] [2024-06-15 12:32:47,322][1651669] Updated weights for policy 0, policy_version 83125 (0.0012) [2024-06-15 12:32:50,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44794.5, 300 sec: 47653.1). Total num frames: 170295296. Throughput: 0: 11628.1. Samples: 42658816. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:32:50,767][1648981] Avg episode reward: [(0, '185.650')] [2024-06-15 12:32:51,043][1651669] Updated weights for policy 0, policy_version 83169 (0.0012) [2024-06-15 12:32:54,332][1651669] Updated weights for policy 0, policy_version 83218 (0.0014) [2024-06-15 12:32:55,766][1648981] Fps is (10 sec: 39383.7, 60 sec: 45875.1, 300 sec: 47097.1). Total num frames: 170524672. Throughput: 0: 11673.6. Samples: 42693632. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:32:55,767][1648981] Avg episode reward: [(0, '184.900')] [2024-06-15 12:32:56,126][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000083296_170590208.pth... [2024-06-15 12:32:56,300][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000077744_159219712.pth [2024-06-15 12:32:56,528][1651669] Updated weights for policy 0, policy_version 83312 (0.0056) [2024-06-15 12:32:58,865][1651669] Updated weights for policy 0, policy_version 83361 (0.0012) [2024-06-15 12:33:00,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 170786816. Throughput: 0: 11593.9. Samples: 42753536. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:33:00,767][1648981] Avg episode reward: [(0, '187.060')] [2024-06-15 12:33:01,952][1651669] Updated weights for policy 0, policy_version 83411 (0.0012) [2024-06-15 12:33:02,770][1651669] Updated weights for policy 0, policy_version 83456 (0.0016) [2024-06-15 12:33:05,804][1648981] Fps is (10 sec: 48966.7, 60 sec: 46392.0, 300 sec: 47091.0). Total num frames: 171016192. Throughput: 0: 11845.8. Samples: 42840064. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:33:05,805][1648981] Avg episode reward: [(0, '191.710')] [2024-06-15 12:33:06,020][1651669] Updated weights for policy 0, policy_version 83528 (0.0157) [2024-06-15 12:33:07,016][1651669] Updated weights for policy 0, policy_version 83579 (0.0014) [2024-06-15 12:33:09,429][1651669] Updated weights for policy 0, policy_version 83619 (0.0012) [2024-06-15 12:33:10,769][1648981] Fps is (10 sec: 52415.5, 60 sec: 48057.9, 300 sec: 47654.6). Total num frames: 171311104. Throughput: 0: 11911.9. Samples: 42871808. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:33:10,769][1648981] Avg episode reward: [(0, '190.690')] [2024-06-15 12:33:12,659][1651669] Updated weights for policy 0, policy_version 83681 (0.0014) [2024-06-15 12:33:15,107][1651669] Updated weights for policy 0, policy_version 83715 (0.0013) [2024-06-15 12:33:15,778][1648981] Fps is (10 sec: 49280.9, 60 sec: 46958.2, 300 sec: 47428.4). Total num frames: 171507712. Throughput: 0: 12159.7. Samples: 42949120. Policy #0 lag: (min: 26.0, avg: 146.5, max: 282.0) [2024-06-15 12:33:15,779][1648981] Avg episode reward: [(0, '193.230')] [2024-06-15 12:33:16,016][1651274] Saving new best policy, reward=193.230! [2024-06-15 12:33:16,429][1651669] Updated weights for policy 0, policy_version 83776 (0.0014) [2024-06-15 12:33:16,891][1651274] Signal inference workers to stop experience collection... (4400 times) [2024-06-15 12:33:16,934][1651669] InferenceWorker_p0-w0: stopping experience collection (4400 times) [2024-06-15 12:33:17,096][1651274] Signal inference workers to resume experience collection... (4400 times) [2024-06-15 12:33:17,097][1651669] InferenceWorker_p0-w0: resuming experience collection (4400 times) [2024-06-15 12:33:17,784][1651669] Updated weights for policy 0, policy_version 83840 (0.0014) [2024-06-15 12:33:20,468][1651669] Updated weights for policy 0, policy_version 83894 (0.0016) [2024-06-15 12:33:20,774][1648981] Fps is (10 sec: 52400.6, 60 sec: 50238.1, 300 sec: 47984.4). Total num frames: 171835392. Throughput: 0: 12058.3. Samples: 43015168. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:20,775][1648981] Avg episode reward: [(0, '190.040')] [2024-06-15 12:33:24,090][1651669] Updated weights for policy 0, policy_version 83955 (0.0026) [2024-06-15 12:33:25,766][1648981] Fps is (10 sec: 45929.3, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 171966464. Throughput: 0: 12037.7. Samples: 43053568. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:25,767][1648981] Avg episode reward: [(0, '194.010')] [2024-06-15 12:33:25,774][1651274] Saving new best policy, reward=194.010! [2024-06-15 12:33:27,087][1651669] Updated weights for policy 0, policy_version 84016 (0.0013) [2024-06-15 12:33:28,297][1651669] Updated weights for policy 0, policy_version 84068 (0.0019) [2024-06-15 12:33:30,767][1648981] Fps is (10 sec: 42631.2, 60 sec: 48605.7, 300 sec: 47652.4). Total num frames: 172261376. Throughput: 0: 12110.2. Samples: 43125760. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:30,767][1648981] Avg episode reward: [(0, '185.930')] [2024-06-15 12:33:31,222][1651669] Updated weights for policy 0, policy_version 84144 (0.0015) [2024-06-15 12:33:33,868][1651669] Updated weights for policy 0, policy_version 84163 (0.0014) [2024-06-15 12:33:35,074][1651669] Updated weights for policy 0, policy_version 84220 (0.0011) [2024-06-15 12:33:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 172490752. Throughput: 0: 12071.8. Samples: 43202048. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:35,767][1648981] Avg episode reward: [(0, '185.230')] [2024-06-15 12:33:37,529][1651669] Updated weights for policy 0, policy_version 84258 (0.0012) [2024-06-15 12:33:38,821][1651669] Updated weights for policy 0, policy_version 84321 (0.0013) [2024-06-15 12:33:40,786][1648981] Fps is (10 sec: 49055.8, 60 sec: 48043.9, 300 sec: 47538.2). Total num frames: 172752896. Throughput: 0: 12134.8. Samples: 43239936. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:40,787][1648981] Avg episode reward: [(0, '178.860')] [2024-06-15 12:33:41,495][1651669] Updated weights for policy 0, policy_version 84384 (0.0012) [2024-06-15 12:33:44,267][1651669] Updated weights for policy 0, policy_version 84455 (0.0015) [2024-06-15 12:33:45,786][1648981] Fps is (10 sec: 52323.9, 60 sec: 48056.4, 300 sec: 47982.4). Total num frames: 173015040. Throughput: 0: 12362.1. Samples: 43310080. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:45,787][1648981] Avg episode reward: [(0, '180.540')] [2024-06-15 12:33:47,744][1651669] Updated weights for policy 0, policy_version 84497 (0.0013) [2024-06-15 12:33:48,729][1651669] Updated weights for policy 0, policy_version 84544 (0.0017) [2024-06-15 12:33:50,343][1651669] Updated weights for policy 0, policy_version 84600 (0.0124) [2024-06-15 12:33:50,766][1648981] Fps is (10 sec: 52532.7, 60 sec: 49698.1, 300 sec: 47874.7). Total num frames: 173277184. Throughput: 0: 12047.8. Samples: 43381760. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:50,767][1648981] Avg episode reward: [(0, '177.860')] [2024-06-15 12:33:52,960][1651669] Updated weights for policy 0, policy_version 84656 (0.0013) [2024-06-15 12:33:54,140][1651669] Updated weights for policy 0, policy_version 84688 (0.0013) [2024-06-15 12:33:55,187][1651669] Updated weights for policy 0, policy_version 84731 (0.0014) [2024-06-15 12:33:55,767][1648981] Fps is (10 sec: 52531.6, 60 sec: 50244.0, 300 sec: 48096.7). Total num frames: 173539328. Throughput: 0: 12095.1. Samples: 43416064. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:33:55,767][1648981] Avg episode reward: [(0, '178.950')] [2024-06-15 12:33:59,943][1651669] Updated weights for policy 0, policy_version 84784 (0.0029) [2024-06-15 12:34:00,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 173670400. Throughput: 0: 12075.0. Samples: 43492352. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:34:00,767][1648981] Avg episode reward: [(0, '185.610')] [2024-06-15 12:34:01,073][1651274] Signal inference workers to stop experience collection... (4450 times) [2024-06-15 12:34:01,142][1651669] InferenceWorker_p0-w0: stopping experience collection (4450 times) [2024-06-15 12:34:01,296][1651274] Signal inference workers to resume experience collection... (4450 times) [2024-06-15 12:34:01,297][1651669] InferenceWorker_p0-w0: resuming experience collection (4450 times) [2024-06-15 12:34:01,475][1651669] Updated weights for policy 0, policy_version 84834 (0.0027) [2024-06-15 12:34:03,337][1651669] Updated weights for policy 0, policy_version 84896 (0.0026) [2024-06-15 12:34:05,647][1651669] Updated weights for policy 0, policy_version 84960 (0.0014) [2024-06-15 12:34:05,767][1648981] Fps is (10 sec: 45875.5, 60 sec: 49729.2, 300 sec: 48207.8). Total num frames: 173998080. Throughput: 0: 12108.0. Samples: 43559936. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:34:05,767][1648981] Avg episode reward: [(0, '188.650')] [2024-06-15 12:34:10,572][1651669] Updated weights for policy 0, policy_version 85014 (0.0014) [2024-06-15 12:34:10,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 46969.3, 300 sec: 47875.2). Total num frames: 174129152. Throughput: 0: 12060.4. Samples: 43596288. Policy #0 lag: (min: 38.0, avg: 170.2, max: 294.0) [2024-06-15 12:34:10,769][1648981] Avg episode reward: [(0, '185.100')] [2024-06-15 12:34:11,341][1651669] Updated weights for policy 0, policy_version 85052 (0.0012) [2024-06-15 12:34:12,677][1651669] Updated weights for policy 0, policy_version 85110 (0.0012) [2024-06-15 12:34:13,646][1651669] Updated weights for policy 0, policy_version 85152 (0.0011) [2024-06-15 12:34:15,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 49161.6, 300 sec: 47986.5). Total num frames: 174456832. Throughput: 0: 11946.7. Samples: 43663360. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:15,767][1648981] Avg episode reward: [(0, '183.000')] [2024-06-15 12:34:16,903][1651669] Updated weights for policy 0, policy_version 85216 (0.0203) [2024-06-15 12:34:20,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 45881.2, 300 sec: 47874.6). Total num frames: 174587904. Throughput: 0: 12094.6. Samples: 43746304. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:20,767][1648981] Avg episode reward: [(0, '183.320')] [2024-06-15 12:34:21,835][1651669] Updated weights for policy 0, policy_version 85288 (0.0021) [2024-06-15 12:34:22,637][1651669] Updated weights for policy 0, policy_version 85328 (0.0013) [2024-06-15 12:34:23,764][1651669] Updated weights for policy 0, policy_version 85376 (0.0013) [2024-06-15 12:34:25,349][1651669] Updated weights for policy 0, policy_version 85434 (0.0117) [2024-06-15 12:34:25,778][1648981] Fps is (10 sec: 52366.6, 60 sec: 50234.3, 300 sec: 47983.7). Total num frames: 174981120. Throughput: 0: 11971.5. Samples: 43778560. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:25,779][1648981] Avg episode reward: [(0, '184.020')] [2024-06-15 12:34:28,077][1651669] Updated weights for policy 0, policy_version 85495 (0.0143) [2024-06-15 12:34:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47513.7, 300 sec: 47990.6). Total num frames: 175112192. Throughput: 0: 12065.8. Samples: 43852800. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:30,767][1648981] Avg episode reward: [(0, '185.950')] [2024-06-15 12:34:32,768][1651669] Updated weights for policy 0, policy_version 85562 (0.0139) [2024-06-15 12:34:34,390][1651669] Updated weights for policy 0, policy_version 85616 (0.0014) [2024-06-15 12:34:35,740][1651669] Updated weights for policy 0, policy_version 85669 (0.0014) [2024-06-15 12:34:35,766][1648981] Fps is (10 sec: 45929.6, 60 sec: 49151.9, 300 sec: 48096.8). Total num frames: 175439872. Throughput: 0: 12014.9. Samples: 43922432. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:35,767][1648981] Avg episode reward: [(0, '181.780')] [2024-06-15 12:34:36,393][1651669] Updated weights for policy 0, policy_version 85696 (0.0011) [2024-06-15 12:34:39,020][1651669] Updated weights for policy 0, policy_version 85751 (0.0012) [2024-06-15 12:34:40,767][1648981] Fps is (10 sec: 52425.1, 60 sec: 48075.0, 300 sec: 48096.6). Total num frames: 175636480. Throughput: 0: 11980.7. Samples: 43955200. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:40,768][1648981] Avg episode reward: [(0, '183.580')] [2024-06-15 12:34:43,309][1651669] Updated weights for policy 0, policy_version 85808 (0.0014) [2024-06-15 12:34:44,287][1651274] Signal inference workers to stop experience collection... (4500 times) [2024-06-15 12:34:44,336][1651669] InferenceWorker_p0-w0: stopping experience collection (4500 times) [2024-06-15 12:34:44,474][1651274] Signal inference workers to resume experience collection... (4500 times) [2024-06-15 12:34:44,475][1651669] InferenceWorker_p0-w0: resuming experience collection (4500 times) [2024-06-15 12:34:44,979][1651669] Updated weights for policy 0, policy_version 85857 (0.0015) [2024-06-15 12:34:45,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48075.8, 300 sec: 48096.8). Total num frames: 175898624. Throughput: 0: 12151.5. Samples: 44039168. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:45,767][1648981] Avg episode reward: [(0, '185.290')] [2024-06-15 12:34:46,496][1651669] Updated weights for policy 0, policy_version 85928 (0.0014) [2024-06-15 12:34:48,509][1651669] Updated weights for policy 0, policy_version 85972 (0.0015) [2024-06-15 12:34:50,766][1648981] Fps is (10 sec: 52432.8, 60 sec: 48059.8, 300 sec: 48431.5). Total num frames: 176160768. Throughput: 0: 11992.3. Samples: 44099584. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:50,767][1648981] Avg episode reward: [(0, '192.210')] [2024-06-15 12:34:54,211][1651669] Updated weights for policy 0, policy_version 86035 (0.0027) [2024-06-15 12:34:55,457][1651669] Updated weights for policy 0, policy_version 86098 (0.0011) [2024-06-15 12:34:55,767][1648981] Fps is (10 sec: 45871.1, 60 sec: 46967.1, 300 sec: 47874.5). Total num frames: 176357376. Throughput: 0: 12287.8. Samples: 44149248. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:34:55,768][1648981] Avg episode reward: [(0, '194.520')] [2024-06-15 12:34:56,044][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000086128_176390144.pth... [2024-06-15 12:34:56,162][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000080576_165019648.pth [2024-06-15 12:34:56,167][1651274] Saving new best policy, reward=194.520! [2024-06-15 12:34:57,430][1651669] Updated weights for policy 0, policy_version 86179 (0.0017) [2024-06-15 12:34:57,907][1651669] Updated weights for policy 0, policy_version 86208 (0.0026) [2024-06-15 12:35:00,403][1651669] Updated weights for policy 0, policy_version 86269 (0.0132) [2024-06-15 12:35:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48430.5). Total num frames: 176685056. Throughput: 0: 12162.8. Samples: 44210688. Policy #0 lag: (min: 25.0, avg: 125.9, max: 271.0) [2024-06-15 12:35:00,767][1648981] Avg episode reward: [(0, '197.160')] [2024-06-15 12:35:00,768][1651274] Saving new best policy, reward=197.160! [2024-06-15 12:35:05,766][1648981] Fps is (10 sec: 39324.8, 60 sec: 45875.5, 300 sec: 47541.4). Total num frames: 176750592. Throughput: 0: 12105.9. Samples: 44291072. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:05,767][1648981] Avg episode reward: [(0, '196.070')] [2024-06-15 12:35:05,892][1651669] Updated weights for policy 0, policy_version 86320 (0.0011) [2024-06-15 12:35:07,100][1651669] Updated weights for policy 0, policy_version 86384 (0.0012) [2024-06-15 12:35:08,506][1651669] Updated weights for policy 0, policy_version 86434 (0.0016) [2024-06-15 12:35:10,313][1651669] Updated weights for policy 0, policy_version 86480 (0.0016) [2024-06-15 12:35:10,767][1648981] Fps is (10 sec: 45872.7, 60 sec: 50243.9, 300 sec: 48096.7). Total num frames: 177143808. Throughput: 0: 12052.1. Samples: 44320768. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:10,768][1648981] Avg episode reward: [(0, '193.380')] [2024-06-15 12:35:15,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 177209344. Throughput: 0: 11958.1. Samples: 44390912. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:15,767][1648981] Avg episode reward: [(0, '191.780')] [2024-06-15 12:35:16,677][1651669] Updated weights for policy 0, policy_version 86529 (0.0014) [2024-06-15 12:35:18,245][1651669] Updated weights for policy 0, policy_version 86613 (0.0012) [2024-06-15 12:35:20,051][1651669] Updated weights for policy 0, policy_version 86688 (0.0251) [2024-06-15 12:35:20,766][1648981] Fps is (10 sec: 42600.8, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 177569792. Throughput: 0: 11867.0. Samples: 44456448. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:20,767][1648981] Avg episode reward: [(0, '191.580')] [2024-06-15 12:35:21,719][1651274] Signal inference workers to stop experience collection... (4550 times) [2024-06-15 12:35:21,730][1651669] Updated weights for policy 0, policy_version 86738 (0.0012) [2024-06-15 12:35:21,769][1651669] InferenceWorker_p0-w0: stopping experience collection (4550 times) [2024-06-15 12:35:21,921][1651274] Signal inference workers to resume experience collection... (4550 times) [2024-06-15 12:35:21,922][1651669] InferenceWorker_p0-w0: resuming experience collection (4550 times) [2024-06-15 12:35:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45884.2, 300 sec: 47763.5). Total num frames: 177733632. Throughput: 0: 11844.5. Samples: 44488192. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:25,767][1648981] Avg episode reward: [(0, '191.440')] [2024-06-15 12:35:28,008][1651669] Updated weights for policy 0, policy_version 86787 (0.0013) [2024-06-15 12:35:29,160][1651669] Updated weights for policy 0, policy_version 86855 (0.0013) [2024-06-15 12:35:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 177995776. Throughput: 0: 11878.4. Samples: 44573696. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:30,767][1648981] Avg episode reward: [(0, '193.060')] [2024-06-15 12:35:31,281][1651669] Updated weights for policy 0, policy_version 86944 (0.0097) [2024-06-15 12:35:32,824][1651669] Updated weights for policy 0, policy_version 86994 (0.0012) [2024-06-15 12:35:35,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 178257920. Throughput: 0: 11935.3. Samples: 44636672. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:35,767][1648981] Avg episode reward: [(0, '185.940')] [2024-06-15 12:35:39,289][1651669] Updated weights for policy 0, policy_version 87058 (0.0012) [2024-06-15 12:35:40,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 46968.0, 300 sec: 47319.2). Total num frames: 178454528. Throughput: 0: 11696.6. Samples: 44675584. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:40,767][1648981] Avg episode reward: [(0, '188.510')] [2024-06-15 12:35:40,918][1651669] Updated weights for policy 0, policy_version 87137 (0.0015) [2024-06-15 12:35:42,713][1651669] Updated weights for policy 0, policy_version 87221 (0.0210) [2024-06-15 12:35:44,263][1651669] Updated weights for policy 0, policy_version 87264 (0.0142) [2024-06-15 12:35:45,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 178782208. Throughput: 0: 11719.1. Samples: 44738048. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:45,767][1648981] Avg episode reward: [(0, '184.250')] [2024-06-15 12:35:50,323][1651669] Updated weights for policy 0, policy_version 87312 (0.0014) [2024-06-15 12:35:50,767][1648981] Fps is (10 sec: 39319.6, 60 sec: 44782.5, 300 sec: 46989.1). Total num frames: 178847744. Throughput: 0: 11730.3. Samples: 44818944. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:50,767][1648981] Avg episode reward: [(0, '186.650')] [2024-06-15 12:35:52,000][1651669] Updated weights for policy 0, policy_version 87382 (0.0024) [2024-06-15 12:35:54,088][1651669] Updated weights for policy 0, policy_version 87478 (0.0014) [2024-06-15 12:35:55,790][1648981] Fps is (10 sec: 42496.6, 60 sec: 47495.3, 300 sec: 47648.6). Total num frames: 179208192. Throughput: 0: 11633.4. Samples: 44844544. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:35:55,791][1648981] Avg episode reward: [(0, '183.560')] [2024-06-15 12:35:55,828][1651669] Updated weights for policy 0, policy_version 87520 (0.0012) [2024-06-15 12:35:56,590][1651669] Updated weights for policy 0, policy_version 87552 (0.0012) [2024-06-15 12:36:00,766][1648981] Fps is (10 sec: 45878.0, 60 sec: 43690.7, 300 sec: 46874.9). Total num frames: 179306496. Throughput: 0: 11798.8. Samples: 44921856. Policy #0 lag: (min: 15.0, avg: 69.6, max: 239.0) [2024-06-15 12:36:00,767][1648981] Avg episode reward: [(0, '183.700')] [2024-06-15 12:36:02,741][1651669] Updated weights for policy 0, policy_version 87616 (0.0066) [2024-06-15 12:36:03,505][1651274] Signal inference workers to stop experience collection... (4600 times) [2024-06-15 12:36:03,544][1651669] InferenceWorker_p0-w0: stopping experience collection (4600 times) [2024-06-15 12:36:03,751][1651274] Signal inference workers to resume experience collection... (4600 times) [2024-06-15 12:36:03,751][1651669] InferenceWorker_p0-w0: resuming experience collection (4600 times) [2024-06-15 12:36:03,912][1651669] Updated weights for policy 0, policy_version 87669 (0.0013) [2024-06-15 12:36:05,086][1651669] Updated weights for policy 0, policy_version 87730 (0.0109) [2024-06-15 12:36:05,766][1648981] Fps is (10 sec: 49270.4, 60 sec: 49152.1, 300 sec: 47542.6). Total num frames: 179699712. Throughput: 0: 11867.0. Samples: 44990464. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:05,767][1648981] Avg episode reward: [(0, '183.300')] [2024-06-15 12:36:06,690][1651669] Updated weights for policy 0, policy_version 87776 (0.0014) [2024-06-15 12:36:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 44783.4, 300 sec: 47319.2). Total num frames: 179830784. Throughput: 0: 11935.3. Samples: 45025280. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:10,767][1648981] Avg episode reward: [(0, '187.000')] [2024-06-15 12:36:12,426][1651669] Updated weights for policy 0, policy_version 87809 (0.0030) [2024-06-15 12:36:13,880][1651669] Updated weights for policy 0, policy_version 87872 (0.0012) [2024-06-15 12:36:15,085][1651669] Updated weights for policy 0, policy_version 87925 (0.0011) [2024-06-15 12:36:15,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48605.8, 300 sec: 47319.2). Total num frames: 180125696. Throughput: 0: 11582.6. Samples: 45094912. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:15,767][1648981] Avg episode reward: [(0, '186.090')] [2024-06-15 12:36:16,465][1651669] Updated weights for policy 0, policy_version 87997 (0.0100) [2024-06-15 12:36:18,525][1651669] Updated weights for policy 0, policy_version 88055 (0.0022) [2024-06-15 12:36:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 46421.5, 300 sec: 47652.5). Total num frames: 180355072. Throughput: 0: 11776.1. Samples: 45166592. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:20,767][1648981] Avg episode reward: [(0, '187.350')] [2024-06-15 12:36:24,466][1651669] Updated weights for policy 0, policy_version 88099 (0.0154) [2024-06-15 12:36:25,774][1648981] Fps is (10 sec: 39290.8, 60 sec: 46415.3, 300 sec: 47206.9). Total num frames: 180518912. Throughput: 0: 11830.8. Samples: 45208064. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:25,775][1648981] Avg episode reward: [(0, '194.310')] [2024-06-15 12:36:25,804][1651669] Updated weights for policy 0, policy_version 88160 (0.0012) [2024-06-15 12:36:27,066][1651669] Updated weights for policy 0, policy_version 88212 (0.0103) [2024-06-15 12:36:27,936][1651669] Updated weights for policy 0, policy_version 88256 (0.0015) [2024-06-15 12:36:30,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 180879360. Throughput: 0: 11730.5. Samples: 45265920. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:30,767][1648981] Avg episode reward: [(0, '182.870')] [2024-06-15 12:36:34,568][1651669] Updated weights for policy 0, policy_version 88321 (0.0052) [2024-06-15 12:36:35,766][1648981] Fps is (10 sec: 45911.2, 60 sec: 45329.1, 300 sec: 46986.0). Total num frames: 180977664. Throughput: 0: 11685.1. Samples: 45344768. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:35,767][1648981] Avg episode reward: [(0, '194.140')] [2024-06-15 12:36:36,324][1651669] Updated weights for policy 0, policy_version 88400 (0.0143) [2024-06-15 12:36:37,341][1651669] Updated weights for policy 0, policy_version 88448 (0.0014) [2024-06-15 12:36:38,715][1651669] Updated weights for policy 0, policy_version 88510 (0.0013) [2024-06-15 12:36:40,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 181338112. Throughput: 0: 11679.8. Samples: 45369856. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:40,767][1648981] Avg episode reward: [(0, '191.060')] [2024-06-15 12:36:41,111][1651669] Updated weights for policy 0, policy_version 88567 (0.0012) [2024-06-15 12:36:45,538][1651274] Signal inference workers to stop experience collection... (4650 times) [2024-06-15 12:36:45,572][1651669] InferenceWorker_p0-w0: stopping experience collection (4650 times) [2024-06-15 12:36:45,767][1648981] Fps is (10 sec: 42598.2, 60 sec: 43690.6, 300 sec: 46766.3). Total num frames: 181403648. Throughput: 0: 11776.0. Samples: 45451776. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:45,767][1648981] Avg episode reward: [(0, '192.510')] [2024-06-15 12:36:45,954][1651274] Signal inference workers to resume experience collection... (4650 times) [2024-06-15 12:36:45,956][1651669] InferenceWorker_p0-w0: resuming experience collection (4650 times) [2024-06-15 12:36:46,912][1651669] Updated weights for policy 0, policy_version 88625 (0.0144) [2024-06-15 12:36:48,215][1651669] Updated weights for policy 0, policy_version 88675 (0.0013) [2024-06-15 12:36:50,006][1651669] Updated weights for policy 0, policy_version 88752 (0.0015) [2024-06-15 12:36:50,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.5, 300 sec: 47541.4). Total num frames: 181796864. Throughput: 0: 11537.1. Samples: 45509632. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:50,767][1648981] Avg episode reward: [(0, '190.780')] [2024-06-15 12:36:51,518][1651669] Updated weights for policy 0, policy_version 88800 (0.0016) [2024-06-15 12:36:55,774][1648981] Fps is (10 sec: 52388.2, 60 sec: 45341.3, 300 sec: 47095.8). Total num frames: 181927936. Throughput: 0: 11580.6. Samples: 45546496. Policy #0 lag: (min: 15.0, avg: 76.0, max: 255.0) [2024-06-15 12:36:55,775][1648981] Avg episode reward: [(0, '199.390')] [2024-06-15 12:36:55,795][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000088832_181927936.pth... [2024-06-15 12:36:55,835][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000083296_170590208.pth [2024-06-15 12:36:55,838][1651274] Saving new best policy, reward=199.390! [2024-06-15 12:36:57,313][1651669] Updated weights for policy 0, policy_version 88867 (0.0013) [2024-06-15 12:36:58,914][1651669] Updated weights for policy 0, policy_version 88933 (0.0014) [2024-06-15 12:37:00,753][1651669] Updated weights for policy 0, policy_version 89015 (0.0014) [2024-06-15 12:37:00,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 47652.5). Total num frames: 182288384. Throughput: 0: 11707.8. Samples: 45621760. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:00,767][1648981] Avg episode reward: [(0, '203.850')] [2024-06-15 12:37:00,898][1651274] Saving new best policy, reward=203.850! [2024-06-15 12:37:03,042][1651669] Updated weights for policy 0, policy_version 89086 (0.0012) [2024-06-15 12:37:05,767][1648981] Fps is (10 sec: 52469.5, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 182452224. Throughput: 0: 11628.0. Samples: 45689856. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:05,767][1648981] Avg episode reward: [(0, '199.570')] [2024-06-15 12:37:09,067][1651669] Updated weights for policy 0, policy_version 89138 (0.0094) [2024-06-15 12:37:10,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 182681600. Throughput: 0: 11652.9. Samples: 45732352. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:10,767][1648981] Avg episode reward: [(0, '204.980')] [2024-06-15 12:37:11,180][1651669] Updated weights for policy 0, policy_version 89217 (0.0243) [2024-06-15 12:37:11,366][1651274] Saving new best policy, reward=204.980! [2024-06-15 12:37:12,412][1651669] Updated weights for policy 0, policy_version 89278 (0.0013) [2024-06-15 12:37:14,293][1651669] Updated weights for policy 0, policy_version 89337 (0.0013) [2024-06-15 12:37:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 47985.8). Total num frames: 182976512. Throughput: 0: 11593.9. Samples: 45787648. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:15,767][1648981] Avg episode reward: [(0, '201.030')] [2024-06-15 12:37:20,512][1651669] Updated weights for policy 0, policy_version 89392 (0.0014) [2024-06-15 12:37:20,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45328.9, 300 sec: 47208.1). Total num frames: 183074816. Throughput: 0: 11696.4. Samples: 45871104. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:20,767][1648981] Avg episode reward: [(0, '204.110')] [2024-06-15 12:37:22,193][1651669] Updated weights for policy 0, policy_version 89445 (0.0013) [2024-06-15 12:37:23,013][1651274] Signal inference workers to stop experience collection... (4700 times) [2024-06-15 12:37:23,063][1651669] InferenceWorker_p0-w0: stopping experience collection (4700 times) [2024-06-15 12:37:23,318][1651274] Signal inference workers to resume experience collection... (4700 times) [2024-06-15 12:37:23,319][1651669] InferenceWorker_p0-w0: resuming experience collection (4700 times) [2024-06-15 12:37:23,931][1651669] Updated weights for policy 0, policy_version 89509 (0.0175) [2024-06-15 12:37:25,658][1651669] Updated weights for policy 0, policy_version 89593 (0.0076) [2024-06-15 12:37:25,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49704.7, 300 sec: 47985.7). Total num frames: 183500800. Throughput: 0: 11650.9. Samples: 45894144. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:25,767][1648981] Avg episode reward: [(0, '203.370')] [2024-06-15 12:37:30,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 43690.6, 300 sec: 46986.0). Total num frames: 183500800. Throughput: 0: 11389.2. Samples: 45964288. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:30,767][1648981] Avg episode reward: [(0, '202.210')] [2024-06-15 12:37:32,587][1651669] Updated weights for policy 0, policy_version 89632 (0.0013) [2024-06-15 12:37:34,692][1651669] Updated weights for policy 0, policy_version 89699 (0.0115) [2024-06-15 12:37:35,767][1648981] Fps is (10 sec: 29490.7, 60 sec: 46967.4, 300 sec: 47208.1). Total num frames: 183795712. Throughput: 0: 11446.0. Samples: 46024704. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:35,767][1648981] Avg episode reward: [(0, '197.900')] [2024-06-15 12:37:36,120][1651669] Updated weights for policy 0, policy_version 89760 (0.0012) [2024-06-15 12:37:37,539][1651669] Updated weights for policy 0, policy_version 89824 (0.0088) [2024-06-15 12:37:40,785][1648981] Fps is (10 sec: 52332.2, 60 sec: 44769.2, 300 sec: 47096.6). Total num frames: 184025088. Throughput: 0: 11295.4. Samples: 46054912. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:40,788][1648981] Avg episode reward: [(0, '194.440')] [2024-06-15 12:37:43,588][1651669] Updated weights for policy 0, policy_version 89859 (0.0011) [2024-06-15 12:37:45,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 184188928. Throughput: 0: 11525.7. Samples: 46140416. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:45,767][1648981] Avg episode reward: [(0, '192.220')] [2024-06-15 12:37:46,025][1651669] Updated weights for policy 0, policy_version 89952 (0.0118) [2024-06-15 12:37:47,231][1651669] Updated weights for policy 0, policy_version 90000 (0.0010) [2024-06-15 12:37:48,543][1651669] Updated weights for policy 0, policy_version 90051 (0.0013) [2024-06-15 12:37:49,628][1651669] Updated weights for policy 0, policy_version 90108 (0.0013) [2024-06-15 12:37:50,768][1648981] Fps is (10 sec: 52524.5, 60 sec: 45874.9, 300 sec: 47541.3). Total num frames: 184549376. Throughput: 0: 11184.3. Samples: 46193152. Policy #0 lag: (min: 11.0, avg: 72.2, max: 267.0) [2024-06-15 12:37:50,769][1648981] Avg episode reward: [(0, '194.510')] [2024-06-15 12:37:55,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44788.8, 300 sec: 46874.9). Total num frames: 184614912. Throughput: 0: 11173.0. Samples: 46235136. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:37:55,767][1648981] Avg episode reward: [(0, '190.740')] [2024-06-15 12:37:56,827][1651669] Updated weights for policy 0, policy_version 90180 (0.0048) [2024-06-15 12:37:58,401][1651669] Updated weights for policy 0, policy_version 90242 (0.0011) [2024-06-15 12:37:59,499][1651669] Updated weights for policy 0, policy_version 90297 (0.0011) [2024-06-15 12:38:00,105][1651274] Signal inference workers to stop experience collection... (4750 times) [2024-06-15 12:38:00,144][1651669] InferenceWorker_p0-w0: stopping experience collection (4750 times) [2024-06-15 12:38:00,363][1651274] Signal inference workers to resume experience collection... (4750 times) [2024-06-15 12:38:00,364][1651669] InferenceWorker_p0-w0: resuming experience collection (4750 times) [2024-06-15 12:38:00,770][1648981] Fps is (10 sec: 49133.7, 60 sec: 45872.1, 300 sec: 47546.8). Total num frames: 185040896. Throughput: 0: 11365.4. Samples: 46299136. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:00,771][1648981] Avg episode reward: [(0, '188.560')] [2024-06-15 12:38:00,983][1651669] Updated weights for policy 0, policy_version 90363 (0.0097) [2024-06-15 12:38:05,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 46653.1). Total num frames: 185073664. Throughput: 0: 11082.0. Samples: 46369792. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:05,767][1648981] Avg episode reward: [(0, '192.350')] [2024-06-15 12:38:07,963][1651669] Updated weights for policy 0, policy_version 90404 (0.0012) [2024-06-15 12:38:09,653][1651669] Updated weights for policy 0, policy_version 90465 (0.0012) [2024-06-15 12:38:10,766][1648981] Fps is (10 sec: 32781.0, 60 sec: 44782.9, 300 sec: 46987.8). Total num frames: 185368576. Throughput: 0: 11389.1. Samples: 46406656. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:10,767][1648981] Avg episode reward: [(0, '191.840')] [2024-06-15 12:38:10,891][1651669] Updated weights for policy 0, policy_version 90517 (0.0016) [2024-06-15 12:38:12,606][1651669] Updated weights for policy 0, policy_version 90592 (0.0115) [2024-06-15 12:38:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 43690.7, 300 sec: 46654.0). Total num frames: 185597952. Throughput: 0: 11138.9. Samples: 46465536. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:15,767][1648981] Avg episode reward: [(0, '190.570')] [2024-06-15 12:38:18,766][1651669] Updated weights for policy 0, policy_version 90640 (0.0019) [2024-06-15 12:38:20,766][1651669] Updated weights for policy 0, policy_version 90720 (0.0011) [2024-06-15 12:38:20,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 45329.1, 300 sec: 46874.9). Total num frames: 185794560. Throughput: 0: 11366.5. Samples: 46536192. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:20,767][1648981] Avg episode reward: [(0, '183.620')] [2024-06-15 12:38:23,680][1651669] Updated weights for policy 0, policy_version 90832 (0.0135) [2024-06-15 12:38:24,771][1651669] Updated weights for policy 0, policy_version 90880 (0.0017) [2024-06-15 12:38:25,769][1648981] Fps is (10 sec: 52427.0, 60 sec: 43690.4, 300 sec: 46986.0). Total num frames: 186122240. Throughput: 0: 11223.0. Samples: 46559744. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:25,771][1648981] Avg episode reward: [(0, '179.620')] [2024-06-15 12:38:30,771][1648981] Fps is (10 sec: 36029.3, 60 sec: 44233.7, 300 sec: 46318.8). Total num frames: 186155008. Throughput: 0: 11069.5. Samples: 46638592. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:30,771][1648981] Avg episode reward: [(0, '184.370')] [2024-06-15 12:38:31,601][1651669] Updated weights for policy 0, policy_version 90931 (0.0012) [2024-06-15 12:38:33,705][1651669] Updated weights for policy 0, policy_version 91024 (0.0013) [2024-06-15 12:38:35,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 46421.4, 300 sec: 46878.1). Total num frames: 186580992. Throughput: 0: 11127.5. Samples: 46693888. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:35,767][1648981] Avg episode reward: [(0, '187.340')] [2024-06-15 12:38:35,977][1651669] Updated weights for policy 0, policy_version 91120 (0.0012) [2024-06-15 12:38:40,766][1648981] Fps is (10 sec: 49172.8, 60 sec: 43704.2, 300 sec: 46211.6). Total num frames: 186646528. Throughput: 0: 11002.3. Samples: 46730240. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:40,767][1648981] Avg episode reward: [(0, '181.990')] [2024-06-15 12:38:42,775][1651669] Updated weights for policy 0, policy_version 91168 (0.0038) [2024-06-15 12:38:43,333][1651274] Signal inference workers to stop experience collection... (4800 times) [2024-06-15 12:38:43,367][1651669] InferenceWorker_p0-w0: stopping experience collection (4800 times) [2024-06-15 12:38:43,577][1651274] Signal inference workers to resume experience collection... (4800 times) [2024-06-15 12:38:43,579][1651669] InferenceWorker_p0-w0: resuming experience collection (4800 times) [2024-06-15 12:38:45,147][1651669] Updated weights for policy 0, policy_version 91249 (0.0015) [2024-06-15 12:38:45,767][1648981] Fps is (10 sec: 32767.0, 60 sec: 45328.8, 300 sec: 46208.4). Total num frames: 186908672. Throughput: 0: 11082.9. Samples: 46797824. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:45,769][1648981] Avg episode reward: [(0, '184.870')] [2024-06-15 12:38:47,252][1651669] Updated weights for policy 0, policy_version 91333 (0.0126) [2024-06-15 12:38:48,684][1651669] Updated weights for policy 0, policy_version 91392 (0.0011) [2024-06-15 12:38:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 43690.9, 300 sec: 46208.5). Total num frames: 187170816. Throughput: 0: 10740.6. Samples: 46853120. Policy #0 lag: (min: 12.0, avg: 72.7, max: 268.0) [2024-06-15 12:38:50,767][1648981] Avg episode reward: [(0, '187.350')] [2024-06-15 12:38:55,721][1651669] Updated weights for policy 0, policy_version 91454 (0.0014) [2024-06-15 12:38:55,785][1648981] Fps is (10 sec: 39249.9, 60 sec: 44769.1, 300 sec: 46205.5). Total num frames: 187301888. Throughput: 0: 10861.3. Samples: 46895616. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:38:55,785][1648981] Avg episode reward: [(0, '193.480')] [2024-06-15 12:38:55,792][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000091456_187301888.pth... [2024-06-15 12:38:56,041][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000086128_176390144.pth [2024-06-15 12:38:58,358][1651669] Updated weights for policy 0, policy_version 91552 (0.0133) [2024-06-15 12:38:59,039][1651669] Updated weights for policy 0, policy_version 91580 (0.0010) [2024-06-15 12:39:00,422][1651669] Updated weights for policy 0, policy_version 91639 (0.0012) [2024-06-15 12:39:00,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 44239.6, 300 sec: 46430.6). Total num frames: 187695104. Throughput: 0: 10786.1. Samples: 46950912. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:00,767][1648981] Avg episode reward: [(0, '198.330')] [2024-06-15 12:39:05,767][1648981] Fps is (10 sec: 39393.1, 60 sec: 43690.3, 300 sec: 45986.2). Total num frames: 187695104. Throughput: 0: 11013.6. Samples: 47031808. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:05,768][1648981] Avg episode reward: [(0, '196.200')] [2024-06-15 12:39:07,280][1651669] Updated weights for policy 0, policy_version 91696 (0.0013) [2024-06-15 12:39:08,764][1651669] Updated weights for policy 0, policy_version 91760 (0.0047) [2024-06-15 12:39:10,645][1651669] Updated weights for policy 0, policy_version 91830 (0.0095) [2024-06-15 12:39:10,766][1648981] Fps is (10 sec: 36045.9, 60 sec: 44783.0, 300 sec: 46097.4). Total num frames: 188055552. Throughput: 0: 11150.3. Samples: 47061504. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:10,767][1648981] Avg episode reward: [(0, '190.580')] [2024-06-15 12:39:15,766][1648981] Fps is (10 sec: 52431.5, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 188219392. Throughput: 0: 10787.2. Samples: 47123968. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:15,767][1648981] Avg episode reward: [(0, '187.250')] [2024-06-15 12:39:18,255][1651669] Updated weights for policy 0, policy_version 91920 (0.0014) [2024-06-15 12:39:19,914][1651669] Updated weights for policy 0, policy_version 91984 (0.0018) [2024-06-15 12:39:20,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 44236.8, 300 sec: 45654.9). Total num frames: 188448768. Throughput: 0: 11161.6. Samples: 47196160. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:20,767][1648981] Avg episode reward: [(0, '191.280')] [2024-06-15 12:39:21,148][1651669] Updated weights for policy 0, policy_version 92032 (0.0049) [2024-06-15 12:39:22,045][1651274] Signal inference workers to stop experience collection... (4850 times) [2024-06-15 12:39:22,113][1651669] InferenceWorker_p0-w0: stopping experience collection (4850 times) [2024-06-15 12:39:22,342][1651274] Signal inference workers to resume experience collection... (4850 times) [2024-06-15 12:39:22,343][1651669] InferenceWorker_p0-w0: resuming experience collection (4850 times) [2024-06-15 12:39:22,345][1651669] Updated weights for policy 0, policy_version 92080 (0.0012) [2024-06-15 12:39:23,368][1651669] Updated weights for policy 0, policy_version 92115 (0.0025) [2024-06-15 12:39:24,332][1651669] Updated weights for policy 0, policy_version 92160 (0.0012) [2024-06-15 12:39:25,778][1648981] Fps is (10 sec: 52366.5, 60 sec: 43682.3, 300 sec: 46206.6). Total num frames: 188743680. Throughput: 0: 11056.3. Samples: 47227904. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:25,779][1648981] Avg episode reward: [(0, '186.470')] [2024-06-15 12:39:30,340][1651669] Updated weights for policy 0, policy_version 92226 (0.0013) [2024-06-15 12:39:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 45878.5, 300 sec: 45653.1). Total num frames: 188907520. Throughput: 0: 11264.1. Samples: 47304704. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:30,767][1648981] Avg episode reward: [(0, '178.720')] [2024-06-15 12:39:31,497][1651669] Updated weights for policy 0, policy_version 92275 (0.0020) [2024-06-15 12:39:33,197][1651669] Updated weights for policy 0, policy_version 92341 (0.0013) [2024-06-15 12:39:34,810][1651669] Updated weights for policy 0, policy_version 92371 (0.0012) [2024-06-15 12:39:35,766][1648981] Fps is (10 sec: 52491.3, 60 sec: 44783.0, 300 sec: 46208.6). Total num frames: 189267968. Throughput: 0: 11411.9. Samples: 47366656. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:35,767][1648981] Avg episode reward: [(0, '175.460')] [2024-06-15 12:39:40,759][1651669] Updated weights for policy 0, policy_version 92452 (0.0015) [2024-06-15 12:39:40,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 45542.0). Total num frames: 189333504. Throughput: 0: 11314.2. Samples: 47404544. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:40,767][1648981] Avg episode reward: [(0, '182.670')] [2024-06-15 12:39:41,516][1651669] Updated weights for policy 0, policy_version 92483 (0.0013) [2024-06-15 12:39:43,438][1651669] Updated weights for policy 0, policy_version 92563 (0.0013) [2024-06-15 12:39:45,766][1648981] Fps is (10 sec: 39321.1, 60 sec: 45875.4, 300 sec: 45764.1). Total num frames: 189661184. Throughput: 0: 11423.3. Samples: 47464960. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:45,767][1648981] Avg episode reward: [(0, '188.150')] [2024-06-15 12:39:46,436][1651669] Updated weights for policy 0, policy_version 92609 (0.0021) [2024-06-15 12:39:47,547][1651669] Updated weights for policy 0, policy_version 92664 (0.0016) [2024-06-15 12:39:50,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 43690.6, 300 sec: 45542.1). Total num frames: 189792256. Throughput: 0: 11434.8. Samples: 47546368. Policy #0 lag: (min: 8.0, avg: 57.1, max: 264.0) [2024-06-15 12:39:50,767][1648981] Avg episode reward: [(0, '195.210')] [2024-06-15 12:39:52,187][1651669] Updated weights for policy 0, policy_version 92708 (0.0015) [2024-06-15 12:39:53,641][1651669] Updated weights for policy 0, policy_version 92768 (0.0020) [2024-06-15 12:39:55,466][1651669] Updated weights for policy 0, policy_version 92832 (0.0105) [2024-06-15 12:39:55,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46982.0, 300 sec: 45542.0). Total num frames: 190119936. Throughput: 0: 11514.3. Samples: 47579648. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:39:55,767][1648981] Avg episode reward: [(0, '194.860')] [2024-06-15 12:39:58,822][1651669] Updated weights for policy 0, policy_version 92896 (0.0012) [2024-06-15 12:40:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 43690.9, 300 sec: 45986.3). Total num frames: 190316544. Throughput: 0: 11525.7. Samples: 47642624. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:00,767][1648981] Avg episode reward: [(0, '197.270')] [2024-06-15 12:40:02,828][1651669] Updated weights for policy 0, policy_version 92930 (0.0014) [2024-06-15 12:40:03,907][1651274] Signal inference workers to stop experience collection... (4900 times) [2024-06-15 12:40:03,967][1651669] InferenceWorker_p0-w0: stopping experience collection (4900 times) [2024-06-15 12:40:03,969][1651669] Updated weights for policy 0, policy_version 92980 (0.0128) [2024-06-15 12:40:04,267][1651274] Signal inference workers to resume experience collection... (4900 times) [2024-06-15 12:40:04,268][1651669] InferenceWorker_p0-w0: resuming experience collection (4900 times) [2024-06-15 12:40:05,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 47514.0, 300 sec: 45431.0). Total num frames: 190545920. Throughput: 0: 11480.2. Samples: 47712768. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:05,767][1648981] Avg episode reward: [(0, '198.460')] [2024-06-15 12:40:06,010][1651669] Updated weights for policy 0, policy_version 93056 (0.0012) [2024-06-15 12:40:07,543][1651669] Updated weights for policy 0, policy_version 93119 (0.0016) [2024-06-15 12:40:10,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 44783.1, 300 sec: 45875.2). Total num frames: 190742528. Throughput: 0: 11437.7. Samples: 47742464. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:10,766][1648981] Avg episode reward: [(0, '198.140')] [2024-06-15 12:40:11,470][1651669] Updated weights for policy 0, policy_version 93180 (0.0014) [2024-06-15 12:40:15,420][1651669] Updated weights for policy 0, policy_version 93232 (0.0014) [2024-06-15 12:40:15,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45329.0, 300 sec: 45319.8). Total num frames: 190939136. Throughput: 0: 11434.7. Samples: 47819264. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:15,767][1648981] Avg episode reward: [(0, '194.260')] [2024-06-15 12:40:16,271][1651669] Updated weights for policy 0, policy_version 93264 (0.0014) [2024-06-15 12:40:18,317][1651669] Updated weights for policy 0, policy_version 93344 (0.0012) [2024-06-15 12:40:19,189][1651669] Updated weights for policy 0, policy_version 93374 (0.0013) [2024-06-15 12:40:20,766][1648981] Fps is (10 sec: 49150.6, 60 sec: 46421.2, 300 sec: 45764.1). Total num frames: 191234048. Throughput: 0: 11468.8. Samples: 47882752. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:20,767][1648981] Avg episode reward: [(0, '186.890')] [2024-06-15 12:40:22,494][1651669] Updated weights for policy 0, policy_version 93424 (0.0013) [2024-06-15 12:40:25,648][1651669] Updated weights for policy 0, policy_version 93500 (0.0114) [2024-06-15 12:40:25,777][1648981] Fps is (10 sec: 55648.8, 60 sec: 45876.5, 300 sec: 45762.5). Total num frames: 191496192. Throughput: 0: 11477.6. Samples: 47921152. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:25,777][1648981] Avg episode reward: [(0, '187.010')] [2024-06-15 12:40:28,071][1651669] Updated weights for policy 0, policy_version 93543 (0.0034) [2024-06-15 12:40:29,263][1651669] Updated weights for policy 0, policy_version 93600 (0.0015) [2024-06-15 12:40:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 45764.1). Total num frames: 191758336. Throughput: 0: 11582.6. Samples: 47986176. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:30,767][1648981] Avg episode reward: [(0, '181.110')] [2024-06-15 12:40:33,494][1651669] Updated weights for policy 0, policy_version 93689 (0.0014) [2024-06-15 12:40:35,771][1648981] Fps is (10 sec: 42624.3, 60 sec: 44233.7, 300 sec: 45652.4). Total num frames: 191922176. Throughput: 0: 11649.8. Samples: 48070656. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:35,771][1648981] Avg episode reward: [(0, '168.670')] [2024-06-15 12:40:36,317][1651669] Updated weights for policy 0, policy_version 93744 (0.0014) [2024-06-15 12:40:38,200][1651669] Updated weights for policy 0, policy_version 93801 (0.0026) [2024-06-15 12:40:38,691][1651669] Updated weights for policy 0, policy_version 93824 (0.0087) [2024-06-15 12:40:40,154][1651669] Updated weights for policy 0, policy_version 93879 (0.0013) [2024-06-15 12:40:40,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49152.1, 300 sec: 45764.1). Total num frames: 192282624. Throughput: 0: 11616.7. Samples: 48102400. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:40,767][1648981] Avg episode reward: [(0, '166.940')] [2024-06-15 12:40:43,856][1651274] Signal inference workers to stop experience collection... (4950 times) [2024-06-15 12:40:43,884][1651669] InferenceWorker_p0-w0: stopping experience collection (4950 times) [2024-06-15 12:40:43,912][1651669] Updated weights for policy 0, policy_version 93923 (0.0013) [2024-06-15 12:40:44,063][1651274] Signal inference workers to resume experience collection... (4950 times) [2024-06-15 12:40:44,064][1651669] InferenceWorker_p0-w0: resuming experience collection (4950 times) [2024-06-15 12:40:45,767][1648981] Fps is (10 sec: 49169.1, 60 sec: 45874.8, 300 sec: 45986.3). Total num frames: 192413696. Throughput: 0: 11878.2. Samples: 48177152. Policy #0 lag: (min: 15.0, avg: 80.3, max: 271.0) [2024-06-15 12:40:45,767][1648981] Avg episode reward: [(0, '184.400')] [2024-06-15 12:40:46,099][1651669] Updated weights for policy 0, policy_version 93971 (0.0012) [2024-06-15 12:40:48,098][1651669] Updated weights for policy 0, policy_version 94033 (0.0013) [2024-06-15 12:40:49,586][1651669] Updated weights for policy 0, policy_version 94099 (0.0013) [2024-06-15 12:40:50,471][1651669] Updated weights for policy 0, policy_version 94142 (0.0034) [2024-06-15 12:40:50,798][1648981] Fps is (10 sec: 52261.7, 60 sec: 50217.6, 300 sec: 46096.1). Total num frames: 192806912. Throughput: 0: 11960.9. Samples: 48251392. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:40:50,799][1648981] Avg episode reward: [(0, '185.610')] [2024-06-15 12:40:55,766][1648981] Fps is (10 sec: 52431.6, 60 sec: 46967.4, 300 sec: 46208.4). Total num frames: 192937984. Throughput: 0: 12208.3. Samples: 48291840. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:40:55,767][1648981] Avg episode reward: [(0, '187.410')] [2024-06-15 12:40:55,785][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000094208_192937984.pth... [2024-06-15 12:40:55,868][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000088832_181927936.pth [2024-06-15 12:40:56,751][1651669] Updated weights for policy 0, policy_version 94224 (0.0015) [2024-06-15 12:40:58,240][1651669] Updated weights for policy 0, policy_version 94288 (0.0079) [2024-06-15 12:40:59,062][1651669] Updated weights for policy 0, policy_version 94333 (0.0012) [2024-06-15 12:41:00,675][1651669] Updated weights for policy 0, policy_version 94373 (0.0020) [2024-06-15 12:41:00,768][1648981] Fps is (10 sec: 46012.5, 60 sec: 49150.3, 300 sec: 45985.9). Total num frames: 193265664. Throughput: 0: 12150.9. Samples: 48366080. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:00,769][1648981] Avg episode reward: [(0, '191.730')] [2024-06-15 12:41:05,000][1651669] Updated weights for policy 0, policy_version 94433 (0.0013) [2024-06-15 12:41:05,798][1648981] Fps is (10 sec: 52262.9, 60 sec: 48580.1, 300 sec: 46203.4). Total num frames: 193462272. Throughput: 0: 12313.4. Samples: 48437248. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:05,799][1648981] Avg episode reward: [(0, '189.030')] [2024-06-15 12:41:07,371][1651669] Updated weights for policy 0, policy_version 94480 (0.0078) [2024-06-15 12:41:08,561][1651669] Updated weights for policy 0, policy_version 94521 (0.0014) [2024-06-15 12:41:09,922][1651669] Updated weights for policy 0, policy_version 94586 (0.0016) [2024-06-15 12:41:10,768][1648981] Fps is (10 sec: 45876.8, 60 sec: 49696.6, 300 sec: 46097.1). Total num frames: 193724416. Throughput: 0: 12278.9. Samples: 48473600. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:10,769][1648981] Avg episode reward: [(0, '194.440')] [2024-06-15 12:41:12,097][1651669] Updated weights for policy 0, policy_version 94644 (0.0013) [2024-06-15 12:41:15,766][1648981] Fps is (10 sec: 46021.8, 60 sec: 49698.1, 300 sec: 45986.3). Total num frames: 193921024. Throughput: 0: 12572.5. Samples: 48551936. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:15,767][1648981] Avg episode reward: [(0, '197.020')] [2024-06-15 12:41:15,781][1651669] Updated weights for policy 0, policy_version 94692 (0.0032) [2024-06-15 12:41:17,701][1651669] Updated weights for policy 0, policy_version 94752 (0.0060) [2024-06-15 12:41:19,708][1651669] Updated weights for policy 0, policy_version 94805 (0.0012) [2024-06-15 12:41:20,767][1648981] Fps is (10 sec: 52436.5, 60 sec: 50244.1, 300 sec: 46542.9). Total num frames: 194248704. Throughput: 0: 12209.4. Samples: 48620032. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:20,768][1648981] Avg episode reward: [(0, '200.720')] [2024-06-15 12:41:21,421][1651669] Updated weights for policy 0, policy_version 94855 (0.0013) [2024-06-15 12:41:22,437][1651669] Updated weights for policy 0, policy_version 94903 (0.0012) [2024-06-15 12:41:25,778][1648981] Fps is (10 sec: 45820.6, 60 sec: 48058.4, 300 sec: 45762.3). Total num frames: 194379776. Throughput: 0: 12466.7. Samples: 48663552. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:25,779][1648981] Avg episode reward: [(0, '189.370')] [2024-06-15 12:41:26,064][1651669] Updated weights for policy 0, policy_version 94933 (0.0015) [2024-06-15 12:41:26,446][1651274] Signal inference workers to stop experience collection... (5000 times) [2024-06-15 12:41:26,492][1651669] InferenceWorker_p0-w0: stopping experience collection (5000 times) [2024-06-15 12:41:26,776][1651274] Signal inference workers to resume experience collection... (5000 times) [2024-06-15 12:41:26,778][1651669] InferenceWorker_p0-w0: resuming experience collection (5000 times) [2024-06-15 12:41:27,936][1651669] Updated weights for policy 0, policy_version 94977 (0.0014) [2024-06-15 12:41:29,350][1651669] Updated weights for policy 0, policy_version 95042 (0.0074) [2024-06-15 12:41:30,784][1648981] Fps is (10 sec: 49068.0, 60 sec: 49683.8, 300 sec: 46650.0). Total num frames: 194740224. Throughput: 0: 12476.8. Samples: 48738816. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:30,784][1648981] Avg episode reward: [(0, '190.430')] [2024-06-15 12:41:30,836][1651669] Updated weights for policy 0, policy_version 95097 (0.0012) [2024-06-15 12:41:32,589][1651669] Updated weights for policy 0, policy_version 95140 (0.0013) [2024-06-15 12:41:32,993][1651669] Updated weights for policy 0, policy_version 95168 (0.0013) [2024-06-15 12:41:35,766][1648981] Fps is (10 sec: 52491.2, 60 sec: 49701.5, 300 sec: 45986.3). Total num frames: 194904064. Throughput: 0: 12661.1. Samples: 48820736. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:35,767][1648981] Avg episode reward: [(0, '191.760')] [2024-06-15 12:41:37,101][1651669] Updated weights for policy 0, policy_version 95232 (0.0061) [2024-06-15 12:41:40,586][1651669] Updated weights for policy 0, policy_version 95312 (0.0042) [2024-06-15 12:41:40,766][1648981] Fps is (10 sec: 45954.6, 60 sec: 48605.7, 300 sec: 46763.8). Total num frames: 195198976. Throughput: 0: 12515.6. Samples: 48855040. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:40,767][1648981] Avg episode reward: [(0, '195.990')] [2024-06-15 12:41:41,717][1651669] Updated weights for policy 0, policy_version 95355 (0.0011) [2024-06-15 12:41:43,324][1651669] Updated weights for policy 0, policy_version 95424 (0.0014) [2024-06-15 12:41:45,769][1648981] Fps is (10 sec: 52416.4, 60 sec: 50242.8, 300 sec: 46208.1). Total num frames: 195428352. Throughput: 0: 12265.2. Samples: 48918016. Policy #0 lag: (min: 47.0, avg: 157.4, max: 303.0) [2024-06-15 12:41:45,770][1648981] Avg episode reward: [(0, '191.710')] [2024-06-15 12:41:47,977][1651669] Updated weights for policy 0, policy_version 95477 (0.0014) [2024-06-15 12:41:50,088][1651669] Updated weights for policy 0, policy_version 95493 (0.0013) [2024-06-15 12:41:50,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46992.4, 300 sec: 46431.8). Total num frames: 195624960. Throughput: 0: 12615.5. Samples: 49004544. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:41:50,767][1648981] Avg episode reward: [(0, '189.470')] [2024-06-15 12:41:52,304][1651669] Updated weights for policy 0, policy_version 95584 (0.0016) [2024-06-15 12:41:53,748][1651669] Updated weights for policy 0, policy_version 95649 (0.0096) [2024-06-15 12:41:55,778][1648981] Fps is (10 sec: 52379.1, 60 sec: 50234.4, 300 sec: 46317.6). Total num frames: 195952640. Throughput: 0: 12308.0. Samples: 49027584. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:41:55,779][1648981] Avg episode reward: [(0, '188.970')] [2024-06-15 12:41:57,701][1651669] Updated weights for policy 0, policy_version 95704 (0.0013) [2024-06-15 12:42:00,476][1651669] Updated weights for policy 0, policy_version 95749 (0.0012) [2024-06-15 12:42:00,779][1648981] Fps is (10 sec: 49089.2, 60 sec: 47505.1, 300 sec: 46317.5). Total num frames: 196116480. Throughput: 0: 12489.2. Samples: 49114112. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:00,780][1648981] Avg episode reward: [(0, '193.130')] [2024-06-15 12:42:02,303][1651669] Updated weights for policy 0, policy_version 95824 (0.0112) [2024-06-15 12:42:04,070][1651274] Signal inference workers to stop experience collection... (5050 times) [2024-06-15 12:42:04,108][1651669] Updated weights for policy 0, policy_version 95889 (0.0126) [2024-06-15 12:42:04,191][1651669] InferenceWorker_p0-w0: stopping experience collection (5050 times) [2024-06-15 12:42:04,309][1651274] Signal inference workers to resume experience collection... (5050 times) [2024-06-15 12:42:04,310][1651669] InferenceWorker_p0-w0: resuming experience collection (5050 times) [2024-06-15 12:42:05,767][1648981] Fps is (10 sec: 52489.5, 60 sec: 50270.7, 300 sec: 46763.8). Total num frames: 196476928. Throughput: 0: 12276.6. Samples: 49172480. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:05,767][1648981] Avg episode reward: [(0, '193.990')] [2024-06-15 12:42:08,551][1651669] Updated weights for policy 0, policy_version 95937 (0.0016) [2024-06-15 12:42:09,659][1651669] Updated weights for policy 0, policy_version 95998 (0.0054) [2024-06-15 12:42:10,766][1648981] Fps is (10 sec: 49214.9, 60 sec: 48061.1, 300 sec: 46208.4). Total num frames: 196608000. Throughput: 0: 12291.2. Samples: 49216512. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:10,767][1648981] Avg episode reward: [(0, '187.590')] [2024-06-15 12:42:12,441][1651669] Updated weights for policy 0, policy_version 96050 (0.0014) [2024-06-15 12:42:13,642][1651669] Updated weights for policy 0, policy_version 96098 (0.0014) [2024-06-15 12:42:15,238][1651669] Updated weights for policy 0, policy_version 96161 (0.0035) [2024-06-15 12:42:15,766][1648981] Fps is (10 sec: 49153.9, 60 sec: 50790.4, 300 sec: 47097.1). Total num frames: 196968448. Throughput: 0: 12076.5. Samples: 49282048. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:15,767][1648981] Avg episode reward: [(0, '190.200')] [2024-06-15 12:42:19,517][1651669] Updated weights for policy 0, policy_version 96197 (0.0026) [2024-06-15 12:42:20,493][1651669] Updated weights for policy 0, policy_version 96251 (0.0012) [2024-06-15 12:42:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.9, 300 sec: 46208.4). Total num frames: 197132288. Throughput: 0: 12003.6. Samples: 49360896. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:20,767][1648981] Avg episode reward: [(0, '194.940')] [2024-06-15 12:42:23,504][1651669] Updated weights for policy 0, policy_version 96304 (0.0012) [2024-06-15 12:42:24,468][1651669] Updated weights for policy 0, policy_version 96337 (0.0013) [2024-06-15 12:42:25,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 50254.1, 300 sec: 47097.0). Total num frames: 197394432. Throughput: 0: 11958.0. Samples: 49393152. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:25,767][1648981] Avg episode reward: [(0, '195.640')] [2024-06-15 12:42:26,169][1651669] Updated weights for policy 0, policy_version 96401 (0.0011) [2024-06-15 12:42:27,191][1651669] Updated weights for policy 0, policy_version 96445 (0.0013) [2024-06-15 12:42:30,771][1648981] Fps is (10 sec: 39305.6, 60 sec: 46431.6, 300 sec: 46541.0). Total num frames: 197525504. Throughput: 0: 12094.1. Samples: 49462272. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:30,772][1648981] Avg episode reward: [(0, '205.920')] [2024-06-15 12:42:31,188][1651274] Saving new best policy, reward=205.920! [2024-06-15 12:42:31,780][1651669] Updated weights for policy 0, policy_version 96502 (0.0015) [2024-06-15 12:42:33,696][1651669] Updated weights for policy 0, policy_version 96529 (0.0011) [2024-06-15 12:42:34,517][1651669] Updated weights for policy 0, policy_version 96575 (0.0025) [2024-06-15 12:42:35,780][1648981] Fps is (10 sec: 42542.6, 60 sec: 48595.1, 300 sec: 46764.6). Total num frames: 197820416. Throughput: 0: 11829.4. Samples: 49537024. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:35,780][1648981] Avg episode reward: [(0, '208.670')] [2024-06-15 12:42:36,541][1651274] Saving new best policy, reward=208.670! [2024-06-15 12:42:37,727][1651669] Updated weights for policy 0, policy_version 96672 (0.0013) [2024-06-15 12:42:38,414][1651669] Updated weights for policy 0, policy_version 96702 (0.0029) [2024-06-15 12:42:40,766][1648981] Fps is (10 sec: 52449.7, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 198049792. Throughput: 0: 11779.1. Samples: 49557504. Policy #0 lag: (min: 9.0, avg: 106.3, max: 265.0) [2024-06-15 12:42:40,767][1648981] Avg episode reward: [(0, '208.020')] [2024-06-15 12:42:42,995][1651669] Updated weights for policy 0, policy_version 96756 (0.0013) [2024-06-15 12:42:45,338][1651669] Updated weights for policy 0, policy_version 96800 (0.0107) [2024-06-15 12:42:45,767][1648981] Fps is (10 sec: 45936.1, 60 sec: 47515.4, 300 sec: 46541.7). Total num frames: 198279168. Throughput: 0: 11665.5. Samples: 49638912. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:42:45,767][1648981] Avg episode reward: [(0, '210.970')] [2024-06-15 12:42:45,933][1651274] Saving new best policy, reward=210.970! [2024-06-15 12:42:46,519][1651669] Updated weights for policy 0, policy_version 96833 (0.0015) [2024-06-15 12:42:46,913][1651274] Signal inference workers to stop experience collection... (5100 times) [2024-06-15 12:42:46,993][1651669] InferenceWorker_p0-w0: stopping experience collection (5100 times) [2024-06-15 12:42:47,158][1651274] Signal inference workers to resume experience collection... (5100 times) [2024-06-15 12:42:47,159][1651669] InferenceWorker_p0-w0: resuming experience collection (5100 times) [2024-06-15 12:42:47,807][1651669] Updated weights for policy 0, policy_version 96881 (0.0013) [2024-06-15 12:42:49,457][1651669] Updated weights for policy 0, policy_version 96948 (0.0011) [2024-06-15 12:42:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 198574080. Throughput: 0: 11821.6. Samples: 49704448. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:42:50,767][1648981] Avg episode reward: [(0, '212.800')] [2024-06-15 12:42:50,768][1651274] Saving new best policy, reward=212.800! [2024-06-15 12:42:53,959][1651669] Updated weights for policy 0, policy_version 97018 (0.0143) [2024-06-15 12:42:55,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 45884.1, 300 sec: 46320.1). Total num frames: 198705152. Throughput: 0: 11719.1. Samples: 49743872. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:42:55,767][1648981] Avg episode reward: [(0, '210.580')] [2024-06-15 12:42:55,795][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000097024_198705152.pth... [2024-06-15 12:42:55,883][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000091456_187301888.pth [2024-06-15 12:42:56,813][1651669] Updated weights for policy 0, policy_version 97057 (0.0012) [2024-06-15 12:42:58,355][1651669] Updated weights for policy 0, policy_version 97125 (0.0125) [2024-06-15 12:43:00,072][1651669] Updated weights for policy 0, policy_version 97185 (0.0015) [2024-06-15 12:43:00,745][1651669] Updated weights for policy 0, policy_version 97216 (0.0011) [2024-06-15 12:43:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49708.8, 300 sec: 47541.4). Total num frames: 199098368. Throughput: 0: 11650.8. Samples: 49806336. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:00,767][1648981] Avg episode reward: [(0, '209.570')] [2024-06-15 12:43:05,282][1651669] Updated weights for policy 0, policy_version 97280 (0.0013) [2024-06-15 12:43:05,768][1648981] Fps is (10 sec: 52422.6, 60 sec: 45874.4, 300 sec: 46985.8). Total num frames: 199229440. Throughput: 0: 11525.3. Samples: 49879552. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:05,768][1648981] Avg episode reward: [(0, '200.680')] [2024-06-15 12:43:09,160][1651669] Updated weights for policy 0, policy_version 97349 (0.0019) [2024-06-15 12:43:10,524][1651669] Updated weights for policy 0, policy_version 97403 (0.0024) [2024-06-15 12:43:10,774][1648981] Fps is (10 sec: 39290.8, 60 sec: 48053.5, 300 sec: 47095.8). Total num frames: 199491584. Throughput: 0: 11614.8. Samples: 49915904. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:10,775][1648981] Avg episode reward: [(0, '191.780')] [2024-06-15 12:43:12,197][1651669] Updated weights for policy 0, policy_version 97465 (0.0012) [2024-06-15 12:43:15,766][1648981] Fps is (10 sec: 49159.3, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 199720960. Throughput: 0: 11720.2. Samples: 49989632. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:15,767][1648981] Avg episode reward: [(0, '205.060')] [2024-06-15 12:43:15,767][1651669] Updated weights for policy 0, policy_version 97520 (0.0119) [2024-06-15 12:43:18,868][1651669] Updated weights for policy 0, policy_version 97540 (0.0012) [2024-06-15 12:43:20,312][1651669] Updated weights for policy 0, policy_version 97616 (0.0014) [2024-06-15 12:43:20,767][1648981] Fps is (10 sec: 45910.0, 60 sec: 46967.3, 300 sec: 46874.9). Total num frames: 199950336. Throughput: 0: 11563.2. Samples: 50057216. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:20,767][1648981] Avg episode reward: [(0, '205.910')] [2024-06-15 12:43:21,437][1651669] Updated weights for policy 0, policy_version 97664 (0.0012) [2024-06-15 12:43:23,244][1651669] Updated weights for policy 0, policy_version 97718 (0.0016) [2024-06-15 12:43:25,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 46421.3, 300 sec: 47542.0). Total num frames: 200179712. Throughput: 0: 11764.6. Samples: 50086912. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:25,767][1648981] Avg episode reward: [(0, '202.180')] [2024-06-15 12:43:26,293][1651669] Updated weights for policy 0, policy_version 97768 (0.0012) [2024-06-15 12:43:29,832][1651669] Updated weights for policy 0, policy_version 97793 (0.0012) [2024-06-15 12:43:30,138][1651274] Signal inference workers to stop experience collection... (5150 times) [2024-06-15 12:43:30,166][1651669] InferenceWorker_p0-w0: stopping experience collection (5150 times) [2024-06-15 12:43:30,318][1651274] Signal inference workers to resume experience collection... (5150 times) [2024-06-15 12:43:30,319][1651669] InferenceWorker_p0-w0: resuming experience collection (5150 times) [2024-06-15 12:43:30,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 47516.8, 300 sec: 46763.8). Total num frames: 200376320. Throughput: 0: 11821.5. Samples: 50170880. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:30,767][1648981] Avg episode reward: [(0, '204.820')] [2024-06-15 12:43:32,339][1651669] Updated weights for policy 0, policy_version 97904 (0.0016) [2024-06-15 12:43:34,437][1651669] Updated weights for policy 0, policy_version 97940 (0.0011) [2024-06-15 12:43:35,769][1648981] Fps is (10 sec: 49141.4, 60 sec: 47522.3, 300 sec: 47541.0). Total num frames: 200671232. Throughput: 0: 11729.9. Samples: 50232320. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:35,769][1648981] Avg episode reward: [(0, '203.490')] [2024-06-15 12:43:37,131][1651669] Updated weights for policy 0, policy_version 98002 (0.0011) [2024-06-15 12:43:40,776][1648981] Fps is (10 sec: 42558.2, 60 sec: 45868.0, 300 sec: 47095.6). Total num frames: 200802304. Throughput: 0: 11659.8. Samples: 50268672. Policy #0 lag: (min: 26.0, avg: 116.2, max: 282.0) [2024-06-15 12:43:40,777][1648981] Avg episode reward: [(0, '203.100')] [2024-06-15 12:43:40,900][1651669] Updated weights for policy 0, policy_version 98049 (0.0014) [2024-06-15 12:43:42,457][1651669] Updated weights for policy 0, policy_version 98115 (0.0013) [2024-06-15 12:43:44,010][1651669] Updated weights for policy 0, policy_version 98170 (0.0123) [2024-06-15 12:43:45,623][1651669] Updated weights for policy 0, policy_version 98196 (0.0011) [2024-06-15 12:43:45,766][1648981] Fps is (10 sec: 45886.2, 60 sec: 47513.7, 300 sec: 47319.2). Total num frames: 201129984. Throughput: 0: 11696.4. Samples: 50332672. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:43:45,767][1648981] Avg episode reward: [(0, '204.680')] [2024-06-15 12:43:46,380][1651669] Updated weights for policy 0, policy_version 98240 (0.0013) [2024-06-15 12:43:49,613][1651669] Updated weights for policy 0, policy_version 98293 (0.0015) [2024-06-15 12:43:50,766][1648981] Fps is (10 sec: 52478.7, 60 sec: 45875.2, 300 sec: 47544.4). Total num frames: 201326592. Throughput: 0: 11730.9. Samples: 50407424. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:43:50,767][1648981] Avg episode reward: [(0, '205.920')] [2024-06-15 12:43:53,330][1651669] Updated weights for policy 0, policy_version 98339 (0.0171) [2024-06-15 12:43:54,337][1651669] Updated weights for policy 0, policy_version 98384 (0.0013) [2024-06-15 12:43:55,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 201588736. Throughput: 0: 11630.1. Samples: 50439168. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:43:55,767][1648981] Avg episode reward: [(0, '199.450')] [2024-06-15 12:43:56,744][1651669] Updated weights for policy 0, policy_version 98448 (0.0018) [2024-06-15 12:43:59,306][1651669] Updated weights for policy 0, policy_version 98498 (0.0050) [2024-06-15 12:44:00,770][1648981] Fps is (10 sec: 52408.1, 60 sec: 45872.2, 300 sec: 47985.1). Total num frames: 201850880. Throughput: 0: 11695.3. Samples: 50515968. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:00,771][1648981] Avg episode reward: [(0, '195.870')] [2024-06-15 12:44:04,014][1651669] Updated weights for policy 0, policy_version 98561 (0.0014) [2024-06-15 12:44:05,717][1651669] Updated weights for policy 0, policy_version 98640 (0.0012) [2024-06-15 12:44:05,774][1648981] Fps is (10 sec: 42565.7, 60 sec: 46416.4, 300 sec: 47318.0). Total num frames: 202014720. Throughput: 0: 11694.4. Samples: 50583552. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:05,775][1648981] Avg episode reward: [(0, '198.110')] [2024-06-15 12:44:06,934][1651669] Updated weights for policy 0, policy_version 98688 (0.0013) [2024-06-15 12:44:08,977][1651669] Updated weights for policy 0, policy_version 98752 (0.0176) [2024-06-15 12:44:10,778][1648981] Fps is (10 sec: 39290.8, 60 sec: 45872.2, 300 sec: 47539.5). Total num frames: 202244096. Throughput: 0: 11602.4. Samples: 50609152. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:10,779][1648981] Avg episode reward: [(0, '190.260')] [2024-06-15 12:44:11,388][1651274] Signal inference workers to stop experience collection... (5200 times) [2024-06-15 12:44:11,438][1651669] InferenceWorker_p0-w0: stopping experience collection (5200 times) [2024-06-15 12:44:11,579][1651274] Signal inference workers to resume experience collection... (5200 times) [2024-06-15 12:44:11,580][1651669] InferenceWorker_p0-w0: resuming experience collection (5200 times) [2024-06-15 12:44:11,783][1651669] Updated weights for policy 0, policy_version 98805 (0.0092) [2024-06-15 12:44:15,257][1651669] Updated weights for policy 0, policy_version 98833 (0.0011) [2024-06-15 12:44:15,766][1648981] Fps is (10 sec: 45911.3, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 202473472. Throughput: 0: 11673.6. Samples: 50696192. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:15,767][1648981] Avg episode reward: [(0, '189.030')] [2024-06-15 12:44:16,859][1651669] Updated weights for policy 0, policy_version 98914 (0.0019) [2024-06-15 12:44:18,692][1651669] Updated weights for policy 0, policy_version 98963 (0.0014) [2024-06-15 12:44:20,766][1648981] Fps is (10 sec: 52490.8, 60 sec: 46967.7, 300 sec: 47543.3). Total num frames: 202768384. Throughput: 0: 11697.0. Samples: 50758656. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:20,767][1648981] Avg episode reward: [(0, '190.210')] [2024-06-15 12:44:21,886][1651669] Updated weights for policy 0, policy_version 99040 (0.0026) [2024-06-15 12:44:25,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 45329.0, 300 sec: 47430.2). Total num frames: 202899456. Throughput: 0: 11869.4. Samples: 50802688. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:25,767][1648981] Avg episode reward: [(0, '194.700')] [2024-06-15 12:44:25,998][1651669] Updated weights for policy 0, policy_version 99088 (0.0012) [2024-06-15 12:44:27,408][1651669] Updated weights for policy 0, policy_version 99157 (0.0013) [2024-06-15 12:44:29,309][1651669] Updated weights for policy 0, policy_version 99234 (0.0016) [2024-06-15 12:44:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 203292672. Throughput: 0: 11867.0. Samples: 50866688. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:30,767][1648981] Avg episode reward: [(0, '192.140')] [2024-06-15 12:44:32,441][1651669] Updated weights for policy 0, policy_version 99280 (0.0018) [2024-06-15 12:44:35,766][1648981] Fps is (10 sec: 52430.7, 60 sec: 45877.0, 300 sec: 47763.5). Total num frames: 203423744. Throughput: 0: 12026.3. Samples: 50948608. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:35,767][1648981] Avg episode reward: [(0, '198.300')] [2024-06-15 12:44:37,152][1651669] Updated weights for policy 0, policy_version 99345 (0.0143) [2024-06-15 12:44:38,677][1651669] Updated weights for policy 0, policy_version 99414 (0.0014) [2024-06-15 12:44:40,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49159.7, 300 sec: 47763.5). Total num frames: 203751424. Throughput: 0: 12049.1. Samples: 50981376. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:40,767][1648981] Avg episode reward: [(0, '201.640')] [2024-06-15 12:44:40,794][1651669] Updated weights for policy 0, policy_version 99504 (0.0014) [2024-06-15 12:44:44,089][1651669] Updated weights for policy 0, policy_version 99553 (0.0017) [2024-06-15 12:44:45,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 46967.3, 300 sec: 47985.7). Total num frames: 203948032. Throughput: 0: 11799.8. Samples: 51046912. Policy #0 lag: (min: 5.0, avg: 135.4, max: 261.0) [2024-06-15 12:44:45,767][1648981] Avg episode reward: [(0, '205.440')] [2024-06-15 12:44:48,152][1651669] Updated weights for policy 0, policy_version 99616 (0.0013) [2024-06-15 12:44:49,653][1651669] Updated weights for policy 0, policy_version 99683 (0.0095) [2024-06-15 12:44:50,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 204242944. Throughput: 0: 12039.8. Samples: 51125248. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:44:50,767][1648981] Avg episode reward: [(0, '202.740')] [2024-06-15 12:44:50,900][1651274] Signal inference workers to stop experience collection... (5250 times) [2024-06-15 12:44:50,960][1651669] InferenceWorker_p0-w0: stopping experience collection (5250 times) [2024-06-15 12:44:51,193][1651274] Signal inference workers to resume experience collection... (5250 times) [2024-06-15 12:44:51,202][1651669] InferenceWorker_p0-w0: resuming experience collection (5250 times) [2024-06-15 12:44:51,204][1651669] Updated weights for policy 0, policy_version 99744 (0.0013) [2024-06-15 12:44:54,212][1651669] Updated weights for policy 0, policy_version 99779 (0.0024) [2024-06-15 12:44:55,156][1651669] Updated weights for policy 0, policy_version 99828 (0.0012) [2024-06-15 12:44:55,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 204472320. Throughput: 0: 12211.6. Samples: 51158528. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:44:55,767][1648981] Avg episode reward: [(0, '218.780')] [2024-06-15 12:44:55,813][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000099840_204472320.pth... [2024-06-15 12:44:55,861][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000094208_192937984.pth [2024-06-15 12:44:55,864][1651274] Saving new best policy, reward=218.780! [2024-06-15 12:44:59,013][1651669] Updated weights for policy 0, policy_version 99872 (0.0047) [2024-06-15 12:45:00,672][1651669] Updated weights for policy 0, policy_version 99936 (0.0015) [2024-06-15 12:45:00,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 46970.3, 300 sec: 47874.5). Total num frames: 204668928. Throughput: 0: 11957.9. Samples: 51234304. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:00,767][1648981] Avg episode reward: [(0, '213.970')] [2024-06-15 12:45:02,476][1651669] Updated weights for policy 0, policy_version 99988 (0.0014) [2024-06-15 12:45:03,468][1651669] Updated weights for policy 0, policy_version 100032 (0.0027) [2024-06-15 12:45:05,794][1648981] Fps is (10 sec: 45747.0, 60 sec: 48589.5, 300 sec: 48092.2). Total num frames: 204931072. Throughput: 0: 11984.7. Samples: 51298304. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:05,795][1648981] Avg episode reward: [(0, '213.560')] [2024-06-15 12:45:09,721][1651669] Updated weights for policy 0, policy_version 100099 (0.0015) [2024-06-15 12:45:10,767][1648981] Fps is (10 sec: 39322.5, 60 sec: 46976.6, 300 sec: 47874.6). Total num frames: 205062144. Throughput: 0: 11924.0. Samples: 51339264. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:10,767][1648981] Avg episode reward: [(0, '202.460')] [2024-06-15 12:45:12,075][1651669] Updated weights for policy 0, policy_version 100178 (0.0017) [2024-06-15 12:45:13,848][1651669] Updated weights for policy 0, policy_version 100240 (0.0015) [2024-06-15 12:45:15,766][1648981] Fps is (10 sec: 46003.7, 60 sec: 48605.7, 300 sec: 47985.7). Total num frames: 205389824. Throughput: 0: 11867.0. Samples: 51400704. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:15,767][1648981] Avg episode reward: [(0, '200.850')] [2024-06-15 12:45:16,970][1651669] Updated weights for policy 0, policy_version 100322 (0.0017) [2024-06-15 12:45:20,774][1648981] Fps is (10 sec: 45839.9, 60 sec: 45869.2, 300 sec: 47541.8). Total num frames: 205520896. Throughput: 0: 11751.2. Samples: 51477504. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:20,775][1648981] Avg episode reward: [(0, '203.110')] [2024-06-15 12:45:21,325][1651669] Updated weights for policy 0, policy_version 100353 (0.0012) [2024-06-15 12:45:22,963][1651669] Updated weights for policy 0, policy_version 100420 (0.0015) [2024-06-15 12:45:24,969][1651669] Updated weights for policy 0, policy_version 100485 (0.0014) [2024-06-15 12:45:25,768][1648981] Fps is (10 sec: 49143.7, 60 sec: 49696.9, 300 sec: 47874.3). Total num frames: 205881344. Throughput: 0: 11650.4. Samples: 51505664. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:25,769][1648981] Avg episode reward: [(0, '196.370')] [2024-06-15 12:45:25,974][1651669] Updated weights for policy 0, policy_version 100544 (0.0014) [2024-06-15 12:45:28,432][1651669] Updated weights for policy 0, policy_version 100601 (0.0013) [2024-06-15 12:45:30,766][1648981] Fps is (10 sec: 52469.7, 60 sec: 45875.2, 300 sec: 47875.3). Total num frames: 206045184. Throughput: 0: 11730.5. Samples: 51574784. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:30,767][1648981] Avg episode reward: [(0, '191.920')] [2024-06-15 12:45:33,195][1651669] Updated weights for policy 0, policy_version 100640 (0.0081) [2024-06-15 12:45:34,665][1651274] Signal inference workers to stop experience collection... (5300 times) [2024-06-15 12:45:34,711][1651669] InferenceWorker_p0-w0: stopping experience collection (5300 times) [2024-06-15 12:45:34,955][1651274] Signal inference workers to resume experience collection... (5300 times) [2024-06-15 12:45:34,955][1651669] InferenceWorker_p0-w0: resuming experience collection (5300 times) [2024-06-15 12:45:35,143][1651669] Updated weights for policy 0, policy_version 100729 (0.0014) [2024-06-15 12:45:35,766][1648981] Fps is (10 sec: 42606.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 206307328. Throughput: 0: 11537.1. Samples: 51644416. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:35,767][1648981] Avg episode reward: [(0, '193.260')] [2024-06-15 12:45:37,162][1651669] Updated weights for policy 0, policy_version 100798 (0.0091) [2024-06-15 12:45:40,147][1651669] Updated weights for policy 0, policy_version 100857 (0.0013) [2024-06-15 12:45:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.5, 300 sec: 47985.8). Total num frames: 206569472. Throughput: 0: 11571.2. Samples: 51679232. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:40,767][1648981] Avg episode reward: [(0, '187.860')] [2024-06-15 12:45:45,033][1651669] Updated weights for policy 0, policy_version 100896 (0.0012) [2024-06-15 12:45:45,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 45875.3, 300 sec: 47102.2). Total num frames: 206700544. Throughput: 0: 11468.9. Samples: 51750400. Policy #0 lag: (min: 15.0, avg: 88.3, max: 271.0) [2024-06-15 12:45:45,767][1648981] Avg episode reward: [(0, '190.600')] [2024-06-15 12:45:46,399][1651669] Updated weights for policy 0, policy_version 100960 (0.0013) [2024-06-15 12:45:47,275][1651669] Updated weights for policy 0, policy_version 100992 (0.0127) [2024-06-15 12:45:48,627][1651669] Updated weights for policy 0, policy_version 101049 (0.0014) [2024-06-15 12:45:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 45875.1, 300 sec: 47652.5). Total num frames: 206995456. Throughput: 0: 11532.9. Samples: 51816960. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:45:50,767][1648981] Avg episode reward: [(0, '190.910')] [2024-06-15 12:45:51,592][1651669] Updated weights for policy 0, policy_version 101104 (0.0013) [2024-06-15 12:45:55,557][1651669] Updated weights for policy 0, policy_version 101122 (0.0012) [2024-06-15 12:45:55,790][1648981] Fps is (10 sec: 42496.9, 60 sec: 44219.2, 300 sec: 46982.5). Total num frames: 207126528. Throughput: 0: 11360.4. Samples: 51850752. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:45:55,791][1648981] Avg episode reward: [(0, '189.640')] [2024-06-15 12:45:57,019][1651669] Updated weights for policy 0, policy_version 101200 (0.0015) [2024-06-15 12:45:58,004][1651669] Updated weights for policy 0, policy_version 101248 (0.0129) [2024-06-15 12:45:59,486][1651669] Updated weights for policy 0, policy_version 101307 (0.0014) [2024-06-15 12:46:00,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46967.7, 300 sec: 47546.5). Total num frames: 207486976. Throughput: 0: 11502.9. Samples: 51918336. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:00,767][1648981] Avg episode reward: [(0, '195.240')] [2024-06-15 12:46:02,588][1651669] Updated weights for policy 0, policy_version 101351 (0.0019) [2024-06-15 12:46:03,129][1651669] Updated weights for policy 0, policy_version 101376 (0.0012) [2024-06-15 12:46:05,766][1648981] Fps is (10 sec: 49269.8, 60 sec: 44803.9, 300 sec: 47097.3). Total num frames: 207618048. Throughput: 0: 11675.6. Samples: 52002816. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:05,767][1648981] Avg episode reward: [(0, '193.820')] [2024-06-15 12:46:07,696][1651669] Updated weights for policy 0, policy_version 101437 (0.0145) [2024-06-15 12:46:09,571][1651669] Updated weights for policy 0, policy_version 101520 (0.0018) [2024-06-15 12:46:10,668][1651669] Updated weights for policy 0, policy_version 101566 (0.0015) [2024-06-15 12:46:10,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 208011264. Throughput: 0: 11685.4. Samples: 52031488. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:10,767][1648981] Avg episode reward: [(0, '196.580')] [2024-06-15 12:46:13,596][1651669] Updated weights for policy 0, policy_version 101616 (0.0015) [2024-06-15 12:46:15,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 208142336. Throughput: 0: 11685.0. Samples: 52100608. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:15,767][1648981] Avg episode reward: [(0, '197.670')] [2024-06-15 12:46:17,332][1651669] Updated weights for policy 0, policy_version 101648 (0.0012) [2024-06-15 12:46:18,396][1651669] Updated weights for policy 0, policy_version 101693 (0.0014) [2024-06-15 12:46:18,805][1651274] Signal inference workers to stop experience collection... (5350 times) [2024-06-15 12:46:18,867][1651669] InferenceWorker_p0-w0: stopping experience collection (5350 times) [2024-06-15 12:46:19,039][1651274] Signal inference workers to resume experience collection... (5350 times) [2024-06-15 12:46:19,040][1651669] InferenceWorker_p0-w0: resuming experience collection (5350 times) [2024-06-15 12:46:20,275][1651669] Updated weights for policy 0, policy_version 101761 (0.0014) [2024-06-15 12:46:20,767][1648981] Fps is (10 sec: 42598.6, 60 sec: 48612.1, 300 sec: 47654.4). Total num frames: 208437248. Throughput: 0: 11719.1. Samples: 52171776. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:20,767][1648981] Avg episode reward: [(0, '196.290')] [2024-06-15 12:46:21,512][1651669] Updated weights for policy 0, policy_version 101822 (0.0037) [2024-06-15 12:46:24,981][1651669] Updated weights for policy 0, policy_version 101877 (0.0021) [2024-06-15 12:46:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46422.6, 300 sec: 47210.9). Total num frames: 208666624. Throughput: 0: 11764.6. Samples: 52208640. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:25,767][1648981] Avg episode reward: [(0, '193.790')] [2024-06-15 12:46:28,714][1651669] Updated weights for policy 0, policy_version 101936 (0.0012) [2024-06-15 12:46:30,749][1651669] Updated weights for policy 0, policy_version 102000 (0.0101) [2024-06-15 12:46:30,767][1648981] Fps is (10 sec: 45872.3, 60 sec: 47513.0, 300 sec: 47430.2). Total num frames: 208896000. Throughput: 0: 11901.0. Samples: 52285952. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:30,768][1648981] Avg episode reward: [(0, '189.790')] [2024-06-15 12:46:32,364][1651669] Updated weights for policy 0, policy_version 102064 (0.0013) [2024-06-15 12:46:35,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 209092608. Throughput: 0: 11878.4. Samples: 52351488. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:35,767][1648981] Avg episode reward: [(0, '192.400')] [2024-06-15 12:46:36,103][1651669] Updated weights for policy 0, policy_version 102112 (0.0043) [2024-06-15 12:46:39,340][1651669] Updated weights for policy 0, policy_version 102176 (0.0013) [2024-06-15 12:46:40,766][1648981] Fps is (10 sec: 42601.2, 60 sec: 45875.2, 300 sec: 47097.4). Total num frames: 209321984. Throughput: 0: 11998.5. Samples: 52390400. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:40,767][1648981] Avg episode reward: [(0, '188.690')] [2024-06-15 12:46:41,574][1651669] Updated weights for policy 0, policy_version 102224 (0.0020) [2024-06-15 12:46:43,145][1651669] Updated weights for policy 0, policy_version 102273 (0.0014) [2024-06-15 12:46:45,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 209584128. Throughput: 0: 11798.7. Samples: 52449280. Policy #0 lag: (min: 52.0, avg: 179.8, max: 292.0) [2024-06-15 12:46:45,767][1648981] Avg episode reward: [(0, '183.520')] [2024-06-15 12:46:47,221][1651669] Updated weights for policy 0, policy_version 102337 (0.0016) [2024-06-15 12:46:48,068][1651669] Updated weights for policy 0, policy_version 102391 (0.0012) [2024-06-15 12:46:50,132][1651669] Updated weights for policy 0, policy_version 102435 (0.0014) [2024-06-15 12:46:50,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 47513.5, 300 sec: 47098.9). Total num frames: 209846272. Throughput: 0: 11821.4. Samples: 52534784. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:46:50,767][1648981] Avg episode reward: [(0, '184.010')] [2024-06-15 12:46:52,504][1651669] Updated weights for policy 0, policy_version 102496 (0.0025) [2024-06-15 12:46:54,385][1651669] Updated weights for policy 0, policy_version 102576 (0.0013) [2024-06-15 12:46:55,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49717.9, 300 sec: 47432.4). Total num frames: 210108416. Throughput: 0: 11901.2. Samples: 52567040. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:46:55,767][1648981] Avg episode reward: [(0, '183.970')] [2024-06-15 12:46:55,778][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000102592_210108416.pth... [2024-06-15 12:46:55,862][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000097024_198705152.pth [2024-06-15 12:46:58,256][1651669] Updated weights for policy 0, policy_version 102626 (0.0012) [2024-06-15 12:47:00,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46421.3, 300 sec: 46763.9). Total num frames: 210272256. Throughput: 0: 11980.8. Samples: 52639744. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:00,769][1648981] Avg episode reward: [(0, '183.180')] [2024-06-15 12:47:00,834][1651274] Signal inference workers to stop experience collection... (5400 times) [2024-06-15 12:47:00,912][1651669] Updated weights for policy 0, policy_version 102676 (0.0014) [2024-06-15 12:47:00,938][1651669] InferenceWorker_p0-w0: stopping experience collection (5400 times) [2024-06-15 12:47:01,065][1651274] Signal inference workers to resume experience collection... (5400 times) [2024-06-15 12:47:01,066][1651669] InferenceWorker_p0-w0: resuming experience collection (5400 times) [2024-06-15 12:47:02,666][1651669] Updated weights for policy 0, policy_version 102722 (0.0017) [2024-06-15 12:47:04,547][1651669] Updated weights for policy 0, policy_version 102800 (0.0023) [2024-06-15 12:47:05,453][1651669] Updated weights for policy 0, policy_version 102848 (0.0013) [2024-06-15 12:47:05,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 210632704. Throughput: 0: 11935.3. Samples: 52708864. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:05,767][1648981] Avg episode reward: [(0, '187.960')] [2024-06-15 12:47:08,987][1651669] Updated weights for policy 0, policy_version 102909 (0.0013) [2024-06-15 12:47:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45875.3, 300 sec: 46763.8). Total num frames: 210763776. Throughput: 0: 11992.2. Samples: 52748288. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:10,767][1648981] Avg episode reward: [(0, '182.460')] [2024-06-15 12:47:11,854][1651669] Updated weights for policy 0, policy_version 102960 (0.0014) [2024-06-15 12:47:13,736][1651669] Updated weights for policy 0, policy_version 102996 (0.0014) [2024-06-15 12:47:14,890][1651669] Updated weights for policy 0, policy_version 103045 (0.0013) [2024-06-15 12:47:15,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49152.1, 300 sec: 47319.2). Total num frames: 211091456. Throughput: 0: 12094.8. Samples: 52830208. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:15,767][1648981] Avg episode reward: [(0, '186.280')] [2024-06-15 12:47:16,211][1651669] Updated weights for policy 0, policy_version 103101 (0.0105) [2024-06-15 12:47:19,453][1651669] Updated weights for policy 0, policy_version 103165 (0.0101) [2024-06-15 12:47:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.7, 300 sec: 47097.1). Total num frames: 211288064. Throughput: 0: 12151.5. Samples: 52898304. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:20,767][1648981] Avg episode reward: [(0, '184.120')] [2024-06-15 12:47:23,412][1651669] Updated weights for policy 0, policy_version 103222 (0.0022) [2024-06-15 12:47:25,155][1651669] Updated weights for policy 0, policy_version 103264 (0.0013) [2024-06-15 12:47:25,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 47513.4, 300 sec: 47430.9). Total num frames: 211517440. Throughput: 0: 12083.1. Samples: 52934144. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:25,767][1648981] Avg episode reward: [(0, '187.450')] [2024-06-15 12:47:27,133][1651669] Updated weights for policy 0, policy_version 103354 (0.0014) [2024-06-15 12:47:30,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 47514.2, 300 sec: 47210.3). Total num frames: 211746816. Throughput: 0: 12310.8. Samples: 53003264. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:30,767][1648981] Avg episode reward: [(0, '185.000')] [2024-06-15 12:47:30,968][1651669] Updated weights for policy 0, policy_version 103411 (0.0013) [2024-06-15 12:47:34,311][1651669] Updated weights for policy 0, policy_version 103456 (0.0011) [2024-06-15 12:47:35,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 211943424. Throughput: 0: 11946.7. Samples: 53072384. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:35,767][1648981] Avg episode reward: [(0, '191.820')] [2024-06-15 12:47:36,138][1651669] Updated weights for policy 0, policy_version 103504 (0.0011) [2024-06-15 12:47:38,254][1651669] Updated weights for policy 0, policy_version 103586 (0.0118) [2024-06-15 12:47:40,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 212205568. Throughput: 0: 11832.9. Samples: 53099520. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:40,767][1648981] Avg episode reward: [(0, '193.360')] [2024-06-15 12:47:41,268][1651669] Updated weights for policy 0, policy_version 103632 (0.0012) [2024-06-15 12:47:42,364][1651669] Updated weights for policy 0, policy_version 103674 (0.0013) [2024-06-15 12:47:45,141][1651274] Signal inference workers to stop experience collection... (5450 times) [2024-06-15 12:47:45,231][1651669] InferenceWorker_p0-w0: stopping experience collection (5450 times) [2024-06-15 12:47:45,384][1651274] Signal inference workers to resume experience collection... (5450 times) [2024-06-15 12:47:45,385][1651669] InferenceWorker_p0-w0: resuming experience collection (5450 times) [2024-06-15 12:47:45,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 212402176. Throughput: 0: 11878.4. Samples: 53174272. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:45,767][1648981] Avg episode reward: [(0, '196.510')] [2024-06-15 12:47:45,896][1651669] Updated weights for policy 0, policy_version 103714 (0.0013) [2024-06-15 12:47:47,373][1651669] Updated weights for policy 0, policy_version 103764 (0.0014) [2024-06-15 12:47:49,249][1651669] Updated weights for policy 0, policy_version 103845 (0.0013) [2024-06-15 12:47:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 212729856. Throughput: 0: 11935.3. Samples: 53245952. Policy #0 lag: (min: 15.0, avg: 119.6, max: 271.0) [2024-06-15 12:47:50,767][1648981] Avg episode reward: [(0, '204.840')] [2024-06-15 12:47:52,340][1651669] Updated weights for policy 0, policy_version 103888 (0.0012) [2024-06-15 12:47:53,390][1651669] Updated weights for policy 0, policy_version 103928 (0.0014) [2024-06-15 12:47:55,767][1648981] Fps is (10 sec: 45871.1, 60 sec: 45874.5, 300 sec: 46652.6). Total num frames: 212860928. Throughput: 0: 11809.9. Samples: 53279744. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:47:55,768][1648981] Avg episode reward: [(0, '199.670')] [2024-06-15 12:47:56,443][1651669] Updated weights for policy 0, policy_version 103968 (0.0037) [2024-06-15 12:47:58,573][1651669] Updated weights for policy 0, policy_version 104032 (0.0011) [2024-06-15 12:48:00,508][1651669] Updated weights for policy 0, policy_version 104112 (0.0017) [2024-06-15 12:48:00,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 49152.0, 300 sec: 47430.5). Total num frames: 213221376. Throughput: 0: 11616.7. Samples: 53352960. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:00,767][1648981] Avg episode reward: [(0, '200.170')] [2024-06-15 12:48:03,620][1651669] Updated weights for policy 0, policy_version 104163 (0.0012) [2024-06-15 12:48:05,767][1648981] Fps is (10 sec: 52432.1, 60 sec: 45875.0, 300 sec: 47098.3). Total num frames: 213385216. Throughput: 0: 11719.0. Samples: 53425664. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:05,767][1648981] Avg episode reward: [(0, '198.450')] [2024-06-15 12:48:06,897][1651669] Updated weights for policy 0, policy_version 104195 (0.0013) [2024-06-15 12:48:07,907][1651669] Updated weights for policy 0, policy_version 104251 (0.0129) [2024-06-15 12:48:10,112][1651669] Updated weights for policy 0, policy_version 104304 (0.0010) [2024-06-15 12:48:10,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 213647360. Throughput: 0: 11730.6. Samples: 53462016. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:10,767][1648981] Avg episode reward: [(0, '199.030')] [2024-06-15 12:48:11,665][1651669] Updated weights for policy 0, policy_version 104368 (0.0013) [2024-06-15 12:48:14,163][1651669] Updated weights for policy 0, policy_version 104416 (0.0014) [2024-06-15 12:48:15,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 46967.5, 300 sec: 47319.3). Total num frames: 213909504. Throughput: 0: 11696.4. Samples: 53529600. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:15,767][1648981] Avg episode reward: [(0, '205.140')] [2024-06-15 12:48:18,247][1651669] Updated weights for policy 0, policy_version 104480 (0.0012) [2024-06-15 12:48:20,754][1651669] Updated weights for policy 0, policy_version 104544 (0.0013) [2024-06-15 12:48:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 47208.2). Total num frames: 214106112. Throughput: 0: 11901.1. Samples: 53607936. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:20,767][1648981] Avg episode reward: [(0, '200.990')] [2024-06-15 12:48:22,676][1651669] Updated weights for policy 0, policy_version 104636 (0.0097) [2024-06-15 12:48:24,959][1651274] Signal inference workers to stop experience collection... (5500 times) [2024-06-15 12:48:25,003][1651669] InferenceWorker_p0-w0: stopping experience collection (5500 times) [2024-06-15 12:48:25,213][1651274] Signal inference workers to resume experience collection... (5500 times) [2024-06-15 12:48:25,222][1651669] InferenceWorker_p0-w0: resuming experience collection (5500 times) [2024-06-15 12:48:25,670][1651669] Updated weights for policy 0, policy_version 104693 (0.0012) [2024-06-15 12:48:25,767][1648981] Fps is (10 sec: 49146.8, 60 sec: 48059.1, 300 sec: 47541.2). Total num frames: 214401024. Throughput: 0: 11935.0. Samples: 53636608. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:25,768][1648981] Avg episode reward: [(0, '197.470')] [2024-06-15 12:48:28,944][1651669] Updated weights for policy 0, policy_version 104736 (0.0039) [2024-06-15 12:48:30,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47097.4). Total num frames: 214564864. Throughput: 0: 11992.2. Samples: 53713920. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:30,767][1648981] Avg episode reward: [(0, '189.050')] [2024-06-15 12:48:32,116][1651669] Updated weights for policy 0, policy_version 104803 (0.0012) [2024-06-15 12:48:33,870][1651669] Updated weights for policy 0, policy_version 104880 (0.0013) [2024-06-15 12:48:35,436][1651669] Updated weights for policy 0, policy_version 104944 (0.0013) [2024-06-15 12:48:35,774][1648981] Fps is (10 sec: 55668.1, 60 sec: 50237.7, 300 sec: 47986.0). Total num frames: 214958080. Throughput: 0: 11921.8. Samples: 53782528. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:35,775][1648981] Avg episode reward: [(0, '190.430')] [2024-06-15 12:48:39,454][1651669] Updated weights for policy 0, policy_version 104976 (0.0021) [2024-06-15 12:48:40,546][1651669] Updated weights for policy 0, policy_version 105021 (0.0011) [2024-06-15 12:48:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 215089152. Throughput: 0: 12197.2. Samples: 53828608. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:40,767][1648981] Avg episode reward: [(0, '197.690')] [2024-06-15 12:48:43,056][1651669] Updated weights for policy 0, policy_version 105090 (0.0022) [2024-06-15 12:48:44,305][1651669] Updated weights for policy 0, policy_version 105148 (0.0014) [2024-06-15 12:48:45,766][1648981] Fps is (10 sec: 39352.1, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 215351296. Throughput: 0: 12026.3. Samples: 53894144. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:45,767][1648981] Avg episode reward: [(0, '213.590')] [2024-06-15 12:48:46,556][1651669] Updated weights for policy 0, policy_version 105200 (0.0129) [2024-06-15 12:48:50,447][1651669] Updated weights for policy 0, policy_version 105233 (0.0011) [2024-06-15 12:48:50,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 215547904. Throughput: 0: 12253.9. Samples: 53977088. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 12:48:50,767][1648981] Avg episode reward: [(0, '221.570')] [2024-06-15 12:48:51,170][1651274] Saving new best policy, reward=221.570! [2024-06-15 12:48:51,468][1651669] Updated weights for policy 0, policy_version 105274 (0.0030) [2024-06-15 12:48:53,704][1651669] Updated weights for policy 0, policy_version 105330 (0.0013) [2024-06-15 12:48:55,187][1651669] Updated weights for policy 0, policy_version 105400 (0.0015) [2024-06-15 12:48:55,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 50245.0, 300 sec: 47542.0). Total num frames: 215875584. Throughput: 0: 12151.4. Samples: 54008832. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:48:55,767][1648981] Avg episode reward: [(0, '220.160')] [2024-06-15 12:48:56,066][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000105424_215908352.pth... [2024-06-15 12:48:56,225][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000099840_204472320.pth [2024-06-15 12:48:56,983][1651669] Updated weights for policy 0, policy_version 105472 (0.0013) [2024-06-15 12:49:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46421.4, 300 sec: 47431.5). Total num frames: 216006656. Throughput: 0: 12276.6. Samples: 54082048. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:00,767][1648981] Avg episode reward: [(0, '220.920')] [2024-06-15 12:49:02,309][1651669] Updated weights for policy 0, policy_version 105536 (0.0015) [2024-06-15 12:49:05,323][1651274] Signal inference workers to stop experience collection... (5550 times) [2024-06-15 12:49:05,462][1651669] InferenceWorker_p0-w0: stopping experience collection (5550 times) [2024-06-15 12:49:05,496][1651669] Updated weights for policy 0, policy_version 105628 (0.0227) [2024-06-15 12:49:05,542][1651274] Signal inference workers to resume experience collection... (5550 times) [2024-06-15 12:49:05,542][1651669] InferenceWorker_p0-w0: resuming experience collection (5550 times) [2024-06-15 12:49:05,779][1648981] Fps is (10 sec: 45817.6, 60 sec: 49141.9, 300 sec: 47763.4). Total num frames: 216334336. Throughput: 0: 12011.6. Samples: 54148608. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:05,780][1648981] Avg episode reward: [(0, '223.390')] [2024-06-15 12:49:06,474][1651274] Saving new best policy, reward=223.390! [2024-06-15 12:49:07,008][1651669] Updated weights for policy 0, policy_version 105684 (0.0012) [2024-06-15 12:49:10,767][1648981] Fps is (10 sec: 52425.5, 60 sec: 48059.2, 300 sec: 47652.3). Total num frames: 216530944. Throughput: 0: 12083.3. Samples: 54180352. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:10,768][1648981] Avg episode reward: [(0, '235.700')] [2024-06-15 12:49:10,772][1651274] Saving new best policy, reward=235.700! [2024-06-15 12:49:13,289][1651669] Updated weights for policy 0, policy_version 105752 (0.0014) [2024-06-15 12:49:14,219][1651669] Updated weights for policy 0, policy_version 105792 (0.0022) [2024-06-15 12:49:15,766][1648981] Fps is (10 sec: 42652.7, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 216760320. Throughput: 0: 12162.9. Samples: 54261248. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:15,767][1648981] Avg episode reward: [(0, '232.310')] [2024-06-15 12:49:15,903][1651669] Updated weights for policy 0, policy_version 105851 (0.0013) [2024-06-15 12:49:17,812][1651669] Updated weights for policy 0, policy_version 105930 (0.0235) [2024-06-15 12:49:20,766][1648981] Fps is (10 sec: 52432.2, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 217055232. Throughput: 0: 12017.0. Samples: 54323200. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:20,767][1648981] Avg episode reward: [(0, '233.610')] [2024-06-15 12:49:23,816][1651669] Updated weights for policy 0, policy_version 105987 (0.0013) [2024-06-15 12:49:25,154][1651669] Updated weights for policy 0, policy_version 106048 (0.0015) [2024-06-15 12:49:25,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46422.1, 300 sec: 47097.0). Total num frames: 217186304. Throughput: 0: 11935.3. Samples: 54365696. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:25,767][1648981] Avg episode reward: [(0, '217.760')] [2024-06-15 12:49:28,067][1651669] Updated weights for policy 0, policy_version 106114 (0.0013) [2024-06-15 12:49:29,551][1651669] Updated weights for policy 0, policy_version 106184 (0.0016) [2024-06-15 12:49:30,684][1651669] Updated weights for policy 0, policy_version 106236 (0.0038) [2024-06-15 12:49:30,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 217579520. Throughput: 0: 11798.8. Samples: 54425088. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:30,767][1648981] Avg episode reward: [(0, '212.690')] [2024-06-15 12:49:35,778][1648981] Fps is (10 sec: 45820.7, 60 sec: 44779.8, 300 sec: 47095.2). Total num frames: 217645056. Throughput: 0: 11750.1. Samples: 54505984. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:35,779][1648981] Avg episode reward: [(0, '217.740')] [2024-06-15 12:49:35,867][1651669] Updated weights for policy 0, policy_version 106273 (0.0014) [2024-06-15 12:49:37,657][1651669] Updated weights for policy 0, policy_version 106306 (0.0068) [2024-06-15 12:49:39,384][1651669] Updated weights for policy 0, policy_version 106369 (0.0013) [2024-06-15 12:49:40,766][1648981] Fps is (10 sec: 36044.3, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 217939968. Throughput: 0: 11776.0. Samples: 54538752. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:40,790][1648981] Avg episode reward: [(0, '215.060')] [2024-06-15 12:49:41,004][1651669] Updated weights for policy 0, policy_version 106434 (0.0012) [2024-06-15 12:49:42,136][1651669] Updated weights for policy 0, policy_version 106495 (0.0015) [2024-06-15 12:49:45,770][1648981] Fps is (10 sec: 45912.6, 60 sec: 45872.3, 300 sec: 46985.4). Total num frames: 218103808. Throughput: 0: 11581.6. Samples: 54603264. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:45,771][1648981] Avg episode reward: [(0, '216.020')] [2024-06-15 12:49:47,360][1651274] Signal inference workers to stop experience collection... (5600 times) [2024-06-15 12:49:47,466][1651669] InferenceWorker_p0-w0: stopping experience collection (5600 times) [2024-06-15 12:49:47,644][1651274] Signal inference workers to resume experience collection... (5600 times) [2024-06-15 12:49:47,645][1651669] InferenceWorker_p0-w0: resuming experience collection (5600 times) [2024-06-15 12:49:47,821][1651669] Updated weights for policy 0, policy_version 106552 (0.0034) [2024-06-15 12:49:49,298][1651669] Updated weights for policy 0, policy_version 106595 (0.0017) [2024-06-15 12:49:50,422][1651669] Updated weights for policy 0, policy_version 106641 (0.0014) [2024-06-15 12:49:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 218431488. Throughput: 0: 11722.4. Samples: 54675968. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:50,767][1648981] Avg episode reward: [(0, '215.820')] [2024-06-15 12:49:52,984][1651669] Updated weights for policy 0, policy_version 106736 (0.0014) [2024-06-15 12:49:55,766][1648981] Fps is (10 sec: 52448.8, 60 sec: 45875.3, 300 sec: 47319.3). Total num frames: 218628096. Throughput: 0: 11537.2. Samples: 54699520. Policy #0 lag: (min: 31.0, avg: 125.3, max: 287.0) [2024-06-15 12:49:55,767][1648981] Avg episode reward: [(0, '208.770')] [2024-06-15 12:49:58,453][1651669] Updated weights for policy 0, policy_version 106769 (0.0019) [2024-06-15 12:49:59,724][1651669] Updated weights for policy 0, policy_version 106832 (0.0078) [2024-06-15 12:50:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 47212.6). Total num frames: 218857472. Throughput: 0: 11605.3. Samples: 54783488. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:00,767][1648981] Avg episode reward: [(0, '204.690')] [2024-06-15 12:50:01,561][1651669] Updated weights for policy 0, policy_version 106896 (0.0023) [2024-06-15 12:50:03,048][1651669] Updated weights for policy 0, policy_version 106947 (0.0013) [2024-06-15 12:50:05,768][1648981] Fps is (10 sec: 52420.3, 60 sec: 46976.1, 300 sec: 47763.3). Total num frames: 219152384. Throughput: 0: 11445.6. Samples: 54838272. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:05,769][1648981] Avg episode reward: [(0, '207.900')] [2024-06-15 12:50:09,713][1651669] Updated weights for policy 0, policy_version 107009 (0.0180) [2024-06-15 12:50:10,785][1648981] Fps is (10 sec: 35976.9, 60 sec: 44769.4, 300 sec: 46871.9). Total num frames: 219217920. Throughput: 0: 11543.6. Samples: 54885376. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:10,786][1648981] Avg episode reward: [(0, '208.260')] [2024-06-15 12:50:11,248][1651669] Updated weights for policy 0, policy_version 107072 (0.0130) [2024-06-15 12:50:12,667][1651669] Updated weights for policy 0, policy_version 107123 (0.0011) [2024-06-15 12:50:14,454][1651669] Updated weights for policy 0, policy_version 107192 (0.0150) [2024-06-15 12:50:15,767][1648981] Fps is (10 sec: 45881.9, 60 sec: 47513.5, 300 sec: 47764.8). Total num frames: 219611136. Throughput: 0: 11514.3. Samples: 54943232. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:15,767][1648981] Avg episode reward: [(0, '211.590')] [2024-06-15 12:50:15,777][1651669] Updated weights for policy 0, policy_version 107234 (0.0013) [2024-06-15 12:50:20,766][1648981] Fps is (10 sec: 45961.9, 60 sec: 43690.7, 300 sec: 46764.1). Total num frames: 219676672. Throughput: 0: 11551.5. Samples: 55025664. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:20,767][1648981] Avg episode reward: [(0, '211.850')] [2024-06-15 12:50:21,340][1651669] Updated weights for policy 0, policy_version 107280 (0.0069) [2024-06-15 12:50:23,047][1651669] Updated weights for policy 0, policy_version 107344 (0.0013) [2024-06-15 12:50:24,783][1651274] Signal inference workers to stop experience collection... (5650 times) [2024-06-15 12:50:24,855][1651669] InferenceWorker_p0-w0: stopping experience collection (5650 times) [2024-06-15 12:50:24,883][1651669] Updated weights for policy 0, policy_version 107415 (0.0120) [2024-06-15 12:50:25,051][1651274] Signal inference workers to resume experience collection... (5650 times) [2024-06-15 12:50:25,052][1651669] InferenceWorker_p0-w0: resuming experience collection (5650 times) [2024-06-15 12:50:25,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 220037120. Throughput: 0: 11468.8. Samples: 55054848. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:25,767][1648981] Avg episode reward: [(0, '211.940')] [2024-06-15 12:50:26,816][1651669] Updated weights for policy 0, policy_version 107458 (0.0013) [2024-06-15 12:50:30,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 220200960. Throughput: 0: 11469.8. Samples: 55119360. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:30,767][1648981] Avg episode reward: [(0, '219.260')] [2024-06-15 12:50:32,143][1651669] Updated weights for policy 0, policy_version 107522 (0.0014) [2024-06-15 12:50:33,898][1651669] Updated weights for policy 0, policy_version 107600 (0.0045) [2024-06-15 12:50:35,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 47523.0, 300 sec: 47208.1). Total num frames: 220495872. Throughput: 0: 11457.4. Samples: 55191552. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:35,767][1648981] Avg episode reward: [(0, '213.940')] [2024-06-15 12:50:36,179][1651669] Updated weights for policy 0, policy_version 107680 (0.0013) [2024-06-15 12:50:38,755][1651669] Updated weights for policy 0, policy_version 107744 (0.0018) [2024-06-15 12:50:40,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 220725248. Throughput: 0: 11582.6. Samples: 55220736. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:40,767][1648981] Avg episode reward: [(0, '220.630')] [2024-06-15 12:50:43,293][1651669] Updated weights for policy 0, policy_version 107778 (0.0049) [2024-06-15 12:50:44,747][1651669] Updated weights for policy 0, policy_version 107842 (0.0014) [2024-06-15 12:50:45,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46970.4, 300 sec: 47208.1). Total num frames: 220921856. Throughput: 0: 11719.1. Samples: 55310848. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:45,767][1648981] Avg episode reward: [(0, '217.550')] [2024-06-15 12:50:47,080][1651669] Updated weights for policy 0, policy_version 107922 (0.0013) [2024-06-15 12:50:47,954][1651669] Updated weights for policy 0, policy_version 107965 (0.0012) [2024-06-15 12:50:50,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 46967.6, 300 sec: 47878.5). Total num frames: 221249536. Throughput: 0: 11685.4. Samples: 55364096. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:50,767][1648981] Avg episode reward: [(0, '222.500')] [2024-06-15 12:50:54,202][1651669] Updated weights for policy 0, policy_version 108035 (0.0029) [2024-06-15 12:50:55,556][1651669] Updated weights for policy 0, policy_version 108096 (0.0012) [2024-06-15 12:50:55,775][1648981] Fps is (10 sec: 45837.7, 60 sec: 45868.9, 300 sec: 47095.7). Total num frames: 221380608. Throughput: 0: 11733.3. Samples: 55413248. Policy #0 lag: (min: 15.0, avg: 86.5, max: 271.0) [2024-06-15 12:50:55,775][1648981] Avg episode reward: [(0, '215.750')] [2024-06-15 12:50:56,378][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000108128_221446144.pth... [2024-06-15 12:50:56,538][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000102592_210108416.pth [2024-06-15 12:50:56,543][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000108128_221446144.pth [2024-06-15 12:50:58,222][1651669] Updated weights for policy 0, policy_version 108192 (0.0014) [2024-06-15 12:50:59,792][1651669] Updated weights for policy 0, policy_version 108225 (0.0011) [2024-06-15 12:51:00,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 221741056. Throughput: 0: 11787.4. Samples: 55473664. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:00,767][1648981] Avg episode reward: [(0, '219.190')] [2024-06-15 12:51:05,108][1651669] Updated weights for policy 0, policy_version 108292 (0.0012) [2024-06-15 12:51:05,766][1648981] Fps is (10 sec: 45913.2, 60 sec: 44784.2, 300 sec: 46874.9). Total num frames: 221839360. Throughput: 0: 11821.5. Samples: 55557632. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:05,767][1648981] Avg episode reward: [(0, '208.440')] [2024-06-15 12:51:06,196][1651274] Signal inference workers to stop experience collection... (5700 times) [2024-06-15 12:51:06,257][1651669] InferenceWorker_p0-w0: stopping experience collection (5700 times) [2024-06-15 12:51:06,447][1651274] Signal inference workers to resume experience collection... (5700 times) [2024-06-15 12:51:06,448][1651669] InferenceWorker_p0-w0: resuming experience collection (5700 times) [2024-06-15 12:51:06,891][1651669] Updated weights for policy 0, policy_version 108368 (0.0013) [2024-06-15 12:51:07,978][1651669] Updated weights for policy 0, policy_version 108416 (0.0047) [2024-06-15 12:51:09,435][1651669] Updated weights for policy 0, policy_version 108475 (0.0013) [2024-06-15 12:51:10,767][1648981] Fps is (10 sec: 45871.3, 60 sec: 49713.1, 300 sec: 47652.3). Total num frames: 222199808. Throughput: 0: 11798.5. Samples: 55585792. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:10,768][1648981] Avg episode reward: [(0, '204.720')] [2024-06-15 12:51:11,328][1651669] Updated weights for policy 0, policy_version 108528 (0.0025) [2024-06-15 12:51:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 46986.0). Total num frames: 222298112. Throughput: 0: 12208.3. Samples: 55668736. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:15,767][1648981] Avg episode reward: [(0, '205.590')] [2024-06-15 12:51:16,131][1651669] Updated weights for policy 0, policy_version 108560 (0.0011) [2024-06-15 12:51:18,379][1651669] Updated weights for policy 0, policy_version 108640 (0.0032) [2024-06-15 12:51:19,696][1651669] Updated weights for policy 0, policy_version 108692 (0.0012) [2024-06-15 12:51:20,766][1648981] Fps is (10 sec: 49156.4, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 222691328. Throughput: 0: 11867.1. Samples: 55725568. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:20,767][1648981] Avg episode reward: [(0, '197.730')] [2024-06-15 12:51:21,726][1651669] Updated weights for policy 0, policy_version 108752 (0.0014) [2024-06-15 12:51:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 47208.3). Total num frames: 222822400. Throughput: 0: 12071.8. Samples: 55763968. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:25,767][1648981] Avg episode reward: [(0, '195.300')] [2024-06-15 12:51:27,343][1651669] Updated weights for policy 0, policy_version 108817 (0.0013) [2024-06-15 12:51:28,310][1651669] Updated weights for policy 0, policy_version 108861 (0.0012) [2024-06-15 12:51:29,875][1651669] Updated weights for policy 0, policy_version 108913 (0.0012) [2024-06-15 12:51:30,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 48605.7, 300 sec: 47541.3). Total num frames: 223117312. Throughput: 0: 11764.6. Samples: 55840256. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:30,767][1648981] Avg episode reward: [(0, '196.110')] [2024-06-15 12:51:31,389][1651669] Updated weights for policy 0, policy_version 108983 (0.0013) [2024-06-15 12:51:33,163][1651669] Updated weights for policy 0, policy_version 109028 (0.0014) [2024-06-15 12:51:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 223346688. Throughput: 0: 12185.6. Samples: 55912448. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:35,767][1648981] Avg episode reward: [(0, '202.630')] [2024-06-15 12:51:38,205][1651669] Updated weights for policy 0, policy_version 109104 (0.0014) [2024-06-15 12:51:40,314][1651669] Updated weights for policy 0, policy_version 109153 (0.0013) [2024-06-15 12:51:40,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 223576064. Throughput: 0: 12017.2. Samples: 55953920. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:40,767][1648981] Avg episode reward: [(0, '207.170')] [2024-06-15 12:51:41,563][1651669] Updated weights for policy 0, policy_version 109203 (0.0012) [2024-06-15 12:51:42,505][1651669] Updated weights for policy 0, policy_version 109248 (0.0014) [2024-06-15 12:51:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 223870976. Throughput: 0: 12117.3. Samples: 56018944. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:45,767][1648981] Avg episode reward: [(0, '206.220')] [2024-06-15 12:51:48,317][1651274] Signal inference workers to stop experience collection... (5750 times) [2024-06-15 12:51:48,327][1651669] Updated weights for policy 0, policy_version 109314 (0.0047) [2024-06-15 12:51:48,406][1651669] InferenceWorker_p0-w0: stopping experience collection (5750 times) [2024-06-15 12:51:48,481][1651274] Signal inference workers to resume experience collection... (5750 times) [2024-06-15 12:51:48,482][1651669] InferenceWorker_p0-w0: resuming experience collection (5750 times) [2024-06-15 12:51:50,180][1651669] Updated weights for policy 0, policy_version 109392 (0.0028) [2024-06-15 12:51:50,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 224067584. Throughput: 0: 12003.6. Samples: 56097792. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:50,767][1648981] Avg episode reward: [(0, '210.300')] [2024-06-15 12:51:52,071][1651669] Updated weights for policy 0, policy_version 109461 (0.0060) [2024-06-15 12:51:54,744][1651669] Updated weights for policy 0, policy_version 109520 (0.0013) [2024-06-15 12:51:55,770][1648981] Fps is (10 sec: 52407.9, 60 sec: 50247.8, 300 sec: 47874.0). Total num frames: 224395264. Throughput: 0: 11968.6. Samples: 56124416. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:51:55,771][1648981] Avg episode reward: [(0, '210.890')] [2024-06-15 12:51:59,809][1651669] Updated weights for policy 0, policy_version 109572 (0.0015) [2024-06-15 12:52:00,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 45328.9, 300 sec: 46874.9). Total num frames: 224460800. Throughput: 0: 11969.3. Samples: 56207360. Policy #0 lag: (min: 2.0, avg: 149.6, max: 258.0) [2024-06-15 12:52:00,767][1648981] Avg episode reward: [(0, '214.820')] [2024-06-15 12:52:01,101][1651669] Updated weights for policy 0, policy_version 109623 (0.0009) [2024-06-15 12:52:03,002][1651669] Updated weights for policy 0, policy_version 109698 (0.0011) [2024-06-15 12:52:04,407][1651669] Updated weights for policy 0, policy_version 109759 (0.0012) [2024-06-15 12:52:05,766][1648981] Fps is (10 sec: 42615.6, 60 sec: 49698.1, 300 sec: 47652.5). Total num frames: 224821248. Throughput: 0: 12071.8. Samples: 56268800. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:05,767][1648981] Avg episode reward: [(0, '213.420')] [2024-06-15 12:52:06,507][1651669] Updated weights for policy 0, policy_version 109824 (0.0015) [2024-06-15 12:52:10,766][1648981] Fps is (10 sec: 49153.4, 60 sec: 45875.9, 300 sec: 46986.0). Total num frames: 224952320. Throughput: 0: 12026.3. Samples: 56305152. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:10,767][1648981] Avg episode reward: [(0, '203.150')] [2024-06-15 12:52:11,651][1651669] Updated weights for policy 0, policy_version 109884 (0.0011) [2024-06-15 12:52:13,663][1651669] Updated weights for policy 0, policy_version 109952 (0.0105) [2024-06-15 12:52:15,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 225312768. Throughput: 0: 11855.7. Samples: 56373760. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:15,767][1648981] Avg episode reward: [(0, '207.120')] [2024-06-15 12:52:16,991][1651669] Updated weights for policy 0, policy_version 110032 (0.0021) [2024-06-15 12:52:17,961][1651669] Updated weights for policy 0, policy_version 110078 (0.0016) [2024-06-15 12:52:20,768][1648981] Fps is (10 sec: 49146.8, 60 sec: 45874.4, 300 sec: 47208.0). Total num frames: 225443840. Throughput: 0: 11923.6. Samples: 56449024. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:20,769][1648981] Avg episode reward: [(0, '211.820')] [2024-06-15 12:52:22,732][1651669] Updated weights for policy 0, policy_version 110133 (0.0014) [2024-06-15 12:52:23,899][1651669] Updated weights for policy 0, policy_version 110176 (0.0021) [2024-06-15 12:52:25,302][1651274] Signal inference workers to stop experience collection... (5800 times) [2024-06-15 12:52:25,383][1651669] InferenceWorker_p0-w0: stopping experience collection (5800 times) [2024-06-15 12:52:25,596][1651274] Signal inference workers to resume experience collection... (5800 times) [2024-06-15 12:52:25,597][1651669] InferenceWorker_p0-w0: resuming experience collection (5800 times) [2024-06-15 12:52:25,598][1651669] Updated weights for policy 0, policy_version 110240 (0.0011) [2024-06-15 12:52:25,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 225771520. Throughput: 0: 11855.6. Samples: 56487424. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:25,767][1648981] Avg episode reward: [(0, '205.220')] [2024-06-15 12:52:28,029][1651669] Updated weights for policy 0, policy_version 110289 (0.0031) [2024-06-15 12:52:30,768][1648981] Fps is (10 sec: 52424.2, 60 sec: 47512.2, 300 sec: 47541.1). Total num frames: 225968128. Throughput: 0: 11855.1. Samples: 56552448. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:30,769][1648981] Avg episode reward: [(0, '207.430')] [2024-06-15 12:52:32,660][1651669] Updated weights for policy 0, policy_version 110337 (0.0013) [2024-06-15 12:52:34,055][1651669] Updated weights for policy 0, policy_version 110396 (0.0117) [2024-06-15 12:52:35,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 226197504. Throughput: 0: 11719.1. Samples: 56625152. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:35,767][1648981] Avg episode reward: [(0, '207.260')] [2024-06-15 12:52:35,831][1651669] Updated weights for policy 0, policy_version 110464 (0.0013) [2024-06-15 12:52:37,439][1651669] Updated weights for policy 0, policy_version 110528 (0.0014) [2024-06-15 12:52:40,288][1651669] Updated weights for policy 0, policy_version 110589 (0.0017) [2024-06-15 12:52:40,766][1648981] Fps is (10 sec: 52439.1, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 226492416. Throughput: 0: 11834.0. Samples: 56656896. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:40,767][1648981] Avg episode reward: [(0, '209.940')] [2024-06-15 12:52:45,024][1651669] Updated weights for policy 0, policy_version 110648 (0.0199) [2024-06-15 12:52:45,770][1648981] Fps is (10 sec: 42582.4, 60 sec: 45872.3, 300 sec: 47096.4). Total num frames: 226623488. Throughput: 0: 11570.3. Samples: 56728064. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:45,771][1648981] Avg episode reward: [(0, '213.520')] [2024-06-15 12:52:47,140][1651669] Updated weights for policy 0, policy_version 110709 (0.0097) [2024-06-15 12:52:48,273][1651669] Updated weights for policy 0, policy_version 110736 (0.0013) [2024-06-15 12:52:50,716][1651669] Updated weights for policy 0, policy_version 110800 (0.0011) [2024-06-15 12:52:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 47513.6, 300 sec: 47652.6). Total num frames: 226918400. Throughput: 0: 11673.6. Samples: 56794112. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:50,767][1648981] Avg episode reward: [(0, '211.610')] [2024-06-15 12:52:51,570][1651669] Updated weights for policy 0, policy_version 110848 (0.0014) [2024-06-15 12:52:55,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 44785.9, 300 sec: 46986.0). Total num frames: 227082240. Throughput: 0: 11719.1. Samples: 56832512. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:52:55,767][1648981] Avg episode reward: [(0, '216.050')] [2024-06-15 12:52:56,321][1651669] Updated weights for policy 0, policy_version 110908 (0.0015) [2024-06-15 12:52:56,376][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000110912_227147776.pth... [2024-06-15 12:52:56,468][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000105424_215908352.pth [2024-06-15 12:52:57,901][1651669] Updated weights for policy 0, policy_version 110962 (0.0105) [2024-06-15 12:52:58,868][1651669] Updated weights for policy 0, policy_version 110997 (0.0016) [2024-06-15 12:53:00,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49152.2, 300 sec: 47541.4). Total num frames: 227409920. Throughput: 0: 11662.2. Samples: 56898560. Policy #0 lag: (min: 36.0, avg: 103.3, max: 292.0) [2024-06-15 12:53:00,767][1648981] Avg episode reward: [(0, '218.380')] [2024-06-15 12:53:02,224][1651669] Updated weights for policy 0, policy_version 111072 (0.0074) [2024-06-15 12:53:05,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 45329.1, 300 sec: 47097.1). Total num frames: 227540992. Throughput: 0: 11821.8. Samples: 56980992. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:05,767][1648981] Avg episode reward: [(0, '216.550')] [2024-06-15 12:53:06,820][1651669] Updated weights for policy 0, policy_version 111137 (0.0013) [2024-06-15 12:53:08,463][1651274] Signal inference workers to stop experience collection... (5850 times) [2024-06-15 12:53:08,489][1651669] InferenceWorker_p0-w0: stopping experience collection (5850 times) [2024-06-15 12:53:08,716][1651274] Signal inference workers to resume experience collection... (5850 times) [2024-06-15 12:53:08,716][1651669] InferenceWorker_p0-w0: resuming experience collection (5850 times) [2024-06-15 12:53:08,719][1651669] Updated weights for policy 0, policy_version 111216 (0.0099) [2024-06-15 12:53:09,836][1651669] Updated weights for policy 0, policy_version 111250 (0.0012) [2024-06-15 12:53:10,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 227901440. Throughput: 0: 11673.6. Samples: 57012736. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:10,767][1648981] Avg episode reward: [(0, '217.100')] [2024-06-15 12:53:13,809][1651669] Updated weights for policy 0, policy_version 111329 (0.0021) [2024-06-15 12:53:15,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 45875.3, 300 sec: 47319.2). Total num frames: 228065280. Throughput: 0: 11662.7. Samples: 57077248. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:15,767][1648981] Avg episode reward: [(0, '224.120')] [2024-06-15 12:53:17,348][1651669] Updated weights for policy 0, policy_version 111378 (0.0013) [2024-06-15 12:53:18,164][1651669] Updated weights for policy 0, policy_version 111417 (0.0051) [2024-06-15 12:53:19,704][1651669] Updated weights for policy 0, policy_version 111458 (0.0013) [2024-06-15 12:53:20,730][1651669] Updated weights for policy 0, policy_version 111492 (0.0013) [2024-06-15 12:53:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48060.6, 300 sec: 47208.3). Total num frames: 228327424. Throughput: 0: 11810.1. Samples: 57156608. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:20,767][1648981] Avg episode reward: [(0, '228.130')] [2024-06-15 12:53:22,044][1651669] Updated weights for policy 0, policy_version 111549 (0.0012) [2024-06-15 12:53:24,566][1651669] Updated weights for policy 0, policy_version 111612 (0.0013) [2024-06-15 12:53:25,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 228589568. Throughput: 0: 11912.5. Samples: 57192960. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:25,767][1648981] Avg episode reward: [(0, '230.760')] [2024-06-15 12:53:29,288][1651669] Updated weights for policy 0, policy_version 111674 (0.0013) [2024-06-15 12:53:30,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46422.9, 300 sec: 46765.1). Total num frames: 228753408. Throughput: 0: 11777.0. Samples: 57257984. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:30,767][1648981] Avg episode reward: [(0, '230.310')] [2024-06-15 12:53:31,443][1651669] Updated weights for policy 0, policy_version 111732 (0.0013) [2024-06-15 12:53:32,758][1651669] Updated weights for policy 0, policy_version 111777 (0.0014) [2024-06-15 12:53:34,671][1651669] Updated weights for policy 0, policy_version 111815 (0.0013) [2024-06-15 12:53:35,743][1651669] Updated weights for policy 0, policy_version 111872 (0.0014) [2024-06-15 12:53:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 229113856. Throughput: 0: 11878.4. Samples: 57328640. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:35,767][1648981] Avg episode reward: [(0, '231.030')] [2024-06-15 12:53:40,373][1651669] Updated weights for policy 0, policy_version 111928 (0.0013) [2024-06-15 12:53:40,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 229244928. Throughput: 0: 11867.0. Samples: 57366528. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:40,767][1648981] Avg episode reward: [(0, '227.770')] [2024-06-15 12:53:42,125][1651669] Updated weights for policy 0, policy_version 111984 (0.0012) [2024-06-15 12:53:43,235][1651669] Updated weights for policy 0, policy_version 112018 (0.0014) [2024-06-15 12:53:45,104][1651669] Updated weights for policy 0, policy_version 112065 (0.0012) [2024-06-15 12:53:45,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49155.1, 300 sec: 47541.4). Total num frames: 229572608. Throughput: 0: 12071.8. Samples: 57441792. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:45,767][1648981] Avg episode reward: [(0, '231.010')] [2024-06-15 12:53:49,222][1651669] Updated weights for policy 0, policy_version 112130 (0.0090) [2024-06-15 12:53:50,582][1651669] Updated weights for policy 0, policy_version 112186 (0.0014) [2024-06-15 12:53:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47513.5, 300 sec: 47097.1). Total num frames: 229769216. Throughput: 0: 11787.4. Samples: 57511424. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:50,767][1648981] Avg episode reward: [(0, '225.730')] [2024-06-15 12:53:53,177][1651669] Updated weights for policy 0, policy_version 112247 (0.0015) [2024-06-15 12:53:54,395][1651274] Signal inference workers to stop experience collection... (5900 times) [2024-06-15 12:53:54,431][1651669] InferenceWorker_p0-w0: stopping experience collection (5900 times) [2024-06-15 12:53:54,626][1651274] Signal inference workers to resume experience collection... (5900 times) [2024-06-15 12:53:54,627][1651669] InferenceWorker_p0-w0: resuming experience collection (5900 times) [2024-06-15 12:53:54,792][1651669] Updated weights for policy 0, policy_version 112290 (0.0012) [2024-06-15 12:53:55,774][1648981] Fps is (10 sec: 45839.1, 60 sec: 49145.6, 300 sec: 47540.1). Total num frames: 230031360. Throughput: 0: 11876.3. Samples: 57547264. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:53:55,775][1648981] Avg episode reward: [(0, '224.580')] [2024-06-15 12:53:56,912][1651669] Updated weights for policy 0, policy_version 112352 (0.0013) [2024-06-15 12:54:00,544][1651669] Updated weights for policy 0, policy_version 112421 (0.0015) [2024-06-15 12:54:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 47210.2). Total num frames: 230260736. Throughput: 0: 12185.6. Samples: 57625600. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 12:54:00,767][1648981] Avg episode reward: [(0, '225.090')] [2024-06-15 12:54:03,415][1651669] Updated weights for policy 0, policy_version 112483 (0.0044) [2024-06-15 12:54:05,093][1651669] Updated weights for policy 0, policy_version 112529 (0.0014) [2024-06-15 12:54:05,790][1648981] Fps is (10 sec: 49073.7, 60 sec: 49678.4, 300 sec: 47426.6). Total num frames: 230522880. Throughput: 0: 11974.5. Samples: 57695744. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:05,791][1648981] Avg episode reward: [(0, '216.760')] [2024-06-15 12:54:06,013][1651669] Updated weights for policy 0, policy_version 112574 (0.0020) [2024-06-15 12:54:08,031][1651669] Updated weights for policy 0, policy_version 112635 (0.0013) [2024-06-15 12:54:10,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 230752256. Throughput: 0: 11958.1. Samples: 57731072. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:10,767][1648981] Avg episode reward: [(0, '214.170')] [2024-06-15 12:54:11,013][1651669] Updated weights for policy 0, policy_version 112692 (0.0013) [2024-06-15 12:54:14,502][1651669] Updated weights for policy 0, policy_version 112736 (0.0012) [2024-06-15 12:54:15,766][1648981] Fps is (10 sec: 42699.6, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 230948864. Throughput: 0: 12162.8. Samples: 57805312. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:15,767][1648981] Avg episode reward: [(0, '212.740')] [2024-06-15 12:54:16,221][1651669] Updated weights for policy 0, policy_version 112785 (0.0014) [2024-06-15 12:54:17,172][1651669] Updated weights for policy 0, policy_version 112832 (0.0011) [2024-06-15 12:54:18,704][1651669] Updated weights for policy 0, policy_version 112888 (0.0013) [2024-06-15 12:54:20,631][1651669] Updated weights for policy 0, policy_version 112914 (0.0019) [2024-06-15 12:54:20,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 231276544. Throughput: 0: 12401.8. Samples: 57886720. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:20,767][1648981] Avg episode reward: [(0, '207.790')] [2024-06-15 12:54:21,368][1651669] Updated weights for policy 0, policy_version 112960 (0.0179) [2024-06-15 12:54:24,792][1651669] Updated weights for policy 0, policy_version 113013 (0.0124) [2024-06-15 12:54:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 231473152. Throughput: 0: 12481.4. Samples: 57928192. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:25,767][1648981] Avg episode reward: [(0, '210.830')] [2024-06-15 12:54:27,304][1651669] Updated weights for policy 0, policy_version 113072 (0.0013) [2024-06-15 12:54:27,793][1651669] Updated weights for policy 0, policy_version 113088 (0.0010) [2024-06-15 12:54:29,352][1651669] Updated weights for policy 0, policy_version 113152 (0.0012) [2024-06-15 12:54:30,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 49697.9, 300 sec: 47765.4). Total num frames: 231735296. Throughput: 0: 12242.4. Samples: 57992704. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:30,768][1648981] Avg episode reward: [(0, '208.090')] [2024-06-15 12:54:31,835][1651669] Updated weights for policy 0, policy_version 113215 (0.0014) [2024-06-15 12:54:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 231931904. Throughput: 0: 12458.7. Samples: 58072064. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:35,767][1648981] Avg episode reward: [(0, '217.460')] [2024-06-15 12:54:36,144][1651669] Updated weights for policy 0, policy_version 113280 (0.0120) [2024-06-15 12:54:38,304][1651274] Signal inference workers to stop experience collection... (5950 times) [2024-06-15 12:54:38,316][1651669] InferenceWorker_p0-w0: stopping experience collection (5950 times) [2024-06-15 12:54:38,617][1651274] Signal inference workers to resume experience collection... (5950 times) [2024-06-15 12:54:38,617][1651669] InferenceWorker_p0-w0: resuming experience collection (5950 times) [2024-06-15 12:54:38,967][1651669] Updated weights for policy 0, policy_version 113339 (0.0013) [2024-06-15 12:54:40,353][1651669] Updated weights for policy 0, policy_version 113398 (0.0012) [2024-06-15 12:54:40,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 50243.8, 300 sec: 47986.2). Total num frames: 232259584. Throughput: 0: 12312.7. Samples: 58101248. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:40,768][1648981] Avg episode reward: [(0, '222.870')] [2024-06-15 12:54:42,059][1651669] Updated weights for policy 0, policy_version 113440 (0.0013) [2024-06-15 12:54:45,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 232390656. Throughput: 0: 12219.7. Samples: 58175488. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:45,767][1648981] Avg episode reward: [(0, '227.360')] [2024-06-15 12:54:46,164][1651669] Updated weights for policy 0, policy_version 113477 (0.0011) [2024-06-15 12:54:47,647][1651669] Updated weights for policy 0, policy_version 113532 (0.0012) [2024-06-15 12:54:49,848][1651669] Updated weights for policy 0, policy_version 113590 (0.0026) [2024-06-15 12:54:50,766][1648981] Fps is (10 sec: 45878.1, 60 sec: 49152.1, 300 sec: 47763.5). Total num frames: 232718336. Throughput: 0: 12135.1. Samples: 58241536. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:50,767][1648981] Avg episode reward: [(0, '234.680')] [2024-06-15 12:54:50,868][1651669] Updated weights for policy 0, policy_version 113639 (0.0012) [2024-06-15 12:54:53,694][1651669] Updated weights for policy 0, policy_version 113712 (0.0013) [2024-06-15 12:54:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48065.9, 300 sec: 47652.4). Total num frames: 232914944. Throughput: 0: 12049.1. Samples: 58273280. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:54:55,767][1648981] Avg episode reward: [(0, '234.700')] [2024-06-15 12:54:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000113728_232914944.pth... [2024-06-15 12:54:55,827][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000108128_221446144.pth [2024-06-15 12:54:57,634][1651669] Updated weights for policy 0, policy_version 113749 (0.0013) [2024-06-15 12:55:00,045][1651669] Updated weights for policy 0, policy_version 113796 (0.0013) [2024-06-15 12:55:00,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 47513.7, 300 sec: 47319.5). Total num frames: 233111552. Throughput: 0: 12231.2. Samples: 58355712. Policy #0 lag: (min: 26.0, avg: 131.9, max: 282.0) [2024-06-15 12:55:00,767][1648981] Avg episode reward: [(0, '236.120')] [2024-06-15 12:55:01,092][1651274] Saving new best policy, reward=236.120! [2024-06-15 12:55:01,938][1651669] Updated weights for policy 0, policy_version 113872 (0.0014) [2024-06-15 12:55:03,655][1651669] Updated weights for policy 0, policy_version 113936 (0.0025) [2024-06-15 12:55:04,862][1651669] Updated weights for policy 0, policy_version 113983 (0.0012) [2024-06-15 12:55:05,803][1648981] Fps is (10 sec: 52235.6, 60 sec: 48595.1, 300 sec: 48204.9). Total num frames: 233439232. Throughput: 0: 11766.3. Samples: 58416640. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:05,804][1648981] Avg episode reward: [(0, '242.890')] [2024-06-15 12:55:05,805][1651274] Saving new best policy, reward=242.890! [2024-06-15 12:55:09,004][1651669] Updated weights for policy 0, policy_version 114039 (0.0014) [2024-06-15 12:55:10,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 233570304. Throughput: 0: 11867.0. Samples: 58462208. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:10,767][1648981] Avg episode reward: [(0, '231.410')] [2024-06-15 12:55:12,345][1651669] Updated weights for policy 0, policy_version 114112 (0.0014) [2024-06-15 12:55:13,717][1651669] Updated weights for policy 0, policy_version 114169 (0.0013) [2024-06-15 12:55:15,671][1651669] Updated weights for policy 0, policy_version 114232 (0.0025) [2024-06-15 12:55:15,798][1648981] Fps is (10 sec: 49180.2, 60 sec: 49672.3, 300 sec: 48313.8). Total num frames: 233930752. Throughput: 0: 11870.2. Samples: 58527232. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:15,798][1648981] Avg episode reward: [(0, '229.400')] [2024-06-15 12:55:19,883][1651669] Updated weights for policy 0, policy_version 114278 (0.0011) [2024-06-15 12:55:20,774][1648981] Fps is (10 sec: 52387.5, 60 sec: 46961.2, 300 sec: 47651.2). Total num frames: 234094592. Throughput: 0: 11819.4. Samples: 58604032. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:20,775][1648981] Avg episode reward: [(0, '235.210')] [2024-06-15 12:55:21,965][1651274] Signal inference workers to stop experience collection... (6000 times) [2024-06-15 12:55:22,031][1651669] InferenceWorker_p0-w0: stopping experience collection (6000 times) [2024-06-15 12:55:22,280][1651274] Signal inference workers to resume experience collection... (6000 times) [2024-06-15 12:55:22,281][1651669] InferenceWorker_p0-w0: resuming experience collection (6000 times) [2024-06-15 12:55:22,608][1651669] Updated weights for policy 0, policy_version 114336 (0.0013) [2024-06-15 12:55:24,325][1651669] Updated weights for policy 0, policy_version 114416 (0.0119) [2024-06-15 12:55:25,692][1651669] Updated weights for policy 0, policy_version 114464 (0.0013) [2024-06-15 12:55:25,766][1648981] Fps is (10 sec: 49306.2, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 234422272. Throughput: 0: 11924.1. Samples: 58637824. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:25,767][1648981] Avg episode reward: [(0, '238.050')] [2024-06-15 12:55:30,274][1651669] Updated weights for policy 0, policy_version 114512 (0.0014) [2024-06-15 12:55:30,780][1648981] Fps is (10 sec: 45851.4, 60 sec: 46957.3, 300 sec: 47650.3). Total num frames: 234553344. Throughput: 0: 11874.9. Samples: 58710016. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:30,780][1648981] Avg episode reward: [(0, '227.910')] [2024-06-15 12:55:31,173][1651669] Updated weights for policy 0, policy_version 114558 (0.0018) [2024-06-15 12:55:33,991][1651669] Updated weights for policy 0, policy_version 114617 (0.0133) [2024-06-15 12:55:35,055][1651669] Updated weights for policy 0, policy_version 114658 (0.0012) [2024-06-15 12:55:35,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 234881024. Throughput: 0: 11946.7. Samples: 58779136. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:35,767][1648981] Avg episode reward: [(0, '225.100')] [2024-06-15 12:55:36,667][1651669] Updated weights for policy 0, policy_version 114737 (0.0013) [2024-06-15 12:55:40,752][1651669] Updated weights for policy 0, policy_version 114770 (0.0013) [2024-06-15 12:55:40,766][1648981] Fps is (10 sec: 49217.0, 60 sec: 46421.8, 300 sec: 47874.6). Total num frames: 235044864. Throughput: 0: 12083.2. Samples: 58817024. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:40,767][1648981] Avg episode reward: [(0, '229.930')] [2024-06-15 12:55:43,362][1651669] Updated weights for policy 0, policy_version 114834 (0.0015) [2024-06-15 12:55:44,615][1651669] Updated weights for policy 0, policy_version 114896 (0.0015) [2024-06-15 12:55:45,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 235405312. Throughput: 0: 12049.0. Samples: 58897920. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:45,767][1648981] Avg episode reward: [(0, '235.000')] [2024-06-15 12:55:46,380][1651669] Updated weights for policy 0, policy_version 114953 (0.0014) [2024-06-15 12:55:50,791][1648981] Fps is (10 sec: 49032.5, 60 sec: 46948.4, 300 sec: 47983.1). Total num frames: 235536384. Throughput: 0: 12325.6. Samples: 58971136. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:50,791][1648981] Avg episode reward: [(0, '240.630')] [2024-06-15 12:55:51,287][1651669] Updated weights for policy 0, policy_version 115012 (0.0134) [2024-06-15 12:55:52,447][1651669] Updated weights for policy 0, policy_version 115072 (0.0014) [2024-06-15 12:55:54,604][1651669] Updated weights for policy 0, policy_version 115120 (0.0013) [2024-06-15 12:55:55,641][1651669] Updated weights for policy 0, policy_version 115157 (0.0013) [2024-06-15 12:55:55,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48605.9, 300 sec: 47763.5). Total num frames: 235831296. Throughput: 0: 12174.3. Samples: 59010048. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:55:55,767][1648981] Avg episode reward: [(0, '239.280')] [2024-06-15 12:55:58,123][1651669] Updated weights for policy 0, policy_version 115232 (0.0034) [2024-06-15 12:56:00,767][1648981] Fps is (10 sec: 52556.1, 60 sec: 49151.9, 300 sec: 48207.8). Total num frames: 236060672. Throughput: 0: 12137.1. Samples: 59073024. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:56:00,767][1648981] Avg episode reward: [(0, '225.170')] [2024-06-15 12:56:02,467][1651274] Signal inference workers to stop experience collection... (6050 times) [2024-06-15 12:56:02,576][1651669] InferenceWorker_p0-w0: stopping experience collection (6050 times) [2024-06-15 12:56:02,782][1651274] Signal inference workers to resume experience collection... (6050 times) [2024-06-15 12:56:02,798][1651669] InferenceWorker_p0-w0: resuming experience collection (6050 times) [2024-06-15 12:56:03,234][1651669] Updated weights for policy 0, policy_version 115312 (0.0016) [2024-06-15 12:56:04,574][1651669] Updated weights for policy 0, policy_version 115345 (0.0010) [2024-06-15 12:56:05,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 48089.3, 300 sec: 47874.7). Total num frames: 236322816. Throughput: 0: 12176.3. Samples: 59151872. Policy #0 lag: (min: 40.0, avg: 192.5, max: 296.0) [2024-06-15 12:56:05,767][1648981] Avg episode reward: [(0, '223.850')] [2024-06-15 12:56:06,308][1651669] Updated weights for policy 0, policy_version 115396 (0.0013) [2024-06-15 12:56:07,842][1651669] Updated weights for policy 0, policy_version 115459 (0.0015) [2024-06-15 12:56:10,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 236584960. Throughput: 0: 12174.2. Samples: 59185664. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:10,767][1648981] Avg episode reward: [(0, '222.580')] [2024-06-15 12:56:13,305][1651669] Updated weights for policy 0, policy_version 115523 (0.0017) [2024-06-15 12:56:14,747][1651669] Updated weights for policy 0, policy_version 115584 (0.0014) [2024-06-15 12:56:15,768][1648981] Fps is (10 sec: 42591.3, 60 sec: 46990.5, 300 sec: 47652.1). Total num frames: 236748800. Throughput: 0: 12200.0. Samples: 59258880. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:15,769][1648981] Avg episode reward: [(0, '220.070')] [2024-06-15 12:56:16,449][1651669] Updated weights for policy 0, policy_version 115648 (0.0013) [2024-06-15 12:56:19,141][1651669] Updated weights for policy 0, policy_version 115699 (0.0012) [2024-06-15 12:56:20,579][1651669] Updated weights for policy 0, policy_version 115767 (0.0023) [2024-06-15 12:56:20,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 50250.9, 300 sec: 48430.0). Total num frames: 237109248. Throughput: 0: 12014.9. Samples: 59319808. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:20,767][1648981] Avg episode reward: [(0, '216.660')] [2024-06-15 12:56:25,775][1648981] Fps is (10 sec: 42570.2, 60 sec: 45868.7, 300 sec: 47651.1). Total num frames: 237174784. Throughput: 0: 12149.2. Samples: 59363840. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:25,775][1648981] Avg episode reward: [(0, '216.670')] [2024-06-15 12:56:26,153][1651669] Updated weights for policy 0, policy_version 115825 (0.0029) [2024-06-15 12:56:27,670][1651669] Updated weights for policy 0, policy_version 115888 (0.0016) [2024-06-15 12:56:30,201][1651669] Updated weights for policy 0, policy_version 115952 (0.0057) [2024-06-15 12:56:30,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 49162.8, 300 sec: 47985.7). Total num frames: 237502464. Throughput: 0: 11935.3. Samples: 59435008. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:30,767][1648981] Avg episode reward: [(0, '217.890')] [2024-06-15 12:56:32,017][1651669] Updated weights for policy 0, policy_version 116018 (0.0107) [2024-06-15 12:56:35,767][1648981] Fps is (10 sec: 45913.8, 60 sec: 45875.1, 300 sec: 47652.4). Total num frames: 237633536. Throughput: 0: 11896.2. Samples: 59506176. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:35,767][1648981] Avg episode reward: [(0, '223.790')] [2024-06-15 12:56:36,363][1651669] Updated weights for policy 0, policy_version 116052 (0.0014) [2024-06-15 12:56:37,986][1651669] Updated weights for policy 0, policy_version 116112 (0.0103) [2024-06-15 12:56:39,022][1651669] Updated weights for policy 0, policy_version 116160 (0.0010) [2024-06-15 12:56:40,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 47513.3, 300 sec: 47541.3). Total num frames: 237895680. Throughput: 0: 11639.4. Samples: 59533824. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:40,767][1648981] Avg episode reward: [(0, '209.430')] [2024-06-15 12:56:42,676][1651669] Updated weights for policy 0, policy_version 116227 (0.0013) [2024-06-15 12:56:43,059][1651274] Signal inference workers to stop experience collection... (6100 times) [2024-06-15 12:56:43,147][1651669] InferenceWorker_p0-w0: stopping experience collection (6100 times) [2024-06-15 12:56:43,243][1651274] Signal inference workers to resume experience collection... (6100 times) [2024-06-15 12:56:43,244][1651669] InferenceWorker_p0-w0: resuming experience collection (6100 times) [2024-06-15 12:56:43,811][1651669] Updated weights for policy 0, policy_version 116287 (0.0142) [2024-06-15 12:56:45,774][1648981] Fps is (10 sec: 52388.7, 60 sec: 45869.3, 300 sec: 47762.3). Total num frames: 238157824. Throughput: 0: 11648.9. Samples: 59597312. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:45,775][1648981] Avg episode reward: [(0, '209.240')] [2024-06-15 12:56:48,536][1651669] Updated weights for policy 0, policy_version 116337 (0.0014) [2024-06-15 12:56:50,063][1651669] Updated weights for policy 0, policy_version 116403 (0.0013) [2024-06-15 12:56:50,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 48079.2, 300 sec: 47542.0). Total num frames: 238419968. Throughput: 0: 11685.0. Samples: 59677696. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:50,767][1648981] Avg episode reward: [(0, '217.750')] [2024-06-15 12:56:52,072][1651669] Updated weights for policy 0, policy_version 116438 (0.0029) [2024-06-15 12:56:53,769][1651669] Updated weights for policy 0, policy_version 116512 (0.0016) [2024-06-15 12:56:55,766][1648981] Fps is (10 sec: 52469.5, 60 sec: 47513.6, 300 sec: 48207.9). Total num frames: 238682112. Throughput: 0: 11764.6. Samples: 59715072. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:56:55,767][1648981] Avg episode reward: [(0, '211.290')] [2024-06-15 12:56:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000116544_238682112.pth... [2024-06-15 12:56:55,835][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000110912_227147776.pth [2024-06-15 12:56:58,115][1651669] Updated weights for policy 0, policy_version 116562 (0.0014) [2024-06-15 12:56:59,810][1651669] Updated weights for policy 0, policy_version 116640 (0.0012) [2024-06-15 12:57:00,772][1648981] Fps is (10 sec: 52398.4, 60 sec: 48055.2, 300 sec: 47873.7). Total num frames: 238944256. Throughput: 0: 11786.4. Samples: 59789312. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:57:00,773][1648981] Avg episode reward: [(0, '217.570')] [2024-06-15 12:57:03,560][1651669] Updated weights for policy 0, policy_version 116693 (0.0013) [2024-06-15 12:57:05,097][1651669] Updated weights for policy 0, policy_version 116768 (0.0013) [2024-06-15 12:57:05,778][1648981] Fps is (10 sec: 49093.8, 60 sec: 47504.4, 300 sec: 48205.9). Total num frames: 239173632. Throughput: 0: 11977.7. Samples: 59858944. Policy #0 lag: (min: 15.0, avg: 137.5, max: 271.0) [2024-06-15 12:57:05,779][1648981] Avg episode reward: [(0, '221.520')] [2024-06-15 12:57:08,763][1651669] Updated weights for policy 0, policy_version 116802 (0.0036) [2024-06-15 12:57:10,654][1651669] Updated weights for policy 0, policy_version 116880 (0.0011) [2024-06-15 12:57:10,767][1648981] Fps is (10 sec: 42619.6, 60 sec: 46420.7, 300 sec: 47652.3). Total num frames: 239370240. Throughput: 0: 11800.8. Samples: 59894784. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:10,768][1648981] Avg episode reward: [(0, '224.840')] [2024-06-15 12:57:14,605][1651669] Updated weights for policy 0, policy_version 116929 (0.0013) [2024-06-15 12:57:15,768][1648981] Fps is (10 sec: 39363.4, 60 sec: 46968.0, 300 sec: 47874.6). Total num frames: 239566848. Throughput: 0: 11855.3. Samples: 59968512. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:15,768][1648981] Avg episode reward: [(0, '221.980')] [2024-06-15 12:57:16,068][1651669] Updated weights for policy 0, policy_version 116998 (0.0079) [2024-06-15 12:57:19,905][1651669] Updated weights for policy 0, policy_version 117072 (0.0014) [2024-06-15 12:57:20,766][1648981] Fps is (10 sec: 45878.4, 60 sec: 45329.1, 300 sec: 47652.4). Total num frames: 239828992. Throughput: 0: 11764.6. Samples: 60035584. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:20,767][1648981] Avg episode reward: [(0, '220.470')] [2024-06-15 12:57:21,184][1651669] Updated weights for policy 0, policy_version 117124 (0.0013) [2024-06-15 12:57:21,752][1651274] Signal inference workers to stop experience collection... (6150 times) [2024-06-15 12:57:21,802][1651669] InferenceWorker_p0-w0: stopping experience collection (6150 times) [2024-06-15 12:57:22,082][1651274] Signal inference workers to resume experience collection... (6150 times) [2024-06-15 12:57:22,083][1651669] InferenceWorker_p0-w0: resuming experience collection (6150 times) [2024-06-15 12:57:22,429][1651669] Updated weights for policy 0, policy_version 117184 (0.0011) [2024-06-15 12:57:25,767][1648981] Fps is (10 sec: 42602.3, 60 sec: 46973.9, 300 sec: 47541.6). Total num frames: 239992832. Throughput: 0: 11912.5. Samples: 60069888. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:25,768][1648981] Avg episode reward: [(0, '222.390')] [2024-06-15 12:57:27,067][1651669] Updated weights for policy 0, policy_version 117255 (0.0029) [2024-06-15 12:57:28,067][1651669] Updated weights for policy 0, policy_version 117304 (0.0015) [2024-06-15 12:57:30,676][1651669] Updated weights for policy 0, policy_version 117347 (0.0014) [2024-06-15 12:57:30,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 240320512. Throughput: 0: 12199.1. Samples: 60146176. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:30,767][1648981] Avg episode reward: [(0, '224.830')] [2024-06-15 12:57:32,464][1651669] Updated weights for policy 0, policy_version 117424 (0.0020) [2024-06-15 12:57:35,767][1648981] Fps is (10 sec: 52430.2, 60 sec: 48059.8, 300 sec: 47541.3). Total num frames: 240517120. Throughput: 0: 12014.9. Samples: 60218368. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:35,767][1648981] Avg episode reward: [(0, '227.450')] [2024-06-15 12:57:37,325][1651669] Updated weights for policy 0, policy_version 117473 (0.0020) [2024-06-15 12:57:38,940][1651669] Updated weights for policy 0, policy_version 117543 (0.0012) [2024-06-15 12:57:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48060.0, 300 sec: 47986.3). Total num frames: 240779264. Throughput: 0: 11901.2. Samples: 60250624. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:40,767][1648981] Avg episode reward: [(0, '234.920')] [2024-06-15 12:57:41,991][1651669] Updated weights for policy 0, policy_version 117586 (0.0020) [2024-06-15 12:57:43,136][1651669] Updated weights for policy 0, policy_version 117648 (0.0015) [2024-06-15 12:57:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48065.9, 300 sec: 47874.6). Total num frames: 241041408. Throughput: 0: 12005.1. Samples: 60329472. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:45,767][1648981] Avg episode reward: [(0, '236.740')] [2024-06-15 12:57:47,013][1651669] Updated weights for policy 0, policy_version 117698 (0.0013) [2024-06-15 12:57:48,504][1651669] Updated weights for policy 0, policy_version 117763 (0.0108) [2024-06-15 12:57:49,767][1651669] Updated weights for policy 0, policy_version 117824 (0.0026) [2024-06-15 12:57:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 241303552. Throughput: 0: 11881.5. Samples: 60393472. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:50,767][1648981] Avg episode reward: [(0, '236.170')] [2024-06-15 12:57:53,962][1651669] Updated weights for policy 0, policy_version 117888 (0.0016) [2024-06-15 12:57:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 241565696. Throughput: 0: 11946.9. Samples: 60432384. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:57:55,767][1648981] Avg episode reward: [(0, '243.930')] [2024-06-15 12:57:55,772][1651274] Saving new best policy, reward=243.930! [2024-06-15 12:57:58,820][1651669] Updated weights for policy 0, policy_version 117984 (0.0014) [2024-06-15 12:58:00,353][1651669] Updated weights for policy 0, policy_version 118048 (0.0185) [2024-06-15 12:58:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 47518.2, 300 sec: 48318.9). Total num frames: 241795072. Throughput: 0: 11912.9. Samples: 60504576. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:58:00,767][1648981] Avg episode reward: [(0, '243.570')] [2024-06-15 12:58:03,612][1651274] Signal inference workers to stop experience collection... (6200 times) [2024-06-15 12:58:03,652][1651669] InferenceWorker_p0-w0: stopping experience collection (6200 times) [2024-06-15 12:58:03,891][1651274] Signal inference workers to resume experience collection... (6200 times) [2024-06-15 12:58:03,893][1651669] InferenceWorker_p0-w0: resuming experience collection (6200 times) [2024-06-15 12:58:04,465][1651669] Updated weights for policy 0, policy_version 118115 (0.0099) [2024-06-15 12:58:05,770][1648981] Fps is (10 sec: 45857.9, 60 sec: 47520.0, 300 sec: 47874.0). Total num frames: 242024448. Throughput: 0: 11843.3. Samples: 60568576. Policy #0 lag: (min: 4.0, avg: 101.9, max: 260.0) [2024-06-15 12:58:05,771][1648981] Avg episode reward: [(0, '231.860')] [2024-06-15 12:58:05,932][1651669] Updated weights for policy 0, policy_version 118180 (0.0013) [2024-06-15 12:58:09,599][1651669] Updated weights for policy 0, policy_version 118224 (0.0077) [2024-06-15 12:58:10,783][1648981] Fps is (10 sec: 42528.4, 60 sec: 47501.2, 300 sec: 47983.0). Total num frames: 242221056. Throughput: 0: 11999.3. Samples: 60610048. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:10,783][1648981] Avg episode reward: [(0, '236.880')] [2024-06-15 12:58:11,528][1651669] Updated weights for policy 0, policy_version 118304 (0.0013) [2024-06-15 12:58:15,306][1651669] Updated weights for policy 0, policy_version 118339 (0.0018) [2024-06-15 12:58:15,767][1648981] Fps is (10 sec: 39335.4, 60 sec: 47514.4, 300 sec: 47763.5). Total num frames: 242417664. Throughput: 0: 11730.4. Samples: 60674048. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:15,767][1648981] Avg episode reward: [(0, '244.420')] [2024-06-15 12:58:16,095][1651274] Saving new best policy, reward=244.420! [2024-06-15 12:58:16,587][1651669] Updated weights for policy 0, policy_version 118402 (0.0012) [2024-06-15 12:58:17,679][1651669] Updated weights for policy 0, policy_version 118454 (0.0012) [2024-06-15 12:58:20,766][1648981] Fps is (10 sec: 42668.5, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 242647040. Throughput: 0: 11889.8. Samples: 60753408. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:20,767][1648981] Avg episode reward: [(0, '242.270')] [2024-06-15 12:58:21,097][1651669] Updated weights for policy 0, policy_version 118497 (0.0013) [2024-06-15 12:58:22,767][1651669] Updated weights for policy 0, policy_version 118560 (0.0013) [2024-06-15 12:58:25,766][1648981] Fps is (10 sec: 45876.4, 60 sec: 48060.0, 300 sec: 47874.6). Total num frames: 242876416. Throughput: 0: 11855.6. Samples: 60784128. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:25,767][1648981] Avg episode reward: [(0, '254.670')] [2024-06-15 12:58:25,782][1651274] Saving new best policy, reward=254.670! [2024-06-15 12:58:26,647][1651669] Updated weights for policy 0, policy_version 118624 (0.0035) [2024-06-15 12:58:28,497][1651669] Updated weights for policy 0, policy_version 118690 (0.0013) [2024-06-15 12:58:30,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 243138560. Throughput: 0: 11571.2. Samples: 60850176. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:30,767][1648981] Avg episode reward: [(0, '245.990')] [2024-06-15 12:58:32,335][1651669] Updated weights for policy 0, policy_version 118752 (0.0014) [2024-06-15 12:58:33,949][1651669] Updated weights for policy 0, policy_version 118817 (0.0013) [2024-06-15 12:58:35,786][1648981] Fps is (10 sec: 52325.3, 60 sec: 48043.9, 300 sec: 47982.5). Total num frames: 243400704. Throughput: 0: 11691.2. Samples: 60919808. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:35,787][1648981] Avg episode reward: [(0, '243.370')] [2024-06-15 12:58:37,341][1651669] Updated weights for policy 0, policy_version 118864 (0.0015) [2024-06-15 12:58:38,576][1651669] Updated weights for policy 0, policy_version 118912 (0.0012) [2024-06-15 12:58:40,818][1648981] Fps is (10 sec: 52160.7, 60 sec: 48018.6, 300 sec: 47755.2). Total num frames: 243662848. Throughput: 0: 11603.5. Samples: 60955136. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:40,819][1648981] Avg episode reward: [(0, '247.570')] [2024-06-15 12:58:43,180][1651669] Updated weights for policy 0, policy_version 118980 (0.0013) [2024-06-15 12:58:43,795][1651274] Signal inference workers to stop experience collection... (6250 times) [2024-06-15 12:58:43,848][1651669] InferenceWorker_p0-w0: stopping experience collection (6250 times) [2024-06-15 12:58:43,987][1651274] Signal inference workers to resume experience collection... (6250 times) [2024-06-15 12:58:43,988][1651669] InferenceWorker_p0-w0: resuming experience collection (6250 times) [2024-06-15 12:58:44,200][1651669] Updated weights for policy 0, policy_version 119029 (0.0012) [2024-06-15 12:58:45,448][1651669] Updated weights for policy 0, policy_version 119092 (0.0014) [2024-06-15 12:58:45,770][1648981] Fps is (10 sec: 52514.9, 60 sec: 48057.0, 300 sec: 47985.1). Total num frames: 243924992. Throughput: 0: 11581.7. Samples: 61025792. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:45,770][1648981] Avg episode reward: [(0, '239.100')] [2024-06-15 12:58:48,845][1651669] Updated weights for policy 0, policy_version 119152 (0.0015) [2024-06-15 12:58:50,766][1648981] Fps is (10 sec: 42818.4, 60 sec: 46421.3, 300 sec: 47653.7). Total num frames: 244088832. Throughput: 0: 11799.8. Samples: 61099520. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:50,767][1648981] Avg episode reward: [(0, '236.480')] [2024-06-15 12:58:50,962][1651669] Updated weights for policy 0, policy_version 119202 (0.0013) [2024-06-15 12:58:53,608][1651669] Updated weights for policy 0, policy_version 119248 (0.0013) [2024-06-15 12:58:55,670][1651669] Updated weights for policy 0, policy_version 119328 (0.0011) [2024-06-15 12:58:55,766][1648981] Fps is (10 sec: 45890.9, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 244383744. Throughput: 0: 11837.2. Samples: 61142528. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:58:55,767][1648981] Avg episode reward: [(0, '229.470')] [2024-06-15 12:58:56,021][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000119344_244416512.pth... [2024-06-15 12:58:56,133][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000113728_232914944.pth [2024-06-15 12:58:56,457][1651669] Updated weights for policy 0, policy_version 119360 (0.0012) [2024-06-15 12:59:00,528][1651669] Updated weights for policy 0, policy_version 119408 (0.0012) [2024-06-15 12:59:00,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 45869.3, 300 sec: 47543.9). Total num frames: 244547584. Throughput: 0: 11899.2. Samples: 61209600. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:59:00,775][1648981] Avg episode reward: [(0, '237.880')] [2024-06-15 12:59:02,054][1651669] Updated weights for policy 0, policy_version 119456 (0.0014) [2024-06-15 12:59:02,980][1651669] Updated weights for policy 0, policy_version 119486 (0.0012) [2024-06-15 12:59:05,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 45878.1, 300 sec: 47541.4). Total num frames: 244776960. Throughput: 0: 11730.5. Samples: 61281280. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:59:05,767][1648981] Avg episode reward: [(0, '236.930')] [2024-06-15 12:59:06,430][1651669] Updated weights for policy 0, policy_version 119556 (0.0032) [2024-06-15 12:59:10,497][1651669] Updated weights for policy 0, policy_version 119632 (0.0061) [2024-06-15 12:59:10,766][1648981] Fps is (10 sec: 45910.5, 60 sec: 46434.0, 300 sec: 47652.4). Total num frames: 245006336. Throughput: 0: 11810.1. Samples: 61315584. Policy #0 lag: (min: 45.0, avg: 130.9, max: 301.0) [2024-06-15 12:59:10,767][1648981] Avg episode reward: [(0, '233.490')] [2024-06-15 12:59:12,378][1651669] Updated weights for policy 0, policy_version 119696 (0.0014) [2024-06-15 12:59:15,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46967.7, 300 sec: 47319.2). Total num frames: 245235712. Throughput: 0: 11855.6. Samples: 61383680. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:15,767][1648981] Avg episode reward: [(0, '238.500')] [2024-06-15 12:59:16,490][1651669] Updated weights for policy 0, policy_version 119761 (0.0013) [2024-06-15 12:59:18,446][1651669] Updated weights for policy 0, policy_version 119844 (0.0018) [2024-06-15 12:59:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 245497856. Throughput: 0: 11986.1. Samples: 61458944. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:20,767][1648981] Avg episode reward: [(0, '231.110')] [2024-06-15 12:59:22,067][1651669] Updated weights for policy 0, policy_version 119904 (0.0146) [2024-06-15 12:59:23,404][1651274] Signal inference workers to stop experience collection... (6300 times) [2024-06-15 12:59:23,436][1651669] InferenceWorker_p0-w0: stopping experience collection (6300 times) [2024-06-15 12:59:23,438][1651669] Updated weights for policy 0, policy_version 119954 (0.0026) [2024-06-15 12:59:23,719][1651274] Signal inference workers to resume experience collection... (6300 times) [2024-06-15 12:59:23,722][1651669] InferenceWorker_p0-w0: resuming experience collection (6300 times) [2024-06-15 12:59:25,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 245760000. Throughput: 0: 11891.9. Samples: 61489664. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:25,767][1648981] Avg episode reward: [(0, '223.130')] [2024-06-15 12:59:27,646][1651669] Updated weights for policy 0, policy_version 120002 (0.0013) [2024-06-15 12:59:29,334][1651669] Updated weights for policy 0, policy_version 120080 (0.0012) [2024-06-15 12:59:30,790][1648981] Fps is (10 sec: 52304.3, 60 sec: 48040.7, 300 sec: 47759.7). Total num frames: 246022144. Throughput: 0: 11782.0. Samples: 61556224. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:30,791][1648981] Avg episode reward: [(0, '227.940')] [2024-06-15 12:59:33,194][1651669] Updated weights for policy 0, policy_version 120151 (0.0014) [2024-06-15 12:59:34,449][1651669] Updated weights for policy 0, policy_version 120196 (0.0012) [2024-06-15 12:59:35,767][1648981] Fps is (10 sec: 49151.9, 60 sec: 47529.2, 300 sec: 47430.4). Total num frames: 246251520. Throughput: 0: 11684.9. Samples: 61625344. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:35,767][1648981] Avg episode reward: [(0, '238.760')] [2024-06-15 12:59:35,900][1651669] Updated weights for policy 0, policy_version 120256 (0.0011) [2024-06-15 12:59:39,870][1651669] Updated weights for policy 0, policy_version 120304 (0.0018) [2024-06-15 12:59:40,767][1648981] Fps is (10 sec: 42699.1, 60 sec: 46461.0, 300 sec: 47652.4). Total num frames: 246448128. Throughput: 0: 11571.2. Samples: 61663232. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:40,767][1648981] Avg episode reward: [(0, '246.090')] [2024-06-15 12:59:41,196][1651669] Updated weights for policy 0, policy_version 120368 (0.0014) [2024-06-15 12:59:43,546][1651669] Updated weights for policy 0, policy_version 120387 (0.0013) [2024-06-15 12:59:44,572][1651669] Updated weights for policy 0, policy_version 120434 (0.0012) [2024-06-15 12:59:45,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 46970.3, 300 sec: 47541.4). Total num frames: 246743040. Throughput: 0: 11755.3. Samples: 61738496. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:45,767][1648981] Avg episode reward: [(0, '247.990')] [2024-06-15 12:59:46,517][1651669] Updated weights for policy 0, policy_version 120512 (0.0012) [2024-06-15 12:59:50,770][1648981] Fps is (10 sec: 42582.9, 60 sec: 46418.4, 300 sec: 47318.6). Total num frames: 246874112. Throughput: 0: 11661.2. Samples: 61806080. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:50,771][1648981] Avg episode reward: [(0, '241.040')] [2024-06-15 12:59:51,042][1651669] Updated weights for policy 0, policy_version 120570 (0.0012) [2024-06-15 12:59:51,911][1651669] Updated weights for policy 0, policy_version 120609 (0.0016) [2024-06-15 12:59:55,143][1651669] Updated weights for policy 0, policy_version 120659 (0.0015) [2024-06-15 12:59:55,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 46421.2, 300 sec: 47652.4). Total num frames: 247169024. Throughput: 0: 11753.2. Samples: 61844480. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 12:59:55,767][1648981] Avg episode reward: [(0, '234.330')] [2024-06-15 12:59:56,448][1651669] Updated weights for policy 0, policy_version 120720 (0.0103) [2024-06-15 12:59:57,751][1651669] Updated weights for policy 0, policy_version 120768 (0.0036) [2024-06-15 13:00:00,794][1648981] Fps is (10 sec: 45764.8, 60 sec: 46405.7, 300 sec: 47098.5). Total num frames: 247332864. Throughput: 0: 11723.2. Samples: 61911552. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 13:00:00,795][1648981] Avg episode reward: [(0, '235.870')] [2024-06-15 13:00:02,741][1651669] Updated weights for policy 0, policy_version 120834 (0.0013) [2024-06-15 13:00:03,966][1651669] Updated weights for policy 0, policy_version 120890 (0.0013) [2024-06-15 13:00:05,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 247595008. Throughput: 0: 11685.0. Samples: 61984768. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 13:00:05,767][1648981] Avg episode reward: [(0, '241.790')] [2024-06-15 13:00:05,922][1651274] Signal inference workers to stop experience collection... (6350 times) [2024-06-15 13:00:05,995][1651669] InferenceWorker_p0-w0: stopping experience collection (6350 times) [2024-06-15 13:00:06,222][1651274] Signal inference workers to resume experience collection... (6350 times) [2024-06-15 13:00:06,223][1651669] InferenceWorker_p0-w0: resuming experience collection (6350 times) [2024-06-15 13:00:07,015][1651669] Updated weights for policy 0, policy_version 120944 (0.0013) [2024-06-15 13:00:08,588][1651669] Updated weights for policy 0, policy_version 121008 (0.0014) [2024-06-15 13:00:10,775][1648981] Fps is (10 sec: 52532.4, 60 sec: 47507.2, 300 sec: 47211.8). Total num frames: 247857152. Throughput: 0: 11591.9. Samples: 62011392. Policy #0 lag: (min: 31.0, avg: 147.8, max: 271.0) [2024-06-15 13:00:10,775][1648981] Avg episode reward: [(0, '240.260')] [2024-06-15 13:00:12,845][1651669] Updated weights for policy 0, policy_version 121052 (0.0023) [2024-06-15 13:00:13,924][1651669] Updated weights for policy 0, policy_version 121105 (0.0020) [2024-06-15 13:00:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47542.7). Total num frames: 248119296. Throughput: 0: 11725.3. Samples: 62083584. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:15,767][1648981] Avg episode reward: [(0, '238.690')] [2024-06-15 13:00:17,545][1651669] Updated weights for policy 0, policy_version 121169 (0.0013) [2024-06-15 13:00:19,537][1651669] Updated weights for policy 0, policy_version 121249 (0.0098) [2024-06-15 13:00:20,769][1648981] Fps is (10 sec: 52460.5, 60 sec: 48058.0, 300 sec: 47318.9). Total num frames: 248381440. Throughput: 0: 11730.0. Samples: 62153216. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:20,769][1648981] Avg episode reward: [(0, '231.640')] [2024-06-15 13:00:24,426][1651669] Updated weights for policy 0, policy_version 121316 (0.0029) [2024-06-15 13:00:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46967.6, 300 sec: 47543.5). Total num frames: 248578048. Throughput: 0: 11798.8. Samples: 62194176. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:25,767][1648981] Avg episode reward: [(0, '220.040')] [2024-06-15 13:00:25,786][1651669] Updated weights for policy 0, policy_version 121380 (0.0119) [2024-06-15 13:00:27,849][1651669] Updated weights for policy 0, policy_version 121414 (0.0010) [2024-06-15 13:00:29,366][1651669] Updated weights for policy 0, policy_version 121472 (0.0015) [2024-06-15 13:00:30,766][1648981] Fps is (10 sec: 42607.6, 60 sec: 46439.7, 300 sec: 47208.1). Total num frames: 248807424. Throughput: 0: 11616.7. Samples: 62261248. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:30,767][1648981] Avg episode reward: [(0, '222.520')] [2024-06-15 13:00:31,287][1651669] Updated weights for policy 0, policy_version 121520 (0.0012) [2024-06-15 13:00:35,580][1651669] Updated weights for policy 0, policy_version 121584 (0.0011) [2024-06-15 13:00:35,776][1648981] Fps is (10 sec: 42555.7, 60 sec: 45867.7, 300 sec: 47317.6). Total num frames: 249004032. Throughput: 0: 11751.6. Samples: 62334976. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:35,777][1648981] Avg episode reward: [(0, '222.780')] [2024-06-15 13:00:36,853][1651669] Updated weights for policy 0, policy_version 121635 (0.0021) [2024-06-15 13:00:39,268][1651669] Updated weights for policy 0, policy_version 121680 (0.0088) [2024-06-15 13:00:40,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47513.8, 300 sec: 47097.1). Total num frames: 249298944. Throughput: 0: 11730.5. Samples: 62372352. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:40,767][1648981] Avg episode reward: [(0, '217.880')] [2024-06-15 13:00:40,860][1651669] Updated weights for policy 0, policy_version 121729 (0.0015) [2024-06-15 13:00:42,240][1651669] Updated weights for policy 0, policy_version 121787 (0.0012) [2024-06-15 13:00:45,767][1648981] Fps is (10 sec: 49200.5, 60 sec: 45875.0, 300 sec: 47323.1). Total num frames: 249495552. Throughput: 0: 11999.6. Samples: 62451200. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:45,767][1648981] Avg episode reward: [(0, '227.780')] [2024-06-15 13:00:46,078][1651274] Signal inference workers to stop experience collection... (6400 times) [2024-06-15 13:00:46,123][1651669] InferenceWorker_p0-w0: stopping experience collection (6400 times) [2024-06-15 13:00:46,127][1651669] Updated weights for policy 0, policy_version 121842 (0.0015) [2024-06-15 13:00:46,337][1651274] Signal inference workers to resume experience collection... (6400 times) [2024-06-15 13:00:46,366][1651669] InferenceWorker_p0-w0: resuming experience collection (6400 times) [2024-06-15 13:00:47,782][1651669] Updated weights for policy 0, policy_version 121919 (0.0026) [2024-06-15 13:00:50,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48062.9, 300 sec: 47208.1). Total num frames: 249757696. Throughput: 0: 11753.3. Samples: 62513664. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:50,767][1648981] Avg episode reward: [(0, '223.880')] [2024-06-15 13:00:51,038][1651669] Updated weights for policy 0, policy_version 121979 (0.0013) [2024-06-15 13:00:53,349][1651669] Updated weights for policy 0, policy_version 122045 (0.0012) [2024-06-15 13:00:55,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 46418.4, 300 sec: 47096.4). Total num frames: 249954304. Throughput: 0: 11822.6. Samples: 62543360. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:00:55,771][1648981] Avg episode reward: [(0, '220.200')] [2024-06-15 13:00:55,778][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000122048_249954304.pth... [2024-06-15 13:00:55,816][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000116544_238682112.pth [2024-06-15 13:00:57,002][1651669] Updated weights for policy 0, policy_version 122096 (0.0015) [2024-06-15 13:00:59,069][1651669] Updated weights for policy 0, policy_version 122173 (0.0177) [2024-06-15 13:01:00,796][1648981] Fps is (10 sec: 45740.2, 60 sec: 48058.6, 300 sec: 47092.4). Total num frames: 250216448. Throughput: 0: 11995.7. Samples: 62623744. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:01:00,797][1648981] Avg episode reward: [(0, '224.480')] [2024-06-15 13:01:01,593][1651669] Updated weights for policy 0, policy_version 122230 (0.0095) [2024-06-15 13:01:03,429][1651669] Updated weights for policy 0, policy_version 122275 (0.0017) [2024-06-15 13:01:04,228][1651669] Updated weights for policy 0, policy_version 122304 (0.0011) [2024-06-15 13:01:05,766][1648981] Fps is (10 sec: 52449.4, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 250478592. Throughput: 0: 12083.8. Samples: 62696960. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:01:05,767][1648981] Avg episode reward: [(0, '225.500')] [2024-06-15 13:01:08,857][1651669] Updated weights for policy 0, policy_version 122384 (0.0012) [2024-06-15 13:01:10,766][1648981] Fps is (10 sec: 52583.4, 60 sec: 48066.3, 300 sec: 47430.6). Total num frames: 250740736. Throughput: 0: 11958.0. Samples: 62732288. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:01:10,767][1648981] Avg episode reward: [(0, '230.860')] [2024-06-15 13:01:12,067][1651669] Updated weights for policy 0, policy_version 122453 (0.0110) [2024-06-15 13:01:13,084][1651669] Updated weights for policy 0, policy_version 122492 (0.0021) [2024-06-15 13:01:15,433][1651669] Updated weights for policy 0, policy_version 122547 (0.0024) [2024-06-15 13:01:15,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 48056.7, 300 sec: 47096.5). Total num frames: 251002880. Throughput: 0: 11900.2. Samples: 62796800. Policy #0 lag: (min: 82.0, avg: 151.6, max: 290.0) [2024-06-15 13:01:15,771][1648981] Avg episode reward: [(0, '220.830')] [2024-06-15 13:01:18,888][1651669] Updated weights for policy 0, policy_version 122580 (0.0028) [2024-06-15 13:01:20,258][1651669] Updated weights for policy 0, policy_version 122643 (0.0042) [2024-06-15 13:01:20,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46969.1, 300 sec: 47542.7). Total num frames: 251199488. Throughput: 0: 11903.8. Samples: 62870528. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:20,767][1648981] Avg episode reward: [(0, '225.110')] [2024-06-15 13:01:23,351][1651669] Updated weights for policy 0, policy_version 122690 (0.0015) [2024-06-15 13:01:24,862][1651669] Updated weights for policy 0, policy_version 122752 (0.0015) [2024-06-15 13:01:25,766][1648981] Fps is (10 sec: 45891.9, 60 sec: 48059.6, 300 sec: 47319.2). Total num frames: 251461632. Throughput: 0: 11912.5. Samples: 62908416. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:25,767][1648981] Avg episode reward: [(0, '229.260')] [2024-06-15 13:01:26,338][1651669] Updated weights for policy 0, policy_version 122809 (0.0014) [2024-06-15 13:01:29,278][1651274] Signal inference workers to stop experience collection... (6450 times) [2024-06-15 13:01:29,310][1651669] InferenceWorker_p0-w0: stopping experience collection (6450 times) [2024-06-15 13:01:29,526][1651274] Signal inference workers to resume experience collection... (6450 times) [2024-06-15 13:01:29,527][1651669] InferenceWorker_p0-w0: resuming experience collection (6450 times) [2024-06-15 13:01:30,186][1651669] Updated weights for policy 0, policy_version 122850 (0.0014) [2024-06-15 13:01:30,767][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 251658240. Throughput: 0: 11935.3. Samples: 62988288. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:30,767][1648981] Avg episode reward: [(0, '229.630')] [2024-06-15 13:01:31,342][1651669] Updated weights for policy 0, policy_version 122912 (0.0015) [2024-06-15 13:01:33,619][1651669] Updated weights for policy 0, policy_version 122960 (0.0012) [2024-06-15 13:01:35,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 49160.2, 300 sec: 47652.5). Total num frames: 251953152. Throughput: 0: 11980.8. Samples: 63052800. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:35,767][1648981] Avg episode reward: [(0, '227.170')] [2024-06-15 13:01:35,894][1651669] Updated weights for policy 0, policy_version 123028 (0.0016) [2024-06-15 13:01:36,852][1651669] Updated weights for policy 0, policy_version 123070 (0.0144) [2024-06-15 13:01:40,770][1648981] Fps is (10 sec: 45858.4, 60 sec: 46964.5, 300 sec: 47319.9). Total num frames: 252116992. Throughput: 0: 12117.4. Samples: 63088640. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:40,771][1648981] Avg episode reward: [(0, '227.900')] [2024-06-15 13:01:41,279][1651669] Updated weights for policy 0, policy_version 123124 (0.0015) [2024-06-15 13:01:42,650][1651669] Updated weights for policy 0, policy_version 123188 (0.0012) [2024-06-15 13:01:44,507][1651669] Updated weights for policy 0, policy_version 123222 (0.0051) [2024-06-15 13:01:45,806][1648981] Fps is (10 sec: 48955.6, 60 sec: 49119.3, 300 sec: 47534.9). Total num frames: 252444672. Throughput: 0: 11978.0. Samples: 63162880. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:45,807][1648981] Avg episode reward: [(0, '231.080')] [2024-06-15 13:01:46,631][1651669] Updated weights for policy 0, policy_version 123265 (0.0122) [2024-06-15 13:01:47,952][1651669] Updated weights for policy 0, policy_version 123324 (0.0015) [2024-06-15 13:01:50,799][1648981] Fps is (10 sec: 45745.6, 60 sec: 46942.3, 300 sec: 47091.9). Total num frames: 252575744. Throughput: 0: 11938.2. Samples: 63234560. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:50,799][1648981] Avg episode reward: [(0, '236.630')] [2024-06-15 13:01:51,982][1651669] Updated weights for policy 0, policy_version 123376 (0.0013) [2024-06-15 13:01:53,232][1651669] Updated weights for policy 0, policy_version 123427 (0.0021) [2024-06-15 13:01:55,548][1651669] Updated weights for policy 0, policy_version 123476 (0.0011) [2024-06-15 13:01:55,776][1648981] Fps is (10 sec: 46015.2, 60 sec: 49147.3, 300 sec: 47318.6). Total num frames: 252903424. Throughput: 0: 12012.4. Samples: 63272960. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:01:55,777][1648981] Avg episode reward: [(0, '234.750')] [2024-06-15 13:01:57,853][1651669] Updated weights for policy 0, policy_version 123557 (0.0013) [2024-06-15 13:02:00,766][1648981] Fps is (10 sec: 52597.8, 60 sec: 48083.3, 300 sec: 47210.0). Total num frames: 253100032. Throughput: 0: 12175.3. Samples: 63344640. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:02:00,767][1648981] Avg episode reward: [(0, '245.860')] [2024-06-15 13:02:02,349][1651669] Updated weights for policy 0, policy_version 123644 (0.0016) [2024-06-15 13:02:04,277][1651669] Updated weights for policy 0, policy_version 123696 (0.0015) [2024-06-15 13:02:05,800][1648981] Fps is (10 sec: 45763.9, 60 sec: 48032.5, 300 sec: 47425.0). Total num frames: 253362176. Throughput: 0: 12165.1. Samples: 63418368. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:02:05,801][1648981] Avg episode reward: [(0, '246.720')] [2024-06-15 13:02:06,195][1651669] Updated weights for policy 0, policy_version 123713 (0.0060) [2024-06-15 13:02:07,327][1651669] Updated weights for policy 0, policy_version 123772 (0.0018) [2024-06-15 13:02:08,983][1651274] Signal inference workers to stop experience collection... (6500 times) [2024-06-15 13:02:09,055][1651669] InferenceWorker_p0-w0: stopping experience collection (6500 times) [2024-06-15 13:02:09,314][1651274] Signal inference workers to resume experience collection... (6500 times) [2024-06-15 13:02:09,324][1651669] InferenceWorker_p0-w0: resuming experience collection (6500 times) [2024-06-15 13:02:09,503][1651669] Updated weights for policy 0, policy_version 123833 (0.0022) [2024-06-15 13:02:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 47652.7). Total num frames: 253624320. Throughput: 0: 12174.3. Samples: 63456256. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:02:10,767][1648981] Avg episode reward: [(0, '246.700')] [2024-06-15 13:02:14,662][1651669] Updated weights for policy 0, policy_version 123920 (0.0015) [2024-06-15 13:02:15,618][1651669] Updated weights for policy 0, policy_version 123966 (0.0012) [2024-06-15 13:02:15,798][1648981] Fps is (10 sec: 52439.9, 60 sec: 48037.3, 300 sec: 47647.3). Total num frames: 253886464. Throughput: 0: 11949.6. Samples: 63526400. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:02:15,799][1648981] Avg episode reward: [(0, '248.060')] [2024-06-15 13:02:18,304][1651669] Updated weights for policy 0, policy_version 124024 (0.0027) [2024-06-15 13:02:20,277][1651669] Updated weights for policy 0, policy_version 124086 (0.0012) [2024-06-15 13:02:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 47985.7). Total num frames: 254148608. Throughput: 0: 11980.8. Samples: 63591936. Policy #0 lag: (min: 11.0, avg: 97.5, max: 267.0) [2024-06-15 13:02:20,767][1648981] Avg episode reward: [(0, '249.210')] [2024-06-15 13:02:23,888][1651669] Updated weights for policy 0, policy_version 124130 (0.0017) [2024-06-15 13:02:25,766][1648981] Fps is (10 sec: 42734.5, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 254312448. Throughput: 0: 12038.7. Samples: 63630336. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:25,767][1648981] Avg episode reward: [(0, '247.400')] [2024-06-15 13:02:25,869][1651669] Updated weights for policy 0, policy_version 124192 (0.0015) [2024-06-15 13:02:28,647][1651669] Updated weights for policy 0, policy_version 124256 (0.0011) [2024-06-15 13:02:30,702][1651669] Updated weights for policy 0, policy_version 124306 (0.0034) [2024-06-15 13:02:30,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 254574592. Throughput: 0: 12037.0. Samples: 63704064. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:30,767][1648981] Avg episode reward: [(0, '246.800')] [2024-06-15 13:02:31,433][1651669] Updated weights for policy 0, policy_version 124347 (0.0017) [2024-06-15 13:02:34,587][1651669] Updated weights for policy 0, policy_version 124407 (0.0014) [2024-06-15 13:02:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 254836736. Throughput: 0: 12160.1. Samples: 63781376. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:35,767][1648981] Avg episode reward: [(0, '241.430')] [2024-06-15 13:02:35,998][1651669] Updated weights for policy 0, policy_version 124448 (0.0017) [2024-06-15 13:02:38,591][1651669] Updated weights for policy 0, policy_version 124482 (0.0014) [2024-06-15 13:02:39,853][1651669] Updated weights for policy 0, policy_version 124535 (0.0013) [2024-06-15 13:02:40,770][1648981] Fps is (10 sec: 49133.3, 60 sec: 49152.0, 300 sec: 47540.8). Total num frames: 255066112. Throughput: 0: 12175.8. Samples: 63820800. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:40,771][1648981] Avg episode reward: [(0, '249.990')] [2024-06-15 13:02:40,993][1651669] Updated weights for policy 0, policy_version 124564 (0.0011) [2024-06-15 13:02:43,755][1651669] Updated weights for policy 0, policy_version 124611 (0.0016) [2024-06-15 13:02:44,986][1651669] Updated weights for policy 0, policy_version 124672 (0.0012) [2024-06-15 13:02:45,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48091.9, 300 sec: 47541.4). Total num frames: 255328256. Throughput: 0: 12208.4. Samples: 63894016. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:45,767][1648981] Avg episode reward: [(0, '245.360')] [2024-06-15 13:02:49,267][1651669] Updated weights for policy 0, policy_version 124738 (0.0014) [2024-06-15 13:02:50,581][1651669] Updated weights for policy 0, policy_version 124790 (0.0030) [2024-06-15 13:02:50,766][1648981] Fps is (10 sec: 49170.7, 60 sec: 49724.7, 300 sec: 47430.3). Total num frames: 255557632. Throughput: 0: 12115.1. Samples: 63963136. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:50,767][1648981] Avg episode reward: [(0, '242.670')] [2024-06-15 13:02:51,976][1651669] Updated weights for policy 0, policy_version 124833 (0.0013) [2024-06-15 13:02:55,150][1651274] Signal inference workers to stop experience collection... (6550 times) [2024-06-15 13:02:55,256][1651669] InferenceWorker_p0-w0: stopping experience collection (6550 times) [2024-06-15 13:02:55,451][1651274] Signal inference workers to resume experience collection... (6550 times) [2024-06-15 13:02:55,453][1651669] InferenceWorker_p0-w0: resuming experience collection (6550 times) [2024-06-15 13:02:55,766][1648981] Fps is (10 sec: 42597.7, 60 sec: 47521.1, 300 sec: 47319.2). Total num frames: 255754240. Throughput: 0: 11980.8. Samples: 63995392. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:02:55,767][1648981] Avg episode reward: [(0, '243.020')] [2024-06-15 13:02:56,186][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000124912_255819776.pth... [2024-06-15 13:02:56,209][1651669] Updated weights for policy 0, policy_version 124912 (0.0134) [2024-06-15 13:02:56,261][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000119344_244416512.pth [2024-06-15 13:02:58,235][1651669] Updated weights for policy 0, policy_version 124948 (0.0013) [2024-06-15 13:02:59,277][1651669] Updated weights for policy 0, policy_version 124990 (0.0012) [2024-06-15 13:03:00,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 47542.0). Total num frames: 256049152. Throughput: 0: 12046.2. Samples: 64068096. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:03:00,767][1648981] Avg episode reward: [(0, '249.000')] [2024-06-15 13:03:01,256][1651669] Updated weights for policy 0, policy_version 125051 (0.0012) [2024-06-15 13:03:03,989][1651669] Updated weights for policy 0, policy_version 125117 (0.0011) [2024-06-15 13:03:05,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 48087.0, 300 sec: 47544.0). Total num frames: 256245760. Throughput: 0: 12128.7. Samples: 64137728. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:03:05,767][1648981] Avg episode reward: [(0, '244.150')] [2024-06-15 13:03:07,476][1651669] Updated weights for policy 0, policy_version 125168 (0.0013) [2024-06-15 13:03:09,497][1651669] Updated weights for policy 0, policy_version 125211 (0.0082) [2024-06-15 13:03:10,346][1651669] Updated weights for policy 0, policy_version 125247 (0.0016) [2024-06-15 13:03:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47763.6). Total num frames: 256507904. Throughput: 0: 12071.8. Samples: 64173568. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:03:10,767][1648981] Avg episode reward: [(0, '236.430')] [2024-06-15 13:03:12,787][1651669] Updated weights for policy 0, policy_version 125309 (0.0011) [2024-06-15 13:03:14,108][1651669] Updated weights for policy 0, policy_version 125351 (0.0012) [2024-06-15 13:03:15,767][1648981] Fps is (10 sec: 52426.2, 60 sec: 48084.9, 300 sec: 47874.5). Total num frames: 256770048. Throughput: 0: 12037.6. Samples: 64245760. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:03:15,767][1648981] Avg episode reward: [(0, '238.510')] [2024-06-15 13:03:17,632][1651669] Updated weights for policy 0, policy_version 125414 (0.0013) [2024-06-15 13:03:19,647][1651669] Updated weights for policy 0, policy_version 125459 (0.0013) [2024-06-15 13:03:20,374][1651669] Updated weights for policy 0, policy_version 125504 (0.0120) [2024-06-15 13:03:20,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 257032192. Throughput: 0: 12117.4. Samples: 64326656. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:03:20,767][1648981] Avg episode reward: [(0, '239.870')] [2024-06-15 13:03:22,566][1651669] Updated weights for policy 0, policy_version 125559 (0.0014) [2024-06-15 13:03:25,766][1648981] Fps is (10 sec: 52430.7, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 257294336. Throughput: 0: 12129.7. Samples: 64366592. Policy #0 lag: (min: 1.0, avg: 103.6, max: 257.0) [2024-06-15 13:03:25,767][1648981] Avg episode reward: [(0, '235.830')] [2024-06-15 13:03:28,585][1651669] Updated weights for policy 0, policy_version 125664 (0.0052) [2024-06-15 13:03:29,219][1651669] Updated weights for policy 0, policy_version 125692 (0.0013) [2024-06-15 13:03:30,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48059.8, 300 sec: 47655.7). Total num frames: 257458176. Throughput: 0: 11969.4. Samples: 64432640. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:03:30,767][1648981] Avg episode reward: [(0, '237.690')] [2024-06-15 13:03:31,291][1651669] Updated weights for policy 0, policy_version 125754 (0.0013) [2024-06-15 13:03:33,717][1651669] Updated weights for policy 0, policy_version 125796 (0.0018) [2024-06-15 13:03:35,120][1651669] Updated weights for policy 0, policy_version 125840 (0.0015) [2024-06-15 13:03:35,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 47771.9). Total num frames: 257753088. Throughput: 0: 12026.3. Samples: 64504320. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:03:35,767][1648981] Avg episode reward: [(0, '232.600')] [2024-06-15 13:03:36,368][1651669] Updated weights for policy 0, policy_version 125886 (0.0013) [2024-06-15 13:03:39,403][1651274] Signal inference workers to stop experience collection... (6600 times) [2024-06-15 13:03:39,492][1651669] InferenceWorker_p0-w0: stopping experience collection (6600 times) [2024-06-15 13:03:39,755][1651274] Signal inference workers to resume experience collection... (6600 times) [2024-06-15 13:03:39,757][1651669] InferenceWorker_p0-w0: resuming experience collection (6600 times) [2024-06-15 13:03:39,760][1651669] Updated weights for policy 0, policy_version 125936 (0.0016) [2024-06-15 13:03:40,771][1648981] Fps is (10 sec: 49131.4, 60 sec: 48059.4, 300 sec: 47541.2). Total num frames: 257949696. Throughput: 0: 12161.8. Samples: 64542720. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:03:40,771][1648981] Avg episode reward: [(0, '234.140')] [2024-06-15 13:03:42,106][1651669] Updated weights for policy 0, policy_version 125984 (0.0012) [2024-06-15 13:03:45,203][1651669] Updated weights for policy 0, policy_version 126067 (0.0087) [2024-06-15 13:03:45,793][1648981] Fps is (10 sec: 45752.4, 60 sec: 48038.2, 300 sec: 47870.3). Total num frames: 258211840. Throughput: 0: 12041.9. Samples: 64610304. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:03:45,796][1648981] Avg episode reward: [(0, '224.320')] [2024-06-15 13:03:46,411][1651669] Updated weights for policy 0, policy_version 126098 (0.0011) [2024-06-15 13:03:47,408][1651669] Updated weights for policy 0, policy_version 126141 (0.0015) [2024-06-15 13:03:50,767][1648981] Fps is (10 sec: 45893.2, 60 sec: 47513.4, 300 sec: 47541.3). Total num frames: 258408448. Throughput: 0: 12128.6. Samples: 64683520. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:03:50,767][1648981] Avg episode reward: [(0, '218.710')] [2024-06-15 13:03:51,047][1651669] Updated weights for policy 0, policy_version 126196 (0.0013) [2024-06-15 13:03:52,310][1651669] Updated weights for policy 0, policy_version 126230 (0.0013) [2024-06-15 13:03:55,050][1651669] Updated weights for policy 0, policy_version 126291 (0.0013) [2024-06-15 13:03:55,766][1648981] Fps is (10 sec: 49284.2, 60 sec: 49152.1, 300 sec: 47986.9). Total num frames: 258703360. Throughput: 0: 12128.7. Samples: 64719360. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:03:55,767][1648981] Avg episode reward: [(0, '223.760')] [2024-06-15 13:03:55,957][1651669] Updated weights for policy 0, policy_version 126336 (0.0034) [2024-06-15 13:03:58,091][1651669] Updated weights for policy 0, policy_version 126391 (0.0012) [2024-06-15 13:04:00,671][1651669] Updated weights for policy 0, policy_version 126432 (0.0017) [2024-06-15 13:04:00,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 258932736. Throughput: 0: 12174.3. Samples: 64793600. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:04:00,767][1648981] Avg episode reward: [(0, '226.130')] [2024-06-15 13:04:02,734][1651669] Updated weights for policy 0, policy_version 126480 (0.0014) [2024-06-15 13:04:03,912][1651669] Updated weights for policy 0, policy_version 126528 (0.0066) [2024-06-15 13:04:05,767][1648981] Fps is (10 sec: 42597.4, 60 sec: 48059.5, 300 sec: 47874.6). Total num frames: 259129344. Throughput: 0: 12014.9. Samples: 64867328. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:04:05,767][1648981] Avg episode reward: [(0, '229.250')] [2024-06-15 13:04:06,827][1651669] Updated weights for policy 0, policy_version 126589 (0.0012) [2024-06-15 13:04:08,630][1651669] Updated weights for policy 0, policy_version 126640 (0.0014) [2024-06-15 13:04:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 259391488. Throughput: 0: 11867.0. Samples: 64900608. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:04:10,767][1648981] Avg episode reward: [(0, '225.430')] [2024-06-15 13:04:11,228][1651669] Updated weights for policy 0, policy_version 126688 (0.0012) [2024-06-15 13:04:14,358][1651669] Updated weights for policy 0, policy_version 126736 (0.0098) [2024-06-15 13:04:15,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48060.1, 300 sec: 47985.7). Total num frames: 259653632. Throughput: 0: 12083.2. Samples: 64976384. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:04:15,767][1648981] Avg episode reward: [(0, '223.860')] [2024-06-15 13:04:16,652][1651669] Updated weights for policy 0, policy_version 126785 (0.0013) [2024-06-15 13:04:18,451][1651669] Updated weights for policy 0, policy_version 126850 (0.0013) [2024-06-15 13:04:19,740][1651669] Updated weights for policy 0, policy_version 126910 (0.0015) [2024-06-15 13:04:20,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48059.5, 300 sec: 47985.7). Total num frames: 259915776. Throughput: 0: 11969.4. Samples: 65042944. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:04:20,768][1648981] Avg episode reward: [(0, '235.430')] [2024-06-15 13:04:23,246][1651669] Updated weights for policy 0, policy_version 126974 (0.0013) [2024-06-15 13:04:25,327][1651274] Signal inference workers to stop experience collection... (6650 times) [2024-06-15 13:04:25,354][1651669] InferenceWorker_p0-w0: stopping experience collection (6650 times) [2024-06-15 13:04:25,504][1651274] Signal inference workers to resume experience collection... (6650 times) [2024-06-15 13:04:25,507][1651669] InferenceWorker_p0-w0: resuming experience collection (6650 times) [2024-06-15 13:04:25,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 46961.4, 300 sec: 47766.1). Total num frames: 260112384. Throughput: 0: 11934.3. Samples: 65079808. Policy #0 lag: (min: 15.0, avg: 117.8, max: 271.0) [2024-06-15 13:04:25,775][1648981] Avg episode reward: [(0, '238.830')] [2024-06-15 13:04:26,008][1651669] Updated weights for policy 0, policy_version 127029 (0.0013) [2024-06-15 13:04:28,172][1651669] Updated weights for policy 0, policy_version 127074 (0.0014) [2024-06-15 13:04:29,219][1651669] Updated weights for policy 0, policy_version 127120 (0.0043) [2024-06-15 13:04:30,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 260440064. Throughput: 0: 12079.0. Samples: 65153536. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:04:30,767][1648981] Avg episode reward: [(0, '244.590')] [2024-06-15 13:04:33,019][1651669] Updated weights for policy 0, policy_version 127184 (0.0019) [2024-06-15 13:04:35,689][1651669] Updated weights for policy 0, policy_version 127233 (0.0013) [2024-06-15 13:04:35,767][1648981] Fps is (10 sec: 45906.1, 60 sec: 46966.6, 300 sec: 47874.5). Total num frames: 260571136. Throughput: 0: 12174.0. Samples: 65231360. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:04:35,768][1648981] Avg episode reward: [(0, '242.320')] [2024-06-15 13:04:38,415][1651669] Updated weights for policy 0, policy_version 127299 (0.0022) [2024-06-15 13:04:39,767][1651669] Updated weights for policy 0, policy_version 127357 (0.0017) [2024-06-15 13:04:40,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 48609.2, 300 sec: 47874.6). Total num frames: 260866048. Throughput: 0: 12162.8. Samples: 65266688. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:04:40,767][1648981] Avg episode reward: [(0, '246.270')] [2024-06-15 13:04:41,560][1651669] Updated weights for policy 0, policy_version 127419 (0.0013) [2024-06-15 13:04:44,836][1651669] Updated weights for policy 0, policy_version 127480 (0.0076) [2024-06-15 13:04:45,767][1648981] Fps is (10 sec: 52433.7, 60 sec: 48081.1, 300 sec: 48208.4). Total num frames: 261095424. Throughput: 0: 11912.5. Samples: 65329664. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:04:45,767][1648981] Avg episode reward: [(0, '240.130')] [2024-06-15 13:04:48,184][1651669] Updated weights for policy 0, policy_version 127547 (0.0032) [2024-06-15 13:04:50,135][1651669] Updated weights for policy 0, policy_version 127584 (0.0019) [2024-06-15 13:04:50,746][1651669] Updated weights for policy 0, policy_version 127616 (0.0041) [2024-06-15 13:04:50,773][1648981] Fps is (10 sec: 49118.6, 60 sec: 49146.6, 300 sec: 48095.7). Total num frames: 261357568. Throughput: 0: 11899.4. Samples: 65402880. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:04:50,774][1648981] Avg episode reward: [(0, '243.160')] [2024-06-15 13:04:52,784][1651669] Updated weights for policy 0, policy_version 127674 (0.0013) [2024-06-15 13:04:55,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 47513.3, 300 sec: 48212.3). Total num frames: 261554176. Throughput: 0: 11969.3. Samples: 65439232. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:04:55,767][1648981] Avg episode reward: [(0, '243.720')] [2024-06-15 13:04:56,228][1651669] Updated weights for policy 0, policy_version 127742 (0.0014) [2024-06-15 13:04:56,244][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000127744_261619712.pth... [2024-06-15 13:04:56,280][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000122048_249954304.pth [2024-06-15 13:04:58,157][1651669] Updated weights for policy 0, policy_version 127798 (0.0014) [2024-06-15 13:05:00,767][1648981] Fps is (10 sec: 45906.2, 60 sec: 48059.6, 300 sec: 48207.8). Total num frames: 261816320. Throughput: 0: 12026.3. Samples: 65517568. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:00,767][1648981] Avg episode reward: [(0, '239.360')] [2024-06-15 13:05:01,288][1651669] Updated weights for policy 0, policy_version 127868 (0.0173) [2024-06-15 13:05:03,930][1651669] Updated weights for policy 0, policy_version 127925 (0.0012) [2024-06-15 13:05:05,774][1648981] Fps is (10 sec: 45841.3, 60 sec: 48053.6, 300 sec: 47985.7). Total num frames: 262012928. Throughput: 0: 12160.8. Samples: 65590272. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:05,775][1648981] Avg episode reward: [(0, '238.540')] [2024-06-15 13:05:07,269][1651669] Updated weights for policy 0, policy_version 127984 (0.0015) [2024-06-15 13:05:08,415][1651274] Signal inference workers to stop experience collection... (6700 times) [2024-06-15 13:05:08,454][1651669] InferenceWorker_p0-w0: stopping experience collection (6700 times) [2024-06-15 13:05:08,622][1651274] Signal inference workers to resume experience collection... (6700 times) [2024-06-15 13:05:08,623][1651669] InferenceWorker_p0-w0: resuming experience collection (6700 times) [2024-06-15 13:05:09,006][1651669] Updated weights for policy 0, policy_version 128048 (0.0014) [2024-06-15 13:05:10,767][1648981] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 262275072. Throughput: 0: 12005.6. Samples: 65619968. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:10,767][1648981] Avg episode reward: [(0, '240.170')] [2024-06-15 13:05:12,031][1651669] Updated weights for policy 0, policy_version 128112 (0.0013) [2024-06-15 13:05:14,995][1651669] Updated weights for policy 0, policy_version 128160 (0.0013) [2024-06-15 13:05:15,766][1648981] Fps is (10 sec: 52470.3, 60 sec: 48059.8, 300 sec: 47986.1). Total num frames: 262537216. Throughput: 0: 12026.3. Samples: 65694720. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:15,767][1648981] Avg episode reward: [(0, '232.600')] [2024-06-15 13:05:17,134][1651669] Updated weights for policy 0, policy_version 128199 (0.0015) [2024-06-15 13:05:18,253][1651669] Updated weights for policy 0, policy_version 128248 (0.0017) [2024-06-15 13:05:19,668][1651669] Updated weights for policy 0, policy_version 128311 (0.0013) [2024-06-15 13:05:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48060.0, 300 sec: 48207.8). Total num frames: 262799360. Throughput: 0: 11833.2. Samples: 65763840. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:20,767][1648981] Avg episode reward: [(0, '239.300')] [2024-06-15 13:05:22,247][1651669] Updated weights for policy 0, policy_version 128354 (0.0013) [2024-06-15 13:05:24,989][1651669] Updated weights for policy 0, policy_version 128387 (0.0078) [2024-06-15 13:05:25,767][1648981] Fps is (10 sec: 45873.5, 60 sec: 48065.7, 300 sec: 48096.7). Total num frames: 262995968. Throughput: 0: 11946.6. Samples: 65804288. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:25,768][1648981] Avg episode reward: [(0, '241.930')] [2024-06-15 13:05:28,089][1651669] Updated weights for policy 0, policy_version 128464 (0.0013) [2024-06-15 13:05:29,262][1651669] Updated weights for policy 0, policy_version 128512 (0.0066) [2024-06-15 13:05:30,494][1651669] Updated weights for policy 0, policy_version 128575 (0.0035) [2024-06-15 13:05:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48542.7). Total num frames: 263323648. Throughput: 0: 12140.1. Samples: 65875968. Policy #0 lag: (min: 62.0, avg: 191.2, max: 313.0) [2024-06-15 13:05:30,767][1648981] Avg episode reward: [(0, '239.880')] [2024-06-15 13:05:33,040][1651669] Updated weights for policy 0, policy_version 128633 (0.0016) [2024-06-15 13:05:35,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 48060.6, 300 sec: 47985.7). Total num frames: 263454720. Throughput: 0: 12437.8. Samples: 65962496. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:05:35,767][1648981] Avg episode reward: [(0, '240.770')] [2024-06-15 13:05:36,747][1651669] Updated weights for policy 0, policy_version 128697 (0.0012) [2024-06-15 13:05:39,727][1651669] Updated weights for policy 0, policy_version 128742 (0.0012) [2024-06-15 13:05:40,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 263749632. Throughput: 0: 12265.4. Samples: 65991168. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:05:40,767][1648981] Avg episode reward: [(0, '239.750')] [2024-06-15 13:05:41,061][1651669] Updated weights for policy 0, policy_version 128802 (0.0012) [2024-06-15 13:05:43,466][1651669] Updated weights for policy 0, policy_version 128854 (0.0013) [2024-06-15 13:05:44,342][1651669] Updated weights for policy 0, policy_version 128896 (0.0012) [2024-06-15 13:05:45,810][1648981] Fps is (10 sec: 52199.5, 60 sec: 48024.7, 300 sec: 48200.7). Total num frames: 263979008. Throughput: 0: 11980.5. Samples: 66057216. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:05:45,811][1648981] Avg episode reward: [(0, '239.460')] [2024-06-15 13:05:48,138][1651669] Updated weights for policy 0, policy_version 128953 (0.0015) [2024-06-15 13:05:49,835][1651669] Updated weights for policy 0, policy_version 128995 (0.0013) [2024-06-15 13:05:50,135][1651274] Signal inference workers to stop experience collection... (6750 times) [2024-06-15 13:05:50,193][1651669] InferenceWorker_p0-w0: stopping experience collection (6750 times) [2024-06-15 13:05:50,442][1651274] Signal inference workers to resume experience collection... (6750 times) [2024-06-15 13:05:50,442][1651669] InferenceWorker_p0-w0: resuming experience collection (6750 times) [2024-06-15 13:05:50,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 48065.2, 300 sec: 48430.6). Total num frames: 264241152. Throughput: 0: 12085.3. Samples: 66134016. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:05:50,767][1648981] Avg episode reward: [(0, '238.300')] [2024-06-15 13:05:51,343][1651669] Updated weights for policy 0, policy_version 129057 (0.0015) [2024-06-15 13:05:53,830][1651669] Updated weights for policy 0, policy_version 129107 (0.0013) [2024-06-15 13:05:55,766][1648981] Fps is (10 sec: 52659.5, 60 sec: 49152.3, 300 sec: 48434.8). Total num frames: 264503296. Throughput: 0: 12265.2. Samples: 66171904. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:05:55,767][1648981] Avg episode reward: [(0, '240.200')] [2024-06-15 13:05:57,840][1651669] Updated weights for policy 0, policy_version 129168 (0.0013) [2024-06-15 13:06:00,212][1651669] Updated weights for policy 0, policy_version 129232 (0.0087) [2024-06-15 13:06:00,794][1648981] Fps is (10 sec: 45749.1, 60 sec: 48037.7, 300 sec: 48203.3). Total num frames: 264699904. Throughput: 0: 12291.8. Samples: 66248192. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:00,795][1648981] Avg episode reward: [(0, '248.020')] [2024-06-15 13:06:02,071][1651669] Updated weights for policy 0, policy_version 129312 (0.0013) [2024-06-15 13:06:04,979][1651669] Updated weights for policy 0, policy_version 129393 (0.0129) [2024-06-15 13:06:05,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 50250.4, 300 sec: 48429.9). Total num frames: 265027584. Throughput: 0: 12105.8. Samples: 66308608. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:05,767][1648981] Avg episode reward: [(0, '251.170')] [2024-06-15 13:06:10,105][1651669] Updated weights for policy 0, policy_version 129462 (0.0134) [2024-06-15 13:06:10,767][1648981] Fps is (10 sec: 46002.1, 60 sec: 48059.7, 300 sec: 47986.3). Total num frames: 265158656. Throughput: 0: 12333.6. Samples: 66359296. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:10,767][1648981] Avg episode reward: [(0, '251.100')] [2024-06-15 13:06:11,311][1651669] Updated weights for policy 0, policy_version 129504 (0.0013) [2024-06-15 13:06:13,160][1651669] Updated weights for policy 0, policy_version 129568 (0.0161) [2024-06-15 13:06:14,969][1651669] Updated weights for policy 0, policy_version 129601 (0.0012) [2024-06-15 13:06:15,766][1648981] Fps is (10 sec: 45877.7, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 265486336. Throughput: 0: 12049.1. Samples: 66418176. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:15,767][1648981] Avg episode reward: [(0, '240.970')] [2024-06-15 13:06:16,120][1651669] Updated weights for policy 0, policy_version 129658 (0.0012) [2024-06-15 13:06:20,768][1648981] Fps is (10 sec: 45869.3, 60 sec: 46966.3, 300 sec: 47985.5). Total num frames: 265617408. Throughput: 0: 11946.3. Samples: 66500096. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:20,768][1648981] Avg episode reward: [(0, '242.320')] [2024-06-15 13:06:21,224][1651669] Updated weights for policy 0, policy_version 129728 (0.0058) [2024-06-15 13:06:22,662][1651669] Updated weights for policy 0, policy_version 129783 (0.0014) [2024-06-15 13:06:24,116][1651669] Updated weights for policy 0, policy_version 129809 (0.0013) [2024-06-15 13:06:25,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49698.4, 300 sec: 48541.1). Total num frames: 265977856. Throughput: 0: 12060.4. Samples: 66533888. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:25,767][1648981] Avg episode reward: [(0, '242.410')] [2024-06-15 13:06:26,276][1651669] Updated weights for policy 0, policy_version 129891 (0.0013) [2024-06-15 13:06:30,779][1648981] Fps is (10 sec: 45824.9, 60 sec: 45865.7, 300 sec: 47872.6). Total num frames: 266076160. Throughput: 0: 12171.4. Samples: 66604544. Policy #0 lag: (min: 6.0, avg: 133.5, max: 262.0) [2024-06-15 13:06:30,779][1648981] Avg episode reward: [(0, '245.010')] [2024-06-15 13:06:31,373][1651669] Updated weights for policy 0, policy_version 129939 (0.0021) [2024-06-15 13:06:31,759][1651274] Signal inference workers to stop experience collection... (6800 times) [2024-06-15 13:06:31,813][1651669] InferenceWorker_p0-w0: stopping experience collection (6800 times) [2024-06-15 13:06:32,043][1651274] Signal inference workers to resume experience collection... (6800 times) [2024-06-15 13:06:32,043][1651669] InferenceWorker_p0-w0: resuming experience collection (6800 times) [2024-06-15 13:06:33,166][1651669] Updated weights for policy 0, policy_version 130016 (0.0012) [2024-06-15 13:06:35,097][1651669] Updated weights for policy 0, policy_version 130066 (0.0014) [2024-06-15 13:06:35,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49698.1, 300 sec: 48541.7). Total num frames: 266436608. Throughput: 0: 12060.5. Samples: 66676736. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:06:35,767][1648981] Avg episode reward: [(0, '255.960')] [2024-06-15 13:06:36,215][1651274] Saving new best policy, reward=255.960! [2024-06-15 13:06:37,286][1651669] Updated weights for policy 0, policy_version 130146 (0.0012) [2024-06-15 13:06:37,933][1651669] Updated weights for policy 0, policy_version 130176 (0.0014) [2024-06-15 13:06:40,766][1648981] Fps is (10 sec: 52493.8, 60 sec: 47513.6, 300 sec: 47992.2). Total num frames: 266600448. Throughput: 0: 11776.0. Samples: 66701824. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:06:40,767][1648981] Avg episode reward: [(0, '262.090')] [2024-06-15 13:06:40,771][1651274] Saving new best policy, reward=262.090! [2024-06-15 13:06:43,489][1651669] Updated weights for policy 0, policy_version 130226 (0.0013) [2024-06-15 13:06:45,770][1648981] Fps is (10 sec: 42583.6, 60 sec: 48092.1, 300 sec: 48434.7). Total num frames: 266862592. Throughput: 0: 11918.9. Samples: 66784256. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:06:45,770][1648981] Avg episode reward: [(0, '265.370')] [2024-06-15 13:06:45,771][1651274] Saving new best policy, reward=265.370! [2024-06-15 13:06:46,288][1651669] Updated weights for policy 0, policy_version 130305 (0.0013) [2024-06-15 13:06:48,019][1651669] Updated weights for policy 0, policy_version 130384 (0.0012) [2024-06-15 13:06:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 48209.4). Total num frames: 267124736. Throughput: 0: 11912.7. Samples: 66844672. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:06:50,767][1648981] Avg episode reward: [(0, '259.600')] [2024-06-15 13:06:53,424][1651669] Updated weights for policy 0, policy_version 130448 (0.0013) [2024-06-15 13:06:54,916][1651669] Updated weights for policy 0, policy_version 130503 (0.0015) [2024-06-15 13:06:55,767][1648981] Fps is (10 sec: 45890.7, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 267321344. Throughput: 0: 11889.8. Samples: 66894336. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:06:55,767][1648981] Avg episode reward: [(0, '258.470')] [2024-06-15 13:06:56,273][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000130560_267386880.pth... [2024-06-15 13:06:56,340][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000124912_255819776.pth [2024-06-15 13:06:57,707][1651669] Updated weights for policy 0, policy_version 130562 (0.0030) [2024-06-15 13:06:59,698][1651669] Updated weights for policy 0, policy_version 130646 (0.0013) [2024-06-15 13:07:00,782][1648981] Fps is (10 sec: 52345.7, 60 sec: 49161.7, 300 sec: 48433.0). Total num frames: 267649024. Throughput: 0: 11760.5. Samples: 66947584. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:00,783][1648981] Avg episode reward: [(0, '255.770')] [2024-06-15 13:07:04,198][1651669] Updated weights for policy 0, policy_version 130692 (0.0025) [2024-06-15 13:07:05,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 45875.6, 300 sec: 47985.7). Total num frames: 267780096. Throughput: 0: 11708.1. Samples: 67026944. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:05,767][1648981] Avg episode reward: [(0, '260.420')] [2024-06-15 13:07:06,025][1651669] Updated weights for policy 0, policy_version 130769 (0.0013) [2024-06-15 13:07:07,106][1651669] Updated weights for policy 0, policy_version 130813 (0.0012) [2024-06-15 13:07:09,772][1651669] Updated weights for policy 0, policy_version 130875 (0.0017) [2024-06-15 13:07:10,105][1651274] Signal inference workers to stop experience collection... (6850 times) [2024-06-15 13:07:10,221][1651669] InferenceWorker_p0-w0: stopping experience collection (6850 times) [2024-06-15 13:07:10,408][1651274] Signal inference workers to resume experience collection... (6850 times) [2024-06-15 13:07:10,409][1651669] InferenceWorker_p0-w0: resuming experience collection (6850 times) [2024-06-15 13:07:10,766][1648981] Fps is (10 sec: 42666.2, 60 sec: 48606.0, 300 sec: 48102.0). Total num frames: 268075008. Throughput: 0: 11764.6. Samples: 67063296. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:10,767][1648981] Avg episode reward: [(0, '262.040')] [2024-06-15 13:07:11,336][1651669] Updated weights for policy 0, policy_version 130934 (0.0014) [2024-06-15 13:07:15,737][1651669] Updated weights for policy 0, policy_version 130976 (0.0159) [2024-06-15 13:07:15,781][1648981] Fps is (10 sec: 45808.8, 60 sec: 45864.1, 300 sec: 47761.2). Total num frames: 268238848. Throughput: 0: 11855.1. Samples: 67138048. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:15,782][1648981] Avg episode reward: [(0, '266.860')] [2024-06-15 13:07:16,270][1651274] Saving new best policy, reward=266.860! [2024-06-15 13:07:17,856][1651669] Updated weights for policy 0, policy_version 131042 (0.0014) [2024-06-15 13:07:19,406][1651669] Updated weights for policy 0, policy_version 131099 (0.0014) [2024-06-15 13:07:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49153.2, 300 sec: 48318.9). Total num frames: 268566528. Throughput: 0: 11764.6. Samples: 67206144. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:20,767][1648981] Avg episode reward: [(0, '268.300')] [2024-06-15 13:07:20,768][1651274] Saving new best policy, reward=268.300! [2024-06-15 13:07:21,402][1651669] Updated weights for policy 0, policy_version 131152 (0.0014) [2024-06-15 13:07:25,704][1651669] Updated weights for policy 0, policy_version 131218 (0.0016) [2024-06-15 13:07:25,766][1648981] Fps is (10 sec: 49223.0, 60 sec: 45875.1, 300 sec: 47985.7). Total num frames: 268730368. Throughput: 0: 11946.7. Samples: 67239424. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:25,767][1648981] Avg episode reward: [(0, '276.580')] [2024-06-15 13:07:26,422][1651274] Saving new best policy, reward=276.580! [2024-06-15 13:07:26,725][1651669] Updated weights for policy 0, policy_version 131257 (0.0012) [2024-06-15 13:07:28,977][1651669] Updated weights for policy 0, policy_version 131297 (0.0013) [2024-06-15 13:07:30,254][1651669] Updated weights for policy 0, policy_version 131344 (0.0014) [2024-06-15 13:07:30,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 49162.1, 300 sec: 48096.8). Total num frames: 269025280. Throughput: 0: 11788.3. Samples: 67314688. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:30,767][1648981] Avg episode reward: [(0, '269.670')] [2024-06-15 13:07:32,715][1651669] Updated weights for policy 0, policy_version 131424 (0.0014) [2024-06-15 13:07:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 46421.3, 300 sec: 47986.3). Total num frames: 269221888. Throughput: 0: 12037.7. Samples: 67386368. Policy #0 lag: (min: 15.0, avg: 141.3, max: 271.0) [2024-06-15 13:07:35,767][1648981] Avg episode reward: [(0, '261.550')] [2024-06-15 13:07:36,460][1651669] Updated weights for policy 0, policy_version 131488 (0.0014) [2024-06-15 13:07:39,573][1651669] Updated weights for policy 0, policy_version 131536 (0.0012) [2024-06-15 13:07:40,769][1648981] Fps is (10 sec: 45865.3, 60 sec: 48058.0, 300 sec: 47985.3). Total num frames: 269484032. Throughput: 0: 11877.9. Samples: 67428864. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:07:40,769][1648981] Avg episode reward: [(0, '262.680')] [2024-06-15 13:07:41,212][1651669] Updated weights for policy 0, policy_version 131603 (0.0012) [2024-06-15 13:07:42,251][1651669] Updated weights for policy 0, policy_version 131652 (0.0013) [2024-06-15 13:07:43,315][1651669] Updated weights for policy 0, policy_version 131709 (0.0013) [2024-06-15 13:07:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48062.5, 300 sec: 48096.8). Total num frames: 269746176. Throughput: 0: 12155.7. Samples: 67494400. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:07:45,767][1648981] Avg episode reward: [(0, '272.490')] [2024-06-15 13:07:50,162][1651669] Updated weights for policy 0, policy_version 131777 (0.0014) [2024-06-15 13:07:50,766][1648981] Fps is (10 sec: 45885.6, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 269942784. Throughput: 0: 12071.8. Samples: 67570176. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:07:50,767][1648981] Avg episode reward: [(0, '274.800')] [2024-06-15 13:07:51,182][1651669] Updated weights for policy 0, policy_version 131835 (0.0015) [2024-06-15 13:07:53,256][1651274] Signal inference workers to stop experience collection... (6900 times) [2024-06-15 13:07:53,317][1651669] InferenceWorker_p0-w0: stopping experience collection (6900 times) [2024-06-15 13:07:53,438][1651274] Signal inference workers to resume experience collection... (6900 times) [2024-06-15 13:07:53,438][1651669] InferenceWorker_p0-w0: resuming experience collection (6900 times) [2024-06-15 13:07:53,441][1651669] Updated weights for policy 0, policy_version 131920 (0.0095) [2024-06-15 13:07:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 48207.8). Total num frames: 270270464. Throughput: 0: 11969.4. Samples: 67601920. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:07:55,767][1648981] Avg episode reward: [(0, '271.680')] [2024-06-15 13:07:57,841][1651669] Updated weights for policy 0, policy_version 131969 (0.0014) [2024-06-15 13:08:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 45887.3, 300 sec: 47985.7). Total num frames: 270401536. Throughput: 0: 11870.8. Samples: 67672064. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:00,767][1648981] Avg episode reward: [(0, '278.660')] [2024-06-15 13:08:00,768][1651274] Saving new best policy, reward=278.660! [2024-06-15 13:08:01,857][1651669] Updated weights for policy 0, policy_version 132033 (0.0018) [2024-06-15 13:08:03,140][1651669] Updated weights for policy 0, policy_version 132096 (0.0013) [2024-06-15 13:08:04,833][1651669] Updated weights for policy 0, policy_version 132169 (0.0014) [2024-06-15 13:08:05,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 48318.9). Total num frames: 270761984. Throughput: 0: 11946.7. Samples: 67743744. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:05,767][1648981] Avg episode reward: [(0, '269.610')] [2024-06-15 13:08:05,841][1651669] Updated weights for policy 0, policy_version 132213 (0.0013) [2024-06-15 13:08:07,877][1651669] Updated weights for policy 0, policy_version 132229 (0.0011) [2024-06-15 13:08:08,926][1651669] Updated weights for policy 0, policy_version 132279 (0.0012) [2024-06-15 13:08:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 270925824. Throughput: 0: 12151.5. Samples: 67786240. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:10,767][1648981] Avg episode reward: [(0, '273.160')] [2024-06-15 13:08:13,173][1651669] Updated weights for policy 0, policy_version 132339 (0.0140) [2024-06-15 13:08:14,521][1651669] Updated weights for policy 0, policy_version 132415 (0.0022) [2024-06-15 13:08:15,814][1648981] Fps is (10 sec: 48920.7, 60 sec: 50216.9, 300 sec: 48200.1). Total num frames: 271253504. Throughput: 0: 12116.0. Samples: 67860480. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:15,814][1648981] Avg episode reward: [(0, '264.220')] [2024-06-15 13:08:16,225][1651669] Updated weights for policy 0, policy_version 132472 (0.0012) [2024-06-15 13:08:19,428][1651669] Updated weights for policy 0, policy_version 132516 (0.0015) [2024-06-15 13:08:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 271450112. Throughput: 0: 12197.0. Samples: 67935232. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:20,767][1648981] Avg episode reward: [(0, '267.410')] [2024-06-15 13:08:23,195][1651669] Updated weights for policy 0, policy_version 132565 (0.0013) [2024-06-15 13:08:24,559][1651669] Updated weights for policy 0, policy_version 132625 (0.0013) [2024-06-15 13:08:25,766][1648981] Fps is (10 sec: 46092.6, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 271712256. Throughput: 0: 12197.6. Samples: 67977728. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:25,767][1648981] Avg episode reward: [(0, '265.990')] [2024-06-15 13:08:26,256][1651669] Updated weights for policy 0, policy_version 132693 (0.0014) [2024-06-15 13:08:27,189][1651669] Updated weights for policy 0, policy_version 132732 (0.0016) [2024-06-15 13:08:30,774][1648981] Fps is (10 sec: 49113.3, 60 sec: 48599.6, 300 sec: 48095.5). Total num frames: 271941632. Throughput: 0: 12285.9. Samples: 68047360. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:30,775][1648981] Avg episode reward: [(0, '250.100')] [2024-06-15 13:08:30,906][1651669] Updated weights for policy 0, policy_version 132795 (0.0026) [2024-06-15 13:08:34,588][1651669] Updated weights for policy 0, policy_version 132864 (0.0014) [2024-06-15 13:08:34,721][1651274] Signal inference workers to stop experience collection... (6950 times) [2024-06-15 13:08:34,799][1651669] InferenceWorker_p0-w0: stopping experience collection (6950 times) [2024-06-15 13:08:34,946][1651274] Signal inference workers to resume experience collection... (6950 times) [2024-06-15 13:08:34,946][1651669] InferenceWorker_p0-w0: resuming experience collection (6950 times) [2024-06-15 13:08:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 48430.7). Total num frames: 272236544. Throughput: 0: 12071.8. Samples: 68113408. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:35,767][1648981] Avg episode reward: [(0, '244.260')] [2024-06-15 13:08:37,044][1651669] Updated weights for policy 0, policy_version 132930 (0.0013) [2024-06-15 13:08:38,105][1651669] Updated weights for policy 0, policy_version 132992 (0.0105) [2024-06-15 13:08:40,786][1648981] Fps is (10 sec: 42545.9, 60 sec: 48045.4, 300 sec: 47986.8). Total num frames: 272367616. Throughput: 0: 12146.0. Samples: 68148736. Policy #0 lag: (min: 10.0, avg: 100.3, max: 266.0) [2024-06-15 13:08:40,787][1648981] Avg episode reward: [(0, '239.950')] [2024-06-15 13:08:42,024][1651669] Updated weights for policy 0, policy_version 133052 (0.0015) [2024-06-15 13:08:45,766][1648981] Fps is (10 sec: 36044.7, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 272596992. Throughput: 0: 12299.4. Samples: 68225536. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:08:45,767][1648981] Avg episode reward: [(0, '236.930')] [2024-06-15 13:08:46,173][1651669] Updated weights for policy 0, policy_version 133124 (0.0012) [2024-06-15 13:08:48,257][1651669] Updated weights for policy 0, policy_version 133189 (0.0012) [2024-06-15 13:08:50,767][1648981] Fps is (10 sec: 52534.1, 60 sec: 49151.9, 300 sec: 48096.7). Total num frames: 272891904. Throughput: 0: 12094.5. Samples: 68288000. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:08:50,767][1648981] Avg episode reward: [(0, '248.240')] [2024-06-15 13:08:51,761][1651669] Updated weights for policy 0, policy_version 133254 (0.0016) [2024-06-15 13:08:55,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 45875.0, 300 sec: 47763.5). Total num frames: 273022976. Throughput: 0: 11901.1. Samples: 68321792. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:08:55,767][1648981] Avg episode reward: [(0, '245.820')] [2024-06-15 13:08:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000133312_273022976.pth... [2024-06-15 13:08:56,006][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000127744_261619712.pth [2024-06-15 13:08:56,424][1651669] Updated weights for policy 0, policy_version 133329 (0.0014) [2024-06-15 13:08:57,809][1651669] Updated weights for policy 0, policy_version 133395 (0.0011) [2024-06-15 13:08:58,673][1651669] Updated weights for policy 0, policy_version 133440 (0.0013) [2024-06-15 13:09:00,625][1651669] Updated weights for policy 0, policy_version 133492 (0.0014) [2024-06-15 13:09:00,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.1, 300 sec: 48319.0). Total num frames: 273383424. Throughput: 0: 11913.7. Samples: 68396032. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:00,767][1648981] Avg episode reward: [(0, '240.960')] [2024-06-15 13:09:03,492][1651669] Updated weights for policy 0, policy_version 133552 (0.0012) [2024-06-15 13:09:05,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 273547264. Throughput: 0: 11946.6. Samples: 68472832. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:05,767][1648981] Avg episode reward: [(0, '242.720')] [2024-06-15 13:09:07,332][1651669] Updated weights for policy 0, policy_version 133589 (0.0013) [2024-06-15 13:09:08,760][1651669] Updated weights for policy 0, policy_version 133654 (0.0246) [2024-06-15 13:09:10,591][1651669] Updated weights for policy 0, policy_version 133713 (0.0014) [2024-06-15 13:09:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 273842176. Throughput: 0: 11832.9. Samples: 68510208. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:10,767][1648981] Avg episode reward: [(0, '240.660')] [2024-06-15 13:09:14,111][1651669] Updated weights for policy 0, policy_version 133792 (0.0014) [2024-06-15 13:09:15,767][1648981] Fps is (10 sec: 52425.0, 60 sec: 47003.9, 300 sec: 47985.6). Total num frames: 274071552. Throughput: 0: 11698.2. Samples: 68573696. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:15,768][1648981] Avg episode reward: [(0, '249.520')] [2024-06-15 13:09:18,091][1651669] Updated weights for policy 0, policy_version 133825 (0.0013) [2024-06-15 13:09:18,851][1651274] Signal inference workers to stop experience collection... (7000 times) [2024-06-15 13:09:18,909][1651669] InferenceWorker_p0-w0: stopping experience collection (7000 times) [2024-06-15 13:09:19,117][1651274] Signal inference workers to resume experience collection... (7000 times) [2024-06-15 13:09:19,134][1651669] InferenceWorker_p0-w0: resuming experience collection (7000 times) [2024-06-15 13:09:19,646][1651669] Updated weights for policy 0, policy_version 133888 (0.0011) [2024-06-15 13:09:20,664][1651669] Updated weights for policy 0, policy_version 133936 (0.0012) [2024-06-15 13:09:20,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.5, 300 sec: 48098.0). Total num frames: 274300928. Throughput: 0: 11798.7. Samples: 68644352. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:20,767][1648981] Avg episode reward: [(0, '264.820')] [2024-06-15 13:09:22,814][1651669] Updated weights for policy 0, policy_version 133984 (0.0039) [2024-06-15 13:09:24,932][1651669] Updated weights for policy 0, policy_version 134037 (0.0012) [2024-06-15 13:09:25,766][1648981] Fps is (10 sec: 49156.0, 60 sec: 47513.7, 300 sec: 47874.6). Total num frames: 274563072. Throughput: 0: 11804.0. Samples: 68679680. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:25,767][1648981] Avg episode reward: [(0, '265.280')] [2024-06-15 13:09:29,423][1651669] Updated weights for policy 0, policy_version 134096 (0.0024) [2024-06-15 13:09:30,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 46427.1, 300 sec: 47985.8). Total num frames: 274726912. Throughput: 0: 11673.5. Samples: 68750848. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:30,768][1648981] Avg episode reward: [(0, '275.870')] [2024-06-15 13:09:31,094][1651669] Updated weights for policy 0, policy_version 134164 (0.0012) [2024-06-15 13:09:31,735][1651669] Updated weights for policy 0, policy_version 134205 (0.0013) [2024-06-15 13:09:34,841][1651669] Updated weights for policy 0, policy_version 134268 (0.0012) [2024-06-15 13:09:35,793][1648981] Fps is (10 sec: 42484.4, 60 sec: 45854.7, 300 sec: 47870.3). Total num frames: 274989056. Throughput: 0: 11882.7. Samples: 68823040. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:35,794][1648981] Avg episode reward: [(0, '275.230')] [2024-06-15 13:09:37,098][1651669] Updated weights for policy 0, policy_version 134328 (0.0014) [2024-06-15 13:09:40,731][1651669] Updated weights for policy 0, policy_version 134368 (0.0029) [2024-06-15 13:09:40,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 46983.3, 300 sec: 47763.6). Total num frames: 275185664. Throughput: 0: 11912.6. Samples: 68857856. Policy #0 lag: (min: 58.0, avg: 169.6, max: 314.0) [2024-06-15 13:09:40,767][1648981] Avg episode reward: [(0, '271.100')] [2024-06-15 13:09:42,502][1651669] Updated weights for policy 0, policy_version 134448 (0.0012) [2024-06-15 13:09:45,635][1651669] Updated weights for policy 0, policy_version 134497 (0.0013) [2024-06-15 13:09:45,766][1648981] Fps is (10 sec: 45998.4, 60 sec: 47513.6, 300 sec: 47764.6). Total num frames: 275447808. Throughput: 0: 11810.1. Samples: 68927488. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:09:45,767][1648981] Avg episode reward: [(0, '286.060')] [2024-06-15 13:09:46,168][1651274] Saving new best policy, reward=286.060! [2024-06-15 13:09:46,816][1651669] Updated weights for policy 0, policy_version 134532 (0.0016) [2024-06-15 13:09:47,994][1651669] Updated weights for policy 0, policy_version 134592 (0.0087) [2024-06-15 13:09:50,770][1648981] Fps is (10 sec: 45858.7, 60 sec: 45872.5, 300 sec: 47763.0). Total num frames: 275644416. Throughput: 0: 11888.8. Samples: 69007872. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:09:50,771][1648981] Avg episode reward: [(0, '281.520')] [2024-06-15 13:09:51,938][1651669] Updated weights for policy 0, policy_version 134640 (0.0012) [2024-06-15 13:09:53,394][1651669] Updated weights for policy 0, policy_version 134720 (0.0013) [2024-06-15 13:09:55,766][1648981] Fps is (10 sec: 55705.8, 60 sec: 49698.4, 300 sec: 48096.8). Total num frames: 276004864. Throughput: 0: 11810.1. Samples: 69041664. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:09:55,767][1648981] Avg episode reward: [(0, '278.930')] [2024-06-15 13:09:55,885][1651669] Updated weights for policy 0, policy_version 134779 (0.0080) [2024-06-15 13:09:58,541][1651669] Updated weights for policy 0, policy_version 134848 (0.0014) [2024-06-15 13:10:00,770][1648981] Fps is (10 sec: 52426.4, 60 sec: 46418.2, 300 sec: 47986.3). Total num frames: 276168704. Throughput: 0: 12127.8. Samples: 69119488. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:00,771][1648981] Avg episode reward: [(0, '266.790')] [2024-06-15 13:10:00,956][1651274] Signal inference workers to stop experience collection... (7050 times) [2024-06-15 13:10:00,993][1651669] InferenceWorker_p0-w0: stopping experience collection (7050 times) [2024-06-15 13:10:01,195][1651274] Signal inference workers to resume experience collection... (7050 times) [2024-06-15 13:10:01,196][1651669] InferenceWorker_p0-w0: resuming experience collection (7050 times) [2024-06-15 13:10:02,210][1651669] Updated weights for policy 0, policy_version 134910 (0.0013) [2024-06-15 13:10:03,553][1651669] Updated weights for policy 0, policy_version 134969 (0.0013) [2024-06-15 13:10:05,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 276463616. Throughput: 0: 12117.3. Samples: 69189632. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:05,767][1648981] Avg episode reward: [(0, '262.470')] [2024-06-15 13:10:06,528][1651669] Updated weights for policy 0, policy_version 135024 (0.0036) [2024-06-15 13:10:08,915][1651669] Updated weights for policy 0, policy_version 135063 (0.0032) [2024-06-15 13:10:09,807][1651669] Updated weights for policy 0, policy_version 135104 (0.0015) [2024-06-15 13:10:10,770][1648981] Fps is (10 sec: 52429.5, 60 sec: 47510.5, 300 sec: 47985.0). Total num frames: 276692992. Throughput: 0: 12116.3. Samples: 69224960. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:10,771][1648981] Avg episode reward: [(0, '256.730')] [2024-06-15 13:10:12,481][1651669] Updated weights for policy 0, policy_version 135168 (0.0016) [2024-06-15 13:10:14,484][1651669] Updated weights for policy 0, policy_version 135232 (0.0014) [2024-06-15 13:10:15,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48060.4, 300 sec: 47985.7). Total num frames: 276955136. Throughput: 0: 12140.2. Samples: 69297152. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:15,767][1648981] Avg episode reward: [(0, '251.270')] [2024-06-15 13:10:17,309][1651669] Updated weights for policy 0, policy_version 135288 (0.0014) [2024-06-15 13:10:20,629][1651669] Updated weights for policy 0, policy_version 135344 (0.0019) [2024-06-15 13:10:20,766][1648981] Fps is (10 sec: 49171.1, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 277184512. Throughput: 0: 12181.5. Samples: 69370880. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:20,767][1648981] Avg episode reward: [(0, '246.780')] [2024-06-15 13:10:22,620][1651669] Updated weights for policy 0, policy_version 135396 (0.0017) [2024-06-15 13:10:25,726][1651669] Updated weights for policy 0, policy_version 135472 (0.0053) [2024-06-15 13:10:25,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 48059.4, 300 sec: 47874.5). Total num frames: 277446656. Throughput: 0: 12196.9. Samples: 69406720. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:25,768][1648981] Avg episode reward: [(0, '252.070')] [2024-06-15 13:10:27,445][1651669] Updated weights for policy 0, policy_version 135521 (0.0013) [2024-06-15 13:10:30,790][1648981] Fps is (10 sec: 42497.4, 60 sec: 48040.9, 300 sec: 47981.8). Total num frames: 277610496. Throughput: 0: 12190.5. Samples: 69476352. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:30,791][1648981] Avg episode reward: [(0, '261.430')] [2024-06-15 13:10:31,885][1651669] Updated weights for policy 0, policy_version 135586 (0.0021) [2024-06-15 13:10:32,989][1651669] Updated weights for policy 0, policy_version 135648 (0.0012) [2024-06-15 13:10:35,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 48081.2, 300 sec: 47874.6). Total num frames: 277872640. Throughput: 0: 12061.4. Samples: 69550592. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:35,767][1648981] Avg episode reward: [(0, '261.080')] [2024-06-15 13:10:36,567][1651669] Updated weights for policy 0, policy_version 135712 (0.0014) [2024-06-15 13:10:38,242][1651669] Updated weights for policy 0, policy_version 135760 (0.0014) [2024-06-15 13:10:40,766][1648981] Fps is (10 sec: 52554.1, 60 sec: 49152.0, 300 sec: 47992.8). Total num frames: 278134784. Throughput: 0: 12071.8. Samples: 69584896. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:40,767][1648981] Avg episode reward: [(0, '252.510')] [2024-06-15 13:10:42,255][1651669] Updated weights for policy 0, policy_version 135809 (0.0125) [2024-06-15 13:10:43,322][1651274] Signal inference workers to stop experience collection... (7100 times) [2024-06-15 13:10:43,345][1651669] InferenceWorker_p0-w0: stopping experience collection (7100 times) [2024-06-15 13:10:43,494][1651274] Signal inference workers to resume experience collection... (7100 times) [2024-06-15 13:10:43,495][1651669] InferenceWorker_p0-w0: resuming experience collection (7100 times) [2024-06-15 13:10:43,939][1651669] Updated weights for policy 0, policy_version 135894 (0.0129) [2024-06-15 13:10:45,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 49151.9, 300 sec: 47985.7). Total num frames: 278396928. Throughput: 0: 11765.6. Samples: 69648896. Policy #0 lag: (min: 11.0, avg: 123.9, max: 267.0) [2024-06-15 13:10:45,767][1648981] Avg episode reward: [(0, '254.830')] [2024-06-15 13:10:47,419][1651669] Updated weights for policy 0, policy_version 135968 (0.0012) [2024-06-15 13:10:48,228][1651669] Updated weights for policy 0, policy_version 136000 (0.0014) [2024-06-15 13:10:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49701.1, 300 sec: 47874.6). Total num frames: 278626304. Throughput: 0: 11901.2. Samples: 69725184. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:10:50,767][1648981] Avg episode reward: [(0, '253.260')] [2024-06-15 13:10:50,921][1651669] Updated weights for policy 0, policy_version 136056 (0.0011) [2024-06-15 13:10:54,281][1651669] Updated weights for policy 0, policy_version 136128 (0.0102) [2024-06-15 13:10:55,352][1651669] Updated weights for policy 0, policy_version 136185 (0.0012) [2024-06-15 13:10:55,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48605.9, 300 sec: 48212.4). Total num frames: 278921216. Throughput: 0: 11981.8. Samples: 69764096. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:10:55,767][1648981] Avg episode reward: [(0, '256.120')] [2024-06-15 13:10:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000136192_278921216.pth... [2024-06-15 13:10:55,834][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000130560_267386880.pth [2024-06-15 13:10:55,838][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000136192_278921216.pth [2024-06-15 13:10:58,931][1651669] Updated weights for policy 0, policy_version 136227 (0.0011) [2024-06-15 13:11:00,775][1648981] Fps is (10 sec: 49108.4, 60 sec: 49148.0, 300 sec: 47762.2). Total num frames: 279117824. Throughput: 0: 12046.7. Samples: 69839360. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:00,776][1648981] Avg episode reward: [(0, '261.580')] [2024-06-15 13:11:00,815][1651669] Updated weights for policy 0, policy_version 136304 (0.0125) [2024-06-15 13:11:04,301][1651669] Updated weights for policy 0, policy_version 136341 (0.0014) [2024-06-15 13:11:05,475][1651669] Updated weights for policy 0, policy_version 136400 (0.0014) [2024-06-15 13:11:05,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 279347200. Throughput: 0: 12037.7. Samples: 69912576. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:05,767][1648981] Avg episode reward: [(0, '254.990')] [2024-06-15 13:11:08,446][1651669] Updated weights for policy 0, policy_version 136451 (0.0013) [2024-06-15 13:11:09,622][1651669] Updated weights for policy 0, policy_version 136502 (0.0047) [2024-06-15 13:11:10,434][1651669] Updated weights for policy 0, policy_version 136529 (0.0014) [2024-06-15 13:11:10,766][1648981] Fps is (10 sec: 52475.8, 60 sec: 49155.2, 300 sec: 47985.7). Total num frames: 279642112. Throughput: 0: 12185.7. Samples: 69955072. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:10,767][1648981] Avg episode reward: [(0, '257.280')] [2024-06-15 13:11:14,315][1651669] Updated weights for policy 0, policy_version 136592 (0.0013) [2024-06-15 13:11:15,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 48059.5, 300 sec: 48208.0). Total num frames: 279838720. Throughput: 0: 12351.3. Samples: 70031872. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:15,767][1648981] Avg episode reward: [(0, '264.480')] [2024-06-15 13:11:16,404][1651669] Updated weights for policy 0, policy_version 136672 (0.0013) [2024-06-15 13:11:19,537][1651669] Updated weights for policy 0, policy_version 136736 (0.0015) [2024-06-15 13:11:20,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 280133632. Throughput: 0: 12185.6. Samples: 70098944. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:20,767][1648981] Avg episode reward: [(0, '275.780')] [2024-06-15 13:11:20,859][1651274] Signal inference workers to stop experience collection... (7150 times) [2024-06-15 13:11:20,904][1651669] Updated weights for policy 0, policy_version 136787 (0.0013) [2024-06-15 13:11:20,959][1651669] InferenceWorker_p0-w0: stopping experience collection (7150 times) [2024-06-15 13:11:21,139][1651274] Signal inference workers to resume experience collection... (7150 times) [2024-06-15 13:11:21,140][1651669] InferenceWorker_p0-w0: resuming experience collection (7150 times) [2024-06-15 13:11:24,959][1651669] Updated weights for policy 0, policy_version 136834 (0.0013) [2024-06-15 13:11:25,769][1648981] Fps is (10 sec: 45863.8, 60 sec: 47511.6, 300 sec: 48209.4). Total num frames: 280297472. Throughput: 0: 12207.6. Samples: 70134272. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:25,770][1648981] Avg episode reward: [(0, '277.810')] [2024-06-15 13:11:26,829][1651669] Updated weights for policy 0, policy_version 136899 (0.0013) [2024-06-15 13:11:27,953][1651669] Updated weights for policy 0, policy_version 136954 (0.0045) [2024-06-15 13:11:30,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 49171.5, 300 sec: 47874.6). Total num frames: 280559616. Throughput: 0: 12310.8. Samples: 70202880. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:30,767][1648981] Avg episode reward: [(0, '271.730')] [2024-06-15 13:11:31,024][1651669] Updated weights for policy 0, policy_version 137010 (0.0013) [2024-06-15 13:11:32,528][1651669] Updated weights for policy 0, policy_version 137059 (0.0015) [2024-06-15 13:11:33,127][1651669] Updated weights for policy 0, policy_version 137087 (0.0011) [2024-06-15 13:11:35,766][1648981] Fps is (10 sec: 45887.9, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 280756224. Throughput: 0: 12515.6. Samples: 70288384. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:35,767][1648981] Avg episode reward: [(0, '274.940')] [2024-06-15 13:11:37,138][1651669] Updated weights for policy 0, policy_version 137155 (0.0145) [2024-06-15 13:11:38,449][1651669] Updated weights for policy 0, policy_version 137215 (0.0109) [2024-06-15 13:11:40,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 48097.3). Total num frames: 281051136. Throughput: 0: 12185.6. Samples: 70312448. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:40,767][1648981] Avg episode reward: [(0, '269.320')] [2024-06-15 13:11:41,509][1651669] Updated weights for policy 0, policy_version 137273 (0.0015) [2024-06-15 13:11:43,219][1651669] Updated weights for policy 0, policy_version 137334 (0.0014) [2024-06-15 13:11:45,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 281280512. Throughput: 0: 12381.5. Samples: 70396416. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:45,767][1648981] Avg episode reward: [(0, '270.690')] [2024-06-15 13:11:46,758][1651669] Updated weights for policy 0, policy_version 137376 (0.0014) [2024-06-15 13:11:48,451][1651669] Updated weights for policy 0, policy_version 137440 (0.0013) [2024-06-15 13:11:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48605.9, 300 sec: 48207.9). Total num frames: 281542656. Throughput: 0: 12162.8. Samples: 70459904. Policy #0 lag: (min: 37.0, avg: 144.4, max: 293.0) [2024-06-15 13:11:50,767][1648981] Avg episode reward: [(0, '272.490')] [2024-06-15 13:11:52,372][1651669] Updated weights for policy 0, policy_version 137504 (0.0013) [2024-06-15 13:11:53,235][1651669] Updated weights for policy 0, policy_version 137538 (0.0035) [2024-06-15 13:11:54,160][1651669] Updated weights for policy 0, policy_version 137596 (0.0015) [2024-06-15 13:11:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47988.3). Total num frames: 281804800. Throughput: 0: 11980.8. Samples: 70494208. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:11:55,767][1648981] Avg episode reward: [(0, '277.040')] [2024-06-15 13:11:58,237][1651669] Updated weights for policy 0, policy_version 137655 (0.0011) [2024-06-15 13:11:59,516][1651669] Updated weights for policy 0, policy_version 137712 (0.0106) [2024-06-15 13:12:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 49159.3, 300 sec: 48430.0). Total num frames: 282066944. Throughput: 0: 12026.4. Samples: 70573056. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:00,767][1648981] Avg episode reward: [(0, '282.330')] [2024-06-15 13:12:03,531][1651274] Signal inference workers to stop experience collection... (7200 times) [2024-06-15 13:12:03,610][1651669] InferenceWorker_p0-w0: stopping experience collection (7200 times) [2024-06-15 13:12:03,719][1651274] Signal inference workers to resume experience collection... (7200 times) [2024-06-15 13:12:03,726][1651669] InferenceWorker_p0-w0: resuming experience collection (7200 times) [2024-06-15 13:12:03,727][1651669] Updated weights for policy 0, policy_version 137776 (0.0033) [2024-06-15 13:12:05,060][1651669] Updated weights for policy 0, policy_version 137850 (0.0013) [2024-06-15 13:12:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 48318.9). Total num frames: 282329088. Throughput: 0: 12014.9. Samples: 70639616. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:05,767][1648981] Avg episode reward: [(0, '278.780')] [2024-06-15 13:12:09,507][1651669] Updated weights for policy 0, policy_version 137922 (0.0019) [2024-06-15 13:12:10,635][1651669] Updated weights for policy 0, policy_version 137976 (0.0017) [2024-06-15 13:12:10,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48605.9, 300 sec: 48543.5). Total num frames: 282558464. Throughput: 0: 12243.3. Samples: 70685184. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:10,767][1648981] Avg episode reward: [(0, '285.210')] [2024-06-15 13:12:13,895][1651669] Updated weights for policy 0, policy_version 138008 (0.0020) [2024-06-15 13:12:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49152.2, 300 sec: 48207.8). Total num frames: 282787840. Throughput: 0: 12242.5. Samples: 70753792. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:15,767][1648981] Avg episode reward: [(0, '276.450')] [2024-06-15 13:12:15,857][1651669] Updated weights for policy 0, policy_version 138096 (0.0018) [2024-06-15 13:12:19,608][1651669] Updated weights for policy 0, policy_version 138161 (0.0110) [2024-06-15 13:12:20,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 283049984. Throughput: 0: 11923.9. Samples: 70824960. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:20,767][1648981] Avg episode reward: [(0, '273.450')] [2024-06-15 13:12:20,778][1651669] Updated weights for policy 0, policy_version 138209 (0.0011) [2024-06-15 13:12:25,018][1651669] Updated weights for policy 0, policy_version 138257 (0.0014) [2024-06-15 13:12:25,772][1648981] Fps is (10 sec: 42572.8, 60 sec: 48603.2, 300 sec: 48095.8). Total num frames: 283213824. Throughput: 0: 12252.2. Samples: 70863872. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:25,773][1648981] Avg episode reward: [(0, '282.030')] [2024-06-15 13:12:26,123][1651669] Updated weights for policy 0, policy_version 138308 (0.0015) [2024-06-15 13:12:27,306][1651669] Updated weights for policy 0, policy_version 138360 (0.0143) [2024-06-15 13:12:29,551][1651669] Updated weights for policy 0, policy_version 138400 (0.0014) [2024-06-15 13:12:30,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 283541504. Throughput: 0: 12117.3. Samples: 70941696. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:30,767][1648981] Avg episode reward: [(0, '285.450')] [2024-06-15 13:12:31,617][1651669] Updated weights for policy 0, policy_version 138490 (0.0014) [2024-06-15 13:12:35,766][1648981] Fps is (10 sec: 45903.1, 60 sec: 48605.9, 300 sec: 48097.1). Total num frames: 283672576. Throughput: 0: 12333.5. Samples: 71014912. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:35,767][1648981] Avg episode reward: [(0, '290.850')] [2024-06-15 13:12:36,270][1651274] Saving new best policy, reward=290.850! [2024-06-15 13:12:36,686][1651669] Updated weights for policy 0, policy_version 138560 (0.0016) [2024-06-15 13:12:38,161][1651669] Updated weights for policy 0, policy_version 138624 (0.0012) [2024-06-15 13:12:40,784][1648981] Fps is (10 sec: 39250.9, 60 sec: 48045.3, 300 sec: 48093.8). Total num frames: 283934720. Throughput: 0: 12078.4. Samples: 71037952. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:40,785][1648981] Avg episode reward: [(0, '287.350')] [2024-06-15 13:12:41,327][1651274] Signal inference workers to stop experience collection... (7250 times) [2024-06-15 13:12:41,501][1651669] InferenceWorker_p0-w0: stopping experience collection (7250 times) [2024-06-15 13:12:41,608][1651274] Signal inference workers to resume experience collection... (7250 times) [2024-06-15 13:12:41,609][1651669] InferenceWorker_p0-w0: resuming experience collection (7250 times) [2024-06-15 13:12:41,980][1651669] Updated weights for policy 0, policy_version 138704 (0.0106) [2024-06-15 13:12:42,881][1651669] Updated weights for policy 0, policy_version 138747 (0.0011) [2024-06-15 13:12:45,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 284164096. Throughput: 0: 12094.6. Samples: 71117312. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:45,767][1648981] Avg episode reward: [(0, '291.650')] [2024-06-15 13:12:45,768][1651274] Saving new best policy, reward=291.650! [2024-06-15 13:12:47,625][1651669] Updated weights for policy 0, policy_version 138800 (0.0172) [2024-06-15 13:12:49,449][1651669] Updated weights for policy 0, policy_version 138872 (0.0014) [2024-06-15 13:12:50,766][1648981] Fps is (10 sec: 49240.5, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 284426240. Throughput: 0: 12174.2. Samples: 71187456. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:50,767][1648981] Avg episode reward: [(0, '284.500')] [2024-06-15 13:12:51,903][1651669] Updated weights for policy 0, policy_version 138915 (0.0066) [2024-06-15 13:12:53,449][1651669] Updated weights for policy 0, policy_version 139002 (0.0013) [2024-06-15 13:12:55,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48059.6, 300 sec: 48430.0). Total num frames: 284688384. Throughput: 0: 11810.1. Samples: 71216640. Policy #0 lag: (min: 15.0, avg: 122.7, max: 271.0) [2024-06-15 13:12:55,767][1648981] Avg episode reward: [(0, '282.890')] [2024-06-15 13:12:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000139008_284688384.pth... [2024-06-15 13:12:55,871][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000133312_273022976.pth [2024-06-15 13:12:58,405][1651669] Updated weights for policy 0, policy_version 139042 (0.0013) [2024-06-15 13:13:00,097][1651669] Updated weights for policy 0, policy_version 139110 (0.0013) [2024-06-15 13:13:00,735][1651669] Updated weights for policy 0, policy_version 139136 (0.0012) [2024-06-15 13:13:00,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 284950528. Throughput: 0: 12026.3. Samples: 71294976. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:00,767][1648981] Avg episode reward: [(0, '284.820')] [2024-06-15 13:13:02,719][1651669] Updated weights for policy 0, policy_version 139196 (0.0014) [2024-06-15 13:13:04,041][1651669] Updated weights for policy 0, policy_version 139253 (0.0018) [2024-06-15 13:13:05,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 285212672. Throughput: 0: 12128.7. Samples: 71370752. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:05,767][1648981] Avg episode reward: [(0, '283.820')] [2024-06-15 13:13:08,675][1651669] Updated weights for policy 0, policy_version 139296 (0.0015) [2024-06-15 13:13:10,095][1651669] Updated weights for policy 0, policy_version 139344 (0.0110) [2024-06-15 13:13:10,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 48104.5). Total num frames: 285442048. Throughput: 0: 12107.6. Samples: 71408640. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:10,767][1648981] Avg episode reward: [(0, '282.820')] [2024-06-15 13:13:11,056][1651669] Updated weights for policy 0, policy_version 139392 (0.0016) [2024-06-15 13:13:13,282][1651669] Updated weights for policy 0, policy_version 139444 (0.0014) [2024-06-15 13:13:15,082][1651669] Updated weights for policy 0, policy_version 139514 (0.0012) [2024-06-15 13:13:15,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 48430.0). Total num frames: 285736960. Throughput: 0: 11901.1. Samples: 71477248. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:15,767][1648981] Avg episode reward: [(0, '285.010')] [2024-06-15 13:13:19,712][1651669] Updated weights for policy 0, policy_version 139555 (0.0012) [2024-06-15 13:13:20,767][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 285868032. Throughput: 0: 11889.8. Samples: 71549952. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:20,767][1648981] Avg episode reward: [(0, '269.330')] [2024-06-15 13:13:21,348][1651669] Updated weights for policy 0, policy_version 139603 (0.0030) [2024-06-15 13:13:23,371][1651274] Signal inference workers to stop experience collection... (7300 times) [2024-06-15 13:13:23,446][1651669] InferenceWorker_p0-w0: stopping experience collection (7300 times) [2024-06-15 13:13:23,448][1651669] Updated weights for policy 0, policy_version 139665 (0.0011) [2024-06-15 13:13:23,722][1651274] Signal inference workers to resume experience collection... (7300 times) [2024-06-15 13:13:23,723][1651669] InferenceWorker_p0-w0: resuming experience collection (7300 times) [2024-06-15 13:13:24,987][1651669] Updated weights for policy 0, policy_version 139714 (0.0012) [2024-06-15 13:13:25,767][1648981] Fps is (10 sec: 45872.7, 60 sec: 49702.6, 300 sec: 48320.1). Total num frames: 286195712. Throughput: 0: 12190.3. Samples: 71586304. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:25,768][1648981] Avg episode reward: [(0, '268.850')] [2024-06-15 13:13:26,169][1651669] Updated weights for policy 0, policy_version 139776 (0.0014) [2024-06-15 13:13:30,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 46418.4, 300 sec: 47762.9). Total num frames: 286326784. Throughput: 0: 12025.3. Samples: 71658496. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:30,771][1648981] Avg episode reward: [(0, '271.680')] [2024-06-15 13:13:31,085][1651669] Updated weights for policy 0, policy_version 139832 (0.0014) [2024-06-15 13:13:33,357][1651669] Updated weights for policy 0, policy_version 139892 (0.0012) [2024-06-15 13:13:34,047][1651669] Updated weights for policy 0, policy_version 139920 (0.0104) [2024-06-15 13:13:35,081][1651669] Updated weights for policy 0, policy_version 139965 (0.0028) [2024-06-15 13:13:35,783][1648981] Fps is (10 sec: 45802.3, 60 sec: 49684.4, 300 sec: 48430.6). Total num frames: 286654464. Throughput: 0: 12021.9. Samples: 71728640. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:35,783][1648981] Avg episode reward: [(0, '272.350')] [2024-06-15 13:13:36,972][1651669] Updated weights for policy 0, policy_version 140016 (0.0014) [2024-06-15 13:13:40,766][1648981] Fps is (10 sec: 45893.6, 60 sec: 47528.0, 300 sec: 48096.8). Total num frames: 286785536. Throughput: 0: 12185.7. Samples: 71764992. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:40,767][1648981] Avg episode reward: [(0, '273.960')] [2024-06-15 13:13:40,978][1651669] Updated weights for policy 0, policy_version 140051 (0.0087) [2024-06-15 13:13:43,262][1651669] Updated weights for policy 0, policy_version 140114 (0.0012) [2024-06-15 13:13:44,569][1651669] Updated weights for policy 0, policy_version 140164 (0.0011) [2024-06-15 13:13:45,769][1648981] Fps is (10 sec: 49222.1, 60 sec: 49696.1, 300 sec: 48318.5). Total num frames: 287145984. Throughput: 0: 12082.5. Samples: 71838720. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:45,770][1648981] Avg episode reward: [(0, '277.850')] [2024-06-15 13:13:46,862][1651669] Updated weights for policy 0, policy_version 140226 (0.0013) [2024-06-15 13:13:47,924][1651669] Updated weights for policy 0, policy_version 140277 (0.0014) [2024-06-15 13:13:50,766][1648981] Fps is (10 sec: 52428.0, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 287309824. Throughput: 0: 12197.0. Samples: 71919616. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:50,767][1648981] Avg episode reward: [(0, '283.830')] [2024-06-15 13:13:51,372][1651669] Updated weights for policy 0, policy_version 140320 (0.0013) [2024-06-15 13:13:53,692][1651669] Updated weights for policy 0, policy_version 140371 (0.0013) [2024-06-15 13:13:54,766][1651669] Updated weights for policy 0, policy_version 140420 (0.0016) [2024-06-15 13:13:55,766][1648981] Fps is (10 sec: 52441.3, 60 sec: 49698.3, 300 sec: 48430.0). Total num frames: 287670272. Throughput: 0: 12140.1. Samples: 71954944. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:13:55,767][1648981] Avg episode reward: [(0, '285.220')] [2024-06-15 13:13:56,071][1651669] Updated weights for policy 0, policy_version 140474 (0.0012) [2024-06-15 13:13:57,821][1651669] Updated weights for policy 0, policy_version 140501 (0.0020) [2024-06-15 13:14:00,769][1648981] Fps is (10 sec: 52413.0, 60 sec: 48057.3, 300 sec: 48429.5). Total num frames: 287834112. Throughput: 0: 12241.7. Samples: 72028160. Policy #0 lag: (min: 4.0, avg: 84.5, max: 260.0) [2024-06-15 13:14:00,770][1648981] Avg episode reward: [(0, '284.250')] [2024-06-15 13:14:02,053][1651669] Updated weights for policy 0, policy_version 140576 (0.0014) [2024-06-15 13:14:04,284][1651669] Updated weights for policy 0, policy_version 140624 (0.0166) [2024-06-15 13:14:05,275][1651274] Signal inference workers to stop experience collection... (7350 times) [2024-06-15 13:14:05,320][1651669] InferenceWorker_p0-w0: stopping experience collection (7350 times) [2024-06-15 13:14:05,437][1651274] Signal inference workers to resume experience collection... (7350 times) [2024-06-15 13:14:05,438][1651669] InferenceWorker_p0-w0: resuming experience collection (7350 times) [2024-06-15 13:14:05,610][1651669] Updated weights for policy 0, policy_version 140695 (0.0013) [2024-06-15 13:14:05,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49152.0, 300 sec: 48541.1). Total num frames: 288161792. Throughput: 0: 12208.4. Samples: 72099328. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:05,767][1648981] Avg episode reward: [(0, '283.370')] [2024-06-15 13:14:06,629][1651669] Updated weights for policy 0, policy_version 140736 (0.0015) [2024-06-15 13:14:09,121][1651669] Updated weights for policy 0, policy_version 140794 (0.0013) [2024-06-15 13:14:10,767][1648981] Fps is (10 sec: 52441.4, 60 sec: 48605.4, 300 sec: 48430.0). Total num frames: 288358400. Throughput: 0: 12231.1. Samples: 72136704. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:10,768][1648981] Avg episode reward: [(0, '280.640')] [2024-06-15 13:14:14,644][1651669] Updated weights for policy 0, policy_version 140865 (0.0027) [2024-06-15 13:14:15,782][1648981] Fps is (10 sec: 42531.2, 60 sec: 47501.2, 300 sec: 48427.4). Total num frames: 288587776. Throughput: 0: 12387.1. Samples: 72216064. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:15,783][1648981] Avg episode reward: [(0, '270.590')] [2024-06-15 13:14:16,409][1651669] Updated weights for policy 0, policy_version 140948 (0.0016) [2024-06-15 13:14:18,819][1651669] Updated weights for policy 0, policy_version 140994 (0.0013) [2024-06-15 13:14:20,171][1651669] Updated weights for policy 0, policy_version 141047 (0.0016) [2024-06-15 13:14:20,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 50243.5, 300 sec: 48540.9). Total num frames: 288882688. Throughput: 0: 12315.0. Samples: 72282624. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:20,768][1648981] Avg episode reward: [(0, '267.860')] [2024-06-15 13:14:23,540][1651669] Updated weights for policy 0, policy_version 141077 (0.0013) [2024-06-15 13:14:25,036][1651669] Updated weights for policy 0, policy_version 141136 (0.0013) [2024-06-15 13:14:25,766][1648981] Fps is (10 sec: 49229.3, 60 sec: 48060.2, 300 sec: 48652.2). Total num frames: 289079296. Throughput: 0: 12492.7. Samples: 72327168. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:25,767][1648981] Avg episode reward: [(0, '274.420')] [2024-06-15 13:14:26,694][1651669] Updated weights for policy 0, policy_version 141200 (0.0055) [2024-06-15 13:14:30,495][1651669] Updated weights for policy 0, policy_version 141250 (0.0037) [2024-06-15 13:14:30,766][1648981] Fps is (10 sec: 42603.1, 60 sec: 49701.4, 300 sec: 48545.5). Total num frames: 289308672. Throughput: 0: 12288.7. Samples: 72391680. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:30,767][1648981] Avg episode reward: [(0, '267.460')] [2024-06-15 13:14:31,691][1651669] Updated weights for policy 0, policy_version 141309 (0.0086) [2024-06-15 13:14:34,426][1651669] Updated weights for policy 0, policy_version 141348 (0.0015) [2024-06-15 13:14:35,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 48072.9, 300 sec: 48652.1). Total num frames: 289538048. Throughput: 0: 12174.2. Samples: 72467456. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:35,767][1648981] Avg episode reward: [(0, '269.920')] [2024-06-15 13:14:37,040][1651669] Updated weights for policy 0, policy_version 141409 (0.0020) [2024-06-15 13:14:38,303][1651669] Updated weights for policy 0, policy_version 141457 (0.0013) [2024-06-15 13:14:40,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50244.2, 300 sec: 48652.2). Total num frames: 289800192. Throughput: 0: 12049.1. Samples: 72497152. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:40,767][1648981] Avg episode reward: [(0, '275.720')] [2024-06-15 13:14:41,848][1651669] Updated weights for policy 0, policy_version 141505 (0.0018) [2024-06-15 13:14:43,011][1651669] Updated weights for policy 0, policy_version 141554 (0.0013) [2024-06-15 13:14:45,014][1651669] Updated weights for policy 0, policy_version 141584 (0.0024) [2024-06-15 13:14:45,774][1648981] Fps is (10 sec: 49114.3, 60 sec: 48055.4, 300 sec: 48762.5). Total num frames: 290029568. Throughput: 0: 12104.7. Samples: 72572928. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:45,775][1648981] Avg episode reward: [(0, '267.520')] [2024-06-15 13:14:47,684][1651669] Updated weights for policy 0, policy_version 141649 (0.0015) [2024-06-15 13:14:48,138][1651274] Signal inference workers to stop experience collection... (7400 times) [2024-06-15 13:14:48,256][1651669] InferenceWorker_p0-w0: stopping experience collection (7400 times) [2024-06-15 13:14:48,496][1651274] Signal inference workers to resume experience collection... (7400 times) [2024-06-15 13:14:48,502][1651669] InferenceWorker_p0-w0: resuming experience collection (7400 times) [2024-06-15 13:14:49,179][1651669] Updated weights for policy 0, policy_version 141699 (0.0015) [2024-06-15 13:14:50,400][1651669] Updated weights for policy 0, policy_version 141756 (0.0141) [2024-06-15 13:14:50,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.2, 300 sec: 48541.1). Total num frames: 290324480. Throughput: 0: 11867.0. Samples: 72633344. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:50,767][1648981] Avg episode reward: [(0, '269.930')] [2024-06-15 13:14:54,068][1651669] Updated weights for policy 0, policy_version 141808 (0.0016) [2024-06-15 13:14:55,767][1648981] Fps is (10 sec: 45909.3, 60 sec: 46967.2, 300 sec: 48541.7). Total num frames: 290488320. Throughput: 0: 11980.9. Samples: 72675840. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:14:55,768][1648981] Avg episode reward: [(0, '277.380')] [2024-06-15 13:14:56,014][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000141856_290521088.pth... [2024-06-15 13:14:56,177][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000136192_278921216.pth [2024-06-15 13:14:56,615][1651669] Updated weights for policy 0, policy_version 141876 (0.0014) [2024-06-15 13:14:59,840][1651669] Updated weights for policy 0, policy_version 141945 (0.0014) [2024-06-15 13:15:00,770][1648981] Fps is (10 sec: 42582.6, 60 sec: 48605.3, 300 sec: 48429.4). Total num frames: 290750464. Throughput: 0: 11779.1. Samples: 72745984. Policy #0 lag: (min: 15.0, avg: 107.1, max: 271.0) [2024-06-15 13:15:00,771][1648981] Avg episode reward: [(0, '278.070')] [2024-06-15 13:15:01,024][1651669] Updated weights for policy 0, policy_version 141984 (0.0087) [2024-06-15 13:15:04,130][1651669] Updated weights for policy 0, policy_version 142017 (0.0013) [2024-06-15 13:15:05,766][1648981] Fps is (10 sec: 49153.7, 60 sec: 46967.4, 300 sec: 48430.6). Total num frames: 290979840. Throughput: 0: 11810.4. Samples: 72814080. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:05,767][1648981] Avg episode reward: [(0, '281.880')] [2024-06-15 13:15:06,783][1651669] Updated weights for policy 0, policy_version 142082 (0.0013) [2024-06-15 13:15:10,135][1651669] Updated weights for policy 0, policy_version 142160 (0.0012) [2024-06-15 13:15:10,766][1648981] Fps is (10 sec: 42614.3, 60 sec: 46967.9, 300 sec: 48207.8). Total num frames: 291176448. Throughput: 0: 11514.3. Samples: 72845312. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:10,767][1648981] Avg episode reward: [(0, '290.140')] [2024-06-15 13:15:11,132][1651669] Updated weights for policy 0, policy_version 142208 (0.0011) [2024-06-15 13:15:12,713][1651669] Updated weights for policy 0, policy_version 142265 (0.0010) [2024-06-15 13:15:15,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 47526.0, 300 sec: 48318.9). Total num frames: 291438592. Throughput: 0: 11878.4. Samples: 72926208. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:15,767][1648981] Avg episode reward: [(0, '285.340')] [2024-06-15 13:15:15,918][1651669] Updated weights for policy 0, policy_version 142305 (0.0014) [2024-06-15 13:15:17,422][1651669] Updated weights for policy 0, policy_version 142340 (0.0013) [2024-06-15 13:15:18,722][1651669] Updated weights for policy 0, policy_version 142394 (0.0013) [2024-06-15 13:15:20,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46968.2, 300 sec: 48318.9). Total num frames: 291700736. Throughput: 0: 11730.5. Samples: 72995328. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:20,767][1648981] Avg episode reward: [(0, '299.490')] [2024-06-15 13:15:21,085][1651669] Updated weights for policy 0, policy_version 142458 (0.0012) [2024-06-15 13:15:21,168][1651274] Saving new best policy, reward=299.490! [2024-06-15 13:15:23,283][1651669] Updated weights for policy 0, policy_version 142517 (0.0013) [2024-06-15 13:15:25,784][1648981] Fps is (10 sec: 45792.6, 60 sec: 46953.3, 300 sec: 48430.9). Total num frames: 291897344. Throughput: 0: 11850.9. Samples: 73030656. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:25,785][1648981] Avg episode reward: [(0, '311.080')] [2024-06-15 13:15:26,386][1651669] Updated weights for policy 0, policy_version 142558 (0.0012) [2024-06-15 13:15:26,419][1651274] Saving new best policy, reward=311.080! [2024-06-15 13:15:28,529][1651669] Updated weights for policy 0, policy_version 142624 (0.0146) [2024-06-15 13:15:30,709][1651669] Updated weights for policy 0, policy_version 142688 (0.0013) [2024-06-15 13:15:30,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48605.8, 300 sec: 48652.2). Total num frames: 292225024. Throughput: 0: 11937.4. Samples: 73110016. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:30,767][1648981] Avg episode reward: [(0, '299.050')] [2024-06-15 13:15:33,094][1651274] Signal inference workers to stop experience collection... (7450 times) [2024-06-15 13:15:33,103][1651669] Updated weights for policy 0, policy_version 142721 (0.0012) [2024-06-15 13:15:33,156][1651669] InferenceWorker_p0-w0: stopping experience collection (7450 times) [2024-06-15 13:15:33,273][1651274] Signal inference workers to resume experience collection... (7450 times) [2024-06-15 13:15:33,274][1651669] InferenceWorker_p0-w0: resuming experience collection (7450 times) [2024-06-15 13:15:34,102][1651669] Updated weights for policy 0, policy_version 142780 (0.0119) [2024-06-15 13:15:35,766][1648981] Fps is (10 sec: 52523.9, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 292421632. Throughput: 0: 12379.0. Samples: 73190400. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:35,767][1648981] Avg episode reward: [(0, '302.490')] [2024-06-15 13:15:37,431][1651669] Updated weights for policy 0, policy_version 142840 (0.0014) [2024-06-15 13:15:38,518][1651669] Updated weights for policy 0, policy_version 142880 (0.0034) [2024-06-15 13:15:40,767][1648981] Fps is (10 sec: 49150.2, 60 sec: 48605.5, 300 sec: 48541.0). Total num frames: 292716544. Throughput: 0: 12174.2. Samples: 73223680. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:40,768][1648981] Avg episode reward: [(0, '295.770')] [2024-06-15 13:15:40,956][1651669] Updated weights for policy 0, policy_version 142944 (0.0015) [2024-06-15 13:15:44,353][1651669] Updated weights for policy 0, policy_version 142994 (0.0013) [2024-06-15 13:15:45,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48612.3, 300 sec: 48541.1). Total num frames: 292945920. Throughput: 0: 12186.6. Samples: 73294336. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:45,767][1648981] Avg episode reward: [(0, '282.420')] [2024-06-15 13:15:48,527][1651669] Updated weights for policy 0, policy_version 143077 (0.0045) [2024-06-15 13:15:50,052][1651669] Updated weights for policy 0, policy_version 143142 (0.0014) [2024-06-15 13:15:50,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 293208064. Throughput: 0: 12333.5. Samples: 73369088. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:50,767][1648981] Avg episode reward: [(0, '289.480')] [2024-06-15 13:15:52,275][1651669] Updated weights for policy 0, policy_version 143201 (0.0022) [2024-06-15 13:15:54,252][1651669] Updated weights for policy 0, policy_version 143248 (0.0051) [2024-06-15 13:15:55,222][1651669] Updated weights for policy 0, policy_version 143296 (0.0013) [2024-06-15 13:15:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49698.4, 300 sec: 48653.6). Total num frames: 293470208. Throughput: 0: 12401.8. Samples: 73403392. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:15:55,767][1648981] Avg episode reward: [(0, '283.610')] [2024-06-15 13:15:59,936][1651669] Updated weights for policy 0, policy_version 143360 (0.0015) [2024-06-15 13:16:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48608.9, 300 sec: 48541.1). Total num frames: 293666816. Throughput: 0: 12413.2. Samples: 73484800. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:16:00,767][1648981] Avg episode reward: [(0, '263.250')] [2024-06-15 13:16:01,236][1651669] Updated weights for policy 0, policy_version 143417 (0.0012) [2024-06-15 13:16:03,127][1651669] Updated weights for policy 0, policy_version 143456 (0.0012) [2024-06-15 13:16:04,288][1651669] Updated weights for policy 0, policy_version 143492 (0.0013) [2024-06-15 13:16:05,584][1651669] Updated weights for policy 0, policy_version 143551 (0.0109) [2024-06-15 13:16:05,767][1648981] Fps is (10 sec: 52426.0, 60 sec: 50243.8, 300 sec: 48652.1). Total num frames: 293994496. Throughput: 0: 12208.2. Samples: 73544704. Policy #0 lag: (min: 39.0, avg: 145.2, max: 295.0) [2024-06-15 13:16:05,767][1648981] Avg episode reward: [(0, '263.930')] [2024-06-15 13:16:10,306][1651669] Updated weights for policy 0, policy_version 143604 (0.0015) [2024-06-15 13:16:10,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 49152.1, 300 sec: 48430.1). Total num frames: 294125568. Throughput: 0: 12441.0. Samples: 73590272. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:10,767][1648981] Avg episode reward: [(0, '261.560')] [2024-06-15 13:16:11,745][1651669] Updated weights for policy 0, policy_version 143672 (0.0013) [2024-06-15 13:16:13,554][1651274] Signal inference workers to stop experience collection... (7500 times) [2024-06-15 13:16:13,627][1651669] InferenceWorker_p0-w0: stopping experience collection (7500 times) [2024-06-15 13:16:13,747][1651274] Signal inference workers to resume experience collection... (7500 times) [2024-06-15 13:16:13,748][1651669] InferenceWorker_p0-w0: resuming experience collection (7500 times) [2024-06-15 13:16:13,902][1651669] Updated weights for policy 0, policy_version 143715 (0.0014) [2024-06-15 13:16:14,963][1651669] Updated weights for policy 0, policy_version 143749 (0.0015) [2024-06-15 13:16:15,766][1648981] Fps is (10 sec: 45877.5, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 294453248. Throughput: 0: 12299.4. Samples: 73663488. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:15,767][1648981] Avg episode reward: [(0, '262.160')] [2024-06-15 13:16:16,223][1651669] Updated weights for policy 0, policy_version 143801 (0.0014) [2024-06-15 13:16:20,766][1648981] Fps is (10 sec: 42597.7, 60 sec: 47513.7, 300 sec: 48319.4). Total num frames: 294551552. Throughput: 0: 12242.5. Samples: 73741312. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:20,767][1648981] Avg episode reward: [(0, '272.280')] [2024-06-15 13:16:21,193][1651669] Updated weights for policy 0, policy_version 143857 (0.0104) [2024-06-15 13:16:22,812][1651669] Updated weights for policy 0, policy_version 143936 (0.0096) [2024-06-15 13:16:25,330][1651669] Updated weights for policy 0, policy_version 143991 (0.0012) [2024-06-15 13:16:25,807][1648981] Fps is (10 sec: 45691.8, 60 sec: 50225.8, 300 sec: 48645.5). Total num frames: 294912000. Throughput: 0: 12186.2. Samples: 73772544. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:25,807][1648981] Avg episode reward: [(0, '279.000')] [2024-06-15 13:16:26,237][1651669] Updated weights for policy 0, policy_version 144024 (0.0012) [2024-06-15 13:16:27,091][1651669] Updated weights for policy 0, policy_version 144062 (0.0011) [2024-06-15 13:16:30,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 295043072. Throughput: 0: 12253.9. Samples: 73845760. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:30,767][1648981] Avg episode reward: [(0, '277.980')] [2024-06-15 13:16:32,241][1651669] Updated weights for policy 0, policy_version 144128 (0.0014) [2024-06-15 13:16:33,398][1651669] Updated weights for policy 0, policy_version 144187 (0.0012) [2024-06-15 13:16:35,309][1651669] Updated weights for policy 0, policy_version 144225 (0.0014) [2024-06-15 13:16:35,790][1648981] Fps is (10 sec: 49232.7, 60 sec: 49678.5, 300 sec: 48648.2). Total num frames: 295403520. Throughput: 0: 12179.2. Samples: 73917440. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:35,791][1648981] Avg episode reward: [(0, '278.710')] [2024-06-15 13:16:36,575][1651669] Updated weights for policy 0, policy_version 144261 (0.0014) [2024-06-15 13:16:40,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 47513.7, 300 sec: 48430.0). Total num frames: 295567360. Throughput: 0: 12162.8. Samples: 73950720. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:40,767][1648981] Avg episode reward: [(0, '278.510')] [2024-06-15 13:16:42,428][1651669] Updated weights for policy 0, policy_version 144336 (0.0013) [2024-06-15 13:16:44,019][1651669] Updated weights for policy 0, policy_version 144416 (0.0016) [2024-06-15 13:16:45,766][1648981] Fps is (10 sec: 45984.8, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 295862272. Throughput: 0: 11889.8. Samples: 74019840. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:45,767][1648981] Avg episode reward: [(0, '278.910')] [2024-06-15 13:16:45,938][1651669] Updated weights for policy 0, policy_version 144465 (0.0017) [2024-06-15 13:16:47,061][1651669] Updated weights for policy 0, policy_version 144508 (0.0014) [2024-06-15 13:16:49,145][1651669] Updated weights for policy 0, policy_version 144563 (0.0012) [2024-06-15 13:16:50,766][1648981] Fps is (10 sec: 52430.7, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 296091648. Throughput: 0: 12174.4. Samples: 74092544. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:50,767][1648981] Avg episode reward: [(0, '276.930')] [2024-06-15 13:16:53,196][1651669] Updated weights for policy 0, policy_version 144594 (0.0022) [2024-06-15 13:16:54,391][1651669] Updated weights for policy 0, policy_version 144656 (0.0013) [2024-06-15 13:16:55,331][1651669] Updated weights for policy 0, policy_version 144704 (0.0041) [2024-06-15 13:16:55,767][1648981] Fps is (10 sec: 49150.3, 60 sec: 48059.5, 300 sec: 48429.9). Total num frames: 296353792. Throughput: 0: 12026.2. Samples: 74131456. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:16:55,768][1648981] Avg episode reward: [(0, '270.520')] [2024-06-15 13:16:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000144704_296353792.pth... [2024-06-15 13:16:55,903][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000139008_284688384.pth [2024-06-15 13:16:56,076][1651274] Signal inference workers to stop experience collection... (7550 times) [2024-06-15 13:16:56,117][1651669] InferenceWorker_p0-w0: stopping experience collection (7550 times) [2024-06-15 13:16:56,317][1651274] Signal inference workers to resume experience collection... (7550 times) [2024-06-15 13:16:56,318][1651669] InferenceWorker_p0-w0: resuming experience collection (7550 times) [2024-06-15 13:16:57,299][1651669] Updated weights for policy 0, policy_version 144759 (0.0013) [2024-06-15 13:16:58,444][1651669] Updated weights for policy 0, policy_version 144786 (0.0012) [2024-06-15 13:16:59,567][1651669] Updated weights for policy 0, policy_version 144830 (0.0013) [2024-06-15 13:17:00,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 296615936. Throughput: 0: 11946.7. Samples: 74201088. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:17:00,767][1648981] Avg episode reward: [(0, '263.140')] [2024-06-15 13:17:05,421][1651669] Updated weights for policy 0, policy_version 144898 (0.0016) [2024-06-15 13:17:05,802][1648981] Fps is (10 sec: 42447.3, 60 sec: 46394.0, 300 sec: 48202.0). Total num frames: 296779776. Throughput: 0: 11914.4. Samples: 74277888. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:17:05,803][1648981] Avg episode reward: [(0, '268.960')] [2024-06-15 13:17:06,418][1651669] Updated weights for policy 0, policy_version 144960 (0.0013) [2024-06-15 13:17:07,720][1651669] Updated weights for policy 0, policy_version 145015 (0.0057) [2024-06-15 13:17:09,845][1651669] Updated weights for policy 0, policy_version 145058 (0.0014) [2024-06-15 13:17:10,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 50244.1, 300 sec: 48652.1). Total num frames: 297140224. Throughput: 0: 12082.6. Samples: 74315776. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:17:10,767][1648981] Avg episode reward: [(0, '267.850')] [2024-06-15 13:17:15,464][1651669] Updated weights for policy 0, policy_version 145136 (0.0012) [2024-06-15 13:17:15,766][1648981] Fps is (10 sec: 49329.1, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 297271296. Throughput: 0: 12185.6. Samples: 74394112. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:15,767][1648981] Avg episode reward: [(0, '276.340')] [2024-06-15 13:17:16,626][1651669] Updated weights for policy 0, policy_version 145200 (0.0097) [2024-06-15 13:17:17,848][1651669] Updated weights for policy 0, policy_version 145256 (0.0012) [2024-06-15 13:17:20,084][1651669] Updated weights for policy 0, policy_version 145297 (0.0011) [2024-06-15 13:17:20,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 51336.7, 300 sec: 48875.3). Total num frames: 297631744. Throughput: 0: 12180.7. Samples: 74465280. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:20,767][1648981] Avg episode reward: [(0, '268.630')] [2024-06-15 13:17:20,968][1651669] Updated weights for policy 0, policy_version 145338 (0.0011) [2024-06-15 13:17:25,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47545.5, 300 sec: 48207.8). Total num frames: 297762816. Throughput: 0: 12367.7. Samples: 74507264. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:25,767][1648981] Avg episode reward: [(0, '275.460')] [2024-06-15 13:17:25,871][1651669] Updated weights for policy 0, policy_version 145405 (0.0030) [2024-06-15 13:17:27,323][1651669] Updated weights for policy 0, policy_version 145456 (0.0013) [2024-06-15 13:17:30,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 50244.4, 300 sec: 48763.3). Total num frames: 298057728. Throughput: 0: 12253.9. Samples: 74571264. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:30,767][1648981] Avg episode reward: [(0, '270.350')] [2024-06-15 13:17:30,768][1651669] Updated weights for policy 0, policy_version 145552 (0.0111) [2024-06-15 13:17:31,759][1651669] Updated weights for policy 0, policy_version 145595 (0.0011) [2024-06-15 13:17:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46986.1, 300 sec: 48433.0). Total num frames: 298221568. Throughput: 0: 12617.9. Samples: 74660352. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:35,767][1648981] Avg episode reward: [(0, '278.210')] [2024-06-15 13:17:36,138][1651669] Updated weights for policy 0, policy_version 145638 (0.0013) [2024-06-15 13:17:36,607][1651274] Signal inference workers to stop experience collection... (7600 times) [2024-06-15 13:17:36,673][1651669] InferenceWorker_p0-w0: stopping experience collection (7600 times) [2024-06-15 13:17:36,824][1651274] Signal inference workers to resume experience collection... (7600 times) [2024-06-15 13:17:36,825][1651669] InferenceWorker_p0-w0: resuming experience collection (7600 times) [2024-06-15 13:17:37,573][1651669] Updated weights for policy 0, policy_version 145712 (0.0082) [2024-06-15 13:17:39,493][1651669] Updated weights for policy 0, policy_version 145792 (0.0012) [2024-06-15 13:17:40,774][1648981] Fps is (10 sec: 52386.1, 60 sec: 50237.8, 300 sec: 48873.0). Total num frames: 298582016. Throughput: 0: 12354.2. Samples: 74687488. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:40,775][1648981] Avg episode reward: [(0, '273.660')] [2024-06-15 13:17:42,344][1651669] Updated weights for policy 0, policy_version 145848 (0.0013) [2024-06-15 13:17:45,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 47513.4, 300 sec: 48430.0). Total num frames: 298713088. Throughput: 0: 12492.7. Samples: 74763264. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:45,767][1648981] Avg episode reward: [(0, '285.280')] [2024-06-15 13:17:46,225][1651669] Updated weights for policy 0, policy_version 145878 (0.0012) [2024-06-15 13:17:47,110][1651669] Updated weights for policy 0, policy_version 145921 (0.0012) [2024-06-15 13:17:48,388][1651669] Updated weights for policy 0, policy_version 145984 (0.0025) [2024-06-15 13:17:50,200][1651669] Updated weights for policy 0, policy_version 146041 (0.0161) [2024-06-15 13:17:50,766][1648981] Fps is (10 sec: 52470.5, 60 sec: 50244.1, 300 sec: 48874.3). Total num frames: 299106304. Throughput: 0: 12286.4. Samples: 74830336. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:50,767][1648981] Avg episode reward: [(0, '294.270')] [2024-06-15 13:17:51,985][1651669] Updated weights for policy 0, policy_version 146081 (0.0014) [2024-06-15 13:17:55,767][1648981] Fps is (10 sec: 52425.0, 60 sec: 48059.2, 300 sec: 48429.8). Total num frames: 299237376. Throughput: 0: 12401.5. Samples: 74873856. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:17:55,768][1648981] Avg episode reward: [(0, '304.010')] [2024-06-15 13:17:56,872][1651669] Updated weights for policy 0, policy_version 146144 (0.0038) [2024-06-15 13:17:58,522][1651669] Updated weights for policy 0, policy_version 146236 (0.0014) [2024-06-15 13:17:59,940][1651669] Updated weights for policy 0, policy_version 146280 (0.0012) [2024-06-15 13:18:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 299630592. Throughput: 0: 12333.5. Samples: 74949120. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:18:00,767][1648981] Avg episode reward: [(0, '303.590')] [2024-06-15 13:18:02,568][1651669] Updated weights for policy 0, policy_version 146336 (0.0014) [2024-06-15 13:18:05,766][1648981] Fps is (10 sec: 52433.7, 60 sec: 49727.9, 300 sec: 48541.1). Total num frames: 299761664. Throughput: 0: 12401.7. Samples: 75023360. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:18:05,767][1648981] Avg episode reward: [(0, '301.390')] [2024-06-15 13:18:07,401][1651669] Updated weights for policy 0, policy_version 146384 (0.0043) [2024-06-15 13:18:08,459][1651669] Updated weights for policy 0, policy_version 146429 (0.0032) [2024-06-15 13:18:09,883][1651669] Updated weights for policy 0, policy_version 146496 (0.0013) [2024-06-15 13:18:10,767][1648981] Fps is (10 sec: 45873.4, 60 sec: 49151.7, 300 sec: 48652.1). Total num frames: 300089344. Throughput: 0: 12413.0. Samples: 75065856. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:18:10,768][1648981] Avg episode reward: [(0, '302.450')] [2024-06-15 13:18:11,231][1651669] Updated weights for policy 0, policy_version 146554 (0.0019) [2024-06-15 13:18:13,008][1651669] Updated weights for policy 0, policy_version 146608 (0.0016) [2024-06-15 13:18:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 300285952. Throughput: 0: 12379.0. Samples: 75128320. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 13:18:15,767][1648981] Avg episode reward: [(0, '294.790')] [2024-06-15 13:18:18,423][1651274] Signal inference workers to stop experience collection... (7650 times) [2024-06-15 13:18:18,483][1651669] InferenceWorker_p0-w0: stopping experience collection (7650 times) [2024-06-15 13:18:18,626][1651274] Signal inference workers to resume experience collection... (7650 times) [2024-06-15 13:18:18,671][1651669] InferenceWorker_p0-w0: resuming experience collection (7650 times) [2024-06-15 13:18:18,673][1651669] Updated weights for policy 0, policy_version 146643 (0.0019) [2024-06-15 13:18:20,066][1651669] Updated weights for policy 0, policy_version 146708 (0.0013) [2024-06-15 13:18:20,766][1648981] Fps is (10 sec: 42600.1, 60 sec: 48059.6, 300 sec: 48541.2). Total num frames: 300515328. Throughput: 0: 12299.4. Samples: 75213824. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:20,767][1648981] Avg episode reward: [(0, '302.840')] [2024-06-15 13:18:21,633][1651669] Updated weights for policy 0, policy_version 146770 (0.0117) [2024-06-15 13:18:23,335][1651669] Updated weights for policy 0, policy_version 146848 (0.0014) [2024-06-15 13:18:24,208][1651669] Updated weights for policy 0, policy_version 146880 (0.0012) [2024-06-15 13:18:25,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50790.4, 300 sec: 49097.1). Total num frames: 300810240. Throughput: 0: 12210.5. Samples: 75236864. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:25,767][1648981] Avg episode reward: [(0, '308.270')] [2024-06-15 13:18:30,783][1648981] Fps is (10 sec: 35985.0, 60 sec: 46954.3, 300 sec: 48207.8). Total num frames: 300875776. Throughput: 0: 12385.9. Samples: 75320832. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:30,784][1648981] Avg episode reward: [(0, '301.460')] [2024-06-15 13:18:31,331][1651669] Updated weights for policy 0, policy_version 146950 (0.0014) [2024-06-15 13:18:32,643][1651669] Updated weights for policy 0, policy_version 146995 (0.0026) [2024-06-15 13:18:34,284][1651669] Updated weights for policy 0, policy_version 147058 (0.0014) [2024-06-15 13:18:35,781][1648981] Fps is (10 sec: 45809.8, 60 sec: 50778.3, 300 sec: 49094.1). Total num frames: 301268992. Throughput: 0: 12090.7. Samples: 75374592. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:35,781][1648981] Avg episode reward: [(0, '303.740')] [2024-06-15 13:18:36,160][1651669] Updated weights for policy 0, policy_version 147129 (0.0012) [2024-06-15 13:18:40,767][1648981] Fps is (10 sec: 45949.1, 60 sec: 45880.9, 300 sec: 48097.1). Total num frames: 301334528. Throughput: 0: 12026.4. Samples: 75415040. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:40,768][1648981] Avg episode reward: [(0, '288.940')] [2024-06-15 13:18:42,547][1651669] Updated weights for policy 0, policy_version 147186 (0.0012) [2024-06-15 13:18:44,675][1651669] Updated weights for policy 0, policy_version 147265 (0.0014) [2024-06-15 13:18:45,766][1648981] Fps is (10 sec: 39378.1, 60 sec: 49152.2, 300 sec: 48652.2). Total num frames: 301662208. Throughput: 0: 11923.9. Samples: 75485696. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:45,767][1648981] Avg episode reward: [(0, '299.080')] [2024-06-15 13:18:46,547][1651669] Updated weights for policy 0, policy_version 147336 (0.0018) [2024-06-15 13:18:48,000][1651669] Updated weights for policy 0, policy_version 147392 (0.0014) [2024-06-15 13:18:50,766][1648981] Fps is (10 sec: 52432.2, 60 sec: 45875.3, 300 sec: 48096.8). Total num frames: 301858816. Throughput: 0: 11719.1. Samples: 75550720. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:50,767][1648981] Avg episode reward: [(0, '288.990')] [2024-06-15 13:18:55,005][1651274] Signal inference workers to stop experience collection... (7700 times) [2024-06-15 13:18:55,053][1651669] InferenceWorker_p0-w0: stopping experience collection (7700 times) [2024-06-15 13:18:55,333][1651274] Signal inference workers to resume experience collection... (7700 times) [2024-06-15 13:18:55,334][1651669] InferenceWorker_p0-w0: resuming experience collection (7700 times) [2024-06-15 13:18:55,335][1651669] Updated weights for policy 0, policy_version 147488 (0.0026) [2024-06-15 13:18:55,774][1648981] Fps is (10 sec: 42563.7, 60 sec: 47508.0, 300 sec: 48318.1). Total num frames: 302088192. Throughput: 0: 11774.0. Samples: 75595776. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:18:55,775][1648981] Avg episode reward: [(0, '277.000')] [2024-06-15 13:18:56,164][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000147520_302120960.pth... [2024-06-15 13:18:56,357][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000141856_290521088.pth [2024-06-15 13:18:57,237][1651669] Updated weights for policy 0, policy_version 147553 (0.0012) [2024-06-15 13:18:59,053][1651669] Updated weights for policy 0, policy_version 147632 (0.0067) [2024-06-15 13:19:00,777][1648981] Fps is (10 sec: 52373.5, 60 sec: 45867.2, 300 sec: 48206.1). Total num frames: 302383104. Throughput: 0: 11454.8. Samples: 75643904. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:19:00,777][1648981] Avg episode reward: [(0, '279.160')] [2024-06-15 13:19:05,275][1651669] Updated weights for policy 0, policy_version 147696 (0.0014) [2024-06-15 13:19:05,766][1648981] Fps is (10 sec: 42633.0, 60 sec: 45875.2, 300 sec: 47985.8). Total num frames: 302514176. Throughput: 0: 11480.2. Samples: 75730432. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:19:05,767][1648981] Avg episode reward: [(0, '280.370')] [2024-06-15 13:19:06,775][1651669] Updated weights for policy 0, policy_version 147746 (0.0012) [2024-06-15 13:19:08,680][1651669] Updated weights for policy 0, policy_version 147824 (0.0013) [2024-06-15 13:19:10,188][1651669] Updated weights for policy 0, policy_version 147880 (0.0014) [2024-06-15 13:19:10,767][1648981] Fps is (10 sec: 52482.9, 60 sec: 46967.7, 300 sec: 48543.6). Total num frames: 302907392. Throughput: 0: 11480.1. Samples: 75753472. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:19:10,767][1648981] Avg episode reward: [(0, '285.750')] [2024-06-15 13:19:15,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 43690.8, 300 sec: 47541.5). Total num frames: 302907392. Throughput: 0: 11404.8. Samples: 75833856. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:19:15,767][1648981] Avg episode reward: [(0, '282.050')] [2024-06-15 13:19:15,782][1651669] Updated weights for policy 0, policy_version 147920 (0.0017) [2024-06-15 13:19:17,123][1651669] Updated weights for policy 0, policy_version 147975 (0.0011) [2024-06-15 13:19:19,218][1651669] Updated weights for policy 0, policy_version 148064 (0.0106) [2024-06-15 13:19:20,703][1651669] Updated weights for policy 0, policy_version 148113 (0.0013) [2024-06-15 13:19:20,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 46967.5, 300 sec: 48318.9). Total num frames: 303333376. Throughput: 0: 11529.4. Samples: 75893248. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:19:20,767][1648981] Avg episode reward: [(0, '277.170')] [2024-06-15 13:19:25,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 43690.6, 300 sec: 47874.6). Total num frames: 303431680. Throughput: 0: 11366.5. Samples: 75926528. Policy #0 lag: (min: 14.0, avg: 59.4, max: 215.0) [2024-06-15 13:19:25,767][1648981] Avg episode reward: [(0, '273.030')] [2024-06-15 13:19:26,875][1651669] Updated weights for policy 0, policy_version 148176 (0.0133) [2024-06-15 13:19:29,223][1651669] Updated weights for policy 0, policy_version 148272 (0.0011) [2024-06-15 13:19:30,505][1651274] Signal inference workers to stop experience collection... (7750 times) [2024-06-15 13:19:30,526][1651669] InferenceWorker_p0-w0: stopping experience collection (7750 times) [2024-06-15 13:19:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48073.1, 300 sec: 48207.9). Total num frames: 303759360. Throughput: 0: 11525.7. Samples: 76004352. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:19:30,767][1648981] Avg episode reward: [(0, '273.560')] [2024-06-15 13:19:30,861][1651274] Signal inference workers to resume experience collection... (7750 times) [2024-06-15 13:19:30,864][1651669] InferenceWorker_p0-w0: resuming experience collection (7750 times) [2024-06-15 13:19:30,868][1651669] Updated weights for policy 0, policy_version 148336 (0.0013) [2024-06-15 13:19:32,302][1651669] Updated weights for policy 0, policy_version 148385 (0.0013) [2024-06-15 13:19:35,798][1648981] Fps is (10 sec: 52262.6, 60 sec: 44769.8, 300 sec: 47980.5). Total num frames: 303955968. Throughput: 0: 11551.6. Samples: 76070912. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:19:35,799][1648981] Avg episode reward: [(0, '268.050')] [2024-06-15 13:19:38,482][1651669] Updated weights for policy 0, policy_version 148448 (0.0013) [2024-06-15 13:19:39,900][1651669] Updated weights for policy 0, policy_version 148497 (0.0014) [2024-06-15 13:19:40,770][1648981] Fps is (10 sec: 42582.0, 60 sec: 47511.0, 300 sec: 47986.3). Total num frames: 304185344. Throughput: 0: 11515.4. Samples: 76113920. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:19:40,771][1648981] Avg episode reward: [(0, '267.340')] [2024-06-15 13:19:41,529][1651669] Updated weights for policy 0, policy_version 148562 (0.0012) [2024-06-15 13:19:42,485][1651669] Updated weights for policy 0, policy_version 148605 (0.0013) [2024-06-15 13:19:44,502][1651669] Updated weights for policy 0, policy_version 148646 (0.0012) [2024-06-15 13:19:45,766][1648981] Fps is (10 sec: 52596.1, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 304480256. Throughput: 0: 11790.1. Samples: 76174336. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:19:45,767][1648981] Avg episode reward: [(0, '266.090')] [2024-06-15 13:19:49,815][1651669] Updated weights for policy 0, policy_version 148704 (0.0108) [2024-06-15 13:19:50,766][1648981] Fps is (10 sec: 42614.9, 60 sec: 45875.2, 300 sec: 47874.7). Total num frames: 304611328. Throughput: 0: 11502.9. Samples: 76248064. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:19:50,767][1648981] Avg episode reward: [(0, '261.770')] [2024-06-15 13:19:51,819][1651669] Updated weights for policy 0, policy_version 148784 (0.0012) [2024-06-15 13:19:53,530][1651669] Updated weights for policy 0, policy_version 148856 (0.0013) [2024-06-15 13:19:55,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46973.8, 300 sec: 47986.3). Total num frames: 304906240. Throughput: 0: 11548.5. Samples: 76273152. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:19:55,767][1648981] Avg episode reward: [(0, '268.570')] [2024-06-15 13:19:56,155][1651669] Updated weights for policy 0, policy_version 148898 (0.0012) [2024-06-15 13:20:00,125][1651669] Updated weights for policy 0, policy_version 148930 (0.0014) [2024-06-15 13:20:00,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 44790.8, 300 sec: 47763.5). Total num frames: 305070080. Throughput: 0: 11571.2. Samples: 76354560. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:20:00,767][1648981] Avg episode reward: [(0, '273.090')] [2024-06-15 13:20:01,687][1651669] Updated weights for policy 0, policy_version 148993 (0.0013) [2024-06-15 13:20:02,907][1651669] Updated weights for policy 0, policy_version 149043 (0.0023) [2024-06-15 13:20:04,658][1651669] Updated weights for policy 0, policy_version 149115 (0.0020) [2024-06-15 13:20:05,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48207.9). Total num frames: 305397760. Throughput: 0: 11616.7. Samples: 76416000. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:20:05,767][1648981] Avg episode reward: [(0, '273.020')] [2024-06-15 13:20:07,311][1651669] Updated weights for policy 0, policy_version 149153 (0.0013) [2024-06-15 13:20:10,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 43690.8, 300 sec: 47763.5). Total num frames: 305528832. Throughput: 0: 11639.5. Samples: 76450304. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:20:10,767][1648981] Avg episode reward: [(0, '262.750')] [2024-06-15 13:20:12,786][1651274] Signal inference workers to stop experience collection... (7800 times) [2024-06-15 13:20:12,827][1651669] InferenceWorker_p0-w0: stopping experience collection (7800 times) [2024-06-15 13:20:12,830][1651669] Updated weights for policy 0, policy_version 149234 (0.0050) [2024-06-15 13:20:13,031][1651274] Signal inference workers to resume experience collection... (7800 times) [2024-06-15 13:20:13,032][1651669] InferenceWorker_p0-w0: resuming experience collection (7800 times) [2024-06-15 13:20:14,244][1651669] Updated weights for policy 0, policy_version 149297 (0.0011) [2024-06-15 13:20:15,641][1651669] Updated weights for policy 0, policy_version 149367 (0.0013) [2024-06-15 13:20:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 305889280. Throughput: 0: 11480.2. Samples: 76520960. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:20:15,767][1648981] Avg episode reward: [(0, '269.990')] [2024-06-15 13:20:18,511][1651669] Updated weights for policy 0, policy_version 149397 (0.0034) [2024-06-15 13:20:20,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45329.1, 300 sec: 47988.6). Total num frames: 306053120. Throughput: 0: 11590.8. Samples: 76592128. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:20:20,767][1648981] Avg episode reward: [(0, '269.270')] [2024-06-15 13:20:22,996][1651669] Updated weights for policy 0, policy_version 149456 (0.0152) [2024-06-15 13:20:24,830][1651669] Updated weights for policy 0, policy_version 149522 (0.0012) [2024-06-15 13:20:25,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 306315264. Throughput: 0: 11412.8. Samples: 76627456. Policy #0 lag: (min: 4.0, avg: 66.2, max: 260.0) [2024-06-15 13:20:25,767][1648981] Avg episode reward: [(0, '269.730')] [2024-06-15 13:20:26,547][1651669] Updated weights for policy 0, policy_version 149600 (0.0012) [2024-06-15 13:20:30,596][1651669] Updated weights for policy 0, policy_version 149648 (0.0014) [2024-06-15 13:20:30,774][1648981] Fps is (10 sec: 42564.7, 60 sec: 45323.1, 300 sec: 47651.2). Total num frames: 306479104. Throughput: 0: 11444.1. Samples: 76689408. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:20:30,775][1648981] Avg episode reward: [(0, '265.670')] [2024-06-15 13:20:34,767][1651669] Updated weights for policy 0, policy_version 149729 (0.0014) [2024-06-15 13:20:35,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 45899.6, 300 sec: 47430.3). Total num frames: 306708480. Throughput: 0: 11320.9. Samples: 76757504. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:20:35,767][1648981] Avg episode reward: [(0, '268.710')] [2024-06-15 13:20:36,017][1651669] Updated weights for policy 0, policy_version 149777 (0.0012) [2024-06-15 13:20:38,000][1651669] Updated weights for policy 0, policy_version 149856 (0.0014) [2024-06-15 13:20:38,811][1651669] Updated weights for policy 0, policy_version 149888 (0.0036) [2024-06-15 13:20:40,766][1648981] Fps is (10 sec: 49191.4, 60 sec: 46424.4, 300 sec: 47541.4). Total num frames: 306970624. Throughput: 0: 11423.3. Samples: 76787200. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:20:40,767][1648981] Avg episode reward: [(0, '284.450')] [2024-06-15 13:20:42,708][1651669] Updated weights for policy 0, policy_version 149952 (0.0014) [2024-06-15 13:20:45,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 307200000. Throughput: 0: 11491.5. Samples: 76871680. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:20:45,767][1648981] Avg episode reward: [(0, '282.450')] [2024-06-15 13:20:46,139][1651669] Updated weights for policy 0, policy_version 150016 (0.0013) [2024-06-15 13:20:47,541][1651669] Updated weights for policy 0, policy_version 150075 (0.0018) [2024-06-15 13:20:49,361][1651669] Updated weights for policy 0, policy_version 150136 (0.0012) [2024-06-15 13:20:50,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 307494912. Throughput: 0: 11537.1. Samples: 76935168. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:20:50,767][1648981] Avg episode reward: [(0, '284.360')] [2024-06-15 13:20:53,331][1651274] Signal inference workers to stop experience collection... (7850 times) [2024-06-15 13:20:53,355][1651669] Updated weights for policy 0, policy_version 150179 (0.0109) [2024-06-15 13:20:53,405][1651669] InferenceWorker_p0-w0: stopping experience collection (7850 times) [2024-06-15 13:20:53,523][1651274] Signal inference workers to resume experience collection... (7850 times) [2024-06-15 13:20:53,524][1651669] InferenceWorker_p0-w0: resuming experience collection (7850 times) [2024-06-15 13:20:55,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 45328.8, 300 sec: 47319.2). Total num frames: 307625984. Throughput: 0: 11662.1. Samples: 76975104. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:20:55,767][1648981] Avg episode reward: [(0, '270.080')] [2024-06-15 13:20:55,789][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000150208_307625984.pth... [2024-06-15 13:20:55,967][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000144704_296353792.pth [2024-06-15 13:20:57,119][1651669] Updated weights for policy 0, policy_version 150256 (0.0087) [2024-06-15 13:20:58,883][1651669] Updated weights for policy 0, policy_version 150325 (0.0139) [2024-06-15 13:21:00,112][1651669] Updated weights for policy 0, policy_version 150368 (0.0022) [2024-06-15 13:21:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49151.9, 300 sec: 47541.5). Total num frames: 308019200. Throughput: 0: 11468.8. Samples: 77037056. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:00,767][1648981] Avg episode reward: [(0, '274.960')] [2024-06-15 13:21:04,176][1651669] Updated weights for policy 0, policy_version 150432 (0.0016) [2024-06-15 13:21:05,792][1648981] Fps is (10 sec: 52296.9, 60 sec: 45855.7, 300 sec: 47537.2). Total num frames: 308150272. Throughput: 0: 11621.5. Samples: 77115392. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:05,792][1648981] Avg episode reward: [(0, '285.900')] [2024-06-15 13:21:07,844][1651669] Updated weights for policy 0, policy_version 150499 (0.0014) [2024-06-15 13:21:09,331][1651669] Updated weights for policy 0, policy_version 150550 (0.0013) [2024-06-15 13:21:10,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 308445184. Throughput: 0: 11605.4. Samples: 77149696. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:10,767][1648981] Avg episode reward: [(0, '302.150')] [2024-06-15 13:21:11,081][1651669] Updated weights for policy 0, policy_version 150624 (0.0012) [2024-06-15 13:21:11,640][1651669] Updated weights for policy 0, policy_version 150653 (0.0023) [2024-06-15 13:21:15,680][1651669] Updated weights for policy 0, policy_version 150704 (0.0019) [2024-06-15 13:21:15,767][1648981] Fps is (10 sec: 49275.2, 60 sec: 45874.8, 300 sec: 47763.4). Total num frames: 308641792. Throughput: 0: 11948.6. Samples: 77227008. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:15,768][1648981] Avg episode reward: [(0, '287.760')] [2024-06-15 13:21:18,282][1651669] Updated weights for policy 0, policy_version 150752 (0.0021) [2024-06-15 13:21:19,197][1651669] Updated weights for policy 0, policy_version 150786 (0.0013) [2024-06-15 13:21:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 47436.8). Total num frames: 308903936. Throughput: 0: 11867.0. Samples: 77291520. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:20,767][1648981] Avg episode reward: [(0, '289.460')] [2024-06-15 13:21:21,285][1651669] Updated weights for policy 0, policy_version 150864 (0.0130) [2024-06-15 13:21:22,599][1651669] Updated weights for policy 0, policy_version 150908 (0.0020) [2024-06-15 13:21:25,766][1648981] Fps is (10 sec: 45877.7, 60 sec: 46421.5, 300 sec: 47652.4). Total num frames: 309100544. Throughput: 0: 11923.9. Samples: 77323776. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:25,767][1648981] Avg episode reward: [(0, '287.020')] [2024-06-15 13:21:26,608][1651669] Updated weights for policy 0, policy_version 150972 (0.0014) [2024-06-15 13:21:29,857][1651669] Updated weights for policy 0, policy_version 151009 (0.0012) [2024-06-15 13:21:30,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47519.9, 300 sec: 47212.0). Total num frames: 309329920. Throughput: 0: 11787.4. Samples: 77402112. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:30,767][1648981] Avg episode reward: [(0, '287.970')] [2024-06-15 13:21:30,922][1651669] Updated weights for policy 0, policy_version 151043 (0.0013) [2024-06-15 13:21:32,462][1651669] Updated weights for policy 0, policy_version 151104 (0.0034) [2024-06-15 13:21:33,826][1651669] Updated weights for policy 0, policy_version 151159 (0.0011) [2024-06-15 13:21:35,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 309592064. Throughput: 0: 11969.4. Samples: 77473792. Policy #0 lag: (min: 15.0, avg: 129.5, max: 271.0) [2024-06-15 13:21:35,767][1648981] Avg episode reward: [(0, '307.930')] [2024-06-15 13:21:35,834][1651274] Signal inference workers to stop experience collection... (7900 times) [2024-06-15 13:21:35,882][1651669] InferenceWorker_p0-w0: stopping experience collection (7900 times) [2024-06-15 13:21:36,089][1651274] Signal inference workers to resume experience collection... (7900 times) [2024-06-15 13:21:36,090][1651669] InferenceWorker_p0-w0: resuming experience collection (7900 times) [2024-06-15 13:21:36,790][1651669] Updated weights for policy 0, policy_version 151216 (0.0015) [2024-06-15 13:21:40,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 46421.0, 300 sec: 47097.0). Total num frames: 309755904. Throughput: 0: 11844.3. Samples: 77508096. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:21:40,767][1648981] Avg episode reward: [(0, '297.330')] [2024-06-15 13:21:41,511][1651669] Updated weights for policy 0, policy_version 151280 (0.0021) [2024-06-15 13:21:42,733][1651669] Updated weights for policy 0, policy_version 151328 (0.0013) [2024-06-15 13:21:44,155][1651669] Updated weights for policy 0, policy_version 151377 (0.0016) [2024-06-15 13:21:45,276][1651669] Updated weights for policy 0, policy_version 151424 (0.0016) [2024-06-15 13:21:45,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48606.0, 300 sec: 47541.4). Total num frames: 310116352. Throughput: 0: 12037.7. Samples: 77578752. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:21:45,767][1648981] Avg episode reward: [(0, '296.320')] [2024-06-15 13:21:50,766][1648981] Fps is (10 sec: 49153.9, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 310247424. Throughput: 0: 11987.6. Samples: 77654528. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:21:50,767][1648981] Avg episode reward: [(0, '303.210')] [2024-06-15 13:21:51,206][1651669] Updated weights for policy 0, policy_version 151489 (0.0012) [2024-06-15 13:21:52,496][1651669] Updated weights for policy 0, policy_version 151550 (0.0014) [2024-06-15 13:21:54,396][1651669] Updated weights for policy 0, policy_version 151615 (0.0014) [2024-06-15 13:21:55,674][1651669] Updated weights for policy 0, policy_version 151664 (0.0019) [2024-06-15 13:21:55,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49698.4, 300 sec: 47430.3). Total num frames: 310607872. Throughput: 0: 11946.7. Samples: 77687296. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:21:55,767][1648981] Avg episode reward: [(0, '312.910')] [2024-06-15 13:21:56,027][1651274] Saving new best policy, reward=312.910! [2024-06-15 13:21:58,339][1651669] Updated weights for policy 0, policy_version 151712 (0.0115) [2024-06-15 13:22:00,769][1648981] Fps is (10 sec: 52416.9, 60 sec: 45873.5, 300 sec: 47435.7). Total num frames: 310771712. Throughput: 0: 11821.1. Samples: 77758976. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:00,769][1648981] Avg episode reward: [(0, '309.190')] [2024-06-15 13:22:03,173][1651669] Updated weights for policy 0, policy_version 151792 (0.0015) [2024-06-15 13:22:04,619][1651669] Updated weights for policy 0, policy_version 151843 (0.0093) [2024-06-15 13:22:05,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48080.2, 300 sec: 47097.1). Total num frames: 311033856. Throughput: 0: 11855.7. Samples: 77825024. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:05,767][1648981] Avg episode reward: [(0, '304.440')] [2024-06-15 13:22:06,352][1651669] Updated weights for policy 0, policy_version 151905 (0.0017) [2024-06-15 13:22:09,544][1651669] Updated weights for policy 0, policy_version 151968 (0.0015) [2024-06-15 13:22:10,770][1648981] Fps is (10 sec: 52419.1, 60 sec: 47510.4, 300 sec: 47540.7). Total num frames: 311296000. Throughput: 0: 12093.5. Samples: 77868032. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:10,771][1648981] Avg episode reward: [(0, '299.910')] [2024-06-15 13:22:14,281][1651669] Updated weights for policy 0, policy_version 152036 (0.0033) [2024-06-15 13:22:15,495][1651669] Updated weights for policy 0, policy_version 152081 (0.0051) [2024-06-15 13:22:15,767][1648981] Fps is (10 sec: 45870.3, 60 sec: 47513.2, 300 sec: 46985.8). Total num frames: 311492608. Throughput: 0: 11957.8. Samples: 77940224. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:15,768][1648981] Avg episode reward: [(0, '305.800')] [2024-06-15 13:22:16,749][1651274] Signal inference workers to stop experience collection... (7950 times) [2024-06-15 13:22:16,835][1651669] InferenceWorker_p0-w0: stopping experience collection (7950 times) [2024-06-15 13:22:17,035][1651274] Signal inference workers to resume experience collection... (7950 times) [2024-06-15 13:22:17,038][1651669] InferenceWorker_p0-w0: resuming experience collection (7950 times) [2024-06-15 13:22:17,212][1651669] Updated weights for policy 0, policy_version 152146 (0.0015) [2024-06-15 13:22:20,766][1648981] Fps is (10 sec: 39337.4, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 311689216. Throughput: 0: 11946.7. Samples: 78011392. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:20,767][1648981] Avg episode reward: [(0, '317.260')] [2024-06-15 13:22:20,767][1651669] Updated weights for policy 0, policy_version 152194 (0.0011) [2024-06-15 13:22:21,396][1651274] Saving new best policy, reward=317.260! [2024-06-15 13:22:22,149][1651669] Updated weights for policy 0, policy_version 152245 (0.0013) [2024-06-15 13:22:24,768][1651669] Updated weights for policy 0, policy_version 152272 (0.0012) [2024-06-15 13:22:25,766][1648981] Fps is (10 sec: 42602.7, 60 sec: 46967.5, 300 sec: 46985.9). Total num frames: 311918592. Throughput: 0: 12071.9. Samples: 78051328. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:25,767][1648981] Avg episode reward: [(0, '317.170')] [2024-06-15 13:22:27,377][1651669] Updated weights for policy 0, policy_version 152368 (0.0099) [2024-06-15 13:22:28,859][1651669] Updated weights for policy 0, policy_version 152420 (0.0012) [2024-06-15 13:22:30,778][1648981] Fps is (10 sec: 52366.8, 60 sec: 48050.2, 300 sec: 47428.4). Total num frames: 312213504. Throughput: 0: 11636.4. Samples: 78102528. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:30,779][1648981] Avg episode reward: [(0, '319.320')] [2024-06-15 13:22:30,780][1651274] Saving new best policy, reward=319.320! [2024-06-15 13:22:31,812][1651669] Updated weights for policy 0, policy_version 152456 (0.0012) [2024-06-15 13:22:33,207][1651669] Updated weights for policy 0, policy_version 152512 (0.0011) [2024-06-15 13:22:35,778][1648981] Fps is (10 sec: 42548.1, 60 sec: 45866.2, 300 sec: 46652.1). Total num frames: 312344576. Throughput: 0: 11898.0. Samples: 78190080. Policy #0 lag: (min: 10.0, avg: 129.1, max: 266.0) [2024-06-15 13:22:35,779][1648981] Avg episode reward: [(0, '321.950')] [2024-06-15 13:22:36,066][1651274] Saving new best policy, reward=321.950! [2024-06-15 13:22:37,856][1651669] Updated weights for policy 0, policy_version 152595 (0.0130) [2024-06-15 13:22:39,805][1651669] Updated weights for policy 0, policy_version 152657 (0.0014) [2024-06-15 13:22:40,780][1648981] Fps is (10 sec: 49145.7, 60 sec: 49141.5, 300 sec: 47428.2). Total num frames: 312705024. Throughput: 0: 11715.7. Samples: 78214656. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:22:40,780][1648981] Avg episode reward: [(0, '327.090')] [2024-06-15 13:22:40,861][1651274] Saving new best policy, reward=327.090! [2024-06-15 13:22:43,921][1651669] Updated weights for policy 0, policy_version 152720 (0.0040) [2024-06-15 13:22:45,766][1648981] Fps is (10 sec: 52491.3, 60 sec: 45875.1, 300 sec: 46652.8). Total num frames: 312868864. Throughput: 0: 11685.6. Samples: 78284800. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:22:45,767][1648981] Avg episode reward: [(0, '329.700')] [2024-06-15 13:22:45,802][1651274] Saving new best policy, reward=329.700! [2024-06-15 13:22:46,908][1651669] Updated weights for policy 0, policy_version 152775 (0.0013) [2024-06-15 13:22:48,638][1651669] Updated weights for policy 0, policy_version 152848 (0.0143) [2024-06-15 13:22:49,985][1651669] Updated weights for policy 0, policy_version 152897 (0.0013) [2024-06-15 13:22:50,766][1648981] Fps is (10 sec: 49217.0, 60 sec: 49152.0, 300 sec: 47319.4). Total num frames: 313196544. Throughput: 0: 11741.9. Samples: 78353408. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:22:50,767][1648981] Avg episode reward: [(0, '323.550')] [2024-06-15 13:22:51,502][1651669] Updated weights for policy 0, policy_version 152960 (0.0013) [2024-06-15 13:22:55,530][1651669] Updated weights for policy 0, policy_version 153020 (0.0013) [2024-06-15 13:22:55,778][1648981] Fps is (10 sec: 52367.8, 60 sec: 46412.3, 300 sec: 46650.9). Total num frames: 313393152. Throughput: 0: 11592.0. Samples: 78389760. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:22:55,778][1648981] Avg episode reward: [(0, '316.880')] [2024-06-15 13:22:55,783][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000153024_313393152.pth... [2024-06-15 13:22:55,845][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000147520_302120960.pth [2024-06-15 13:22:58,556][1651669] Updated weights for policy 0, policy_version 153072 (0.0013) [2024-06-15 13:22:59,079][1651274] Signal inference workers to stop experience collection... (8000 times) [2024-06-15 13:22:59,130][1651669] InferenceWorker_p0-w0: stopping experience collection (8000 times) [2024-06-15 13:22:59,353][1651274] Signal inference workers to resume experience collection... (8000 times) [2024-06-15 13:22:59,354][1651669] InferenceWorker_p0-w0: resuming experience collection (8000 times) [2024-06-15 13:22:59,763][1651669] Updated weights for policy 0, policy_version 153120 (0.0033) [2024-06-15 13:23:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48061.5, 300 sec: 47097.1). Total num frames: 313655296. Throughput: 0: 11787.7. Samples: 78470656. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:00,767][1648981] Avg episode reward: [(0, '320.490')] [2024-06-15 13:23:01,553][1651669] Updated weights for policy 0, policy_version 153188 (0.0018) [2024-06-15 13:23:05,078][1651669] Updated weights for policy 0, policy_version 153237 (0.0014) [2024-06-15 13:23:05,766][1648981] Fps is (10 sec: 49208.6, 60 sec: 47513.5, 300 sec: 46763.9). Total num frames: 313884672. Throughput: 0: 11855.6. Samples: 78544896. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:05,767][1648981] Avg episode reward: [(0, '321.580')] [2024-06-15 13:23:08,181][1651669] Updated weights for policy 0, policy_version 153281 (0.0014) [2024-06-15 13:23:09,591][1651669] Updated weights for policy 0, policy_version 153341 (0.0018) [2024-06-15 13:23:10,641][1651669] Updated weights for policy 0, policy_version 153380 (0.0013) [2024-06-15 13:23:10,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46970.6, 300 sec: 46874.9). Total num frames: 314114048. Throughput: 0: 11901.1. Samples: 78586880. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:10,767][1648981] Avg episode reward: [(0, '321.890')] [2024-06-15 13:23:12,334][1651669] Updated weights for policy 0, policy_version 153458 (0.0013) [2024-06-15 13:23:15,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46968.2, 300 sec: 46763.8). Total num frames: 314310656. Throughput: 0: 12166.0. Samples: 78649856. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:15,767][1648981] Avg episode reward: [(0, '317.510')] [2024-06-15 13:23:15,951][1651669] Updated weights for policy 0, policy_version 153488 (0.0015) [2024-06-15 13:23:19,683][1651669] Updated weights for policy 0, policy_version 153538 (0.0012) [2024-06-15 13:23:20,774][1648981] Fps is (10 sec: 42565.5, 60 sec: 47507.5, 300 sec: 46540.4). Total num frames: 314540032. Throughput: 0: 11981.9. Samples: 78729216. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:20,775][1648981] Avg episode reward: [(0, '320.970')] [2024-06-15 13:23:21,721][1651669] Updated weights for policy 0, policy_version 153623 (0.0013) [2024-06-15 13:23:23,562][1651669] Updated weights for policy 0, policy_version 153687 (0.0021) [2024-06-15 13:23:25,767][1648981] Fps is (10 sec: 52424.5, 60 sec: 48605.2, 300 sec: 47321.7). Total num frames: 314834944. Throughput: 0: 11881.6. Samples: 78749184. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:25,768][1648981] Avg episode reward: [(0, '317.750')] [2024-06-15 13:23:28,356][1651669] Updated weights for policy 0, policy_version 153760 (0.0013) [2024-06-15 13:23:30,766][1648981] Fps is (10 sec: 42631.7, 60 sec: 45884.3, 300 sec: 46432.8). Total num frames: 314966016. Throughput: 0: 12083.2. Samples: 78828544. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:30,767][1648981] Avg episode reward: [(0, '310.540')] [2024-06-15 13:23:31,259][1651669] Updated weights for policy 0, policy_version 153797 (0.0010) [2024-06-15 13:23:32,712][1651669] Updated weights for policy 0, policy_version 153857 (0.0013) [2024-06-15 13:23:34,486][1651669] Updated weights for policy 0, policy_version 153923 (0.0127) [2024-06-15 13:23:35,766][1648981] Fps is (10 sec: 52433.2, 60 sec: 50254.2, 300 sec: 47541.5). Total num frames: 315359232. Throughput: 0: 11923.9. Samples: 78889984. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:35,767][1648981] Avg episode reward: [(0, '310.960')] [2024-06-15 13:23:39,271][1651669] Updated weights for policy 0, policy_version 153985 (0.0035) [2024-06-15 13:23:39,571][1651274] Signal inference workers to stop experience collection... (8050 times) [2024-06-15 13:23:39,625][1651669] InferenceWorker_p0-w0: stopping experience collection (8050 times) [2024-06-15 13:23:39,746][1651274] Signal inference workers to resume experience collection... (8050 times) [2024-06-15 13:23:39,750][1651669] InferenceWorker_p0-w0: resuming experience collection (8050 times) [2024-06-15 13:23:40,386][1651669] Updated weights for policy 0, policy_version 154048 (0.0015) [2024-06-15 13:23:40,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 46431.5, 300 sec: 46874.9). Total num frames: 315490304. Throughput: 0: 11983.9. Samples: 78928896. Policy #0 lag: (min: 93.0, avg: 196.8, max: 314.0) [2024-06-15 13:23:40,767][1648981] Avg episode reward: [(0, '325.420')] [2024-06-15 13:23:43,895][1651669] Updated weights for policy 0, policy_version 154113 (0.0013) [2024-06-15 13:23:45,696][1651669] Updated weights for policy 0, policy_version 154178 (0.0125) [2024-06-15 13:23:45,772][1648981] Fps is (10 sec: 39300.8, 60 sec: 48055.4, 300 sec: 47096.2). Total num frames: 315752448. Throughput: 0: 11797.3. Samples: 79001600. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:23:45,772][1648981] Avg episode reward: [(0, '324.200')] [2024-06-15 13:23:46,817][1651669] Updated weights for policy 0, policy_version 154235 (0.0019) [2024-06-15 13:23:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 46876.2). Total num frames: 315916288. Throughput: 0: 11764.6. Samples: 79074304. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:23:50,767][1648981] Avg episode reward: [(0, '320.450')] [2024-06-15 13:23:51,486][1651669] Updated weights for policy 0, policy_version 154299 (0.0013) [2024-06-15 13:23:55,717][1651669] Updated weights for policy 0, policy_version 154385 (0.0033) [2024-06-15 13:23:55,766][1648981] Fps is (10 sec: 42621.3, 60 sec: 46430.3, 300 sec: 46765.5). Total num frames: 316178432. Throughput: 0: 11673.6. Samples: 79112192. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:23:55,767][1648981] Avg episode reward: [(0, '307.180')] [2024-06-15 13:23:57,231][1651669] Updated weights for policy 0, policy_version 154453 (0.0013) [2024-06-15 13:23:58,217][1651669] Updated weights for policy 0, policy_version 154490 (0.0012) [2024-06-15 13:24:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 316407808. Throughput: 0: 11468.8. Samples: 79165952. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:00,767][1648981] Avg episode reward: [(0, '306.270')] [2024-06-15 13:24:03,732][1651669] Updated weights for policy 0, policy_version 154544 (0.0013) [2024-06-15 13:24:05,566][1651669] Updated weights for policy 0, policy_version 154580 (0.0017) [2024-06-15 13:24:05,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 44783.0, 300 sec: 46319.5). Total num frames: 316571648. Throughput: 0: 11345.6. Samples: 79239680. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:05,767][1648981] Avg episode reward: [(0, '291.120')] [2024-06-15 13:24:06,991][1651669] Updated weights for policy 0, policy_version 154640 (0.0011) [2024-06-15 13:24:08,579][1651669] Updated weights for policy 0, policy_version 154707 (0.0016) [2024-06-15 13:24:09,506][1651669] Updated weights for policy 0, policy_version 154746 (0.0017) [2024-06-15 13:24:10,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 46964.5, 300 sec: 47540.7). Total num frames: 316932096. Throughput: 0: 11490.8. Samples: 79266304. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:10,771][1648981] Avg episode reward: [(0, '288.260')] [2024-06-15 13:24:14,705][1651669] Updated weights for policy 0, policy_version 154800 (0.0013) [2024-06-15 13:24:15,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46541.7). Total num frames: 317063168. Throughput: 0: 11468.8. Samples: 79344640. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:15,767][1648981] Avg episode reward: [(0, '299.830')] [2024-06-15 13:24:16,852][1651669] Updated weights for policy 0, policy_version 154832 (0.0016) [2024-06-15 13:24:19,009][1651669] Updated weights for policy 0, policy_version 154912 (0.0013) [2024-06-15 13:24:19,155][1651274] Signal inference workers to stop experience collection... (8100 times) [2024-06-15 13:24:19,206][1651669] InferenceWorker_p0-w0: stopping experience collection (8100 times) [2024-06-15 13:24:19,376][1651274] Signal inference workers to resume experience collection... (8100 times) [2024-06-15 13:24:19,378][1651669] InferenceWorker_p0-w0: resuming experience collection (8100 times) [2024-06-15 13:24:20,366][1651669] Updated weights for policy 0, policy_version 154976 (0.0011) [2024-06-15 13:24:20,766][1648981] Fps is (10 sec: 49170.2, 60 sec: 48065.9, 300 sec: 47430.3). Total num frames: 317423616. Throughput: 0: 11366.4. Samples: 79401472. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:20,767][1648981] Avg episode reward: [(0, '298.530')] [2024-06-15 13:24:21,065][1651669] Updated weights for policy 0, policy_version 155005 (0.0012) [2024-06-15 13:24:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 44237.4, 300 sec: 46541.7). Total num frames: 317489152. Throughput: 0: 11514.3. Samples: 79447040. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:25,767][1648981] Avg episode reward: [(0, '294.590')] [2024-06-15 13:24:26,240][1651669] Updated weights for policy 0, policy_version 155056 (0.0012) [2024-06-15 13:24:28,725][1651669] Updated weights for policy 0, policy_version 155105 (0.0013) [2024-06-15 13:24:30,699][1651669] Updated weights for policy 0, policy_version 155185 (0.0138) [2024-06-15 13:24:30,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 47513.6, 300 sec: 46991.1). Total num frames: 317816832. Throughput: 0: 11436.0. Samples: 79516160. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:30,767][1648981] Avg episode reward: [(0, '289.540')] [2024-06-15 13:24:32,167][1651669] Updated weights for policy 0, policy_version 155262 (0.0132) [2024-06-15 13:24:35,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 43690.7, 300 sec: 46764.4). Total num frames: 317980672. Throughput: 0: 11582.6. Samples: 79595520. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:35,767][1648981] Avg episode reward: [(0, '295.730')] [2024-06-15 13:24:37,137][1651669] Updated weights for policy 0, policy_version 155328 (0.0138) [2024-06-15 13:24:39,864][1651669] Updated weights for policy 0, policy_version 155379 (0.0012) [2024-06-15 13:24:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 318275584. Throughput: 0: 11525.7. Samples: 79630848. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:40,767][1648981] Avg episode reward: [(0, '291.440')] [2024-06-15 13:24:41,661][1651669] Updated weights for policy 0, policy_version 155445 (0.0013) [2024-06-15 13:24:42,788][1651669] Updated weights for policy 0, policy_version 155504 (0.0013) [2024-06-15 13:24:45,774][1648981] Fps is (10 sec: 52386.8, 60 sec: 45873.1, 300 sec: 47095.8). Total num frames: 318504960. Throughput: 0: 11808.0. Samples: 79697408. Policy #0 lag: (min: 79.0, avg: 161.4, max: 335.0) [2024-06-15 13:24:45,775][1648981] Avg episode reward: [(0, '282.490')] [2024-06-15 13:24:47,660][1651669] Updated weights for policy 0, policy_version 155552 (0.0012) [2024-06-15 13:24:49,633][1651669] Updated weights for policy 0, policy_version 155616 (0.0024) [2024-06-15 13:24:50,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 47513.4, 300 sec: 46985.9). Total num frames: 318767104. Throughput: 0: 11878.3. Samples: 79774208. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:24:50,767][1648981] Avg episode reward: [(0, '280.800')] [2024-06-15 13:24:51,815][1651669] Updated weights for policy 0, policy_version 155697 (0.0119) [2024-06-15 13:24:53,501][1651669] Updated weights for policy 0, policy_version 155770 (0.0013) [2024-06-15 13:24:55,766][1648981] Fps is (10 sec: 52470.6, 60 sec: 47513.5, 300 sec: 47319.2). Total num frames: 319029248. Throughput: 0: 11799.7. Samples: 79797248. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:24:55,767][1648981] Avg episode reward: [(0, '277.170')] [2024-06-15 13:24:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000155776_319029248.pth... [2024-06-15 13:24:55,846][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000150208_307625984.pth [2024-06-15 13:24:58,488][1651669] Updated weights for policy 0, policy_version 155824 (0.0013) [2024-06-15 13:25:00,399][1651274] Signal inference workers to stop experience collection... (8150 times) [2024-06-15 13:25:00,425][1651669] InferenceWorker_p0-w0: stopping experience collection (8150 times) [2024-06-15 13:25:00,692][1651274] Signal inference workers to resume experience collection... (8150 times) [2024-06-15 13:25:00,693][1651669] InferenceWorker_p0-w0: resuming experience collection (8150 times) [2024-06-15 13:25:00,778][1648981] Fps is (10 sec: 42549.3, 60 sec: 46412.2, 300 sec: 46762.0). Total num frames: 319193088. Throughput: 0: 12068.7. Samples: 79887872. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:00,779][1648981] Avg episode reward: [(0, '277.880')] [2024-06-15 13:25:01,140][1651669] Updated weights for policy 0, policy_version 155877 (0.0012) [2024-06-15 13:25:02,777][1651669] Updated weights for policy 0, policy_version 155956 (0.0017) [2024-06-15 13:25:03,947][1651669] Updated weights for policy 0, policy_version 156016 (0.0014) [2024-06-15 13:25:05,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 49698.2, 300 sec: 47541.4). Total num frames: 319553536. Throughput: 0: 12299.4. Samples: 79954944. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:05,767][1648981] Avg episode reward: [(0, '281.900')] [2024-06-15 13:25:08,486][1651669] Updated weights for policy 0, policy_version 156050 (0.0015) [2024-06-15 13:25:09,407][1651669] Updated weights for policy 0, policy_version 156090 (0.0011) [2024-06-15 13:25:10,767][1648981] Fps is (10 sec: 49209.5, 60 sec: 45878.1, 300 sec: 46763.8). Total num frames: 319684608. Throughput: 0: 12299.4. Samples: 80000512. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:10,767][1648981] Avg episode reward: [(0, '292.550')] [2024-06-15 13:25:12,127][1651669] Updated weights for policy 0, policy_version 156144 (0.0150) [2024-06-15 13:25:13,825][1651669] Updated weights for policy 0, policy_version 156224 (0.0018) [2024-06-15 13:25:15,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 320077824. Throughput: 0: 12094.6. Samples: 80060416. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:15,769][1648981] Avg episode reward: [(0, '304.280')] [2024-06-15 13:25:18,595][1651669] Updated weights for policy 0, policy_version 156289 (0.0014) [2024-06-15 13:25:19,415][1651669] Updated weights for policy 0, policy_version 156341 (0.0012) [2024-06-15 13:25:20,774][1648981] Fps is (10 sec: 52388.8, 60 sec: 46415.4, 300 sec: 47095.8). Total num frames: 320208896. Throughput: 0: 12354.2. Samples: 80151552. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:20,775][1648981] Avg episode reward: [(0, '314.150')] [2024-06-15 13:25:21,544][1651669] Updated weights for policy 0, policy_version 156384 (0.0012) [2024-06-15 13:25:23,858][1651669] Updated weights for policy 0, policy_version 156466 (0.0015) [2024-06-15 13:25:25,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 51882.5, 300 sec: 47875.8). Total num frames: 320602112. Throughput: 0: 12140.0. Samples: 80177152. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:25,768][1648981] Avg episode reward: [(0, '311.570')] [2024-06-15 13:25:30,062][1651669] Updated weights for policy 0, policy_version 156560 (0.0017) [2024-06-15 13:25:30,766][1648981] Fps is (10 sec: 49190.3, 60 sec: 48059.8, 300 sec: 47430.3). Total num frames: 320700416. Throughput: 0: 12290.2. Samples: 80250368. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:30,767][1648981] Avg episode reward: [(0, '312.020')] [2024-06-15 13:25:32,229][1651669] Updated weights for policy 0, policy_version 156624 (0.0012) [2024-06-15 13:25:35,193][1651669] Updated weights for policy 0, policy_version 156736 (0.0102) [2024-06-15 13:25:35,296][1651274] Signal inference workers to stop experience collection... (8200 times) [2024-06-15 13:25:35,359][1651669] InferenceWorker_p0-w0: stopping experience collection (8200 times) [2024-06-15 13:25:35,460][1651274] Signal inference workers to resume experience collection... (8200 times) [2024-06-15 13:25:35,461][1651669] InferenceWorker_p0-w0: resuming experience collection (8200 times) [2024-06-15 13:25:35,782][1648981] Fps is (10 sec: 45803.7, 60 sec: 51323.0, 300 sec: 47760.9). Total num frames: 321060864. Throughput: 0: 11988.0. Samples: 80313856. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:35,783][1648981] Avg episode reward: [(0, '307.070')] [2024-06-15 13:25:40,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 321126400. Throughput: 0: 12208.4. Samples: 80346624. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:40,767][1648981] Avg episode reward: [(0, '306.880')] [2024-06-15 13:25:41,158][1651669] Updated weights for policy 0, policy_version 156817 (0.0083) [2024-06-15 13:25:43,619][1651669] Updated weights for policy 0, policy_version 156866 (0.0020) [2024-06-15 13:25:45,097][1651669] Updated weights for policy 0, policy_version 156928 (0.0013) [2024-06-15 13:25:45,768][1648981] Fps is (10 sec: 36097.1, 60 sec: 48611.3, 300 sec: 47207.9). Total num frames: 321421312. Throughput: 0: 12074.6. Samples: 80431104. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:45,768][1648981] Avg episode reward: [(0, '307.670')] [2024-06-15 13:25:46,345][1651669] Updated weights for policy 0, policy_version 156981 (0.0013) [2024-06-15 13:25:47,680][1651669] Updated weights for policy 0, policy_version 157051 (0.0061) [2024-06-15 13:25:50,789][1648981] Fps is (10 sec: 52310.9, 60 sec: 48041.9, 300 sec: 47537.8). Total num frames: 321650688. Throughput: 0: 12111.2. Samples: 80500224. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 13:25:50,789][1648981] Avg episode reward: [(0, '316.470')] [2024-06-15 13:25:52,288][1651669] Updated weights for policy 0, policy_version 157120 (0.0013) [2024-06-15 13:25:55,778][1648981] Fps is (10 sec: 45826.9, 60 sec: 47504.3, 300 sec: 46984.1). Total num frames: 321880064. Throughput: 0: 11818.4. Samples: 80532480. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:25:55,779][1648981] Avg episode reward: [(0, '305.300')] [2024-06-15 13:25:56,215][1651669] Updated weights for policy 0, policy_version 157188 (0.0012) [2024-06-15 13:25:57,719][1651669] Updated weights for policy 0, policy_version 157250 (0.0016) [2024-06-15 13:25:58,820][1651669] Updated weights for policy 0, policy_version 157298 (0.0012) [2024-06-15 13:26:00,770][1648981] Fps is (10 sec: 52527.3, 60 sec: 49704.7, 300 sec: 47544.9). Total num frames: 322174976. Throughput: 0: 12013.9. Samples: 80601088. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:00,771][1648981] Avg episode reward: [(0, '302.760')] [2024-06-15 13:26:02,339][1651669] Updated weights for policy 0, policy_version 157319 (0.0012) [2024-06-15 13:26:03,312][1651669] Updated weights for policy 0, policy_version 157370 (0.0014) [2024-06-15 13:26:05,766][1648981] Fps is (10 sec: 45929.8, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 322338816. Throughput: 0: 11743.9. Samples: 80679936. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:05,767][1648981] Avg episode reward: [(0, '308.940')] [2024-06-15 13:26:06,076][1651669] Updated weights for policy 0, policy_version 157408 (0.0014) [2024-06-15 13:26:08,099][1651669] Updated weights for policy 0, policy_version 157498 (0.0014) [2024-06-15 13:26:09,639][1651669] Updated weights for policy 0, policy_version 157552 (0.0013) [2024-06-15 13:26:10,768][1648981] Fps is (10 sec: 52438.0, 60 sec: 50242.6, 300 sec: 47652.2). Total num frames: 322699264. Throughput: 0: 11741.4. Samples: 80705536. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:10,769][1648981] Avg episode reward: [(0, '312.220')] [2024-06-15 13:26:13,511][1651669] Updated weights for policy 0, policy_version 157605 (0.0013) [2024-06-15 13:26:15,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 47208.1). Total num frames: 322830336. Throughput: 0: 11901.2. Samples: 80785920. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:15,767][1648981] Avg episode reward: [(0, '325.050')] [2024-06-15 13:26:16,659][1651669] Updated weights for policy 0, policy_version 157653 (0.0016) [2024-06-15 13:26:16,987][1651274] Signal inference workers to stop experience collection... (8250 times) [2024-06-15 13:26:17,048][1651669] InferenceWorker_p0-w0: stopping experience collection (8250 times) [2024-06-15 13:26:17,158][1651274] Signal inference workers to resume experience collection... (8250 times) [2024-06-15 13:26:17,170][1651669] InferenceWorker_p0-w0: resuming experience collection (8250 times) [2024-06-15 13:26:18,099][1651669] Updated weights for policy 0, policy_version 157716 (0.0015) [2024-06-15 13:26:19,024][1651669] Updated weights for policy 0, policy_version 157760 (0.0013) [2024-06-15 13:26:20,766][1648981] Fps is (10 sec: 45884.7, 60 sec: 49158.4, 300 sec: 47652.4). Total num frames: 323158016. Throughput: 0: 11973.6. Samples: 80852480. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:20,767][1648981] Avg episode reward: [(0, '324.510')] [2024-06-15 13:26:21,080][1651669] Updated weights for policy 0, policy_version 157822 (0.0013) [2024-06-15 13:26:24,653][1651669] Updated weights for policy 0, policy_version 157882 (0.0047) [2024-06-15 13:26:25,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 45875.2, 300 sec: 47541.3). Total num frames: 323354624. Throughput: 0: 12071.8. Samples: 80889856. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:25,767][1648981] Avg episode reward: [(0, '311.960')] [2024-06-15 13:26:28,673][1651669] Updated weights for policy 0, policy_version 157955 (0.0012) [2024-06-15 13:26:30,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 323616768. Throughput: 0: 11696.7. Samples: 80957440. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:30,767][1648981] Avg episode reward: [(0, '321.510')] [2024-06-15 13:26:31,350][1651669] Updated weights for policy 0, policy_version 158018 (0.0119) [2024-06-15 13:26:34,804][1651669] Updated weights for policy 0, policy_version 158082 (0.0014) [2024-06-15 13:26:35,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 46433.6, 300 sec: 47763.6). Total num frames: 323846144. Throughput: 0: 11725.0. Samples: 81027584. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:35,767][1648981] Avg episode reward: [(0, '320.770')] [2024-06-15 13:26:35,959][1651669] Updated weights for policy 0, policy_version 158143 (0.0034) [2024-06-15 13:26:39,492][1651669] Updated weights for policy 0, policy_version 158208 (0.0017) [2024-06-15 13:26:40,732][1651669] Updated weights for policy 0, policy_version 158265 (0.0011) [2024-06-15 13:26:40,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 324108288. Throughput: 0: 11961.2. Samples: 81070592. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:40,767][1648981] Avg episode reward: [(0, '329.320')] [2024-06-15 13:26:43,466][1651669] Updated weights for policy 0, policy_version 158306 (0.0015) [2024-06-15 13:26:45,181][1651669] Updated weights for policy 0, policy_version 158337 (0.0015) [2024-06-15 13:26:45,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 48060.6, 300 sec: 47652.4). Total num frames: 324304896. Throughput: 0: 11959.0. Samples: 81139200. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:45,767][1648981] Avg episode reward: [(0, '329.030')] [2024-06-15 13:26:46,633][1651669] Updated weights for policy 0, policy_version 158399 (0.0011) [2024-06-15 13:26:49,896][1651669] Updated weights for policy 0, policy_version 158453 (0.0013) [2024-06-15 13:26:50,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 48077.6, 300 sec: 47208.1). Total num frames: 324534272. Throughput: 0: 11821.5. Samples: 81211904. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:50,767][1648981] Avg episode reward: [(0, '335.510')] [2024-06-15 13:26:51,263][1651274] Saving new best policy, reward=335.510! [2024-06-15 13:26:51,946][1651669] Updated weights for policy 0, policy_version 158520 (0.0012) [2024-06-15 13:26:55,420][1651669] Updated weights for policy 0, policy_version 158576 (0.0013) [2024-06-15 13:26:55,774][1648981] Fps is (10 sec: 49115.1, 60 sec: 48609.2, 300 sec: 47540.5). Total num frames: 324796416. Throughput: 0: 11956.5. Samples: 81243648. Policy #0 lag: (min: 63.0, avg: 173.6, max: 319.0) [2024-06-15 13:26:55,775][1648981] Avg episode reward: [(0, '335.030')] [2024-06-15 13:26:55,783][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000158592_324796416.pth... [2024-06-15 13:26:55,864][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000153024_313393152.pth [2024-06-15 13:26:57,868][1651669] Updated weights for policy 0, policy_version 158648 (0.0013) [2024-06-15 13:27:00,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 46424.3, 300 sec: 47208.1). Total num frames: 324960256. Throughput: 0: 11707.7. Samples: 81312768. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:00,767][1648981] Avg episode reward: [(0, '331.040')] [2024-06-15 13:27:00,767][1651274] Signal inference workers to stop experience collection... (8300 times) [2024-06-15 13:27:00,870][1651669] InferenceWorker_p0-w0: stopping experience collection (8300 times) [2024-06-15 13:27:00,988][1651274] Signal inference workers to resume experience collection... (8300 times) [2024-06-15 13:27:00,990][1651669] InferenceWorker_p0-w0: resuming experience collection (8300 times) [2024-06-15 13:27:00,993][1651669] Updated weights for policy 0, policy_version 158688 (0.0013) [2024-06-15 13:27:02,533][1651669] Updated weights for policy 0, policy_version 158752 (0.0014) [2024-06-15 13:27:05,782][1648981] Fps is (10 sec: 39290.2, 60 sec: 47501.1, 300 sec: 47095.2). Total num frames: 325189632. Throughput: 0: 11806.0. Samples: 81383936. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:05,783][1648981] Avg episode reward: [(0, '329.130')] [2024-06-15 13:27:06,236][1651669] Updated weights for policy 0, policy_version 158801 (0.0013) [2024-06-15 13:27:07,300][1651669] Updated weights for policy 0, policy_version 158845 (0.0012) [2024-06-15 13:27:09,094][1651669] Updated weights for policy 0, policy_version 158912 (0.0012) [2024-06-15 13:27:10,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 45876.7, 300 sec: 47319.4). Total num frames: 325451776. Throughput: 0: 11741.9. Samples: 81418240. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:10,767][1648981] Avg episode reward: [(0, '334.110')] [2024-06-15 13:27:12,270][1651669] Updated weights for policy 0, policy_version 158975 (0.0015) [2024-06-15 13:27:13,909][1651669] Updated weights for policy 0, policy_version 159033 (0.0034) [2024-06-15 13:27:15,766][1648981] Fps is (10 sec: 52511.3, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 325713920. Throughput: 0: 11719.1. Samples: 81484800. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:15,767][1648981] Avg episode reward: [(0, '335.560')] [2024-06-15 13:27:15,768][1651274] Saving new best policy, reward=335.560! [2024-06-15 13:27:17,871][1651669] Updated weights for policy 0, policy_version 159072 (0.0068) [2024-06-15 13:27:19,688][1651669] Updated weights for policy 0, policy_version 159123 (0.0011) [2024-06-15 13:27:20,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 325976064. Throughput: 0: 11753.3. Samples: 81556480. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:20,767][1648981] Avg episode reward: [(0, '321.810')] [2024-06-15 13:27:22,086][1651669] Updated weights for policy 0, policy_version 159171 (0.0019) [2024-06-15 13:27:23,292][1651669] Updated weights for policy 0, policy_version 159229 (0.0067) [2024-06-15 13:27:25,094][1651669] Updated weights for policy 0, policy_version 159288 (0.0014) [2024-06-15 13:27:25,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.9, 300 sec: 47543.3). Total num frames: 326238208. Throughput: 0: 11639.5. Samples: 81594368. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:25,767][1648981] Avg episode reward: [(0, '322.120')] [2024-06-15 13:27:28,396][1651669] Updated weights for policy 0, policy_version 159314 (0.0041) [2024-06-15 13:27:29,896][1651669] Updated weights for policy 0, policy_version 159376 (0.0013) [2024-06-15 13:27:30,669][1651669] Updated weights for policy 0, policy_version 159423 (0.0013) [2024-06-15 13:27:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47987.6). Total num frames: 326500352. Throughput: 0: 11833.0. Samples: 81671680. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:30,767][1648981] Avg episode reward: [(0, '312.260')] [2024-06-15 13:27:34,279][1651669] Updated weights for policy 0, policy_version 159488 (0.0013) [2024-06-15 13:27:35,767][1648981] Fps is (10 sec: 49150.2, 60 sec: 48059.4, 300 sec: 47543.4). Total num frames: 326729728. Throughput: 0: 11696.3. Samples: 81738240. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:35,767][1648981] Avg episode reward: [(0, '326.370')] [2024-06-15 13:27:35,788][1651669] Updated weights for policy 0, policy_version 159552 (0.0019) [2024-06-15 13:27:38,924][1651669] Updated weights for policy 0, policy_version 159600 (0.0014) [2024-06-15 13:27:40,344][1651669] Updated weights for policy 0, policy_version 159650 (0.0017) [2024-06-15 13:27:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 326991872. Throughput: 0: 12062.5. Samples: 81786368. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:40,767][1648981] Avg episode reward: [(0, '326.760')] [2024-06-15 13:27:40,883][1651669] Updated weights for policy 0, policy_version 159680 (0.0021) [2024-06-15 13:27:43,737][1651274] Signal inference workers to stop experience collection... (8350 times) [2024-06-15 13:27:43,821][1651669] InferenceWorker_p0-w0: stopping experience collection (8350 times) [2024-06-15 13:27:43,982][1651274] Signal inference workers to resume experience collection... (8350 times) [2024-06-15 13:27:43,983][1651669] InferenceWorker_p0-w0: resuming experience collection (8350 times) [2024-06-15 13:27:45,002][1651669] Updated weights for policy 0, policy_version 159733 (0.0014) [2024-06-15 13:27:45,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.6, 300 sec: 47430.2). Total num frames: 327188480. Throughput: 0: 12174.1. Samples: 81860608. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:45,768][1648981] Avg episode reward: [(0, '316.890')] [2024-06-15 13:27:46,751][1651669] Updated weights for policy 0, policy_version 159797 (0.0013) [2024-06-15 13:27:49,644][1651669] Updated weights for policy 0, policy_version 159845 (0.0018) [2024-06-15 13:27:50,540][1651669] Updated weights for policy 0, policy_version 159876 (0.0014) [2024-06-15 13:27:50,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48606.0, 300 sec: 47654.3). Total num frames: 327450624. Throughput: 0: 12178.5. Samples: 81931776. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:50,767][1648981] Avg episode reward: [(0, '306.460')] [2024-06-15 13:27:51,700][1651669] Updated weights for policy 0, policy_version 159936 (0.0149) [2024-06-15 13:27:55,650][1651669] Updated weights for policy 0, policy_version 159998 (0.0100) [2024-06-15 13:27:55,770][1648981] Fps is (10 sec: 45862.5, 60 sec: 47517.2, 300 sec: 47429.8). Total num frames: 327647232. Throughput: 0: 12241.7. Samples: 81969152. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:27:55,770][1648981] Avg episode reward: [(0, '315.770')] [2024-06-15 13:27:57,395][1651669] Updated weights for policy 0, policy_version 160064 (0.0023) [2024-06-15 13:28:00,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 327876608. Throughput: 0: 12333.5. Samples: 82039808. Policy #0 lag: (min: 15.0, avg: 139.3, max: 271.0) [2024-06-15 13:28:00,767][1648981] Avg episode reward: [(0, '317.360')] [2024-06-15 13:28:01,685][1651669] Updated weights for policy 0, policy_version 160144 (0.0012) [2024-06-15 13:28:02,601][1651669] Updated weights for policy 0, policy_version 160190 (0.0012) [2024-06-15 13:28:05,766][1648981] Fps is (10 sec: 45889.8, 60 sec: 48618.6, 300 sec: 47430.3). Total num frames: 328105984. Throughput: 0: 12424.5. Samples: 82115584. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:05,767][1648981] Avg episode reward: [(0, '322.610')] [2024-06-15 13:28:07,163][1651669] Updated weights for policy 0, policy_version 160272 (0.0013) [2024-06-15 13:28:10,793][1648981] Fps is (10 sec: 45754.2, 60 sec: 48038.7, 300 sec: 47537.1). Total num frames: 328335360. Throughput: 0: 12178.5. Samples: 82142720. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:10,793][1648981] Avg episode reward: [(0, '333.610')] [2024-06-15 13:28:11,647][1651669] Updated weights for policy 0, policy_version 160336 (0.0013) [2024-06-15 13:28:13,402][1651669] Updated weights for policy 0, policy_version 160404 (0.0012) [2024-06-15 13:28:15,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 47653.7). Total num frames: 328597504. Throughput: 0: 11992.2. Samples: 82211328. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:15,767][1648981] Avg episode reward: [(0, '327.560')] [2024-06-15 13:28:16,117][1651669] Updated weights for policy 0, policy_version 160451 (0.0011) [2024-06-15 13:28:17,044][1651669] Updated weights for policy 0, policy_version 160501 (0.0096) [2024-06-15 13:28:18,661][1651669] Updated weights for policy 0, policy_version 160572 (0.0014) [2024-06-15 13:28:20,766][1648981] Fps is (10 sec: 52567.3, 60 sec: 48059.7, 300 sec: 47541.5). Total num frames: 328859648. Throughput: 0: 12345.0. Samples: 82293760. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:20,767][1648981] Avg episode reward: [(0, '325.360')] [2024-06-15 13:28:23,360][1651274] Signal inference workers to stop experience collection... (8400 times) [2024-06-15 13:28:23,417][1651669] InferenceWorker_p0-w0: stopping experience collection (8400 times) [2024-06-15 13:28:23,584][1651274] Signal inference workers to resume experience collection... (8400 times) [2024-06-15 13:28:23,594][1651669] InferenceWorker_p0-w0: resuming experience collection (8400 times) [2024-06-15 13:28:23,787][1651669] Updated weights for policy 0, policy_version 160631 (0.0012) [2024-06-15 13:28:24,997][1651669] Updated weights for policy 0, policy_version 160688 (0.0013) [2024-06-15 13:28:25,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 48059.6, 300 sec: 47985.6). Total num frames: 329121792. Throughput: 0: 12049.0. Samples: 82328576. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:25,768][1648981] Avg episode reward: [(0, '329.660')] [2024-06-15 13:28:27,477][1651669] Updated weights for policy 0, policy_version 160736 (0.0012) [2024-06-15 13:28:28,729][1651669] Updated weights for policy 0, policy_version 160789 (0.0012) [2024-06-15 13:28:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 329383936. Throughput: 0: 11946.8. Samples: 82398208. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:30,767][1648981] Avg episode reward: [(0, '335.050')] [2024-06-15 13:28:34,087][1651669] Updated weights for policy 0, policy_version 160864 (0.0029) [2024-06-15 13:28:35,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 47513.9, 300 sec: 47763.5). Total num frames: 329580544. Throughput: 0: 11923.9. Samples: 82468352. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:35,767][1648981] Avg episode reward: [(0, '351.490')] [2024-06-15 13:28:35,845][1651669] Updated weights for policy 0, policy_version 160932 (0.0014) [2024-06-15 13:28:36,025][1651274] Saving new best policy, reward=351.490! [2024-06-15 13:28:38,648][1651669] Updated weights for policy 0, policy_version 160992 (0.0013) [2024-06-15 13:28:40,486][1651669] Updated weights for policy 0, policy_version 161072 (0.0013) [2024-06-15 13:28:40,774][1648981] Fps is (10 sec: 49113.0, 60 sec: 48053.4, 300 sec: 47874.2). Total num frames: 329875456. Throughput: 0: 12013.7. Samples: 82509824. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:40,775][1648981] Avg episode reward: [(0, '351.120')] [2024-06-15 13:28:44,268][1651669] Updated weights for policy 0, policy_version 161093 (0.0021) [2024-06-15 13:28:45,774][1648981] Fps is (10 sec: 45839.4, 60 sec: 47507.7, 300 sec: 47873.3). Total num frames: 330039296. Throughput: 0: 12092.5. Samples: 82584064. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:45,775][1648981] Avg episode reward: [(0, '340.610')] [2024-06-15 13:28:46,187][1651669] Updated weights for policy 0, policy_version 161168 (0.0011) [2024-06-15 13:28:47,209][1651669] Updated weights for policy 0, policy_version 161215 (0.0015) [2024-06-15 13:28:50,766][1648981] Fps is (10 sec: 42631.8, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 330301440. Throughput: 0: 11832.9. Samples: 82648064. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:50,767][1648981] Avg episode reward: [(0, '354.950')] [2024-06-15 13:28:50,785][1651669] Updated weights for policy 0, policy_version 161296 (0.0124) [2024-06-15 13:28:51,144][1651274] Saving new best policy, reward=354.950! [2024-06-15 13:28:55,767][1648981] Fps is (10 sec: 39351.1, 60 sec: 46423.6, 300 sec: 47541.3). Total num frames: 330432512. Throughput: 0: 11965.0. Samples: 82680832. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:28:55,767][1648981] Avg episode reward: [(0, '364.480')] [2024-06-15 13:28:55,797][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000161344_330432512.pth... [2024-06-15 13:28:55,921][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000155776_319029248.pth [2024-06-15 13:28:55,925][1651274] Saving new best policy, reward=364.480! [2024-06-15 13:28:56,463][1651669] Updated weights for policy 0, policy_version 161362 (0.0038) [2024-06-15 13:28:57,921][1651669] Updated weights for policy 0, policy_version 161424 (0.0015) [2024-06-15 13:28:59,015][1651669] Updated weights for policy 0, policy_version 161469 (0.0012) [2024-06-15 13:29:00,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 46967.4, 300 sec: 47874.6). Total num frames: 330694656. Throughput: 0: 11878.4. Samples: 82745856. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:29:00,767][1648981] Avg episode reward: [(0, '368.900')] [2024-06-15 13:29:01,269][1651274] Saving new best policy, reward=368.900! [2024-06-15 13:29:01,810][1651274] Signal inference workers to stop experience collection... (8450 times) [2024-06-15 13:29:01,898][1651669] InferenceWorker_p0-w0: stopping experience collection (8450 times) [2024-06-15 13:29:02,038][1651274] Signal inference workers to resume experience collection... (8450 times) [2024-06-15 13:29:02,063][1651669] InferenceWorker_p0-w0: resuming experience collection (8450 times) [2024-06-15 13:29:02,487][1651669] Updated weights for policy 0, policy_version 161554 (0.0092) [2024-06-15 13:29:05,767][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.4, 300 sec: 47541.9). Total num frames: 330956800. Throughput: 0: 11684.9. Samples: 82819584. Policy #0 lag: (min: 54.0, avg: 155.7, max: 310.0) [2024-06-15 13:29:05,767][1648981] Avg episode reward: [(0, '366.470')] [2024-06-15 13:29:06,831][1651669] Updated weights for policy 0, policy_version 161601 (0.0014) [2024-06-15 13:29:09,210][1651669] Updated weights for policy 0, policy_version 161683 (0.0013) [2024-06-15 13:29:10,299][1651669] Updated weights for policy 0, policy_version 161728 (0.0013) [2024-06-15 13:29:10,830][1648981] Fps is (10 sec: 52096.7, 60 sec: 48029.8, 300 sec: 47975.3). Total num frames: 331218944. Throughput: 0: 11566.2. Samples: 82849792. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:10,831][1648981] Avg episode reward: [(0, '367.080')] [2024-06-15 13:29:14,193][1651669] Updated weights for policy 0, policy_version 161814 (0.0018) [2024-06-15 13:29:14,915][1651669] Updated weights for policy 0, policy_version 161854 (0.0029) [2024-06-15 13:29:15,785][1648981] Fps is (10 sec: 52335.2, 60 sec: 48045.1, 300 sec: 47649.5). Total num frames: 331481088. Throughput: 0: 11555.1. Samples: 82918400. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:15,785][1648981] Avg episode reward: [(0, '367.530')] [2024-06-15 13:29:19,763][1651669] Updated weights for policy 0, policy_version 161912 (0.0012) [2024-06-15 13:29:20,766][1648981] Fps is (10 sec: 42871.8, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 331644928. Throughput: 0: 11559.8. Samples: 82988544. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:20,767][1648981] Avg episode reward: [(0, '358.300')] [2024-06-15 13:29:20,976][1651669] Updated weights for policy 0, policy_version 161954 (0.0115) [2024-06-15 13:29:24,180][1651669] Updated weights for policy 0, policy_version 162017 (0.0017) [2024-06-15 13:29:25,360][1651669] Updated weights for policy 0, policy_version 162080 (0.0024) [2024-06-15 13:29:25,766][1648981] Fps is (10 sec: 49241.3, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 331972608. Throughput: 0: 11459.4. Samples: 83025408. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:25,767][1648981] Avg episode reward: [(0, '367.150')] [2024-06-15 13:29:30,798][1648981] Fps is (10 sec: 42463.1, 60 sec: 44759.1, 300 sec: 47758.4). Total num frames: 332070912. Throughput: 0: 11462.7. Samples: 83100160. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:30,799][1648981] Avg episode reward: [(0, '352.610')] [2024-06-15 13:29:30,818][1651669] Updated weights for policy 0, policy_version 162160 (0.0012) [2024-06-15 13:29:31,554][1651669] Updated weights for policy 0, policy_version 162181 (0.0011) [2024-06-15 13:29:32,836][1651669] Updated weights for policy 0, policy_version 162232 (0.0015) [2024-06-15 13:29:34,882][1651669] Updated weights for policy 0, policy_version 162272 (0.0174) [2024-06-15 13:29:35,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 332431360. Throughput: 0: 11559.8. Samples: 83168256. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:35,767][1648981] Avg episode reward: [(0, '356.120')] [2024-06-15 13:29:36,178][1651669] Updated weights for policy 0, policy_version 162339 (0.0102) [2024-06-15 13:29:40,694][1651669] Updated weights for policy 0, policy_version 162373 (0.0013) [2024-06-15 13:29:40,766][1648981] Fps is (10 sec: 46021.8, 60 sec: 44242.6, 300 sec: 47542.7). Total num frames: 332529664. Throughput: 0: 11719.2. Samples: 83208192. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:40,767][1648981] Avg episode reward: [(0, '335.240')] [2024-06-15 13:29:41,955][1651669] Updated weights for policy 0, policy_version 162427 (0.0016) [2024-06-15 13:29:42,523][1651274] Signal inference workers to stop experience collection... (8500 times) [2024-06-15 13:29:42,572][1651669] InferenceWorker_p0-w0: stopping experience collection (8500 times) [2024-06-15 13:29:42,764][1651274] Signal inference workers to resume experience collection... (8500 times) [2024-06-15 13:29:42,765][1651669] InferenceWorker_p0-w0: resuming experience collection (8500 times) [2024-06-15 13:29:43,196][1651669] Updated weights for policy 0, policy_version 162480 (0.0012) [2024-06-15 13:29:45,043][1651669] Updated weights for policy 0, policy_version 162497 (0.0013) [2024-06-15 13:29:45,784][1648981] Fps is (10 sec: 42521.9, 60 sec: 46959.4, 300 sec: 47760.6). Total num frames: 332857344. Throughput: 0: 11873.6. Samples: 83280384. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:45,785][1648981] Avg episode reward: [(0, '320.830')] [2024-06-15 13:29:46,777][1651669] Updated weights for policy 0, policy_version 162564 (0.0012) [2024-06-15 13:29:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 333053952. Throughput: 0: 11889.8. Samples: 83354624. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:50,767][1648981] Avg episode reward: [(0, '316.280')] [2024-06-15 13:29:51,114][1651669] Updated weights for policy 0, policy_version 162626 (0.0016) [2024-06-15 13:29:52,648][1651669] Updated weights for policy 0, policy_version 162688 (0.0013) [2024-06-15 13:29:54,549][1651669] Updated weights for policy 0, policy_version 162750 (0.0013) [2024-06-15 13:29:55,767][1648981] Fps is (10 sec: 45957.6, 60 sec: 48059.8, 300 sec: 47876.5). Total num frames: 333316096. Throughput: 0: 11952.2. Samples: 83386880. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:29:55,767][1648981] Avg episode reward: [(0, '311.380')] [2024-06-15 13:29:58,326][1651669] Updated weights for policy 0, policy_version 162848 (0.0011) [2024-06-15 13:30:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 333578240. Throughput: 0: 11689.7. Samples: 83444224. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:30:00,767][1648981] Avg episode reward: [(0, '304.730')] [2024-06-15 13:30:04,769][1651669] Updated weights for policy 0, policy_version 162928 (0.0015) [2024-06-15 13:30:05,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 46967.7, 300 sec: 47763.5). Total num frames: 333774848. Throughput: 0: 11776.0. Samples: 83518464. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:30:05,767][1648981] Avg episode reward: [(0, '299.990')] [2024-06-15 13:30:05,872][1651669] Updated weights for policy 0, policy_version 162979 (0.0018) [2024-06-15 13:30:08,535][1651669] Updated weights for policy 0, policy_version 163042 (0.0033) [2024-06-15 13:30:10,302][1651669] Updated weights for policy 0, policy_version 163105 (0.0145) [2024-06-15 13:30:10,770][1648981] Fps is (10 sec: 49133.4, 60 sec: 47561.1, 300 sec: 47429.7). Total num frames: 334069760. Throughput: 0: 11752.3. Samples: 83554304. Policy #0 lag: (min: 15.0, avg: 99.4, max: 271.0) [2024-06-15 13:30:10,771][1648981] Avg episode reward: [(0, '296.210')] [2024-06-15 13:30:15,579][1651669] Updated weights for policy 0, policy_version 163168 (0.0015) [2024-06-15 13:30:15,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 44796.4, 300 sec: 47320.4). Total num frames: 334168064. Throughput: 0: 11761.5. Samples: 83629056. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:15,767][1648981] Avg episode reward: [(0, '289.790')] [2024-06-15 13:30:17,193][1651669] Updated weights for policy 0, policy_version 163234 (0.0015) [2024-06-15 13:30:20,527][1651669] Updated weights for policy 0, policy_version 163299 (0.0015) [2024-06-15 13:30:20,766][1648981] Fps is (10 sec: 36058.6, 60 sec: 46421.4, 300 sec: 46874.9). Total num frames: 334430208. Throughput: 0: 11571.2. Samples: 83688960. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:20,767][1648981] Avg episode reward: [(0, '290.890')] [2024-06-15 13:30:22,185][1651669] Updated weights for policy 0, policy_version 163364 (0.0021) [2024-06-15 13:30:25,794][1648981] Fps is (10 sec: 45749.1, 60 sec: 44216.4, 300 sec: 47203.7). Total num frames: 334626816. Throughput: 0: 11348.0. Samples: 83719168. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:25,795][1648981] Avg episode reward: [(0, '301.790')] [2024-06-15 13:30:26,005][1651274] Signal inference workers to stop experience collection... (8550 times) [2024-06-15 13:30:26,044][1651669] InferenceWorker_p0-w0: stopping experience collection (8550 times) [2024-06-15 13:30:26,252][1651274] Signal inference workers to resume experience collection... (8550 times) [2024-06-15 13:30:26,253][1651669] InferenceWorker_p0-w0: resuming experience collection (8550 times) [2024-06-15 13:30:26,256][1651669] Updated weights for policy 0, policy_version 163408 (0.0013) [2024-06-15 13:30:27,972][1651669] Updated weights for policy 0, policy_version 163491 (0.0014) [2024-06-15 13:30:30,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46992.4, 300 sec: 46877.4). Total num frames: 334888960. Throughput: 0: 11371.0. Samples: 83791872. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:30,767][1648981] Avg episode reward: [(0, '290.630')] [2024-06-15 13:30:31,766][1651669] Updated weights for policy 0, policy_version 163538 (0.0013) [2024-06-15 13:30:33,375][1651669] Updated weights for policy 0, policy_version 163605 (0.0013) [2024-06-15 13:30:34,368][1651669] Updated weights for policy 0, policy_version 163646 (0.0011) [2024-06-15 13:30:35,766][1648981] Fps is (10 sec: 52575.2, 60 sec: 45329.2, 300 sec: 47541.4). Total num frames: 335151104. Throughput: 0: 11229.9. Samples: 83859968. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:35,767][1648981] Avg episode reward: [(0, '305.780')] [2024-06-15 13:30:39,392][1651669] Updated weights for policy 0, policy_version 163728 (0.0121) [2024-06-15 13:30:40,782][1648981] Fps is (10 sec: 52345.8, 60 sec: 48047.0, 300 sec: 47428.0). Total num frames: 335413248. Throughput: 0: 11362.5. Samples: 83898368. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:40,783][1648981] Avg episode reward: [(0, '318.200')] [2024-06-15 13:30:42,954][1651669] Updated weights for policy 0, policy_version 163793 (0.0013) [2024-06-15 13:30:45,150][1651669] Updated weights for policy 0, policy_version 163875 (0.0013) [2024-06-15 13:30:45,770][1648981] Fps is (10 sec: 52407.4, 60 sec: 46978.5, 300 sec: 47544.4). Total num frames: 335675392. Throughput: 0: 11445.0. Samples: 83959296. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:45,771][1648981] Avg episode reward: [(0, '319.460')] [2024-06-15 13:30:48,921][1651669] Updated weights for policy 0, policy_version 163906 (0.0014) [2024-06-15 13:30:49,939][1651669] Updated weights for policy 0, policy_version 163962 (0.0013) [2024-06-15 13:30:50,766][1648981] Fps is (10 sec: 42665.8, 60 sec: 46421.3, 300 sec: 47321.1). Total num frames: 335839232. Throughput: 0: 11514.3. Samples: 84036608. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:50,767][1648981] Avg episode reward: [(0, '325.310')] [2024-06-15 13:30:50,920][1651669] Updated weights for policy 0, policy_version 164000 (0.0015) [2024-06-15 13:30:53,660][1651669] Updated weights for policy 0, policy_version 164048 (0.0012) [2024-06-15 13:30:54,920][1651669] Updated weights for policy 0, policy_version 164100 (0.0095) [2024-06-15 13:30:55,766][1648981] Fps is (10 sec: 45893.0, 60 sec: 46967.5, 300 sec: 47319.8). Total num frames: 336134144. Throughput: 0: 11606.3. Samples: 84076544. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:30:55,767][1648981] Avg episode reward: [(0, '330.510')] [2024-06-15 13:30:56,372][1651669] Updated weights for policy 0, policy_version 164156 (0.0012) [2024-06-15 13:30:56,434][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000164160_336199680.pth... [2024-06-15 13:30:56,475][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000158592_324796416.pth [2024-06-15 13:30:56,480][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000164160_336199680.pth [2024-06-15 13:31:00,476][1651669] Updated weights for policy 0, policy_version 164208 (0.0014) [2024-06-15 13:31:00,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 45329.1, 300 sec: 47319.2). Total num frames: 336297984. Throughput: 0: 11707.8. Samples: 84155904. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:31:00,767][1648981] Avg episode reward: [(0, '332.020')] [2024-06-15 13:31:02,087][1651669] Updated weights for policy 0, policy_version 164288 (0.0014) [2024-06-15 13:31:04,433][1651274] Signal inference workers to stop experience collection... (8600 times) [2024-06-15 13:31:04,529][1651669] InferenceWorker_p0-w0: stopping experience collection (8600 times) [2024-06-15 13:31:04,619][1651274] Signal inference workers to resume experience collection... (8600 times) [2024-06-15 13:31:04,621][1651669] InferenceWorker_p0-w0: resuming experience collection (8600 times) [2024-06-15 13:31:05,186][1651669] Updated weights for policy 0, policy_version 164352 (0.0221) [2024-06-15 13:31:05,766][1648981] Fps is (10 sec: 49152.9, 60 sec: 47513.7, 300 sec: 47208.5). Total num frames: 336625664. Throughput: 0: 11832.9. Samples: 84221440. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:31:05,767][1648981] Avg episode reward: [(0, '318.820')] [2024-06-15 13:31:06,484][1651669] Updated weights for policy 0, policy_version 164411 (0.0013) [2024-06-15 13:31:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45332.0, 300 sec: 47319.2). Total num frames: 336789504. Throughput: 0: 12022.4. Samples: 84259840. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:31:10,767][1648981] Avg episode reward: [(0, '314.510')] [2024-06-15 13:31:11,210][1651669] Updated weights for policy 0, policy_version 164480 (0.0013) [2024-06-15 13:31:12,376][1651669] Updated weights for policy 0, policy_version 164528 (0.0013) [2024-06-15 13:31:14,855][1651669] Updated weights for policy 0, policy_version 164577 (0.0015) [2024-06-15 13:31:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 337117184. Throughput: 0: 12060.4. Samples: 84334592. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:31:15,767][1648981] Avg episode reward: [(0, '326.140')] [2024-06-15 13:31:16,137][1651669] Updated weights for policy 0, policy_version 164626 (0.0013) [2024-06-15 13:31:20,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 337248256. Throughput: 0: 12151.5. Samples: 84406784. Policy #0 lag: (min: 15.0, avg: 93.8, max: 271.0) [2024-06-15 13:31:20,767][1648981] Avg episode reward: [(0, '335.320')] [2024-06-15 13:31:21,239][1651669] Updated weights for policy 0, policy_version 164676 (0.0107) [2024-06-15 13:31:23,326][1651669] Updated weights for policy 0, policy_version 164755 (0.0013) [2024-06-15 13:31:24,291][1651669] Updated weights for policy 0, policy_version 164797 (0.0032) [2024-06-15 13:31:25,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48081.9, 300 sec: 47097.0). Total num frames: 337510400. Throughput: 0: 11928.1. Samples: 84434944. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:25,767][1648981] Avg episode reward: [(0, '341.080')] [2024-06-15 13:31:26,764][1651669] Updated weights for policy 0, policy_version 164838 (0.0012) [2024-06-15 13:31:28,821][1651669] Updated weights for policy 0, policy_version 164920 (0.0102) [2024-06-15 13:31:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 337772544. Throughput: 0: 11947.8. Samples: 84496896. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:30,767][1648981] Avg episode reward: [(0, '334.490')] [2024-06-15 13:31:34,717][1651669] Updated weights for policy 0, policy_version 164992 (0.0012) [2024-06-15 13:31:35,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.3, 300 sec: 46986.0). Total num frames: 337969152. Throughput: 0: 11867.0. Samples: 84570624. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:35,767][1648981] Avg episode reward: [(0, '333.580')] [2024-06-15 13:31:36,275][1651669] Updated weights for policy 0, policy_version 165056 (0.0012) [2024-06-15 13:31:38,736][1651669] Updated weights for policy 0, policy_version 165124 (0.0142) [2024-06-15 13:31:40,775][1648981] Fps is (10 sec: 52384.6, 60 sec: 48065.7, 300 sec: 47429.0). Total num frames: 338296832. Throughput: 0: 11603.2. Samples: 84598784. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:40,775][1648981] Avg episode reward: [(0, '338.040')] [2024-06-15 13:31:45,560][1651669] Updated weights for policy 0, policy_version 165189 (0.0012) [2024-06-15 13:31:45,766][1648981] Fps is (10 sec: 32768.6, 60 sec: 43693.7, 300 sec: 46652.8). Total num frames: 338296832. Throughput: 0: 11491.6. Samples: 84673024. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:45,767][1648981] Avg episode reward: [(0, '319.700')] [2024-06-15 13:31:46,731][1651274] Signal inference workers to stop experience collection... (8650 times) [2024-06-15 13:31:46,778][1651669] InferenceWorker_p0-w0: stopping experience collection (8650 times) [2024-06-15 13:31:47,014][1651274] Signal inference workers to resume experience collection... (8650 times) [2024-06-15 13:31:47,017][1651669] InferenceWorker_p0-w0: resuming experience collection (8650 times) [2024-06-15 13:31:47,134][1651669] Updated weights for policy 0, policy_version 165249 (0.0012) [2024-06-15 13:31:48,326][1651669] Updated weights for policy 0, policy_version 165301 (0.0014) [2024-06-15 13:31:49,289][1651669] Updated weights for policy 0, policy_version 165344 (0.0044) [2024-06-15 13:31:50,465][1651669] Updated weights for policy 0, policy_version 165396 (0.0014) [2024-06-15 13:31:50,766][1648981] Fps is (10 sec: 45914.0, 60 sec: 48606.0, 300 sec: 47320.5). Total num frames: 338755584. Throughput: 0: 11537.1. Samples: 84740608. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:50,767][1648981] Avg episode reward: [(0, '324.910')] [2024-06-15 13:31:55,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 44783.0, 300 sec: 46986.0). Total num frames: 338821120. Throughput: 0: 11446.0. Samples: 84774912. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:31:55,767][1648981] Avg episode reward: [(0, '323.250')] [2024-06-15 13:31:56,902][1651669] Updated weights for policy 0, policy_version 165456 (0.0103) [2024-06-15 13:31:58,471][1651669] Updated weights for policy 0, policy_version 165520 (0.0020) [2024-06-15 13:31:59,729][1651669] Updated weights for policy 0, policy_version 165565 (0.0015) [2024-06-15 13:32:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48059.8, 300 sec: 47432.8). Total num frames: 339181568. Throughput: 0: 11286.8. Samples: 84842496. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:32:00,767][1648981] Avg episode reward: [(0, '333.590')] [2024-06-15 13:32:00,833][1651669] Updated weights for policy 0, policy_version 165618 (0.0029) [2024-06-15 13:32:02,252][1651669] Updated weights for policy 0, policy_version 165695 (0.0014) [2024-06-15 13:32:05,767][1648981] Fps is (10 sec: 52427.1, 60 sec: 45328.8, 300 sec: 47097.0). Total num frames: 339345408. Throughput: 0: 11377.7. Samples: 84918784. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:32:05,767][1648981] Avg episode reward: [(0, '334.360')] [2024-06-15 13:32:09,521][1651669] Updated weights for policy 0, policy_version 165760 (0.0012) [2024-06-15 13:32:10,768][1648981] Fps is (10 sec: 36039.4, 60 sec: 45874.1, 300 sec: 46874.7). Total num frames: 339542016. Throughput: 0: 11639.1. Samples: 84958720. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:32:10,768][1648981] Avg episode reward: [(0, '326.900')] [2024-06-15 13:32:11,172][1651669] Updated weights for policy 0, policy_version 165822 (0.0128) [2024-06-15 13:32:12,812][1651669] Updated weights for policy 0, policy_version 165893 (0.0012) [2024-06-15 13:32:13,864][1651669] Updated weights for policy 0, policy_version 165951 (0.0013) [2024-06-15 13:32:15,767][1648981] Fps is (10 sec: 52429.6, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 339869696. Throughput: 0: 11354.9. Samples: 85007872. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:32:15,767][1648981] Avg episode reward: [(0, '335.380')] [2024-06-15 13:32:20,773][1648981] Fps is (10 sec: 39300.1, 60 sec: 44777.7, 300 sec: 46429.5). Total num frames: 339935232. Throughput: 0: 11615.0. Samples: 85093376. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:32:20,774][1648981] Avg episode reward: [(0, '341.420')] [2024-06-15 13:32:21,076][1651669] Updated weights for policy 0, policy_version 166004 (0.0014) [2024-06-15 13:32:22,281][1651669] Updated weights for policy 0, policy_version 166064 (0.0069) [2024-06-15 13:32:23,051][1651274] Signal inference workers to stop experience collection... (8700 times) [2024-06-15 13:32:23,102][1651669] InferenceWorker_p0-w0: stopping experience collection (8700 times) [2024-06-15 13:32:23,229][1651274] Signal inference workers to resume experience collection... (8700 times) [2024-06-15 13:32:23,230][1651669] InferenceWorker_p0-w0: resuming experience collection (8700 times) [2024-06-15 13:32:24,028][1651669] Updated weights for policy 0, policy_version 166144 (0.0110) [2024-06-15 13:32:25,358][1651669] Updated weights for policy 0, policy_version 166202 (0.0138) [2024-06-15 13:32:25,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 340393984. Throughput: 0: 11641.6. Samples: 85122560. Policy #0 lag: (min: 7.0, avg: 89.5, max: 263.0) [2024-06-15 13:32:25,767][1648981] Avg episode reward: [(0, '349.290')] [2024-06-15 13:32:30,766][1648981] Fps is (10 sec: 49186.0, 60 sec: 44236.8, 300 sec: 46430.6). Total num frames: 340426752. Throughput: 0: 11685.0. Samples: 85198848. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:32:30,767][1648981] Avg episode reward: [(0, '345.870')] [2024-06-15 13:32:31,522][1651669] Updated weights for policy 0, policy_version 166256 (0.0113) [2024-06-15 13:32:33,322][1651669] Updated weights for policy 0, policy_version 166328 (0.0013) [2024-06-15 13:32:35,340][1651669] Updated weights for policy 0, policy_version 166393 (0.0013) [2024-06-15 13:32:35,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 46967.6, 300 sec: 46763.8). Total num frames: 340787200. Throughput: 0: 11434.7. Samples: 85255168. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:32:35,767][1648981] Avg episode reward: [(0, '337.650')] [2024-06-15 13:32:37,173][1651669] Updated weights for policy 0, policy_version 166457 (0.0013) [2024-06-15 13:32:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 43696.7, 300 sec: 46541.7). Total num frames: 340918272. Throughput: 0: 11366.4. Samples: 85286400. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:32:40,767][1648981] Avg episode reward: [(0, '339.130')] [2024-06-15 13:32:44,149][1651669] Updated weights for policy 0, policy_version 166536 (0.0015) [2024-06-15 13:32:45,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 341180416. Throughput: 0: 11446.0. Samples: 85357568. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:32:45,767][1648981] Avg episode reward: [(0, '339.600')] [2024-06-15 13:32:45,944][1651669] Updated weights for policy 0, policy_version 166593 (0.0014) [2024-06-15 13:32:47,509][1651669] Updated weights for policy 0, policy_version 166656 (0.0102) [2024-06-15 13:32:49,050][1651669] Updated weights for policy 0, policy_version 166712 (0.0012) [2024-06-15 13:32:50,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 44782.7, 300 sec: 46764.3). Total num frames: 341442560. Throughput: 0: 11207.1. Samples: 85423104. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:32:50,767][1648981] Avg episode reward: [(0, '338.850')] [2024-06-15 13:32:54,748][1651669] Updated weights for policy 0, policy_version 166752 (0.0021) [2024-06-15 13:32:55,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 341573632. Throughput: 0: 11207.5. Samples: 85463040. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:32:55,767][1648981] Avg episode reward: [(0, '345.010')] [2024-06-15 13:32:56,303][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000166816_341639168.pth... [2024-06-15 13:32:56,420][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000161344_330432512.pth [2024-06-15 13:32:56,489][1651669] Updated weights for policy 0, policy_version 166817 (0.0013) [2024-06-15 13:32:57,801][1651669] Updated weights for policy 0, policy_version 166865 (0.0013) [2024-06-15 13:32:59,727][1651669] Updated weights for policy 0, policy_version 166932 (0.0013) [2024-06-15 13:33:00,672][1651669] Updated weights for policy 0, policy_version 166970 (0.0011) [2024-06-15 13:33:00,766][1648981] Fps is (10 sec: 49153.6, 60 sec: 45875.2, 300 sec: 46874.9). Total num frames: 341934080. Throughput: 0: 11400.6. Samples: 85520896. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:00,767][1648981] Avg episode reward: [(0, '339.910')] [2024-06-15 13:33:05,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 44237.1, 300 sec: 46323.6). Total num frames: 341999616. Throughput: 0: 11447.8. Samples: 85608448. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:05,767][1648981] Avg episode reward: [(0, '345.340')] [2024-06-15 13:33:05,882][1651274] Signal inference workers to stop experience collection... (8750 times) [2024-06-15 13:33:05,983][1651669] InferenceWorker_p0-w0: stopping experience collection (8750 times) [2024-06-15 13:33:06,119][1651274] Signal inference workers to resume experience collection... (8750 times) [2024-06-15 13:33:06,120][1651669] InferenceWorker_p0-w0: resuming experience collection (8750 times) [2024-06-15 13:33:06,122][1651669] Updated weights for policy 0, policy_version 167024 (0.0096) [2024-06-15 13:33:07,576][1651669] Updated weights for policy 0, policy_version 167076 (0.0012) [2024-06-15 13:33:09,579][1651669] Updated weights for policy 0, policy_version 167154 (0.0013) [2024-06-15 13:33:10,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 48060.8, 300 sec: 46874.9). Total num frames: 342425600. Throughput: 0: 11480.1. Samples: 85639168. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:10,767][1648981] Avg episode reward: [(0, '346.940')] [2024-06-15 13:33:11,332][1651669] Updated weights for policy 0, policy_version 167218 (0.0018) [2024-06-15 13:33:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 342491136. Throughput: 0: 11480.2. Samples: 85715456. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:15,767][1648981] Avg episode reward: [(0, '347.050')] [2024-06-15 13:33:16,583][1651669] Updated weights for policy 0, policy_version 167256 (0.0040) [2024-06-15 13:33:18,060][1651669] Updated weights for policy 0, policy_version 167312 (0.0014) [2024-06-15 13:33:19,693][1651669] Updated weights for policy 0, policy_version 167392 (0.0013) [2024-06-15 13:33:20,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 49157.5, 300 sec: 46652.7). Total num frames: 342884352. Throughput: 0: 11593.9. Samples: 85776896. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:20,768][1648981] Avg episode reward: [(0, '327.710')] [2024-06-15 13:33:21,730][1651669] Updated weights for policy 0, policy_version 167472 (0.0013) [2024-06-15 13:33:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 343015424. Throughput: 0: 11787.4. Samples: 85816832. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:25,767][1648981] Avg episode reward: [(0, '328.520')] [2024-06-15 13:33:27,265][1651669] Updated weights for policy 0, policy_version 167504 (0.0012) [2024-06-15 13:33:29,170][1651669] Updated weights for policy 0, policy_version 167588 (0.0013) [2024-06-15 13:33:30,761][1651669] Updated weights for policy 0, policy_version 167664 (0.0013) [2024-06-15 13:33:30,771][1648981] Fps is (10 sec: 49133.0, 60 sec: 49148.6, 300 sec: 46763.2). Total num frames: 343375872. Throughput: 0: 11922.8. Samples: 85894144. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:30,771][1648981] Avg episode reward: [(0, '323.880')] [2024-06-15 13:33:32,507][1651669] Updated weights for policy 0, policy_version 167728 (0.0014) [2024-06-15 13:33:35,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46320.7). Total num frames: 343539712. Throughput: 0: 12151.5. Samples: 85969920. Policy #0 lag: (min: 5.0, avg: 68.0, max: 261.0) [2024-06-15 13:33:35,767][1648981] Avg episode reward: [(0, '328.880')] [2024-06-15 13:33:37,962][1651669] Updated weights for policy 0, policy_version 167760 (0.0026) [2024-06-15 13:33:39,712][1651669] Updated weights for policy 0, policy_version 167831 (0.0013) [2024-06-15 13:33:40,766][1648981] Fps is (10 sec: 42616.1, 60 sec: 48059.8, 300 sec: 46654.0). Total num frames: 343801856. Throughput: 0: 12231.1. Samples: 86013440. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:33:40,767][1648981] Avg episode reward: [(0, '327.650')] [2024-06-15 13:33:41,021][1651274] Signal inference workers to stop experience collection... (8800 times) [2024-06-15 13:33:41,072][1651669] InferenceWorker_p0-w0: stopping experience collection (8800 times) [2024-06-15 13:33:41,250][1651274] Signal inference workers to resume experience collection... (8800 times) [2024-06-15 13:33:41,251][1651669] InferenceWorker_p0-w0: resuming experience collection (8800 times) [2024-06-15 13:33:42,145][1651669] Updated weights for policy 0, policy_version 167936 (0.0017) [2024-06-15 13:33:43,469][1651669] Updated weights for policy 0, policy_version 167994 (0.0013) [2024-06-15 13:33:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 344064000. Throughput: 0: 12094.6. Samples: 86065152. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:33:45,767][1648981] Avg episode reward: [(0, '350.830')] [2024-06-15 13:33:49,798][1651669] Updated weights for policy 0, policy_version 168038 (0.0013) [2024-06-15 13:33:50,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46421.7, 300 sec: 46763.9). Total num frames: 344227840. Throughput: 0: 12015.0. Samples: 86149120. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:33:50,767][1648981] Avg episode reward: [(0, '334.550')] [2024-06-15 13:33:51,146][1651669] Updated weights for policy 0, policy_version 168112 (0.0013) [2024-06-15 13:33:52,701][1651669] Updated weights for policy 0, policy_version 168180 (0.0012) [2024-06-15 13:33:54,089][1651669] Updated weights for policy 0, policy_version 168253 (0.0018) [2024-06-15 13:33:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 344588288. Throughput: 0: 11878.4. Samples: 86173696. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:33:55,767][1648981] Avg episode reward: [(0, '329.430')] [2024-06-15 13:34:00,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 45329.1, 300 sec: 46430.6). Total num frames: 344653824. Throughput: 0: 12003.6. Samples: 86255616. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:00,767][1648981] Avg episode reward: [(0, '327.170')] [2024-06-15 13:34:01,134][1651669] Updated weights for policy 0, policy_version 168316 (0.0095) [2024-06-15 13:34:02,361][1651669] Updated weights for policy 0, policy_version 168368 (0.0146) [2024-06-15 13:34:03,938][1651669] Updated weights for policy 0, policy_version 168433 (0.0013) [2024-06-15 13:34:05,480][1651669] Updated weights for policy 0, policy_version 168511 (0.0099) [2024-06-15 13:34:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 51882.7, 300 sec: 47107.2). Total num frames: 345112576. Throughput: 0: 11844.3. Samples: 86309888. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:05,767][1648981] Avg episode reward: [(0, '319.760')] [2024-06-15 13:34:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 44783.1, 300 sec: 46211.3). Total num frames: 345112576. Throughput: 0: 11867.0. Samples: 86350848. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:10,767][1648981] Avg episode reward: [(0, '313.010')] [2024-06-15 13:34:12,449][1651669] Updated weights for policy 0, policy_version 168566 (0.0012) [2024-06-15 13:34:13,861][1651669] Updated weights for policy 0, policy_version 168624 (0.0012) [2024-06-15 13:34:15,703][1651669] Updated weights for policy 0, policy_version 168690 (0.0146) [2024-06-15 13:34:15,782][1648981] Fps is (10 sec: 35987.5, 60 sec: 49685.0, 300 sec: 46872.4). Total num frames: 345473024. Throughput: 0: 11704.7. Samples: 86420992. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:15,783][1648981] Avg episode reward: [(0, '313.480')] [2024-06-15 13:34:17,141][1651669] Updated weights for policy 0, policy_version 168766 (0.0029) [2024-06-15 13:34:20,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45875.4, 300 sec: 46319.5). Total num frames: 345636864. Throughput: 0: 11559.8. Samples: 86490112. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:20,767][1648981] Avg episode reward: [(0, '313.230')] [2024-06-15 13:34:23,338][1651274] Signal inference workers to stop experience collection... (8850 times) [2024-06-15 13:34:23,384][1651669] InferenceWorker_p0-w0: stopping experience collection (8850 times) [2024-06-15 13:34:23,503][1651274] Signal inference workers to resume experience collection... (8850 times) [2024-06-15 13:34:23,504][1651669] InferenceWorker_p0-w0: resuming experience collection (8850 times) [2024-06-15 13:34:23,967][1651669] Updated weights for policy 0, policy_version 168832 (0.0013) [2024-06-15 13:34:25,767][1648981] Fps is (10 sec: 42665.8, 60 sec: 48059.6, 300 sec: 46879.9). Total num frames: 345899008. Throughput: 0: 11434.6. Samples: 86528000. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:25,767][1648981] Avg episode reward: [(0, '334.380')] [2024-06-15 13:34:26,040][1651669] Updated weights for policy 0, policy_version 168901 (0.0013) [2024-06-15 13:34:27,495][1651669] Updated weights for policy 0, policy_version 168963 (0.0107) [2024-06-15 13:34:28,774][1651669] Updated weights for policy 0, policy_version 169020 (0.0014) [2024-06-15 13:34:30,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46424.6, 300 sec: 46541.7). Total num frames: 346161152. Throughput: 0: 11650.9. Samples: 86589440. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:30,767][1648981] Avg episode reward: [(0, '331.070')] [2024-06-15 13:34:34,483][1651669] Updated weights for policy 0, policy_version 169080 (0.0014) [2024-06-15 13:34:35,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 346324992. Throughput: 0: 11525.7. Samples: 86667776. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:35,767][1648981] Avg episode reward: [(0, '344.260')] [2024-06-15 13:34:36,417][1651669] Updated weights for policy 0, policy_version 169147 (0.0013) [2024-06-15 13:34:37,877][1651669] Updated weights for policy 0, policy_version 169200 (0.0019) [2024-06-15 13:34:39,287][1651669] Updated weights for policy 0, policy_version 169234 (0.0011) [2024-06-15 13:34:40,148][1651669] Updated weights for policy 0, policy_version 169274 (0.0018) [2024-06-15 13:34:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 46877.8). Total num frames: 346685440. Throughput: 0: 11685.0. Samples: 86699520. Policy #0 lag: (min: 15.0, avg: 67.3, max: 271.0) [2024-06-15 13:34:40,767][1648981] Avg episode reward: [(0, '339.170')] [2024-06-15 13:34:45,501][1651669] Updated weights for policy 0, policy_version 169342 (0.0015) [2024-06-15 13:34:45,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 346816512. Throughput: 0: 11650.8. Samples: 86779904. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:34:45,767][1648981] Avg episode reward: [(0, '342.240')] [2024-06-15 13:34:47,342][1651669] Updated weights for policy 0, policy_version 169397 (0.0012) [2024-06-15 13:34:48,691][1651669] Updated weights for policy 0, policy_version 169426 (0.0012) [2024-06-15 13:34:50,423][1651669] Updated weights for policy 0, policy_version 169488 (0.0013) [2024-06-15 13:34:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.6, 300 sec: 46763.9). Total num frames: 347111424. Throughput: 0: 11810.1. Samples: 86841344. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:34:50,767][1648981] Avg episode reward: [(0, '326.110')] [2024-06-15 13:34:51,581][1651669] Updated weights for policy 0, policy_version 169536 (0.0037) [2024-06-15 13:34:55,768][1648981] Fps is (10 sec: 39315.9, 60 sec: 43689.6, 300 sec: 46208.2). Total num frames: 347209728. Throughput: 0: 11741.5. Samples: 86879232. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:34:55,768][1648981] Avg episode reward: [(0, '323.390')] [2024-06-15 13:34:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000169536_347209728.pth... [2024-06-15 13:34:55,982][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000164160_336199680.pth [2024-06-15 13:34:57,840][1651669] Updated weights for policy 0, policy_version 169616 (0.0014) [2024-06-15 13:34:58,838][1651669] Updated weights for policy 0, policy_version 169654 (0.0013) [2024-06-15 13:35:00,629][1651669] Updated weights for policy 0, policy_version 169717 (0.0111) [2024-06-15 13:35:00,774][1648981] Fps is (10 sec: 45838.7, 60 sec: 48599.4, 300 sec: 46762.6). Total num frames: 347570176. Throughput: 0: 11755.3. Samples: 86949888. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:00,775][1648981] Avg episode reward: [(0, '320.150')] [2024-06-15 13:35:01,322][1651669] Updated weights for policy 0, policy_version 169744 (0.0013) [2024-06-15 13:35:02,333][1651669] Updated weights for policy 0, policy_version 169788 (0.0029) [2024-06-15 13:35:05,766][1648981] Fps is (10 sec: 52436.3, 60 sec: 43690.7, 300 sec: 46320.1). Total num frames: 347734016. Throughput: 0: 11753.2. Samples: 87019008. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:05,767][1648981] Avg episode reward: [(0, '319.150')] [2024-06-15 13:35:08,091][1651274] Signal inference workers to stop experience collection... (8900 times) [2024-06-15 13:35:08,105][1651669] Updated weights for policy 0, policy_version 169825 (0.0014) [2024-06-15 13:35:08,145][1651669] InferenceWorker_p0-w0: stopping experience collection (8900 times) [2024-06-15 13:35:08,370][1651274] Signal inference workers to resume experience collection... (8900 times) [2024-06-15 13:35:08,386][1651669] InferenceWorker_p0-w0: resuming experience collection (8900 times) [2024-06-15 13:35:09,878][1651669] Updated weights for policy 0, policy_version 169904 (0.0089) [2024-06-15 13:35:10,766][1648981] Fps is (10 sec: 42632.2, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 347996160. Throughput: 0: 11719.1. Samples: 87055360. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:10,767][1648981] Avg episode reward: [(0, '331.050')] [2024-06-15 13:35:10,974][1651669] Updated weights for policy 0, policy_version 169922 (0.0019) [2024-06-15 13:35:12,923][1651669] Updated weights for policy 0, policy_version 170004 (0.0105) [2024-06-15 13:35:15,774][1648981] Fps is (10 sec: 52387.7, 60 sec: 46427.6, 300 sec: 46873.6). Total num frames: 348258304. Throughput: 0: 11626.0. Samples: 87112704. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:15,775][1648981] Avg episode reward: [(0, '339.550')] [2024-06-15 13:35:19,637][1651669] Updated weights for policy 0, policy_version 170075 (0.0014) [2024-06-15 13:35:20,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 45875.2, 300 sec: 46657.1). Total num frames: 348389376. Throughput: 0: 11639.5. Samples: 87191552. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:20,767][1648981] Avg episode reward: [(0, '332.100')] [2024-06-15 13:35:21,478][1651669] Updated weights for policy 0, policy_version 170144 (0.0349) [2024-06-15 13:35:22,657][1651669] Updated weights for policy 0, policy_version 170192 (0.0122) [2024-06-15 13:35:24,549][1651669] Updated weights for policy 0, policy_version 170262 (0.0118) [2024-06-15 13:35:25,766][1648981] Fps is (10 sec: 52469.4, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 348782592. Throughput: 0: 11446.0. Samples: 87214592. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:25,767][1648981] Avg episode reward: [(0, '333.210')] [2024-06-15 13:35:30,797][1648981] Fps is (10 sec: 39200.0, 60 sec: 43668.1, 300 sec: 46203.6). Total num frames: 348782592. Throughput: 0: 11460.9. Samples: 87296000. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:30,798][1648981] Avg episode reward: [(0, '330.880')] [2024-06-15 13:35:31,166][1651669] Updated weights for policy 0, policy_version 170327 (0.0013) [2024-06-15 13:35:33,028][1651669] Updated weights for policy 0, policy_version 170400 (0.0012) [2024-06-15 13:35:34,648][1651669] Updated weights for policy 0, policy_version 170464 (0.0013) [2024-06-15 13:35:35,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 47513.6, 300 sec: 46655.3). Total num frames: 349175808. Throughput: 0: 11332.3. Samples: 87351296. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:35,767][1648981] Avg episode reward: [(0, '327.410')] [2024-06-15 13:35:36,855][1651669] Updated weights for policy 0, policy_version 170550 (0.0140) [2024-06-15 13:35:40,766][1648981] Fps is (10 sec: 52591.4, 60 sec: 43690.6, 300 sec: 46209.1). Total num frames: 349306880. Throughput: 0: 11298.5. Samples: 87387648. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:40,767][1648981] Avg episode reward: [(0, '333.340')] [2024-06-15 13:35:42,752][1651669] Updated weights for policy 0, policy_version 170593 (0.0016) [2024-06-15 13:35:43,886][1651669] Updated weights for policy 0, policy_version 170628 (0.0053) [2024-06-15 13:35:45,413][1651669] Updated weights for policy 0, policy_version 170689 (0.0013) [2024-06-15 13:35:45,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 46652.8). Total num frames: 349601792. Throughput: 0: 11482.2. Samples: 87466496. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:45,767][1648981] Avg episode reward: [(0, '335.300')] [2024-06-15 13:35:45,821][1651274] Signal inference workers to stop experience collection... (8950 times) [2024-06-15 13:35:45,889][1651669] InferenceWorker_p0-w0: stopping experience collection (8950 times) [2024-06-15 13:35:46,085][1651274] Signal inference workers to resume experience collection... (8950 times) [2024-06-15 13:35:46,086][1651669] InferenceWorker_p0-w0: resuming experience collection (8950 times) [2024-06-15 13:35:46,735][1651669] Updated weights for policy 0, policy_version 170743 (0.0106) [2024-06-15 13:35:48,217][1651669] Updated weights for policy 0, policy_version 170810 (0.0014) [2024-06-15 13:35:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 349831168. Throughput: 0: 11400.5. Samples: 87532032. Policy #0 lag: (min: 15.0, avg: 91.2, max: 271.0) [2024-06-15 13:35:50,767][1648981] Avg episode reward: [(0, '332.590')] [2024-06-15 13:35:53,890][1651669] Updated weights for policy 0, policy_version 170864 (0.0016) [2024-06-15 13:35:55,055][1651669] Updated weights for policy 0, policy_version 170897 (0.0016) [2024-06-15 13:35:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47514.8, 300 sec: 46652.7). Total num frames: 350060544. Throughput: 0: 11719.1. Samples: 87582720. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:35:55,767][1648981] Avg episode reward: [(0, '338.430')] [2024-06-15 13:35:56,825][1651669] Updated weights for policy 0, policy_version 170976 (0.0012) [2024-06-15 13:35:58,780][1651669] Updated weights for policy 0, policy_version 171040 (0.0014) [2024-06-15 13:36:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46427.5, 300 sec: 46541.7). Total num frames: 350355456. Throughput: 0: 11561.9. Samples: 87632896. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:00,767][1648981] Avg episode reward: [(0, '336.730')] [2024-06-15 13:36:05,410][1651669] Updated weights for policy 0, policy_version 171120 (0.0013) [2024-06-15 13:36:05,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45329.1, 300 sec: 46319.5). Total num frames: 350453760. Throughput: 0: 11776.0. Samples: 87721472. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:05,767][1648981] Avg episode reward: [(0, '330.690')] [2024-06-15 13:36:06,950][1651669] Updated weights for policy 0, policy_version 171170 (0.0013) [2024-06-15 13:36:08,871][1651669] Updated weights for policy 0, policy_version 171248 (0.0140) [2024-06-15 13:36:10,385][1651669] Updated weights for policy 0, policy_version 171317 (0.0017) [2024-06-15 13:36:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 46652.7). Total num frames: 350879744. Throughput: 0: 11844.3. Samples: 87747584. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:10,767][1648981] Avg episode reward: [(0, '327.480')] [2024-06-15 13:36:15,774][1648981] Fps is (10 sec: 42565.2, 60 sec: 43690.7, 300 sec: 46207.2). Total num frames: 350879744. Throughput: 0: 11770.7. Samples: 87825408. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:15,775][1648981] Avg episode reward: [(0, '321.360')] [2024-06-15 13:36:16,553][1651669] Updated weights for policy 0, policy_version 171376 (0.0024) [2024-06-15 13:36:18,611][1651669] Updated weights for policy 0, policy_version 171441 (0.0017) [2024-06-15 13:36:19,841][1651669] Updated weights for policy 0, policy_version 171491 (0.0016) [2024-06-15 13:36:20,767][1648981] Fps is (10 sec: 39321.1, 60 sec: 48059.6, 300 sec: 46652.7). Total num frames: 351272960. Throughput: 0: 11923.9. Samples: 87887872. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:20,767][1648981] Avg episode reward: [(0, '338.120')] [2024-06-15 13:36:21,531][1651274] Signal inference workers to stop experience collection... (9000 times) [2024-06-15 13:36:21,561][1651669] InferenceWorker_p0-w0: stopping experience collection (9000 times) [2024-06-15 13:36:21,784][1651274] Signal inference workers to resume experience collection... (9000 times) [2024-06-15 13:36:21,785][1651669] InferenceWorker_p0-w0: resuming experience collection (9000 times) [2024-06-15 13:36:22,012][1651669] Updated weights for policy 0, policy_version 171582 (0.0013) [2024-06-15 13:36:25,770][1648981] Fps is (10 sec: 52451.3, 60 sec: 43688.2, 300 sec: 46207.9). Total num frames: 351404032. Throughput: 0: 11741.0. Samples: 87916032. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:25,771][1648981] Avg episode reward: [(0, '340.250')] [2024-06-15 13:36:29,690][1651669] Updated weights for policy 0, policy_version 171680 (0.0015) [2024-06-15 13:36:30,766][1648981] Fps is (10 sec: 39322.1, 60 sec: 48084.6, 300 sec: 46430.6). Total num frames: 351666176. Throughput: 0: 11673.6. Samples: 87991808. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:30,767][1648981] Avg episode reward: [(0, '341.530')] [2024-06-15 13:36:31,350][1651669] Updated weights for policy 0, policy_version 171744 (0.0040) [2024-06-15 13:36:33,305][1651669] Updated weights for policy 0, policy_version 171810 (0.0079) [2024-06-15 13:36:35,766][1648981] Fps is (10 sec: 52446.5, 60 sec: 45875.1, 300 sec: 46209.7). Total num frames: 351928320. Throughput: 0: 11411.9. Samples: 88045568. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:35,767][1648981] Avg episode reward: [(0, '343.150')] [2024-06-15 13:36:39,949][1651669] Updated weights for policy 0, policy_version 171873 (0.0013) [2024-06-15 13:36:40,798][1648981] Fps is (10 sec: 39196.7, 60 sec: 45850.9, 300 sec: 46647.7). Total num frames: 352059392. Throughput: 0: 11324.2. Samples: 88092672. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:40,799][1648981] Avg episode reward: [(0, '342.700')] [2024-06-15 13:36:42,048][1651669] Updated weights for policy 0, policy_version 171954 (0.0098) [2024-06-15 13:36:44,695][1651669] Updated weights for policy 0, policy_version 172064 (0.0013) [2024-06-15 13:36:45,767][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.5, 300 sec: 46430.6). Total num frames: 352452608. Throughput: 0: 11275.3. Samples: 88140288. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:45,767][1648981] Avg episode reward: [(0, '355.850')] [2024-06-15 13:36:50,766][1648981] Fps is (10 sec: 39447.0, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 352452608. Throughput: 0: 11127.5. Samples: 88222208. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:50,767][1648981] Avg episode reward: [(0, '348.380')] [2024-06-15 13:36:51,772][1651669] Updated weights for policy 0, policy_version 172114 (0.0012) [2024-06-15 13:36:53,656][1651669] Updated weights for policy 0, policy_version 172192 (0.0013) [2024-06-15 13:36:55,087][1651669] Updated weights for policy 0, policy_version 172256 (0.0013) [2024-06-15 13:36:55,769][1648981] Fps is (10 sec: 36037.1, 60 sec: 45873.5, 300 sec: 46208.1). Total num frames: 352813056. Throughput: 0: 11343.1. Samples: 88258048. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:36:55,769][1648981] Avg episode reward: [(0, '362.110')] [2024-06-15 13:36:56,173][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000172304_352878592.pth... [2024-06-15 13:36:56,314][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000166816_341639168.pth [2024-06-15 13:36:56,836][1651669] Updated weights for policy 0, policy_version 172336 (0.0014) [2024-06-15 13:37:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 352976896. Throughput: 0: 11129.4. Samples: 88326144. Policy #0 lag: (min: 15.0, avg: 78.7, max: 271.0) [2024-06-15 13:37:00,767][1648981] Avg episode reward: [(0, '352.380')] [2024-06-15 13:37:03,388][1651669] Updated weights for policy 0, policy_version 172385 (0.0013) [2024-06-15 13:37:04,139][1651274] Signal inference workers to stop experience collection... (9050 times) [2024-06-15 13:37:04,214][1651669] InferenceWorker_p0-w0: stopping experience collection (9050 times) [2024-06-15 13:37:04,344][1651274] Signal inference workers to resume experience collection... (9050 times) [2024-06-15 13:37:04,344][1651669] InferenceWorker_p0-w0: resuming experience collection (9050 times) [2024-06-15 13:37:05,242][1651669] Updated weights for policy 0, policy_version 172464 (0.0012) [2024-06-15 13:37:05,766][1648981] Fps is (10 sec: 42608.1, 60 sec: 46421.4, 300 sec: 46430.8). Total num frames: 353239040. Throughput: 0: 11309.5. Samples: 88396800. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:05,767][1648981] Avg episode reward: [(0, '349.080')] [2024-06-15 13:37:06,463][1651669] Updated weights for policy 0, policy_version 172517 (0.0011) [2024-06-15 13:37:08,152][1651669] Updated weights for policy 0, policy_version 172594 (0.0106) [2024-06-15 13:37:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 43690.6, 300 sec: 46208.5). Total num frames: 353501184. Throughput: 0: 11299.0. Samples: 88424448. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:10,767][1648981] Avg episode reward: [(0, '356.660')] [2024-06-15 13:37:14,752][1651669] Updated weights for policy 0, policy_version 172659 (0.0014) [2024-06-15 13:37:15,767][1648981] Fps is (10 sec: 42597.0, 60 sec: 46427.2, 300 sec: 46542.7). Total num frames: 353665024. Throughput: 0: 11423.2. Samples: 88505856. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:15,767][1648981] Avg episode reward: [(0, '344.660')] [2024-06-15 13:37:16,534][1651669] Updated weights for policy 0, policy_version 172721 (0.0013) [2024-06-15 13:37:18,585][1651669] Updated weights for policy 0, policy_version 172804 (0.0013) [2024-06-15 13:37:19,568][1651669] Updated weights for policy 0, policy_version 172859 (0.0012) [2024-06-15 13:37:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46208.4). Total num frames: 354025472. Throughput: 0: 11366.4. Samples: 88557056. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:20,767][1648981] Avg episode reward: [(0, '339.100')] [2024-06-15 13:37:25,766][1648981] Fps is (10 sec: 39322.8, 60 sec: 44239.4, 300 sec: 46208.4). Total num frames: 354058240. Throughput: 0: 11328.9. Samples: 88602112. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:25,767][1648981] Avg episode reward: [(0, '357.160')] [2024-06-15 13:37:26,632][1651669] Updated weights for policy 0, policy_version 172928 (0.0013) [2024-06-15 13:37:28,338][1651669] Updated weights for policy 0, policy_version 172997 (0.0013) [2024-06-15 13:37:30,147][1651669] Updated weights for policy 0, policy_version 173088 (0.0013) [2024-06-15 13:37:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 354549760. Throughput: 0: 11571.2. Samples: 88660992. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:30,767][1648981] Avg episode reward: [(0, '361.210')] [2024-06-15 13:37:35,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 43690.8, 300 sec: 46208.5). Total num frames: 354549760. Throughput: 0: 11628.1. Samples: 88745472. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:35,767][1648981] Avg episode reward: [(0, '356.400')] [2024-06-15 13:37:36,730][1651669] Updated weights for policy 0, policy_version 173153 (0.0020) [2024-06-15 13:37:38,358][1651669] Updated weights for policy 0, policy_version 173233 (0.0013) [2024-06-15 13:37:39,565][1651669] Updated weights for policy 0, policy_version 173266 (0.0012) [2024-06-15 13:37:39,896][1651274] Signal inference workers to stop experience collection... (9100 times) [2024-06-15 13:37:39,946][1651669] InferenceWorker_p0-w0: stopping experience collection (9100 times) [2024-06-15 13:37:40,162][1651274] Signal inference workers to resume experience collection... (9100 times) [2024-06-15 13:37:40,163][1651669] InferenceWorker_p0-w0: resuming experience collection (9100 times) [2024-06-15 13:37:40,775][1648981] Fps is (10 sec: 42563.9, 60 sec: 48625.1, 300 sec: 46762.5). Total num frames: 354975744. Throughput: 0: 11433.2. Samples: 88772608. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:40,775][1648981] Avg episode reward: [(0, '351.250')] [2024-06-15 13:37:41,235][1651669] Updated weights for policy 0, policy_version 173344 (0.0013) [2024-06-15 13:37:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 355074048. Throughput: 0: 11548.4. Samples: 88845824. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:45,767][1648981] Avg episode reward: [(0, '374.040')] [2024-06-15 13:37:45,768][1651274] Saving new best policy, reward=374.040! [2024-06-15 13:37:46,722][1651669] Updated weights for policy 0, policy_version 173380 (0.0013) [2024-06-15 13:37:48,260][1651669] Updated weights for policy 0, policy_version 173456 (0.0012) [2024-06-15 13:37:50,533][1651669] Updated weights for policy 0, policy_version 173526 (0.0016) [2024-06-15 13:37:50,766][1648981] Fps is (10 sec: 42632.7, 60 sec: 49152.0, 300 sec: 46874.9). Total num frames: 355401728. Throughput: 0: 11593.9. Samples: 88918528. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:50,767][1648981] Avg episode reward: [(0, '374.560')] [2024-06-15 13:37:50,946][1651274] Saving new best policy, reward=374.560! [2024-06-15 13:37:51,922][1651669] Updated weights for policy 0, policy_version 173585 (0.0014) [2024-06-15 13:37:52,906][1651669] Updated weights for policy 0, policy_version 173628 (0.0046) [2024-06-15 13:37:55,768][1648981] Fps is (10 sec: 52420.7, 60 sec: 46421.9, 300 sec: 46319.3). Total num frames: 355598336. Throughput: 0: 11661.8. Samples: 88949248. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:37:55,768][1648981] Avg episode reward: [(0, '369.770')] [2024-06-15 13:37:58,546][1651669] Updated weights for policy 0, policy_version 173686 (0.0013) [2024-06-15 13:38:00,768][1648981] Fps is (10 sec: 45866.9, 60 sec: 48058.2, 300 sec: 46985.7). Total num frames: 355860480. Throughput: 0: 11604.9. Samples: 89028096. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:38:00,769][1648981] Avg episode reward: [(0, '374.030')] [2024-06-15 13:38:01,200][1651669] Updated weights for policy 0, policy_version 173761 (0.0022) [2024-06-15 13:38:02,656][1651669] Updated weights for policy 0, policy_version 173827 (0.0091) [2024-06-15 13:38:04,085][1651669] Updated weights for policy 0, policy_version 173885 (0.0029) [2024-06-15 13:38:05,774][1648981] Fps is (10 sec: 52395.7, 60 sec: 48053.4, 300 sec: 46429.4). Total num frames: 356122624. Throughput: 0: 11876.3. Samples: 89091584. Policy #0 lag: (min: 15.0, avg: 62.8, max: 269.0) [2024-06-15 13:38:05,775][1648981] Avg episode reward: [(0, '371.260')] [2024-06-15 13:38:09,756][1651669] Updated weights for policy 0, policy_version 173937 (0.0013) [2024-06-15 13:38:10,766][1648981] Fps is (10 sec: 45883.5, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 356319232. Throughput: 0: 11821.5. Samples: 89134080. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:10,767][1648981] Avg episode reward: [(0, '390.070')] [2024-06-15 13:38:11,065][1651669] Updated weights for policy 0, policy_version 174011 (0.0014) [2024-06-15 13:38:11,126][1651274] Saving new best policy, reward=390.070! [2024-06-15 13:38:13,663][1651669] Updated weights for policy 0, policy_version 174080 (0.0126) [2024-06-15 13:38:15,348][1651669] Updated weights for policy 0, policy_version 174141 (0.0011) [2024-06-15 13:38:15,766][1648981] Fps is (10 sec: 52469.8, 60 sec: 49698.3, 300 sec: 46652.8). Total num frames: 356646912. Throughput: 0: 11980.8. Samples: 89200128. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:15,767][1648981] Avg episode reward: [(0, '398.800')] [2024-06-15 13:38:15,768][1651274] Saving new best policy, reward=398.800! [2024-06-15 13:38:20,774][1648981] Fps is (10 sec: 36017.1, 60 sec: 44231.1, 300 sec: 46318.3). Total num frames: 356679680. Throughput: 0: 11853.6. Samples: 89278976. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:20,775][1648981] Avg episode reward: [(0, '384.960')] [2024-06-15 13:38:21,124][1651274] Signal inference workers to stop experience collection... (9150 times) [2024-06-15 13:38:21,214][1651669] InferenceWorker_p0-w0: stopping experience collection (9150 times) [2024-06-15 13:38:21,229][1651669] Updated weights for policy 0, policy_version 174201 (0.0012) [2024-06-15 13:38:21,309][1651274] Signal inference workers to resume experience collection... (9150 times) [2024-06-15 13:38:21,309][1651669] InferenceWorker_p0-w0: resuming experience collection (9150 times) [2024-06-15 13:38:22,508][1651669] Updated weights for policy 0, policy_version 174272 (0.0080) [2024-06-15 13:38:25,290][1651669] Updated weights for policy 0, policy_version 174353 (0.0013) [2024-06-15 13:38:25,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 50790.3, 300 sec: 46542.3). Total num frames: 357105664. Throughput: 0: 12017.1. Samples: 89313280. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:25,767][1648981] Avg episode reward: [(0, '387.460')] [2024-06-15 13:38:30,766][1648981] Fps is (10 sec: 49190.2, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 357171200. Throughput: 0: 11923.9. Samples: 89382400. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:30,767][1648981] Avg episode reward: [(0, '377.710')] [2024-06-15 13:38:31,399][1651669] Updated weights for policy 0, policy_version 174418 (0.0013) [2024-06-15 13:38:33,058][1651669] Updated weights for policy 0, policy_version 174496 (0.0089) [2024-06-15 13:38:35,175][1651669] Updated weights for policy 0, policy_version 174562 (0.0013) [2024-06-15 13:38:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 50244.2, 300 sec: 46652.7). Total num frames: 357564416. Throughput: 0: 11832.9. Samples: 89451008. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:35,767][1648981] Avg episode reward: [(0, '372.410')] [2024-06-15 13:38:36,259][1651669] Updated weights for policy 0, policy_version 174624 (0.0013) [2024-06-15 13:38:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45335.2, 300 sec: 46208.4). Total num frames: 357695488. Throughput: 0: 12015.4. Samples: 89489920. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:40,767][1648981] Avg episode reward: [(0, '369.710')] [2024-06-15 13:38:42,565][1651669] Updated weights for policy 0, policy_version 174688 (0.0012) [2024-06-15 13:38:44,496][1651669] Updated weights for policy 0, policy_version 174777 (0.0129) [2024-06-15 13:38:45,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 49698.2, 300 sec: 46874.9). Total num frames: 358055936. Throughput: 0: 11844.8. Samples: 89561088. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:45,767][1648981] Avg episode reward: [(0, '367.870')] [2024-06-15 13:38:46,127][1651669] Updated weights for policy 0, policy_version 174848 (0.0013) [2024-06-15 13:38:47,160][1651669] Updated weights for policy 0, policy_version 174896 (0.0091) [2024-06-15 13:38:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 46208.4). Total num frames: 358219776. Throughput: 0: 12130.8. Samples: 89637376. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:50,767][1648981] Avg episode reward: [(0, '356.070')] [2024-06-15 13:38:53,214][1651669] Updated weights for policy 0, policy_version 174928 (0.0021) [2024-06-15 13:38:55,304][1651669] Updated weights for policy 0, policy_version 175008 (0.0012) [2024-06-15 13:38:55,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 47514.8, 300 sec: 46763.8). Total num frames: 358449152. Throughput: 0: 12174.2. Samples: 89681920. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:38:55,767][1648981] Avg episode reward: [(0, '343.460')] [2024-06-15 13:38:56,301][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000175056_358514688.pth... [2024-06-15 13:38:56,422][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000169536_347209728.pth [2024-06-15 13:38:57,033][1651669] Updated weights for policy 0, policy_version 175088 (0.0013) [2024-06-15 13:38:57,137][1651274] Signal inference workers to stop experience collection... (9200 times) [2024-06-15 13:38:57,165][1651669] InferenceWorker_p0-w0: stopping experience collection (9200 times) [2024-06-15 13:38:57,377][1651274] Signal inference workers to resume experience collection... (9200 times) [2024-06-15 13:38:57,378][1651669] InferenceWorker_p0-w0: resuming experience collection (9200 times) [2024-06-15 13:38:58,686][1651669] Updated weights for policy 0, policy_version 175159 (0.0012) [2024-06-15 13:39:00,767][1648981] Fps is (10 sec: 52424.8, 60 sec: 48060.6, 300 sec: 46208.3). Total num frames: 358744064. Throughput: 0: 11855.5. Samples: 89733632. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:39:00,768][1648981] Avg episode reward: [(0, '336.090')] [2024-06-15 13:39:05,632][1651669] Updated weights for policy 0, policy_version 175216 (0.0013) [2024-06-15 13:39:05,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 45335.0, 300 sec: 46541.7). Total num frames: 358842368. Throughput: 0: 12017.0. Samples: 89819648. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:39:05,767][1648981] Avg episode reward: [(0, '341.060')] [2024-06-15 13:39:07,370][1651669] Updated weights for policy 0, policy_version 175281 (0.0014) [2024-06-15 13:39:08,577][1651669] Updated weights for policy 0, policy_version 175344 (0.0116) [2024-06-15 13:39:10,281][1651669] Updated weights for policy 0, policy_version 175409 (0.0014) [2024-06-15 13:39:10,769][1648981] Fps is (10 sec: 52420.8, 60 sec: 49150.2, 300 sec: 46766.0). Total num frames: 359268352. Throughput: 0: 11798.2. Samples: 89844224. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:39:10,770][1648981] Avg episode reward: [(0, '335.520')] [2024-06-15 13:39:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 359268352. Throughput: 0: 11901.2. Samples: 89917952. Policy #0 lag: (min: 15.0, avg: 82.5, max: 271.0) [2024-06-15 13:39:15,767][1648981] Avg episode reward: [(0, '339.790')] [2024-06-15 13:39:16,712][1651669] Updated weights for policy 0, policy_version 175456 (0.0015) [2024-06-15 13:39:18,233][1651669] Updated weights for policy 0, policy_version 175520 (0.0014) [2024-06-15 13:39:19,639][1651669] Updated weights for policy 0, policy_version 175584 (0.0012) [2024-06-15 13:39:20,766][1648981] Fps is (10 sec: 39331.0, 60 sec: 49704.6, 300 sec: 46652.8). Total num frames: 359661568. Throughput: 0: 11889.8. Samples: 89986048. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:20,767][1648981] Avg episode reward: [(0, '323.440')] [2024-06-15 13:39:21,467][1651669] Updated weights for policy 0, policy_version 175651 (0.0013) [2024-06-15 13:39:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 359792640. Throughput: 0: 11776.0. Samples: 90019840. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:25,767][1648981] Avg episode reward: [(0, '320.000')] [2024-06-15 13:39:27,825][1651669] Updated weights for policy 0, policy_version 175728 (0.0015) [2024-06-15 13:39:29,400][1651669] Updated weights for policy 0, policy_version 175797 (0.0020) [2024-06-15 13:39:30,775][1648981] Fps is (10 sec: 45837.8, 60 sec: 49145.3, 300 sec: 46762.5). Total num frames: 360120320. Throughput: 0: 11785.2. Samples: 90091520. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:30,775][1648981] Avg episode reward: [(0, '337.470')] [2024-06-15 13:39:30,948][1651669] Updated weights for policy 0, policy_version 175860 (0.0013) [2024-06-15 13:39:32,379][1651669] Updated weights for policy 0, policy_version 175929 (0.0012) [2024-06-15 13:39:35,782][1648981] Fps is (10 sec: 52346.0, 60 sec: 45863.2, 300 sec: 46206.0). Total num frames: 360316928. Throughput: 0: 11680.9. Samples: 90163200. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:35,783][1648981] Avg episode reward: [(0, '339.220')] [2024-06-15 13:39:38,173][1651274] Signal inference workers to stop experience collection... (9250 times) [2024-06-15 13:39:38,254][1651669] InferenceWorker_p0-w0: stopping experience collection (9250 times) [2024-06-15 13:39:38,357][1651274] Signal inference workers to resume experience collection... (9250 times) [2024-06-15 13:39:38,359][1651669] InferenceWorker_p0-w0: resuming experience collection (9250 times) [2024-06-15 13:39:38,552][1651669] Updated weights for policy 0, policy_version 175971 (0.0109) [2024-06-15 13:39:39,600][1651669] Updated weights for policy 0, policy_version 176032 (0.0013) [2024-06-15 13:39:40,766][1648981] Fps is (10 sec: 49192.2, 60 sec: 48605.9, 300 sec: 46763.8). Total num frames: 360611840. Throughput: 0: 11594.0. Samples: 90203648. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:40,767][1648981] Avg episode reward: [(0, '349.700')] [2024-06-15 13:39:41,481][1651669] Updated weights for policy 0, policy_version 176112 (0.0013) [2024-06-15 13:39:43,584][1651669] Updated weights for policy 0, policy_version 176187 (0.0014) [2024-06-15 13:39:45,766][1648981] Fps is (10 sec: 52511.7, 60 sec: 46421.3, 300 sec: 46541.7). Total num frames: 360841216. Throughput: 0: 11685.2. Samples: 90259456. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:45,767][1648981] Avg episode reward: [(0, '354.320')] [2024-06-15 13:39:50,067][1651669] Updated weights for policy 0, policy_version 176240 (0.0012) [2024-06-15 13:39:50,767][1648981] Fps is (10 sec: 36043.4, 60 sec: 45875.0, 300 sec: 46652.9). Total num frames: 360972288. Throughput: 0: 11525.6. Samples: 90338304. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:50,767][1648981] Avg episode reward: [(0, '361.770')] [2024-06-15 13:39:52,191][1651669] Updated weights for policy 0, policy_version 176306 (0.0013) [2024-06-15 13:39:54,156][1651669] Updated weights for policy 0, policy_version 176384 (0.0102) [2024-06-15 13:39:55,774][1648981] Fps is (10 sec: 49113.5, 60 sec: 48053.4, 300 sec: 46652.8). Total num frames: 361332736. Throughput: 0: 11581.1. Samples: 90365440. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:39:55,775][1648981] Avg episode reward: [(0, '359.760')] [2024-06-15 13:40:00,592][1651669] Updated weights for policy 0, policy_version 176464 (0.0136) [2024-06-15 13:40:00,766][1648981] Fps is (10 sec: 42600.0, 60 sec: 44237.4, 300 sec: 46319.5). Total num frames: 361398272. Throughput: 0: 11537.1. Samples: 90437120. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:40:00,767][1648981] Avg episode reward: [(0, '349.900')] [2024-06-15 13:40:02,528][1651669] Updated weights for policy 0, policy_version 176513 (0.0011) [2024-06-15 13:40:04,310][1651669] Updated weights for policy 0, policy_version 176592 (0.0012) [2024-06-15 13:40:05,767][1648981] Fps is (10 sec: 42628.6, 60 sec: 48605.2, 300 sec: 46652.6). Total num frames: 361758720. Throughput: 0: 11491.3. Samples: 90503168. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:40:05,768][1648981] Avg episode reward: [(0, '351.340')] [2024-06-15 13:40:05,851][1651669] Updated weights for policy 0, policy_version 176644 (0.0014) [2024-06-15 13:40:07,147][1651669] Updated weights for policy 0, policy_version 176698 (0.0014) [2024-06-15 13:40:10,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 43692.3, 300 sec: 46209.7). Total num frames: 361889792. Throughput: 0: 11457.4. Samples: 90535424. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:40:10,767][1648981] Avg episode reward: [(0, '348.340')] [2024-06-15 13:40:12,645][1651669] Updated weights for policy 0, policy_version 176739 (0.0017) [2024-06-15 13:40:14,003][1651669] Updated weights for policy 0, policy_version 176787 (0.0031) [2024-06-15 13:40:15,472][1651669] Updated weights for policy 0, policy_version 176850 (0.0128) [2024-06-15 13:40:15,767][1648981] Fps is (10 sec: 45878.0, 60 sec: 49151.8, 300 sec: 46874.9). Total num frames: 362217472. Throughput: 0: 11721.2. Samples: 90618880. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:40:15,767][1648981] Avg episode reward: [(0, '349.170')] [2024-06-15 13:40:15,857][1651274] Signal inference workers to stop experience collection... (9300 times) [2024-06-15 13:40:15,933][1651669] InferenceWorker_p0-w0: stopping experience collection (9300 times) [2024-06-15 13:40:16,211][1651274] Signal inference workers to resume experience collection... (9300 times) [2024-06-15 13:40:16,212][1651669] InferenceWorker_p0-w0: resuming experience collection (9300 times) [2024-06-15 13:40:17,224][1651669] Updated weights for policy 0, policy_version 176913 (0.0012) [2024-06-15 13:40:18,117][1651669] Updated weights for policy 0, policy_version 176955 (0.0020) [2024-06-15 13:40:20,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 362414080. Throughput: 0: 11563.9. Samples: 90683392. Policy #0 lag: (min: 15.0, avg: 59.9, max: 271.0) [2024-06-15 13:40:20,767][1648981] Avg episode reward: [(0, '361.490')] [2024-06-15 13:40:23,900][1651669] Updated weights for policy 0, policy_version 177008 (0.0017) [2024-06-15 13:40:25,767][1648981] Fps is (10 sec: 39321.7, 60 sec: 46967.3, 300 sec: 46879.8). Total num frames: 362610688. Throughput: 0: 11537.0. Samples: 90722816. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:25,767][1648981] Avg episode reward: [(0, '348.380')] [2024-06-15 13:40:25,835][1651669] Updated weights for policy 0, policy_version 177056 (0.0124) [2024-06-15 13:40:27,463][1651669] Updated weights for policy 0, policy_version 177124 (0.0014) [2024-06-15 13:40:29,375][1651669] Updated weights for policy 0, policy_version 177208 (0.0073) [2024-06-15 13:40:30,798][1648981] Fps is (10 sec: 52263.1, 60 sec: 46948.9, 300 sec: 46647.7). Total num frames: 362938368. Throughput: 0: 11506.2. Samples: 90777600. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:30,799][1648981] Avg episode reward: [(0, '343.760')] [2024-06-15 13:40:35,123][1651669] Updated weights for policy 0, policy_version 177248 (0.0014) [2024-06-15 13:40:35,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 45887.3, 300 sec: 46652.8). Total num frames: 363069440. Throughput: 0: 11605.4. Samples: 90860544. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:35,767][1648981] Avg episode reward: [(0, '326.030')] [2024-06-15 13:40:35,785][1651669] Updated weights for policy 0, policy_version 177280 (0.0012) [2024-06-15 13:40:38,014][1651669] Updated weights for policy 0, policy_version 177346 (0.0013) [2024-06-15 13:40:39,251][1651669] Updated weights for policy 0, policy_version 177394 (0.0041) [2024-06-15 13:40:40,712][1651669] Updated weights for policy 0, policy_version 177456 (0.0017) [2024-06-15 13:40:40,766][1648981] Fps is (10 sec: 49308.4, 60 sec: 46967.4, 300 sec: 46874.9). Total num frames: 363429888. Throughput: 0: 11743.9. Samples: 90893824. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:40,767][1648981] Avg episode reward: [(0, '338.920')] [2024-06-15 13:40:45,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 363462656. Throughput: 0: 11844.3. Samples: 90970112. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:45,767][1648981] Avg episode reward: [(0, '340.310')] [2024-06-15 13:40:46,612][1651669] Updated weights for policy 0, policy_version 177520 (0.0116) [2024-06-15 13:40:47,842][1651669] Updated weights for policy 0, policy_version 177569 (0.0011) [2024-06-15 13:40:50,172][1651669] Updated weights for policy 0, policy_version 177664 (0.0016) [2024-06-15 13:40:50,770][1648981] Fps is (10 sec: 45857.8, 60 sec: 48603.0, 300 sec: 46874.3). Total num frames: 363888640. Throughput: 0: 11559.0. Samples: 91023360. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:50,771][1648981] Avg episode reward: [(0, '336.310')] [2024-06-15 13:40:51,528][1651669] Updated weights for policy 0, policy_version 177728 (0.0014) [2024-06-15 13:40:55,767][1648981] Fps is (10 sec: 52426.4, 60 sec: 44242.3, 300 sec: 46208.4). Total num frames: 363986944. Throughput: 0: 11650.7. Samples: 91059712. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:40:55,768][1648981] Avg episode reward: [(0, '338.160')] [2024-06-15 13:40:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000177728_363986944.pth... [2024-06-15 13:40:55,831][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000172304_352878592.pth [2024-06-15 13:40:57,819][1651274] Signal inference workers to stop experience collection... (9350 times) [2024-06-15 13:40:57,853][1651669] InferenceWorker_p0-w0: stopping experience collection (9350 times) [2024-06-15 13:40:58,113][1651274] Signal inference workers to resume experience collection... (9350 times) [2024-06-15 13:40:58,114][1651669] InferenceWorker_p0-w0: resuming experience collection (9350 times) [2024-06-15 13:40:58,684][1651669] Updated weights for policy 0, policy_version 177793 (0.0012) [2024-06-15 13:41:00,188][1651669] Updated weights for policy 0, policy_version 177856 (0.0012) [2024-06-15 13:41:00,766][1648981] Fps is (10 sec: 39337.0, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 364281856. Throughput: 0: 11457.5. Samples: 91134464. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:00,767][1648981] Avg episode reward: [(0, '331.440')] [2024-06-15 13:41:01,505][1651669] Updated weights for policy 0, policy_version 177907 (0.0053) [2024-06-15 13:41:02,947][1651669] Updated weights for policy 0, policy_version 177971 (0.0135) [2024-06-15 13:41:05,766][1648981] Fps is (10 sec: 52430.9, 60 sec: 45875.8, 300 sec: 46208.4). Total num frames: 364511232. Throughput: 0: 11491.6. Samples: 91200512. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:05,767][1648981] Avg episode reward: [(0, '339.920')] [2024-06-15 13:41:09,178][1651669] Updated weights for policy 0, policy_version 178008 (0.0014) [2024-06-15 13:41:10,817][1648981] Fps is (10 sec: 42384.9, 60 sec: 46928.1, 300 sec: 46868.2). Total num frames: 364707840. Throughput: 0: 11581.1. Samples: 91244544. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:10,818][1648981] Avg episode reward: [(0, '351.290')] [2024-06-15 13:41:11,263][1651669] Updated weights for policy 0, policy_version 178101 (0.0013) [2024-06-15 13:41:13,133][1651669] Updated weights for policy 0, policy_version 178176 (0.0101) [2024-06-15 13:41:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.7, 300 sec: 46652.8). Total num frames: 365035520. Throughput: 0: 11511.1. Samples: 91295232. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:15,767][1648981] Avg episode reward: [(0, '355.280')] [2024-06-15 13:41:20,710][1651669] Updated weights for policy 0, policy_version 178260 (0.0013) [2024-06-15 13:41:20,766][1648981] Fps is (10 sec: 36227.0, 60 sec: 44236.8, 300 sec: 46320.1). Total num frames: 365068288. Throughput: 0: 11662.2. Samples: 91385344. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:20,767][1648981] Avg episode reward: [(0, '353.990')] [2024-06-15 13:41:22,316][1651669] Updated weights for policy 0, policy_version 178336 (0.0013) [2024-06-15 13:41:24,692][1651669] Updated weights for policy 0, policy_version 178417 (0.0013) [2024-06-15 13:41:25,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48059.8, 300 sec: 46874.9). Total num frames: 365494272. Throughput: 0: 11377.8. Samples: 91405824. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:25,767][1648981] Avg episode reward: [(0, '363.790')] [2024-06-15 13:41:26,316][1651669] Updated weights for policy 0, policy_version 178490 (0.0098) [2024-06-15 13:41:30,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 43713.8, 300 sec: 46208.5). Total num frames: 365559808. Throughput: 0: 11150.2. Samples: 91471872. Policy #0 lag: (min: 15.0, avg: 83.4, max: 271.0) [2024-06-15 13:41:30,767][1648981] Avg episode reward: [(0, '360.150')] [2024-06-15 13:41:32,737][1651669] Updated weights for policy 0, policy_version 178544 (0.0092) [2024-06-15 13:41:34,463][1651274] Signal inference workers to stop experience collection... (9400 times) [2024-06-15 13:41:34,507][1651669] InferenceWorker_p0-w0: stopping experience collection (9400 times) [2024-06-15 13:41:34,673][1651274] Signal inference workers to resume experience collection... (9400 times) [2024-06-15 13:41:34,675][1651669] InferenceWorker_p0-w0: resuming experience collection (9400 times) [2024-06-15 13:41:35,298][1651669] Updated weights for policy 0, policy_version 178643 (0.0370) [2024-06-15 13:41:35,774][1648981] Fps is (10 sec: 39291.2, 60 sec: 46961.4, 300 sec: 46878.7). Total num frames: 365887488. Throughput: 0: 11490.6. Samples: 91540480. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:41:35,775][1648981] Avg episode reward: [(0, '357.010')] [2024-06-15 13:41:37,257][1651669] Updated weights for policy 0, policy_version 178723 (0.0102) [2024-06-15 13:41:37,683][1651669] Updated weights for policy 0, policy_version 178752 (0.0017) [2024-06-15 13:41:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 44236.8, 300 sec: 46208.4). Total num frames: 366084096. Throughput: 0: 11355.1. Samples: 91570688. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:41:40,767][1648981] Avg episode reward: [(0, '360.420')] [2024-06-15 13:41:44,267][1651669] Updated weights for policy 0, policy_version 178816 (0.0011) [2024-06-15 13:41:45,767][1648981] Fps is (10 sec: 42630.8, 60 sec: 47513.4, 300 sec: 46986.0). Total num frames: 366313472. Throughput: 0: 11548.4. Samples: 91654144. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:41:45,767][1648981] Avg episode reward: [(0, '363.060')] [2024-06-15 13:41:46,040][1651669] Updated weights for policy 0, policy_version 178887 (0.0247) [2024-06-15 13:41:47,318][1651669] Updated weights for policy 0, policy_version 178944 (0.0011) [2024-06-15 13:41:48,597][1651669] Updated weights for policy 0, policy_version 179001 (0.0040) [2024-06-15 13:41:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45332.0, 300 sec: 46764.2). Total num frames: 366608384. Throughput: 0: 11502.9. Samples: 91718144. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:41:50,767][1648981] Avg episode reward: [(0, '363.030')] [2024-06-15 13:41:55,015][1651669] Updated weights for policy 0, policy_version 179068 (0.0013) [2024-06-15 13:41:55,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 46421.6, 300 sec: 46763.8). Total num frames: 366772224. Throughput: 0: 11572.7. Samples: 91764736. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:41:55,767][1648981] Avg episode reward: [(0, '359.220')] [2024-06-15 13:41:56,515][1651669] Updated weights for policy 0, policy_version 179136 (0.0084) [2024-06-15 13:41:57,638][1651669] Updated weights for policy 0, policy_version 179186 (0.0017) [2024-06-15 13:41:58,987][1651669] Updated weights for policy 0, policy_version 179248 (0.0015) [2024-06-15 13:42:00,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 47513.5, 300 sec: 47097.0). Total num frames: 367132672. Throughput: 0: 11810.1. Samples: 91826688. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:00,767][1648981] Avg episode reward: [(0, '347.540')] [2024-06-15 13:42:04,439][1651669] Updated weights for policy 0, policy_version 179267 (0.0023) [2024-06-15 13:42:05,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 367230976. Throughput: 0: 11628.1. Samples: 91908608. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:05,767][1648981] Avg episode reward: [(0, '345.420')] [2024-06-15 13:42:05,892][1651669] Updated weights for policy 0, policy_version 179323 (0.0014) [2024-06-15 13:42:07,127][1651669] Updated weights for policy 0, policy_version 179377 (0.0011) [2024-06-15 13:42:08,863][1651274] Signal inference workers to stop experience collection... (9450 times) [2024-06-15 13:42:08,888][1651669] Updated weights for policy 0, policy_version 179458 (0.0236) [2024-06-15 13:42:08,920][1651669] InferenceWorker_p0-w0: stopping experience collection (9450 times) [2024-06-15 13:42:09,059][1651274] Signal inference workers to resume experience collection... (9450 times) [2024-06-15 13:42:09,060][1651669] InferenceWorker_p0-w0: resuming experience collection (9450 times) [2024-06-15 13:42:10,007][1651669] Updated weights for policy 0, policy_version 179517 (0.0011) [2024-06-15 13:42:10,774][1648981] Fps is (10 sec: 52388.0, 60 sec: 49186.8, 300 sec: 47429.1). Total num frames: 367656960. Throughput: 0: 11796.7. Samples: 91936768. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:10,775][1648981] Avg episode reward: [(0, '353.240')] [2024-06-15 13:42:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 367722496. Throughput: 0: 12231.1. Samples: 92022272. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:15,767][1648981] Avg episode reward: [(0, '343.740')] [2024-06-15 13:42:16,270][1651669] Updated weights for policy 0, policy_version 179577 (0.0014) [2024-06-15 13:42:18,356][1651669] Updated weights for policy 0, policy_version 179634 (0.0012) [2024-06-15 13:42:19,785][1651669] Updated weights for policy 0, policy_version 179713 (0.0106) [2024-06-15 13:42:20,766][1648981] Fps is (10 sec: 52470.1, 60 sec: 51882.7, 300 sec: 47874.6). Total num frames: 368181248. Throughput: 0: 12028.4. Samples: 92081664. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:20,767][1648981] Avg episode reward: [(0, '337.200')] [2024-06-15 13:42:25,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 44783.0, 300 sec: 46208.4). Total num frames: 368181248. Throughput: 0: 12242.5. Samples: 92121600. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:25,767][1648981] Avg episode reward: [(0, '330.510')] [2024-06-15 13:42:26,472][1651669] Updated weights for policy 0, policy_version 179779 (0.0014) [2024-06-15 13:42:27,999][1651669] Updated weights for policy 0, policy_version 179841 (0.0014) [2024-06-15 13:42:29,185][1651669] Updated weights for policy 0, policy_version 179888 (0.0101) [2024-06-15 13:42:30,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 368574464. Throughput: 0: 11912.6. Samples: 92190208. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:30,767][1648981] Avg episode reward: [(0, '334.780')] [2024-06-15 13:42:30,911][1651669] Updated weights for policy 0, policy_version 179972 (0.0098) [2024-06-15 13:42:31,917][1651669] Updated weights for policy 0, policy_version 180026 (0.0014) [2024-06-15 13:42:35,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 46973.4, 300 sec: 46542.9). Total num frames: 368705536. Throughput: 0: 12128.6. Samples: 92263936. Policy #0 lag: (min: 4.0, avg: 56.3, max: 260.0) [2024-06-15 13:42:35,768][1648981] Avg episode reward: [(0, '332.750')] [2024-06-15 13:42:38,516][1651669] Updated weights for policy 0, policy_version 180067 (0.0042) [2024-06-15 13:42:40,738][1651669] Updated weights for policy 0, policy_version 180144 (0.0015) [2024-06-15 13:42:40,766][1648981] Fps is (10 sec: 36045.0, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 368934912. Throughput: 0: 11969.4. Samples: 92303360. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:42:40,767][1648981] Avg episode reward: [(0, '331.610')] [2024-06-15 13:42:42,329][1651669] Updated weights for policy 0, policy_version 180224 (0.0017) [2024-06-15 13:42:43,587][1651669] Updated weights for policy 0, policy_version 180285 (0.0100) [2024-06-15 13:42:45,767][1648981] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 46874.9). Total num frames: 369229824. Throughput: 0: 11855.6. Samples: 92360192. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:42:45,767][1648981] Avg episode reward: [(0, '314.430')] [2024-06-15 13:42:49,724][1651669] Updated weights for policy 0, policy_version 180343 (0.0012) [2024-06-15 13:42:50,550][1651274] Signal inference workers to stop experience collection... (9500 times) [2024-06-15 13:42:50,609][1651669] InferenceWorker_p0-w0: stopping experience collection (9500 times) [2024-06-15 13:42:50,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 46653.0). Total num frames: 369360896. Throughput: 0: 11776.0. Samples: 92438528. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:42:50,767][1648981] Avg episode reward: [(0, '327.610')] [2024-06-15 13:42:50,809][1651274] Signal inference workers to resume experience collection... (9500 times) [2024-06-15 13:42:50,810][1651669] InferenceWorker_p0-w0: resuming experience collection (9500 times) [2024-06-15 13:42:51,147][1651669] Updated weights for policy 0, policy_version 180384 (0.0014) [2024-06-15 13:42:52,730][1651669] Updated weights for policy 0, policy_version 180452 (0.0013) [2024-06-15 13:42:54,631][1651669] Updated weights for policy 0, policy_version 180538 (0.0042) [2024-06-15 13:42:55,768][1648981] Fps is (10 sec: 52419.9, 60 sec: 49696.6, 300 sec: 47097.1). Total num frames: 369754112. Throughput: 0: 11743.4. Samples: 92465152. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:42:55,769][1648981] Avg episode reward: [(0, '331.680')] [2024-06-15 13:42:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000180544_369754112.pth... [2024-06-15 13:42:55,853][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000175056_358514688.pth [2024-06-15 13:43:00,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 44783.1, 300 sec: 46431.8). Total num frames: 369819648. Throughput: 0: 11673.6. Samples: 92547584. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:00,767][1648981] Avg episode reward: [(0, '341.200')] [2024-06-15 13:43:01,073][1651669] Updated weights for policy 0, policy_version 180597 (0.0013) [2024-06-15 13:43:02,960][1651669] Updated weights for policy 0, policy_version 180664 (0.0013) [2024-06-15 13:43:05,272][1651669] Updated weights for policy 0, policy_version 180752 (0.0012) [2024-06-15 13:43:05,766][1648981] Fps is (10 sec: 45883.8, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 370212864. Throughput: 0: 11548.4. Samples: 92601344. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:05,767][1648981] Avg episode reward: [(0, '344.130')] [2024-06-15 13:43:10,787][1648981] Fps is (10 sec: 45780.9, 60 sec: 43681.5, 300 sec: 46205.2). Total num frames: 370278400. Throughput: 0: 11554.6. Samples: 92641792. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:10,787][1648981] Avg episode reward: [(0, '343.970')] [2024-06-15 13:43:12,076][1651669] Updated weights for policy 0, policy_version 180816 (0.0060) [2024-06-15 13:43:14,414][1651669] Updated weights for policy 0, policy_version 180899 (0.0104) [2024-06-15 13:43:15,766][1648981] Fps is (10 sec: 32768.1, 60 sec: 46967.5, 300 sec: 46987.2). Total num frames: 370540544. Throughput: 0: 11377.8. Samples: 92702208. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:15,767][1648981] Avg episode reward: [(0, '332.050')] [2024-06-15 13:43:16,399][1651669] Updated weights for policy 0, policy_version 180960 (0.0012) [2024-06-15 13:43:18,022][1651669] Updated weights for policy 0, policy_version 181040 (0.0014) [2024-06-15 13:43:20,766][1648981] Fps is (10 sec: 52536.1, 60 sec: 43690.6, 300 sec: 46430.6). Total num frames: 370802688. Throughput: 0: 11366.5. Samples: 92775424. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:20,767][1648981] Avg episode reward: [(0, '332.690')] [2024-06-15 13:43:25,244][1651669] Updated weights for policy 0, policy_version 181123 (0.0015) [2024-06-15 13:43:25,767][1648981] Fps is (10 sec: 45873.3, 60 sec: 46967.2, 300 sec: 46874.8). Total num frames: 370999296. Throughput: 0: 11354.9. Samples: 92814336. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:25,767][1648981] Avg episode reward: [(0, '333.640')] [2024-06-15 13:43:26,141][1651669] Updated weights for policy 0, policy_version 181174 (0.0013) [2024-06-15 13:43:28,105][1651669] Updated weights for policy 0, policy_version 181236 (0.0013) [2024-06-15 13:43:28,595][1651274] Signal inference workers to stop experience collection... (9550 times) [2024-06-15 13:43:28,619][1651669] InferenceWorker_p0-w0: stopping experience collection (9550 times) [2024-06-15 13:43:28,777][1651274] Signal inference workers to resume experience collection... (9550 times) [2024-06-15 13:43:28,778][1651669] InferenceWorker_p0-w0: resuming experience collection (9550 times) [2024-06-15 13:43:29,250][1651669] Updated weights for policy 0, policy_version 181310 (0.0017) [2024-06-15 13:43:30,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 371326976. Throughput: 0: 11468.8. Samples: 92876288. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:30,767][1648981] Avg episode reward: [(0, '328.080')] [2024-06-15 13:43:35,134][1651669] Updated weights for policy 0, policy_version 181371 (0.0013) [2024-06-15 13:43:35,774][1648981] Fps is (10 sec: 45841.1, 60 sec: 45869.4, 300 sec: 46651.5). Total num frames: 371458048. Throughput: 0: 11489.6. Samples: 92955648. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:35,775][1648981] Avg episode reward: [(0, '332.660')] [2024-06-15 13:43:37,237][1651669] Updated weights for policy 0, policy_version 181430 (0.0013) [2024-06-15 13:43:38,497][1651669] Updated weights for policy 0, policy_version 181472 (0.0014) [2024-06-15 13:43:39,813][1651669] Updated weights for policy 0, policy_version 181536 (0.0014) [2024-06-15 13:43:40,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48605.8, 300 sec: 46763.8). Total num frames: 371851264. Throughput: 0: 11583.0. Samples: 92986368. Policy #0 lag: (min: 15.0, avg: 76.0, max: 271.0) [2024-06-15 13:43:40,767][1648981] Avg episode reward: [(0, '333.030')] [2024-06-15 13:43:44,265][1651669] Updated weights for policy 0, policy_version 181572 (0.0012) [2024-06-15 13:43:45,572][1651669] Updated weights for policy 0, policy_version 181624 (0.0014) [2024-06-15 13:43:45,766][1648981] Fps is (10 sec: 52469.1, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 371982336. Throughput: 0: 11559.8. Samples: 93067776. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:43:45,767][1648981] Avg episode reward: [(0, '345.170')] [2024-06-15 13:43:47,942][1651669] Updated weights for policy 0, policy_version 181686 (0.0030) [2024-06-15 13:43:48,828][1651669] Updated weights for policy 0, policy_version 181712 (0.0013) [2024-06-15 13:43:50,181][1651669] Updated weights for policy 0, policy_version 181777 (0.0013) [2024-06-15 13:43:50,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 47097.1). Total num frames: 372342784. Throughput: 0: 11776.0. Samples: 93131264. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:43:50,767][1648981] Avg episode reward: [(0, '344.660')] [2024-06-15 13:43:51,047][1651669] Updated weights for policy 0, policy_version 181824 (0.0013) [2024-06-15 13:43:55,800][1648981] Fps is (10 sec: 42499.6, 60 sec: 44220.9, 300 sec: 46316.0). Total num frames: 372408320. Throughput: 0: 11763.9. Samples: 93171200. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:43:55,801][1648981] Avg episode reward: [(0, '354.240')] [2024-06-15 13:43:56,575][1651669] Updated weights for policy 0, policy_version 181872 (0.0016) [2024-06-15 13:43:57,917][1651669] Updated weights for policy 0, policy_version 181920 (0.0011) [2024-06-15 13:43:59,738][1651669] Updated weights for policy 0, policy_version 181969 (0.0034) [2024-06-15 13:44:00,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 48605.5, 300 sec: 47097.0). Total num frames: 372736000. Throughput: 0: 12049.0. Samples: 93244416. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:00,768][1648981] Avg episode reward: [(0, '356.400')] [2024-06-15 13:44:01,401][1651669] Updated weights for policy 0, policy_version 182033 (0.0111) [2024-06-15 13:44:05,766][1648981] Fps is (10 sec: 49267.3, 60 sec: 44783.0, 300 sec: 46208.8). Total num frames: 372899840. Throughput: 0: 11992.2. Samples: 93315072. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:05,767][1648981] Avg episode reward: [(0, '365.770')] [2024-06-15 13:44:06,887][1651669] Updated weights for policy 0, policy_version 182112 (0.0016) [2024-06-15 13:44:08,566][1651669] Updated weights for policy 0, policy_version 182160 (0.0012) [2024-06-15 13:44:10,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 48076.1, 300 sec: 47097.0). Total num frames: 373161984. Throughput: 0: 11901.3. Samples: 93349888. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:10,767][1648981] Avg episode reward: [(0, '359.870')] [2024-06-15 13:44:10,820][1651669] Updated weights for policy 0, policy_version 182209 (0.0018) [2024-06-15 13:44:12,287][1651274] Signal inference workers to stop experience collection... (9600 times) [2024-06-15 13:44:12,314][1651274] Signal inference workers to resume experience collection... (9600 times) [2024-06-15 13:44:12,326][1651669] InferenceWorker_p0-w0: stopping experience collection (9600 times) [2024-06-15 13:44:12,328][1651669] Updated weights for policy 0, policy_version 182272 (0.0013) [2024-06-15 13:44:12,358][1651669] InferenceWorker_p0-w0: resuming experience collection (9600 times) [2024-06-15 13:44:13,721][1651669] Updated weights for policy 0, policy_version 182324 (0.0013) [2024-06-15 13:44:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 373424128. Throughput: 0: 11901.2. Samples: 93411840. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:15,767][1648981] Avg episode reward: [(0, '351.010')] [2024-06-15 13:44:17,817][1651669] Updated weights for policy 0, policy_version 182354 (0.0012) [2024-06-15 13:44:19,104][1651669] Updated weights for policy 0, policy_version 182401 (0.0129) [2024-06-15 13:44:20,635][1651669] Updated weights for policy 0, policy_version 182455 (0.0158) [2024-06-15 13:44:20,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 373653504. Throughput: 0: 11652.8. Samples: 93479936. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:20,768][1648981] Avg episode reward: [(0, '355.290')] [2024-06-15 13:44:23,001][1651669] Updated weights for policy 0, policy_version 182496 (0.0014) [2024-06-15 13:44:24,061][1651669] Updated weights for policy 0, policy_version 182544 (0.0015) [2024-06-15 13:44:25,222][1651669] Updated weights for policy 0, policy_version 182588 (0.0011) [2024-06-15 13:44:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.3, 300 sec: 46876.2). Total num frames: 373948416. Throughput: 0: 11832.9. Samples: 93518848. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:25,767][1648981] Avg episode reward: [(0, '346.400')] [2024-06-15 13:44:29,695][1651669] Updated weights for policy 0, policy_version 182649 (0.0014) [2024-06-15 13:44:30,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 45875.3, 300 sec: 46655.3). Total num frames: 374079488. Throughput: 0: 11662.3. Samples: 93592576. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:30,767][1648981] Avg episode reward: [(0, '356.910')] [2024-06-15 13:44:31,079][1651669] Updated weights for policy 0, policy_version 182675 (0.0010) [2024-06-15 13:44:33,193][1651669] Updated weights for policy 0, policy_version 182722 (0.0011) [2024-06-15 13:44:34,555][1651669] Updated weights for policy 0, policy_version 182782 (0.0011) [2024-06-15 13:44:35,794][1648981] Fps is (10 sec: 42479.7, 60 sec: 48589.5, 300 sec: 46648.3). Total num frames: 374374400. Throughput: 0: 11791.4. Samples: 93662208. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:35,795][1648981] Avg episode reward: [(0, '387.440')] [2024-06-15 13:44:36,552][1651669] Updated weights for policy 0, policy_version 182842 (0.0012) [2024-06-15 13:44:40,091][1651669] Updated weights for policy 0, policy_version 182887 (0.0013) [2024-06-15 13:44:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 374603776. Throughput: 0: 11770.7. Samples: 93700608. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:40,767][1648981] Avg episode reward: [(0, '371.380')] [2024-06-15 13:44:41,601][1651669] Updated weights for policy 0, policy_version 182931 (0.0013) [2024-06-15 13:44:44,135][1651669] Updated weights for policy 0, policy_version 182995 (0.0013) [2024-06-15 13:44:45,766][1648981] Fps is (10 sec: 49289.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 374865920. Throughput: 0: 11707.8. Samples: 93771264. Policy #0 lag: (min: 15.0, avg: 92.4, max: 271.0) [2024-06-15 13:44:45,767][1648981] Avg episode reward: [(0, '364.590')] [2024-06-15 13:44:46,541][1651669] Updated weights for policy 0, policy_version 183042 (0.0011) [2024-06-15 13:44:47,720][1651669] Updated weights for policy 0, policy_version 183102 (0.0015) [2024-06-15 13:44:50,717][1651669] Updated weights for policy 0, policy_version 183164 (0.0012) [2024-06-15 13:44:50,769][1648981] Fps is (10 sec: 52414.5, 60 sec: 46419.2, 300 sec: 46764.6). Total num frames: 375128064. Throughput: 0: 11843.5. Samples: 93848064. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:44:50,770][1648981] Avg episode reward: [(0, '362.280')] [2024-06-15 13:44:53,347][1651669] Updated weights for policy 0, policy_version 183222 (0.0013) [2024-06-15 13:44:55,018][1651669] Updated weights for policy 0, policy_version 183264 (0.0016) [2024-06-15 13:44:55,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49171.1, 300 sec: 47319.2). Total num frames: 375357440. Throughput: 0: 11787.4. Samples: 93880320. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:44:55,767][1648981] Avg episode reward: [(0, '374.820')] [2024-06-15 13:44:55,785][1651669] Updated weights for policy 0, policy_version 183292 (0.0011) [2024-06-15 13:44:55,876][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000183296_375390208.pth... [2024-06-15 13:44:55,916][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000177728_363986944.pth [2024-06-15 13:44:57,558][1651274] Signal inference workers to stop experience collection... (9650 times) [2024-06-15 13:44:57,596][1651669] InferenceWorker_p0-w0: stopping experience collection (9650 times) [2024-06-15 13:44:57,799][1651274] Signal inference workers to resume experience collection... (9650 times) [2024-06-15 13:44:57,800][1651669] InferenceWorker_p0-w0: resuming experience collection (9650 times) [2024-06-15 13:44:58,834][1651669] Updated weights for policy 0, policy_version 183360 (0.0142) [2024-06-15 13:45:00,770][1648981] Fps is (10 sec: 39318.5, 60 sec: 46418.9, 300 sec: 46652.3). Total num frames: 375521280. Throughput: 0: 12059.5. Samples: 93954560. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:00,770][1648981] Avg episode reward: [(0, '375.220')] [2024-06-15 13:45:02,165][1651669] Updated weights for policy 0, policy_version 183422 (0.0014) [2024-06-15 13:45:04,183][1651669] Updated weights for policy 0, policy_version 183476 (0.0022) [2024-06-15 13:45:05,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 375816192. Throughput: 0: 12060.5. Samples: 94022656. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:05,767][1648981] Avg episode reward: [(0, '364.720')] [2024-06-15 13:45:06,028][1651669] Updated weights for policy 0, policy_version 183520 (0.0013) [2024-06-15 13:45:08,984][1651669] Updated weights for policy 0, policy_version 183568 (0.0013) [2024-06-15 13:45:10,005][1651669] Updated weights for policy 0, policy_version 183612 (0.0011) [2024-06-15 13:45:10,766][1648981] Fps is (10 sec: 52447.0, 60 sec: 48059.7, 300 sec: 46874.9). Total num frames: 376045568. Throughput: 0: 12037.7. Samples: 94060544. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:10,767][1648981] Avg episode reward: [(0, '377.350')] [2024-06-15 13:45:13,040][1651669] Updated weights for policy 0, policy_version 183673 (0.0013) [2024-06-15 13:45:15,067][1651669] Updated weights for policy 0, policy_version 183737 (0.0019) [2024-06-15 13:45:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 376307712. Throughput: 0: 12083.2. Samples: 94136320. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:15,767][1648981] Avg episode reward: [(0, '376.280')] [2024-06-15 13:45:16,652][1651669] Updated weights for policy 0, policy_version 183792 (0.0012) [2024-06-15 13:45:20,294][1651669] Updated weights for policy 0, policy_version 183844 (0.0025) [2024-06-15 13:45:20,776][1648981] Fps is (10 sec: 52378.0, 60 sec: 48598.1, 300 sec: 47317.7). Total num frames: 376569856. Throughput: 0: 12122.2. Samples: 94207488. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:20,784][1648981] Avg episode reward: [(0, '368.100')] [2024-06-15 13:45:22,973][1651669] Updated weights for policy 0, policy_version 183906 (0.0122) [2024-06-15 13:45:25,410][1651669] Updated weights for policy 0, policy_version 183952 (0.0015) [2024-06-15 13:45:25,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 46768.9). Total num frames: 376733696. Throughput: 0: 12105.9. Samples: 94245376. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:25,767][1648981] Avg episode reward: [(0, '349.620')] [2024-06-15 13:45:27,764][1651669] Updated weights for policy 0, policy_version 184063 (0.0013) [2024-06-15 13:45:30,766][1648981] Fps is (10 sec: 39359.9, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 376963072. Throughput: 0: 12060.4. Samples: 94313984. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:30,767][1648981] Avg episode reward: [(0, '337.370')] [2024-06-15 13:45:32,369][1651669] Updated weights for policy 0, policy_version 184128 (0.0030) [2024-06-15 13:45:34,694][1651669] Updated weights for policy 0, policy_version 184191 (0.0013) [2024-06-15 13:45:35,810][1648981] Fps is (10 sec: 48937.6, 60 sec: 47501.0, 300 sec: 46756.9). Total num frames: 377225216. Throughput: 0: 11947.1. Samples: 94386176. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:35,811][1648981] Avg episode reward: [(0, '348.980')] [2024-06-15 13:45:37,698][1651669] Updated weights for policy 0, policy_version 184240 (0.0012) [2024-06-15 13:45:39,198][1651669] Updated weights for policy 0, policy_version 184311 (0.0014) [2024-06-15 13:45:40,794][1648981] Fps is (10 sec: 52283.3, 60 sec: 48037.4, 300 sec: 47536.9). Total num frames: 377487360. Throughput: 0: 11893.8. Samples: 94415872. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:40,795][1648981] Avg episode reward: [(0, '348.310')] [2024-06-15 13:45:42,730][1651274] Signal inference workers to stop experience collection... (9700 times) [2024-06-15 13:45:42,789][1651669] InferenceWorker_p0-w0: stopping experience collection (9700 times) [2024-06-15 13:45:42,915][1651274] Signal inference workers to resume experience collection... (9700 times) [2024-06-15 13:45:42,916][1651669] InferenceWorker_p0-w0: resuming experience collection (9700 times) [2024-06-15 13:45:43,675][1651669] Updated weights for policy 0, policy_version 184368 (0.0152) [2024-06-15 13:45:45,159][1651669] Updated weights for policy 0, policy_version 184418 (0.0019) [2024-06-15 13:45:45,766][1648981] Fps is (10 sec: 52659.0, 60 sec: 48059.6, 300 sec: 46986.6). Total num frames: 377749504. Throughput: 0: 11958.9. Samples: 94492672. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:45,767][1648981] Avg episode reward: [(0, '357.550')] [2024-06-15 13:45:47,513][1651669] Updated weights for policy 0, policy_version 184467 (0.0012) [2024-06-15 13:45:49,159][1651669] Updated weights for policy 0, policy_version 184529 (0.0011) [2024-06-15 13:45:49,902][1651669] Updated weights for policy 0, policy_version 184569 (0.0032) [2024-06-15 13:45:50,766][1648981] Fps is (10 sec: 52575.1, 60 sec: 48061.9, 300 sec: 47541.4). Total num frames: 378011648. Throughput: 0: 11980.8. Samples: 94561792. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:50,767][1648981] Avg episode reward: [(0, '344.030')] [2024-06-15 13:45:54,144][1651669] Updated weights for policy 0, policy_version 184614 (0.0012) [2024-06-15 13:45:54,885][1651669] Updated weights for policy 0, policy_version 184656 (0.0013) [2024-06-15 13:45:55,768][1648981] Fps is (10 sec: 49145.5, 60 sec: 48058.6, 300 sec: 47319.0). Total num frames: 378241024. Throughput: 0: 12185.2. Samples: 94608896. Policy #0 lag: (min: 47.0, avg: 143.9, max: 303.0) [2024-06-15 13:45:55,773][1648981] Avg episode reward: [(0, '343.220')] [2024-06-15 13:45:55,930][1651669] Updated weights for policy 0, policy_version 184702 (0.0016) [2024-06-15 13:45:58,479][1651669] Updated weights for policy 0, policy_version 184768 (0.0012) [2024-06-15 13:45:59,960][1651669] Updated weights for policy 0, policy_version 184823 (0.0094) [2024-06-15 13:46:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50247.2, 300 sec: 47541.4). Total num frames: 378535936. Throughput: 0: 11969.4. Samples: 94674944. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:00,767][1648981] Avg episode reward: [(0, '342.870')] [2024-06-15 13:46:04,607][1651669] Updated weights for policy 0, policy_version 184880 (0.0020) [2024-06-15 13:46:05,768][1648981] Fps is (10 sec: 42604.2, 60 sec: 47513.5, 300 sec: 47327.3). Total num frames: 378667008. Throughput: 0: 12119.9. Samples: 94752768. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:05,768][1648981] Avg episode reward: [(0, '328.220')] [2024-06-15 13:46:06,279][1651669] Updated weights for policy 0, policy_version 184928 (0.0019) [2024-06-15 13:46:08,683][1651669] Updated weights for policy 0, policy_version 184976 (0.0012) [2024-06-15 13:46:10,236][1651669] Updated weights for policy 0, policy_version 185040 (0.0016) [2024-06-15 13:46:10,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49152.1, 300 sec: 47319.2). Total num frames: 378994688. Throughput: 0: 12083.2. Samples: 94789120. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:10,767][1648981] Avg episode reward: [(0, '326.560')] [2024-06-15 13:46:14,816][1651669] Updated weights for policy 0, policy_version 185089 (0.0012) [2024-06-15 13:46:15,753][1651669] Updated weights for policy 0, policy_version 185144 (0.0013) [2024-06-15 13:46:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 47513.5, 300 sec: 47763.5). Total num frames: 379158528. Throughput: 0: 12128.7. Samples: 94859776. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:15,767][1648981] Avg episode reward: [(0, '328.720')] [2024-06-15 13:46:17,578][1651669] Updated weights for policy 0, policy_version 185200 (0.0014) [2024-06-15 13:46:20,016][1651669] Updated weights for policy 0, policy_version 185251 (0.0013) [2024-06-15 13:46:20,715][1651274] Signal inference workers to stop experience collection... (9750 times) [2024-06-15 13:46:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48067.6, 300 sec: 47319.2). Total num frames: 379453440. Throughput: 0: 12083.6. Samples: 94929408. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:20,767][1648981] Avg episode reward: [(0, '330.040')] [2024-06-15 13:46:20,796][1651669] InferenceWorker_p0-w0: stopping experience collection (9750 times) [2024-06-15 13:46:20,942][1651274] Signal inference workers to resume experience collection... (9750 times) [2024-06-15 13:46:20,950][1651669] InferenceWorker_p0-w0: resuming experience collection (9750 times) [2024-06-15 13:46:21,649][1651669] Updated weights for policy 0, policy_version 185328 (0.0012) [2024-06-15 13:46:25,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 379617280. Throughput: 0: 12307.0. Samples: 94969344. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:25,767][1648981] Avg episode reward: [(0, '339.500')] [2024-06-15 13:46:25,806][1651669] Updated weights for policy 0, policy_version 185366 (0.0021) [2024-06-15 13:46:26,666][1651669] Updated weights for policy 0, policy_version 185407 (0.0016) [2024-06-15 13:46:28,335][1651669] Updated weights for policy 0, policy_version 185456 (0.0012) [2024-06-15 13:46:30,767][1648981] Fps is (10 sec: 45873.6, 60 sec: 49151.8, 300 sec: 47542.6). Total num frames: 379912192. Throughput: 0: 12196.9. Samples: 95041536. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:30,767][1648981] Avg episode reward: [(0, '324.050')] [2024-06-15 13:46:31,332][1651669] Updated weights for policy 0, policy_version 185524 (0.0025) [2024-06-15 13:46:32,488][1651669] Updated weights for policy 0, policy_version 185569 (0.0098) [2024-06-15 13:46:35,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48094.9, 300 sec: 47541.4). Total num frames: 380108800. Throughput: 0: 12526.9. Samples: 95125504. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:35,767][1648981] Avg episode reward: [(0, '320.730')] [2024-06-15 13:46:35,767][1651669] Updated weights for policy 0, policy_version 185601 (0.0029) [2024-06-15 13:46:37,604][1651669] Updated weights for policy 0, policy_version 185680 (0.0224) [2024-06-15 13:46:38,398][1651669] Updated weights for policy 0, policy_version 185718 (0.0102) [2024-06-15 13:46:40,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 48082.0, 300 sec: 47652.5). Total num frames: 380370944. Throughput: 0: 12060.8. Samples: 95151616. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:40,767][1648981] Avg episode reward: [(0, '326.440')] [2024-06-15 13:46:41,684][1651669] Updated weights for policy 0, policy_version 185760 (0.0016) [2024-06-15 13:46:43,521][1651669] Updated weights for policy 0, policy_version 185845 (0.0013) [2024-06-15 13:46:45,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48059.7, 300 sec: 47541.3). Total num frames: 380633088. Throughput: 0: 12253.8. Samples: 95226368. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:45,767][1648981] Avg episode reward: [(0, '333.250')] [2024-06-15 13:46:47,321][1651669] Updated weights for policy 0, policy_version 185907 (0.0120) [2024-06-15 13:46:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 380895232. Throughput: 0: 12094.6. Samples: 95297024. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:50,767][1648981] Avg episode reward: [(0, '342.420')] [2024-06-15 13:46:51,677][1651669] Updated weights for policy 0, policy_version 185986 (0.0012) [2024-06-15 13:46:53,107][1651669] Updated weights for policy 0, policy_version 186046 (0.0020) [2024-06-15 13:46:54,509][1651669] Updated weights for policy 0, policy_version 186100 (0.0030) [2024-06-15 13:46:55,770][1648981] Fps is (10 sec: 52412.1, 60 sec: 48604.3, 300 sec: 47540.8). Total num frames: 381157376. Throughput: 0: 12070.9. Samples: 95332352. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:46:55,770][1648981] Avg episode reward: [(0, '332.340')] [2024-06-15 13:46:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000186112_381157376.pth... [2024-06-15 13:46:55,815][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000180544_369754112.pth [2024-06-15 13:46:57,806][1651669] Updated weights for policy 0, policy_version 186131 (0.0017) [2024-06-15 13:46:58,873][1651669] Updated weights for policy 0, policy_version 186177 (0.0015) [2024-06-15 13:46:59,606][1651274] Signal inference workers to stop experience collection... (9800 times) [2024-06-15 13:46:59,667][1651669] InferenceWorker_p0-w0: stopping experience collection (9800 times) [2024-06-15 13:46:59,953][1651274] Signal inference workers to resume experience collection... (9800 times) [2024-06-15 13:46:59,955][1651669] InferenceWorker_p0-w0: resuming experience collection (9800 times) [2024-06-15 13:47:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 48096.7). Total num frames: 381419520. Throughput: 0: 12265.3. Samples: 95411712. Policy #0 lag: (min: 0.0, avg: 117.5, max: 256.0) [2024-06-15 13:47:00,767][1648981] Avg episode reward: [(0, '344.470')] [2024-06-15 13:47:02,106][1651669] Updated weights for policy 0, policy_version 186245 (0.0013) [2024-06-15 13:47:04,138][1651669] Updated weights for policy 0, policy_version 186322 (0.0012) [2024-06-15 13:47:05,766][1648981] Fps is (10 sec: 52446.6, 60 sec: 50244.3, 300 sec: 47542.6). Total num frames: 381681664. Throughput: 0: 12185.6. Samples: 95477760. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:05,767][1648981] Avg episode reward: [(0, '337.730')] [2024-06-15 13:47:08,479][1651669] Updated weights for policy 0, policy_version 186371 (0.0012) [2024-06-15 13:47:10,562][1651669] Updated weights for policy 0, policy_version 186448 (0.0130) [2024-06-15 13:47:10,778][1648981] Fps is (10 sec: 42548.2, 60 sec: 47504.2, 300 sec: 47872.7). Total num frames: 381845504. Throughput: 0: 12409.9. Samples: 95527936. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:10,779][1648981] Avg episode reward: [(0, '331.320')] [2024-06-15 13:47:11,435][1651669] Updated weights for policy 0, policy_version 186487 (0.0013) [2024-06-15 13:47:13,683][1651669] Updated weights for policy 0, policy_version 186528 (0.0015) [2024-06-15 13:47:15,329][1651669] Updated weights for policy 0, policy_version 186592 (0.0013) [2024-06-15 13:47:15,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 50244.3, 300 sec: 47430.3). Total num frames: 382173184. Throughput: 0: 12117.4. Samples: 95586816. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:15,767][1648981] Avg episode reward: [(0, '335.370')] [2024-06-15 13:47:19,746][1651669] Updated weights for policy 0, policy_version 186642 (0.0015) [2024-06-15 13:47:20,766][1648981] Fps is (10 sec: 49210.3, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 382337024. Throughput: 0: 12026.3. Samples: 95666688. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:20,767][1648981] Avg episode reward: [(0, '336.170')] [2024-06-15 13:47:21,585][1651669] Updated weights for policy 0, policy_version 186720 (0.0019) [2024-06-15 13:47:24,374][1651669] Updated weights for policy 0, policy_version 186772 (0.0013) [2024-06-15 13:47:25,783][1648981] Fps is (10 sec: 42529.6, 60 sec: 49684.7, 300 sec: 47538.8). Total num frames: 382599168. Throughput: 0: 12147.1. Samples: 95698432. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:25,783][1648981] Avg episode reward: [(0, '331.410')] [2024-06-15 13:47:26,234][1651669] Updated weights for policy 0, policy_version 186833 (0.0015) [2024-06-15 13:47:30,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 46967.7, 300 sec: 47541.4). Total num frames: 382730240. Throughput: 0: 11912.6. Samples: 95762432. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:30,767][1648981] Avg episode reward: [(0, '334.030')] [2024-06-15 13:47:31,041][1651669] Updated weights for policy 0, policy_version 186884 (0.0013) [2024-06-15 13:47:32,692][1651669] Updated weights for policy 0, policy_version 186960 (0.0012) [2024-06-15 13:47:35,667][1651669] Updated weights for policy 0, policy_version 187024 (0.0016) [2024-06-15 13:47:35,766][1648981] Fps is (10 sec: 42667.6, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 383025152. Throughput: 0: 11992.2. Samples: 95836672. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:35,767][1648981] Avg episode reward: [(0, '341.740')] [2024-06-15 13:47:37,697][1651669] Updated weights for policy 0, policy_version 187075 (0.0014) [2024-06-15 13:47:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 383254528. Throughput: 0: 11731.4. Samples: 95860224. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:40,767][1648981] Avg episode reward: [(0, '343.740')] [2024-06-15 13:47:42,163][1651669] Updated weights for policy 0, policy_version 187140 (0.0018) [2024-06-15 13:47:42,685][1651274] Signal inference workers to stop experience collection... (9850 times) [2024-06-15 13:47:42,725][1651669] InferenceWorker_p0-w0: stopping experience collection (9850 times) [2024-06-15 13:47:42,922][1651274] Signal inference workers to resume experience collection... (9850 times) [2024-06-15 13:47:42,923][1651669] InferenceWorker_p0-w0: resuming experience collection (9850 times) [2024-06-15 13:47:43,382][1651669] Updated weights for policy 0, policy_version 187201 (0.0032) [2024-06-15 13:47:44,628][1651669] Updated weights for policy 0, policy_version 187258 (0.0013) [2024-06-15 13:47:45,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 383516672. Throughput: 0: 11628.1. Samples: 95934976. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:45,767][1648981] Avg episode reward: [(0, '348.950')] [2024-06-15 13:47:47,422][1651669] Updated weights for policy 0, policy_version 187312 (0.0013) [2024-06-15 13:47:48,995][1651669] Updated weights for policy 0, policy_version 187345 (0.0021) [2024-06-15 13:47:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47541.7). Total num frames: 383778816. Throughput: 0: 11741.9. Samples: 96006144. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:50,767][1648981] Avg episode reward: [(0, '360.230')] [2024-06-15 13:47:53,166][1651669] Updated weights for policy 0, policy_version 187395 (0.0015) [2024-06-15 13:47:54,312][1651669] Updated weights for policy 0, policy_version 187458 (0.0076) [2024-06-15 13:47:55,360][1651669] Updated weights for policy 0, policy_version 187515 (0.0024) [2024-06-15 13:47:55,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 48062.3, 300 sec: 48207.8). Total num frames: 384040960. Throughput: 0: 11551.5. Samples: 96047616. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:47:55,767][1648981] Avg episode reward: [(0, '356.710')] [2024-06-15 13:47:58,161][1651669] Updated weights for policy 0, policy_version 187554 (0.0028) [2024-06-15 13:47:59,453][1651669] Updated weights for policy 0, policy_version 187600 (0.0011) [2024-06-15 13:48:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 384303104. Throughput: 0: 11946.7. Samples: 96124416. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:48:00,767][1648981] Avg episode reward: [(0, '349.940')] [2024-06-15 13:48:03,242][1651669] Updated weights for policy 0, policy_version 187651 (0.0013) [2024-06-15 13:48:04,618][1651669] Updated weights for policy 0, policy_version 187728 (0.0082) [2024-06-15 13:48:05,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 48433.3). Total num frames: 384565248. Throughput: 0: 11696.3. Samples: 96193024. Policy #0 lag: (min: 44.0, avg: 155.5, max: 300.0) [2024-06-15 13:48:05,767][1648981] Avg episode reward: [(0, '353.330')] [2024-06-15 13:48:08,165][1651669] Updated weights for policy 0, policy_version 187778 (0.0015) [2024-06-15 13:48:09,415][1651669] Updated weights for policy 0, policy_version 187832 (0.0046) [2024-06-15 13:48:10,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48615.4, 300 sec: 48207.8). Total num frames: 384761856. Throughput: 0: 12064.8. Samples: 96241152. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:10,767][1648981] Avg episode reward: [(0, '365.310')] [2024-06-15 13:48:11,183][1651669] Updated weights for policy 0, policy_version 187897 (0.0122) [2024-06-15 13:48:14,468][1651669] Updated weights for policy 0, policy_version 187952 (0.0014) [2024-06-15 13:48:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 47513.7, 300 sec: 48207.8). Total num frames: 385024000. Throughput: 0: 12151.5. Samples: 96309248. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:15,767][1648981] Avg episode reward: [(0, '345.510')] [2024-06-15 13:48:15,812][1651669] Updated weights for policy 0, policy_version 188016 (0.0013) [2024-06-15 13:48:19,557][1651669] Updated weights for policy 0, policy_version 188064 (0.0011) [2024-06-15 13:48:20,775][1648981] Fps is (10 sec: 49112.1, 60 sec: 48599.2, 300 sec: 48317.6). Total num frames: 385253376. Throughput: 0: 12228.9. Samples: 96387072. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:20,775][1648981] Avg episode reward: [(0, '362.380')] [2024-06-15 13:48:20,790][1651274] Signal inference workers to stop experience collection... (9900 times) [2024-06-15 13:48:20,828][1651669] InferenceWorker_p0-w0: stopping experience collection (9900 times) [2024-06-15 13:48:21,079][1651274] Signal inference workers to resume experience collection... (9900 times) [2024-06-15 13:48:21,080][1651669] InferenceWorker_p0-w0: resuming experience collection (9900 times) [2024-06-15 13:48:21,349][1651669] Updated weights for policy 0, policy_version 188144 (0.0233) [2024-06-15 13:48:24,556][1651669] Updated weights for policy 0, policy_version 188192 (0.0012) [2024-06-15 13:48:25,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 48072.6, 300 sec: 47985.7). Total num frames: 385482752. Throughput: 0: 12435.8. Samples: 96419840. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:25,768][1648981] Avg episode reward: [(0, '358.780')] [2024-06-15 13:48:26,462][1651669] Updated weights for policy 0, policy_version 188256 (0.0013) [2024-06-15 13:48:30,766][1648981] Fps is (10 sec: 42633.4, 60 sec: 49152.0, 300 sec: 48209.1). Total num frames: 385679360. Throughput: 0: 12492.8. Samples: 96497152. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:30,767][1648981] Avg episode reward: [(0, '358.570')] [2024-06-15 13:48:31,093][1651669] Updated weights for policy 0, policy_version 188336 (0.0013) [2024-06-15 13:48:32,323][1651669] Updated weights for policy 0, policy_version 188390 (0.0014) [2024-06-15 13:48:35,111][1651669] Updated weights for policy 0, policy_version 188434 (0.0013) [2024-06-15 13:48:35,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 385974272. Throughput: 0: 12424.5. Samples: 96565248. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:35,767][1648981] Avg episode reward: [(0, '361.940')] [2024-06-15 13:48:35,896][1651669] Updated weights for policy 0, policy_version 188473 (0.0015) [2024-06-15 13:48:37,296][1651669] Updated weights for policy 0, policy_version 188516 (0.0012) [2024-06-15 13:48:40,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 386170880. Throughput: 0: 12344.9. Samples: 96603136. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:40,767][1648981] Avg episode reward: [(0, '355.240')] [2024-06-15 13:48:41,248][1651669] Updated weights for policy 0, policy_version 188577 (0.0015) [2024-06-15 13:48:43,064][1651669] Updated weights for policy 0, policy_version 188656 (0.0013) [2024-06-15 13:48:45,773][1648981] Fps is (10 sec: 42570.6, 60 sec: 48054.5, 300 sec: 47651.4). Total num frames: 386400256. Throughput: 0: 12229.3. Samples: 96674816. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:45,776][1648981] Avg episode reward: [(0, '353.580')] [2024-06-15 13:48:46,881][1651669] Updated weights for policy 0, policy_version 188720 (0.0017) [2024-06-15 13:48:48,285][1651669] Updated weights for policy 0, policy_version 188775 (0.0014) [2024-06-15 13:48:50,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48322.7). Total num frames: 386662400. Throughput: 0: 12401.8. Samples: 96751104. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:50,767][1648981] Avg episode reward: [(0, '358.520')] [2024-06-15 13:48:52,058][1651669] Updated weights for policy 0, policy_version 188832 (0.0134) [2024-06-15 13:48:53,367][1651669] Updated weights for policy 0, policy_version 188883 (0.0015) [2024-06-15 13:48:55,766][1648981] Fps is (10 sec: 52462.6, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 386924544. Throughput: 0: 12094.6. Samples: 96785408. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:48:55,767][1648981] Avg episode reward: [(0, '355.020')] [2024-06-15 13:48:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000188928_386924544.pth... [2024-06-15 13:48:55,858][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000183296_375390208.pth [2024-06-15 13:48:57,984][1651669] Updated weights for policy 0, policy_version 188976 (0.0016) [2024-06-15 13:48:59,640][1651669] Updated weights for policy 0, policy_version 189044 (0.0132) [2024-06-15 13:49:00,771][1648981] Fps is (10 sec: 52407.3, 60 sec: 48056.4, 300 sec: 48429.3). Total num frames: 387186688. Throughput: 0: 12025.2. Samples: 96850432. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:49:00,772][1648981] Avg episode reward: [(0, '355.640')] [2024-06-15 13:49:02,554][1651274] Signal inference workers to stop experience collection... (9950 times) [2024-06-15 13:49:02,576][1651669] Updated weights for policy 0, policy_version 189076 (0.0022) [2024-06-15 13:49:02,611][1651669] InferenceWorker_p0-w0: stopping experience collection (9950 times) [2024-06-15 13:49:02,798][1651274] Signal inference workers to resume experience collection... (9950 times) [2024-06-15 13:49:02,799][1651669] InferenceWorker_p0-w0: resuming experience collection (9950 times) [2024-06-15 13:49:03,357][1651669] Updated weights for policy 0, policy_version 189118 (0.0012) [2024-06-15 13:49:04,649][1651669] Updated weights for policy 0, policy_version 189168 (0.0013) [2024-06-15 13:49:05,771][1648981] Fps is (10 sec: 52407.7, 60 sec: 48056.5, 300 sec: 48429.3). Total num frames: 387448832. Throughput: 0: 12061.5. Samples: 96929792. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:49:05,771][1648981] Avg episode reward: [(0, '356.570')] [2024-06-15 13:49:08,818][1651669] Updated weights for policy 0, policy_version 189233 (0.0099) [2024-06-15 13:49:10,150][1651669] Updated weights for policy 0, policy_version 189296 (0.0148) [2024-06-15 13:49:10,766][1648981] Fps is (10 sec: 52450.1, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 387710976. Throughput: 0: 12117.4. Samples: 96965120. Policy #0 lag: (min: 48.0, avg: 138.1, max: 304.0) [2024-06-15 13:49:10,767][1648981] Avg episode reward: [(0, '374.230')] [2024-06-15 13:49:13,955][1651669] Updated weights for policy 0, policy_version 189360 (0.0012) [2024-06-15 13:49:15,766][1648981] Fps is (10 sec: 49172.4, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 387940352. Throughput: 0: 12037.7. Samples: 97038848. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:15,767][1648981] Avg episode reward: [(0, '378.820')] [2024-06-15 13:49:15,996][1651669] Updated weights for policy 0, policy_version 189434 (0.0013) [2024-06-15 13:49:20,186][1651669] Updated weights for policy 0, policy_version 189509 (0.0012) [2024-06-15 13:49:20,770][1648981] Fps is (10 sec: 45858.4, 60 sec: 48609.5, 300 sec: 48207.2). Total num frames: 388169728. Throughput: 0: 11957.1. Samples: 97103360. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:20,771][1648981] Avg episode reward: [(0, '370.320')] [2024-06-15 13:49:21,102][1651669] Updated weights for policy 0, policy_version 189560 (0.0012) [2024-06-15 13:49:25,742][1651669] Updated weights for policy 0, policy_version 189619 (0.0011) [2024-06-15 13:49:25,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 47513.8, 300 sec: 48318.9). Total num frames: 388333568. Throughput: 0: 12014.9. Samples: 97143808. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:25,767][1648981] Avg episode reward: [(0, '394.940')] [2024-06-15 13:49:27,216][1651669] Updated weights for policy 0, policy_version 189680 (0.0012) [2024-06-15 13:49:30,434][1651669] Updated weights for policy 0, policy_version 189713 (0.0013) [2024-06-15 13:49:30,782][1648981] Fps is (10 sec: 39274.5, 60 sec: 48047.1, 300 sec: 48098.7). Total num frames: 388562944. Throughput: 0: 12023.9. Samples: 97216000. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:30,783][1648981] Avg episode reward: [(0, '392.290')] [2024-06-15 13:49:32,168][1651669] Updated weights for policy 0, policy_version 189797 (0.0015) [2024-06-15 13:49:35,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 47985.7). Total num frames: 388759552. Throughput: 0: 11923.9. Samples: 97287680. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:35,767][1648981] Avg episode reward: [(0, '398.070')] [2024-06-15 13:49:36,165][1651669] Updated weights for policy 0, policy_version 189843 (0.0025) [2024-06-15 13:49:37,676][1651669] Updated weights for policy 0, policy_version 189906 (0.0013) [2024-06-15 13:49:38,551][1651669] Updated weights for policy 0, policy_version 189950 (0.0013) [2024-06-15 13:49:40,766][1648981] Fps is (10 sec: 45947.3, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 389021696. Throughput: 0: 11798.8. Samples: 97316352. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:40,767][1648981] Avg episode reward: [(0, '390.160')] [2024-06-15 13:49:41,607][1651274] Signal inference workers to stop experience collection... (10000 times) [2024-06-15 13:49:41,673][1651669] InferenceWorker_p0-w0: stopping experience collection (10000 times) [2024-06-15 13:49:41,840][1651274] Signal inference workers to resume experience collection... (10000 times) [2024-06-15 13:49:41,851][1651669] InferenceWorker_p0-w0: resuming experience collection (10000 times) [2024-06-15 13:49:42,021][1651669] Updated weights for policy 0, policy_version 190020 (0.0012) [2024-06-15 13:49:45,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 48064.7, 300 sec: 47986.1). Total num frames: 389283840. Throughput: 0: 12050.1. Samples: 97392640. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:45,767][1648981] Avg episode reward: [(0, '383.260')] [2024-06-15 13:49:46,692][1651669] Updated weights for policy 0, policy_version 190085 (0.0033) [2024-06-15 13:49:48,112][1651669] Updated weights for policy 0, policy_version 190145 (0.0014) [2024-06-15 13:49:49,521][1651669] Updated weights for policy 0, policy_version 190202 (0.0013) [2024-06-15 13:49:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 389545984. Throughput: 0: 11868.1. Samples: 97463808. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:50,767][1648981] Avg episode reward: [(0, '372.200')] [2024-06-15 13:49:52,828][1651669] Updated weights for policy 0, policy_version 190256 (0.0134) [2024-06-15 13:49:53,984][1651669] Updated weights for policy 0, policy_version 190308 (0.0014) [2024-06-15 13:49:55,771][1648981] Fps is (10 sec: 52405.0, 60 sec: 48056.0, 300 sec: 48429.8). Total num frames: 389808128. Throughput: 0: 11786.1. Samples: 97495552. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:49:55,772][1648981] Avg episode reward: [(0, '370.940')] [2024-06-15 13:49:58,588][1651669] Updated weights for policy 0, policy_version 190372 (0.0013) [2024-06-15 13:49:59,804][1651669] Updated weights for policy 0, policy_version 190432 (0.0014) [2024-06-15 13:50:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48063.1, 300 sec: 48318.9). Total num frames: 390070272. Throughput: 0: 11787.4. Samples: 97569280. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:50:00,767][1648981] Avg episode reward: [(0, '364.130')] [2024-06-15 13:50:03,536][1651669] Updated weights for policy 0, policy_version 190496 (0.0045) [2024-06-15 13:50:05,163][1651669] Updated weights for policy 0, policy_version 190561 (0.0029) [2024-06-15 13:50:05,767][1648981] Fps is (10 sec: 52452.3, 60 sec: 48062.8, 300 sec: 48429.9). Total num frames: 390332416. Throughput: 0: 11822.4. Samples: 97635328. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:50:05,767][1648981] Avg episode reward: [(0, '351.260')] [2024-06-15 13:50:09,384][1651669] Updated weights for policy 0, policy_version 190609 (0.0032) [2024-06-15 13:50:10,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 390496256. Throughput: 0: 11832.9. Samples: 97676288. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:50:10,767][1648981] Avg episode reward: [(0, '355.250')] [2024-06-15 13:50:11,286][1651669] Updated weights for policy 0, policy_version 190691 (0.0013) [2024-06-15 13:50:14,635][1651669] Updated weights for policy 0, policy_version 190736 (0.0013) [2024-06-15 13:50:15,774][1648981] Fps is (10 sec: 39292.4, 60 sec: 46415.3, 300 sec: 47986.0). Total num frames: 390725632. Throughput: 0: 11732.6. Samples: 97743872. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:50:15,775][1648981] Avg episode reward: [(0, '350.130')] [2024-06-15 13:50:15,944][1651669] Updated weights for policy 0, policy_version 190790 (0.0012) [2024-06-15 13:50:16,909][1651669] Updated weights for policy 0, policy_version 190844 (0.0098) [2024-06-15 13:50:20,767][1648981] Fps is (10 sec: 39321.3, 60 sec: 45331.8, 300 sec: 47985.7). Total num frames: 390889472. Throughput: 0: 11832.9. Samples: 97820160. Policy #0 lag: (min: 76.0, avg: 181.8, max: 302.0) [2024-06-15 13:50:20,767][1648981] Avg episode reward: [(0, '349.410')] [2024-06-15 13:50:21,197][1651669] Updated weights for policy 0, policy_version 190886 (0.0014) [2024-06-15 13:50:21,869][1651274] Signal inference workers to stop experience collection... (10050 times) [2024-06-15 13:50:21,900][1651669] InferenceWorker_p0-w0: stopping experience collection (10050 times) [2024-06-15 13:50:22,083][1651274] Signal inference workers to resume experience collection... (10050 times) [2024-06-15 13:50:22,094][1651669] InferenceWorker_p0-w0: resuming experience collection (10050 times) [2024-06-15 13:50:22,593][1651669] Updated weights for policy 0, policy_version 190946 (0.0013) [2024-06-15 13:50:25,647][1651669] Updated weights for policy 0, policy_version 190995 (0.0013) [2024-06-15 13:50:25,768][1648981] Fps is (10 sec: 42624.8, 60 sec: 46966.2, 300 sec: 48096.5). Total num frames: 391151616. Throughput: 0: 11832.5. Samples: 97848832. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:25,768][1648981] Avg episode reward: [(0, '343.660')] [2024-06-15 13:50:26,993][1651669] Updated weights for policy 0, policy_version 191056 (0.0013) [2024-06-15 13:50:30,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46979.7, 300 sec: 47992.8). Total num frames: 391380992. Throughput: 0: 11696.4. Samples: 97918976. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:30,767][1648981] Avg episode reward: [(0, '342.950')] [2024-06-15 13:50:31,989][1651669] Updated weights for policy 0, policy_version 191136 (0.0023) [2024-06-15 13:50:33,167][1651669] Updated weights for policy 0, policy_version 191184 (0.0012) [2024-06-15 13:50:35,766][1648981] Fps is (10 sec: 49160.3, 60 sec: 48059.8, 300 sec: 47990.2). Total num frames: 391643136. Throughput: 0: 11878.4. Samples: 97998336. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:35,767][1648981] Avg episode reward: [(0, '330.150')] [2024-06-15 13:50:36,028][1651669] Updated weights for policy 0, policy_version 191248 (0.0121) [2024-06-15 13:50:37,731][1651669] Updated weights for policy 0, policy_version 191328 (0.0013) [2024-06-15 13:50:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 391905280. Throughput: 0: 11777.2. Samples: 98025472. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:40,767][1648981] Avg episode reward: [(0, '328.300')] [2024-06-15 13:50:41,860][1651669] Updated weights for policy 0, policy_version 191361 (0.0014) [2024-06-15 13:50:42,874][1651669] Updated weights for policy 0, policy_version 191417 (0.0011) [2024-06-15 13:50:44,370][1651669] Updated weights for policy 0, policy_version 191472 (0.0017) [2024-06-15 13:50:45,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 392167424. Throughput: 0: 11878.4. Samples: 98103808. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:45,767][1648981] Avg episode reward: [(0, '332.940')] [2024-06-15 13:50:46,696][1651669] Updated weights for policy 0, policy_version 191507 (0.0011) [2024-06-15 13:50:48,093][1651669] Updated weights for policy 0, policy_version 191570 (0.0013) [2024-06-15 13:50:49,011][1651669] Updated weights for policy 0, policy_version 191616 (0.0020) [2024-06-15 13:50:50,773][1648981] Fps is (10 sec: 52394.8, 60 sec: 48054.5, 300 sec: 48095.9). Total num frames: 392429568. Throughput: 0: 12104.3. Samples: 98180096. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:50,773][1648981] Avg episode reward: [(0, '346.850')] [2024-06-15 13:50:53,285][1651669] Updated weights for policy 0, policy_version 191677 (0.0015) [2024-06-15 13:50:55,370][1651669] Updated weights for policy 0, policy_version 191728 (0.0017) [2024-06-15 13:50:55,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48063.5, 300 sec: 47985.7). Total num frames: 392691712. Throughput: 0: 11867.0. Samples: 98210304. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:50:55,767][1648981] Avg episode reward: [(0, '348.850')] [2024-06-15 13:50:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000191744_392691712.pth... [2024-06-15 13:50:55,830][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000186112_381157376.pth [2024-06-15 13:50:55,834][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000191744_392691712.pth [2024-06-15 13:50:58,367][1651669] Updated weights for policy 0, policy_version 191792 (0.0012) [2024-06-15 13:50:59,594][1651669] Updated weights for policy 0, policy_version 191840 (0.0167) [2024-06-15 13:51:00,257][1651669] Updated weights for policy 0, policy_version 191872 (0.0012) [2024-06-15 13:51:00,767][1648981] Fps is (10 sec: 52462.6, 60 sec: 48059.6, 300 sec: 48430.0). Total num frames: 392953856. Throughput: 0: 12039.7. Samples: 98285568. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:51:00,767][1648981] Avg episode reward: [(0, '355.920')] [2024-06-15 13:51:03,323][1651274] Signal inference workers to stop experience collection... (10100 times) [2024-06-15 13:51:03,379][1651669] InferenceWorker_p0-w0: stopping experience collection (10100 times) [2024-06-15 13:51:03,647][1651274] Signal inference workers to resume experience collection... (10100 times) [2024-06-15 13:51:03,648][1651669] InferenceWorker_p0-w0: resuming experience collection (10100 times) [2024-06-15 13:51:04,209][1651669] Updated weights for policy 0, policy_version 191929 (0.0013) [2024-06-15 13:51:05,770][1648981] Fps is (10 sec: 42583.7, 60 sec: 46418.9, 300 sec: 47874.0). Total num frames: 393117696. Throughput: 0: 12036.8. Samples: 98361856. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:51:05,771][1648981] Avg episode reward: [(0, '372.450')] [2024-06-15 13:51:06,496][1651669] Updated weights for policy 0, policy_version 191984 (0.0012) [2024-06-15 13:51:08,340][1651669] Updated weights for policy 0, policy_version 192033 (0.0049) [2024-06-15 13:51:08,799][1651669] Updated weights for policy 0, policy_version 192064 (0.0012) [2024-06-15 13:51:10,257][1651669] Updated weights for policy 0, policy_version 192112 (0.0013) [2024-06-15 13:51:10,768][1648981] Fps is (10 sec: 52420.5, 60 sec: 49696.8, 300 sec: 48540.8). Total num frames: 393478144. Throughput: 0: 12196.9. Samples: 98397696. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:51:10,769][1648981] Avg episode reward: [(0, '367.150')] [2024-06-15 13:51:12,825][1651669] Updated weights for policy 0, policy_version 192148 (0.0022) [2024-06-15 13:51:13,860][1651669] Updated weights for policy 0, policy_version 192189 (0.0013) [2024-06-15 13:51:15,766][1648981] Fps is (10 sec: 49169.6, 60 sec: 48066.0, 300 sec: 47985.7). Total num frames: 393609216. Throughput: 0: 12492.8. Samples: 98481152. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:51:15,767][1648981] Avg episode reward: [(0, '369.850')] [2024-06-15 13:51:17,085][1651669] Updated weights for policy 0, policy_version 192253 (0.0013) [2024-06-15 13:51:18,173][1651669] Updated weights for policy 0, policy_version 192290 (0.0022) [2024-06-15 13:51:20,212][1651669] Updated weights for policy 0, policy_version 192337 (0.0023) [2024-06-15 13:51:20,766][1648981] Fps is (10 sec: 49160.5, 60 sec: 51336.6, 300 sec: 48652.2). Total num frames: 393969664. Throughput: 0: 12310.7. Samples: 98552320. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:51:20,767][1648981] Avg episode reward: [(0, '365.220')] [2024-06-15 13:51:23,252][1651669] Updated weights for policy 0, policy_version 192402 (0.0013) [2024-06-15 13:51:25,818][1648981] Fps is (10 sec: 52158.1, 60 sec: 49656.5, 300 sec: 48199.4). Total num frames: 394133504. Throughput: 0: 12683.0. Samples: 98596864. Policy #0 lag: (min: 54.0, avg: 142.2, max: 295.0) [2024-06-15 13:51:25,819][1648981] Avg episode reward: [(0, '364.870')] [2024-06-15 13:51:26,494][1651669] Updated weights for policy 0, policy_version 192466 (0.0012) [2024-06-15 13:51:28,379][1651669] Updated weights for policy 0, policy_version 192545 (0.0013) [2024-06-15 13:51:30,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 394395648. Throughput: 0: 12401.8. Samples: 98661888. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:51:30,767][1648981] Avg episode reward: [(0, '364.260')] [2024-06-15 13:51:30,997][1651669] Updated weights for policy 0, policy_version 192597 (0.0045) [2024-06-15 13:51:34,136][1651669] Updated weights for policy 0, policy_version 192674 (0.0012) [2024-06-15 13:51:35,766][1648981] Fps is (10 sec: 52702.4, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 394657792. Throughput: 0: 12562.9. Samples: 98745344. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:51:35,767][1648981] Avg episode reward: [(0, '366.600')] [2024-06-15 13:51:37,032][1651669] Updated weights for policy 0, policy_version 192706 (0.0022) [2024-06-15 13:51:38,700][1651669] Updated weights for policy 0, policy_version 192768 (0.0034) [2024-06-15 13:51:40,093][1651669] Updated weights for policy 0, policy_version 192820 (0.0014) [2024-06-15 13:51:40,778][1648981] Fps is (10 sec: 52367.3, 60 sec: 50234.4, 300 sec: 48428.1). Total num frames: 394919936. Throughput: 0: 12694.3. Samples: 98781696. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:51:40,779][1648981] Avg episode reward: [(0, '363.160')] [2024-06-15 13:51:41,906][1651669] Updated weights for policy 0, policy_version 192880 (0.0016) [2024-06-15 13:51:44,218][1651274] Signal inference workers to stop experience collection... (10150 times) [2024-06-15 13:51:44,255][1651669] InferenceWorker_p0-w0: stopping experience collection (10150 times) [2024-06-15 13:51:44,257][1651669] Updated weights for policy 0, policy_version 192914 (0.0034) [2024-06-15 13:51:44,472][1651274] Signal inference workers to resume experience collection... (10150 times) [2024-06-15 13:51:44,472][1651669] InferenceWorker_p0-w0: resuming experience collection (10150 times) [2024-06-15 13:51:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 395182080. Throughput: 0: 12618.0. Samples: 98853376. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:51:45,767][1648981] Avg episode reward: [(0, '365.090')] [2024-06-15 13:51:48,088][1651669] Updated weights for policy 0, policy_version 192976 (0.0014) [2024-06-15 13:51:49,861][1651669] Updated weights for policy 0, policy_version 193040 (0.0027) [2024-06-15 13:51:50,766][1648981] Fps is (10 sec: 49210.0, 60 sec: 49703.6, 300 sec: 48319.5). Total num frames: 395411456. Throughput: 0: 12414.1. Samples: 98920448. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:51:50,767][1648981] Avg episode reward: [(0, '363.290')] [2024-06-15 13:51:50,870][1651669] Updated weights for policy 0, policy_version 193082 (0.0012) [2024-06-15 13:51:52,974][1651669] Updated weights for policy 0, policy_version 193145 (0.0016) [2024-06-15 13:51:55,703][1651669] Updated weights for policy 0, policy_version 193209 (0.0090) [2024-06-15 13:51:55,798][1648981] Fps is (10 sec: 52265.6, 60 sec: 50218.2, 300 sec: 48424.9). Total num frames: 395706368. Throughput: 0: 12382.3. Samples: 98955264. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:51:55,798][1648981] Avg episode reward: [(0, '371.510')] [2024-06-15 13:52:00,587][1651669] Updated weights for policy 0, policy_version 193281 (0.0012) [2024-06-15 13:52:00,778][1648981] Fps is (10 sec: 42548.5, 60 sec: 48050.4, 300 sec: 47983.8). Total num frames: 395837440. Throughput: 0: 12318.9. Samples: 99035648. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:00,779][1648981] Avg episode reward: [(0, '376.000')] [2024-06-15 13:52:01,856][1651669] Updated weights for policy 0, policy_version 193334 (0.0012) [2024-06-15 13:52:03,866][1651669] Updated weights for policy 0, policy_version 193379 (0.0015) [2024-06-15 13:52:05,766][1648981] Fps is (10 sec: 39445.1, 60 sec: 49701.1, 300 sec: 48320.9). Total num frames: 396099584. Throughput: 0: 12140.1. Samples: 99098624. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:05,767][1648981] Avg episode reward: [(0, '378.710')] [2024-06-15 13:52:06,381][1651669] Updated weights for policy 0, policy_version 193440 (0.0015) [2024-06-15 13:52:10,767][1648981] Fps is (10 sec: 42644.5, 60 sec: 46421.9, 300 sec: 47763.4). Total num frames: 396263424. Throughput: 0: 12062.7. Samples: 99139072. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:10,768][1648981] Avg episode reward: [(0, '365.180')] [2024-06-15 13:52:11,037][1651669] Updated weights for policy 0, policy_version 193506 (0.0012) [2024-06-15 13:52:12,719][1651669] Updated weights for policy 0, policy_version 193572 (0.0013) [2024-06-15 13:52:14,381][1651669] Updated weights for policy 0, policy_version 193632 (0.0012) [2024-06-15 13:52:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 396623872. Throughput: 0: 12003.6. Samples: 99202048. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:15,767][1648981] Avg episode reward: [(0, '356.950')] [2024-06-15 13:52:18,135][1651669] Updated weights for policy 0, policy_version 193696 (0.0014) [2024-06-15 13:52:20,766][1648981] Fps is (10 sec: 49156.6, 60 sec: 46421.3, 300 sec: 47988.3). Total num frames: 396754944. Throughput: 0: 11832.9. Samples: 99277824. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:20,767][1648981] Avg episode reward: [(0, '357.050')] [2024-06-15 13:52:22,376][1651669] Updated weights for policy 0, policy_version 193749 (0.0024) [2024-06-15 13:52:23,678][1651669] Updated weights for policy 0, policy_version 193811 (0.0023) [2024-06-15 13:52:25,084][1651274] Signal inference workers to stop experience collection... (10200 times) [2024-06-15 13:52:25,158][1651669] InferenceWorker_p0-w0: stopping experience collection (10200 times) [2024-06-15 13:52:25,385][1651274] Signal inference workers to resume experience collection... (10200 times) [2024-06-15 13:52:25,387][1651669] InferenceWorker_p0-w0: resuming experience collection (10200 times) [2024-06-15 13:52:25,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49741.1, 300 sec: 48763.2). Total num frames: 397115392. Throughput: 0: 11904.3. Samples: 99317248. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:25,767][1648981] Avg episode reward: [(0, '357.760')] [2024-06-15 13:52:25,882][1651669] Updated weights for policy 0, policy_version 193911 (0.0106) [2024-06-15 13:52:29,071][1651669] Updated weights for policy 0, policy_version 193953 (0.0014) [2024-06-15 13:52:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 397279232. Throughput: 0: 11650.9. Samples: 99377664. Policy #0 lag: (min: 79.0, avg: 180.6, max: 303.0) [2024-06-15 13:52:30,767][1648981] Avg episode reward: [(0, '353.330')] [2024-06-15 13:52:33,576][1651669] Updated weights for policy 0, policy_version 194006 (0.0013) [2024-06-15 13:52:35,100][1651669] Updated weights for policy 0, policy_version 194082 (0.0014) [2024-06-15 13:52:35,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 397541376. Throughput: 0: 11901.2. Samples: 99456000. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:52:35,767][1648981] Avg episode reward: [(0, '378.000')] [2024-06-15 13:52:36,350][1651669] Updated weights for policy 0, policy_version 194147 (0.0013) [2024-06-15 13:52:38,809][1651669] Updated weights for policy 0, policy_version 194180 (0.0015) [2024-06-15 13:52:40,382][1651669] Updated weights for policy 0, policy_version 194240 (0.0011) [2024-06-15 13:52:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48069.2, 300 sec: 48430.0). Total num frames: 397803520. Throughput: 0: 11966.4. Samples: 99493376. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:52:40,767][1648981] Avg episode reward: [(0, '379.820')] [2024-06-15 13:52:44,686][1651669] Updated weights for policy 0, policy_version 194294 (0.0013) [2024-06-15 13:52:45,485][1651669] Updated weights for policy 0, policy_version 194341 (0.0012) [2024-06-15 13:52:45,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 48318.9). Total num frames: 398032896. Throughput: 0: 11881.5. Samples: 99570176. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:52:45,767][1648981] Avg episode reward: [(0, '386.800')] [2024-06-15 13:52:46,572][1651669] Updated weights for policy 0, policy_version 194401 (0.0012) [2024-06-15 13:52:50,495][1651669] Updated weights for policy 0, policy_version 194480 (0.0080) [2024-06-15 13:52:50,769][1648981] Fps is (10 sec: 49141.2, 60 sec: 48058.0, 300 sec: 48318.6). Total num frames: 398295040. Throughput: 0: 12150.9. Samples: 99645440. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:52:50,769][1648981] Avg episode reward: [(0, '384.190')] [2024-06-15 13:52:54,427][1651669] Updated weights for policy 0, policy_version 194515 (0.0023) [2024-06-15 13:52:55,794][1648981] Fps is (10 sec: 45747.6, 60 sec: 46424.0, 300 sec: 48092.2). Total num frames: 398491648. Throughput: 0: 12178.3. Samples: 99687424. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:52:55,795][1648981] Avg episode reward: [(0, '387.030')] [2024-06-15 13:52:55,999][1651669] Updated weights for policy 0, policy_version 194594 (0.0012) [2024-06-15 13:52:56,243][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000194608_398557184.pth... [2024-06-15 13:52:56,401][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000188928_386924544.pth [2024-06-15 13:52:57,568][1651669] Updated weights for policy 0, policy_version 194683 (0.0140) [2024-06-15 13:53:00,770][1648981] Fps is (10 sec: 49144.0, 60 sec: 49158.5, 300 sec: 48207.2). Total num frames: 398786560. Throughput: 0: 12252.8. Samples: 99753472. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:00,771][1648981] Avg episode reward: [(0, '381.690')] [2024-06-15 13:53:01,097][1651669] Updated weights for policy 0, policy_version 194743 (0.0013) [2024-06-15 13:53:04,953][1651669] Updated weights for policy 0, policy_version 194786 (0.0013) [2024-06-15 13:53:05,314][1651274] Signal inference workers to stop experience collection... (10250 times) [2024-06-15 13:53:05,388][1651669] InferenceWorker_p0-w0: stopping experience collection (10250 times) [2024-06-15 13:53:05,631][1651274] Signal inference workers to resume experience collection... (10250 times) [2024-06-15 13:53:05,632][1651669] InferenceWorker_p0-w0: resuming experience collection (10250 times) [2024-06-15 13:53:05,766][1648981] Fps is (10 sec: 49288.9, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 398983168. Throughput: 0: 12424.5. Samples: 99836928. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:05,767][1648981] Avg episode reward: [(0, '391.530')] [2024-06-15 13:53:06,229][1651669] Updated weights for policy 0, policy_version 194834 (0.0013) [2024-06-15 13:53:07,788][1651669] Updated weights for policy 0, policy_version 194912 (0.0014) [2024-06-15 13:53:10,766][1648981] Fps is (10 sec: 45892.6, 60 sec: 49698.9, 300 sec: 48207.8). Total num frames: 399245312. Throughput: 0: 12174.2. Samples: 99865088. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:10,767][1648981] Avg episode reward: [(0, '397.340')] [2024-06-15 13:53:11,126][1651669] Updated weights for policy 0, policy_version 194976 (0.0020) [2024-06-15 13:53:15,763][1651669] Updated weights for policy 0, policy_version 195040 (0.0014) [2024-06-15 13:53:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 48098.1). Total num frames: 399441920. Throughput: 0: 12731.7. Samples: 99950592. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:15,767][1648981] Avg episode reward: [(0, '407.780')] [2024-06-15 13:53:16,115][1651274] Saving new best policy, reward=407.780! [2024-06-15 13:53:17,536][1651669] Updated weights for policy 0, policy_version 195120 (0.0076) [2024-06-15 13:53:18,985][1651669] Updated weights for policy 0, policy_version 195184 (0.0017) [2024-06-15 13:53:20,774][1648981] Fps is (10 sec: 52388.2, 60 sec: 50237.8, 300 sec: 48428.8). Total num frames: 399769600. Throughput: 0: 12376.9. Samples: 100013056. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:20,775][1648981] Avg episode reward: [(0, '398.690')] [2024-06-15 13:53:21,189][1651669] Updated weights for policy 0, policy_version 195221 (0.0014) [2024-06-15 13:53:22,012][1651669] Updated weights for policy 0, policy_version 195261 (0.0012) [2024-06-15 13:53:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 48207.8). Total num frames: 399900672. Throughput: 0: 12333.5. Samples: 100048384. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:25,767][1648981] Avg episode reward: [(0, '402.400')] [2024-06-15 13:53:27,217][1651669] Updated weights for policy 0, policy_version 195331 (0.0013) [2024-06-15 13:53:28,364][1651669] Updated weights for policy 0, policy_version 195385 (0.0011) [2024-06-15 13:53:29,775][1651669] Updated weights for policy 0, policy_version 195440 (0.0012) [2024-06-15 13:53:30,794][1648981] Fps is (10 sec: 52324.0, 60 sec: 50221.0, 300 sec: 48536.5). Total num frames: 400293888. Throughput: 0: 12382.7. Samples: 100127744. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:30,795][1648981] Avg episode reward: [(0, '400.550')] [2024-06-15 13:53:31,395][1651669] Updated weights for policy 0, policy_version 195475 (0.0015) [2024-06-15 13:53:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 400424960. Throughput: 0: 12482.0. Samples: 100207104. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:35,767][1648981] Avg episode reward: [(0, '402.840')] [2024-06-15 13:53:36,015][1651669] Updated weights for policy 0, policy_version 195521 (0.0018) [2024-06-15 13:53:37,117][1651669] Updated weights for policy 0, policy_version 195583 (0.0014) [2024-06-15 13:53:39,460][1651669] Updated weights for policy 0, policy_version 195664 (0.0039) [2024-06-15 13:53:40,767][1648981] Fps is (10 sec: 52573.8, 60 sec: 50244.1, 300 sec: 48875.4). Total num frames: 400818176. Throughput: 0: 12215.9. Samples: 100236800. Policy #0 lag: (min: 111.0, avg: 182.4, max: 367.0) [2024-06-15 13:53:40,767][1648981] Avg episode reward: [(0, '399.510')] [2024-06-15 13:53:42,920][1651274] Signal inference workers to stop experience collection... (10300 times) [2024-06-15 13:53:42,968][1651669] InferenceWorker_p0-w0: stopping experience collection (10300 times) [2024-06-15 13:53:43,198][1651274] Signal inference workers to resume experience collection... (10300 times) [2024-06-15 13:53:43,199][1651669] InferenceWorker_p0-w0: resuming experience collection (10300 times) [2024-06-15 13:53:43,202][1651669] Updated weights for policy 0, policy_version 195744 (0.0014) [2024-06-15 13:53:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 400949248. Throughput: 0: 12015.9. Samples: 100294144. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:53:45,767][1648981] Avg episode reward: [(0, '395.670')] [2024-06-15 13:53:48,711][1651669] Updated weights for policy 0, policy_version 195808 (0.0014) [2024-06-15 13:53:49,861][1651669] Updated weights for policy 0, policy_version 195858 (0.0012) [2024-06-15 13:53:50,766][1648981] Fps is (10 sec: 36045.8, 60 sec: 48061.5, 300 sec: 48318.9). Total num frames: 401178624. Throughput: 0: 11810.2. Samples: 100368384. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:53:50,767][1648981] Avg episode reward: [(0, '395.660')] [2024-06-15 13:53:51,798][1651669] Updated weights for policy 0, policy_version 195936 (0.0100) [2024-06-15 13:53:55,524][1651669] Updated weights for policy 0, policy_version 196004 (0.0040) [2024-06-15 13:53:55,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49174.8, 300 sec: 48319.6). Total num frames: 401440768. Throughput: 0: 11810.1. Samples: 100396544. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:53:55,767][1648981] Avg episode reward: [(0, '403.600')] [2024-06-15 13:53:59,720][1651669] Updated weights for policy 0, policy_version 196040 (0.0012) [2024-06-15 13:54:00,615][1651669] Updated weights for policy 0, policy_version 196088 (0.0015) [2024-06-15 13:54:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46970.5, 300 sec: 47986.4). Total num frames: 401604608. Throughput: 0: 11514.3. Samples: 100468736. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:00,767][1648981] Avg episode reward: [(0, '381.720')] [2024-06-15 13:54:02,477][1651669] Updated weights for policy 0, policy_version 196146 (0.0014) [2024-06-15 13:54:03,955][1651669] Updated weights for policy 0, policy_version 196217 (0.0012) [2024-06-15 13:54:05,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 401932288. Throughput: 0: 11687.0. Samples: 100538880. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:05,767][1648981] Avg episode reward: [(0, '384.720')] [2024-06-15 13:54:05,842][1651669] Updated weights for policy 0, policy_version 196272 (0.0011) [2024-06-15 13:54:10,289][1651669] Updated weights for policy 0, policy_version 196320 (0.0012) [2024-06-15 13:54:10,776][1648981] Fps is (10 sec: 49102.7, 60 sec: 47505.7, 300 sec: 47984.1). Total num frames: 402096128. Throughput: 0: 11853.0. Samples: 100581888. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:10,777][1648981] Avg episode reward: [(0, '377.730')] [2024-06-15 13:54:12,889][1651669] Updated weights for policy 0, policy_version 196384 (0.0012) [2024-06-15 13:54:14,111][1651669] Updated weights for policy 0, policy_version 196448 (0.0014) [2024-06-15 13:54:15,029][1651669] Updated weights for policy 0, policy_version 196484 (0.0017) [2024-06-15 13:54:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.4, 300 sec: 48430.6). Total num frames: 402456576. Throughput: 0: 11703.6. Samples: 100654080. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:15,767][1648981] Avg episode reward: [(0, '376.570')] [2024-06-15 13:54:16,169][1651669] Updated weights for policy 0, policy_version 196537 (0.0016) [2024-06-15 13:54:20,766][1648981] Fps is (10 sec: 49201.3, 60 sec: 46973.6, 300 sec: 48318.9). Total num frames: 402587648. Throughput: 0: 11810.1. Samples: 100738560. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:20,767][1648981] Avg episode reward: [(0, '366.050')] [2024-06-15 13:54:20,964][1651669] Updated weights for policy 0, policy_version 196592 (0.0012) [2024-06-15 13:54:23,860][1651669] Updated weights for policy 0, policy_version 196656 (0.0014) [2024-06-15 13:54:24,722][1651274] Signal inference workers to stop experience collection... (10350 times) [2024-06-15 13:54:24,751][1651669] InferenceWorker_p0-w0: stopping experience collection (10350 times) [2024-06-15 13:54:24,936][1651274] Signal inference workers to resume experience collection... (10350 times) [2024-06-15 13:54:24,937][1651669] InferenceWorker_p0-w0: resuming experience collection (10350 times) [2024-06-15 13:54:25,644][1651669] Updated weights for policy 0, policy_version 196736 (0.0012) [2024-06-15 13:54:25,770][1648981] Fps is (10 sec: 45857.5, 60 sec: 50241.1, 300 sec: 48654.1). Total num frames: 402915328. Throughput: 0: 11866.1. Samples: 100770816. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:25,771][1648981] Avg episode reward: [(0, '367.900')] [2024-06-15 13:54:26,767][1651669] Updated weights for policy 0, policy_version 196800 (0.0013) [2024-06-15 13:54:30,770][1648981] Fps is (10 sec: 49133.3, 60 sec: 46439.9, 300 sec: 48540.4). Total num frames: 403079168. Throughput: 0: 12264.2. Samples: 100846080. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:30,771][1648981] Avg episode reward: [(0, '364.420')] [2024-06-15 13:54:31,683][1651669] Updated weights for policy 0, policy_version 196864 (0.0020) [2024-06-15 13:54:34,755][1651669] Updated weights for policy 0, policy_version 196928 (0.0265) [2024-06-15 13:54:35,766][1648981] Fps is (10 sec: 49171.0, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 403406848. Throughput: 0: 12128.7. Samples: 100914176. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:35,767][1648981] Avg episode reward: [(0, '349.470')] [2024-06-15 13:54:36,271][1651669] Updated weights for policy 0, policy_version 197003 (0.0019) [2024-06-15 13:54:37,169][1651669] Updated weights for policy 0, policy_version 197052 (0.0021) [2024-06-15 13:54:40,766][1648981] Fps is (10 sec: 49170.6, 60 sec: 45875.4, 300 sec: 48430.0). Total num frames: 403570688. Throughput: 0: 12288.0. Samples: 100949504. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:40,767][1648981] Avg episode reward: [(0, '349.350')] [2024-06-15 13:54:42,525][1651669] Updated weights for policy 0, policy_version 197119 (0.0038) [2024-06-15 13:54:44,837][1651669] Updated weights for policy 0, policy_version 197168 (0.0014) [2024-06-15 13:54:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 403865600. Throughput: 0: 12526.9. Samples: 101032448. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:45,767][1648981] Avg episode reward: [(0, '350.000')] [2024-06-15 13:54:46,673][1651669] Updated weights for policy 0, policy_version 197238 (0.0106) [2024-06-15 13:54:47,919][1651669] Updated weights for policy 0, policy_version 197308 (0.0013) [2024-06-15 13:54:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.8, 300 sec: 48430.8). Total num frames: 404094976. Throughput: 0: 12492.8. Samples: 101101056. Policy #0 lag: (min: 4.0, avg: 140.1, max: 260.0) [2024-06-15 13:54:50,767][1648981] Avg episode reward: [(0, '345.870')] [2024-06-15 13:54:53,223][1651669] Updated weights for policy 0, policy_version 197360 (0.0013) [2024-06-15 13:54:55,135][1651669] Updated weights for policy 0, policy_version 197408 (0.0012) [2024-06-15 13:54:55,774][1648981] Fps is (10 sec: 45839.3, 60 sec: 48053.4, 300 sec: 48317.6). Total num frames: 404324352. Throughput: 0: 12482.0. Samples: 101143552. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:54:55,775][1648981] Avg episode reward: [(0, '340.460')] [2024-06-15 13:54:56,277][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000197456_404389888.pth... [2024-06-15 13:54:56,381][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000191744_392691712.pth [2024-06-15 13:54:56,609][1651669] Updated weights for policy 0, policy_version 197472 (0.0014) [2024-06-15 13:54:57,996][1651669] Updated weights for policy 0, policy_version 197523 (0.0012) [2024-06-15 13:55:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 404619264. Throughput: 0: 12310.7. Samples: 101208064. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:00,767][1648981] Avg episode reward: [(0, '352.290')] [2024-06-15 13:55:03,554][1651669] Updated weights for policy 0, policy_version 197584 (0.0013) [2024-06-15 13:55:03,984][1651274] Signal inference workers to stop experience collection... (10400 times) [2024-06-15 13:55:04,055][1651669] InferenceWorker_p0-w0: stopping experience collection (10400 times) [2024-06-15 13:55:04,159][1651274] Signal inference workers to resume experience collection... (10400 times) [2024-06-15 13:55:04,160][1651669] InferenceWorker_p0-w0: resuming experience collection (10400 times) [2024-06-15 13:55:04,414][1651669] Updated weights for policy 0, policy_version 197632 (0.0012) [2024-06-15 13:55:05,766][1648981] Fps is (10 sec: 45911.5, 60 sec: 47513.7, 300 sec: 48430.0). Total num frames: 404783104. Throughput: 0: 12299.4. Samples: 101292032. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:05,767][1648981] Avg episode reward: [(0, '329.880')] [2024-06-15 13:55:06,769][1651669] Updated weights for policy 0, policy_version 197696 (0.0012) [2024-06-15 13:55:08,281][1651669] Updated weights for policy 0, policy_version 197746 (0.0014) [2024-06-15 13:55:09,835][1651669] Updated weights for policy 0, policy_version 197818 (0.0012) [2024-06-15 13:55:10,770][1648981] Fps is (10 sec: 52409.2, 60 sec: 50795.7, 300 sec: 48875.0). Total num frames: 405143552. Throughput: 0: 12174.2. Samples: 101318656. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:10,771][1648981] Avg episode reward: [(0, '332.060')] [2024-06-15 13:55:14,657][1651669] Updated weights for policy 0, policy_version 197872 (0.0045) [2024-06-15 13:55:15,784][1648981] Fps is (10 sec: 49066.4, 60 sec: 46953.8, 300 sec: 48760.4). Total num frames: 405274624. Throughput: 0: 12204.7. Samples: 101395456. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:15,784][1648981] Avg episode reward: [(0, '339.240')] [2024-06-15 13:55:16,718][1651669] Updated weights for policy 0, policy_version 197920 (0.0013) [2024-06-15 13:55:18,554][1651669] Updated weights for policy 0, policy_version 197985 (0.0017) [2024-06-15 13:55:20,151][1651669] Updated weights for policy 0, policy_version 198048 (0.0012) [2024-06-15 13:55:20,767][1648981] Fps is (10 sec: 49169.2, 60 sec: 50790.2, 300 sec: 49096.7). Total num frames: 405635072. Throughput: 0: 12196.9. Samples: 101463040. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:20,767][1648981] Avg episode reward: [(0, '361.140')] [2024-06-15 13:55:25,119][1651669] Updated weights for policy 0, policy_version 198103 (0.0022) [2024-06-15 13:55:25,805][1648981] Fps is (10 sec: 52316.2, 60 sec: 48031.7, 300 sec: 48867.9). Total num frames: 405798912. Throughput: 0: 12334.2. Samples: 101505024. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:25,806][1648981] Avg episode reward: [(0, '373.320')] [2024-06-15 13:55:27,694][1651669] Updated weights for policy 0, policy_version 198176 (0.0025) [2024-06-15 13:55:29,426][1651669] Updated weights for policy 0, policy_version 198245 (0.0147) [2024-06-15 13:55:30,766][1648981] Fps is (10 sec: 45877.0, 60 sec: 50247.5, 300 sec: 48985.4). Total num frames: 406093824. Throughput: 0: 12071.9. Samples: 101575680. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:30,767][1648981] Avg episode reward: [(0, '377.000')] [2024-06-15 13:55:31,605][1651669] Updated weights for policy 0, policy_version 198327 (0.0012) [2024-06-15 13:55:35,766][1648981] Fps is (10 sec: 39475.4, 60 sec: 46421.3, 300 sec: 48430.0). Total num frames: 406192128. Throughput: 0: 12128.7. Samples: 101646848. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:35,767][1648981] Avg episode reward: [(0, '383.330')] [2024-06-15 13:55:36,524][1651669] Updated weights for policy 0, policy_version 198371 (0.0012) [2024-06-15 13:55:38,550][1651669] Updated weights for policy 0, policy_version 198411 (0.0012) [2024-06-15 13:55:39,994][1651669] Updated weights for policy 0, policy_version 198480 (0.0015) [2024-06-15 13:55:40,766][1648981] Fps is (10 sec: 45874.2, 60 sec: 49698.0, 300 sec: 48763.2). Total num frames: 406552576. Throughput: 0: 12085.3. Samples: 101687296. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:40,767][1648981] Avg episode reward: [(0, '378.660')] [2024-06-15 13:55:40,862][1651274] Signal inference workers to stop experience collection... (10450 times) [2024-06-15 13:55:40,882][1651669] InferenceWorker_p0-w0: stopping experience collection (10450 times) [2024-06-15 13:55:41,056][1651274] Signal inference workers to resume experience collection... (10450 times) [2024-06-15 13:55:41,057][1651669] InferenceWorker_p0-w0: resuming experience collection (10450 times) [2024-06-15 13:55:41,406][1651669] Updated weights for policy 0, policy_version 198544 (0.0124) [2024-06-15 13:55:45,540][1651669] Updated weights for policy 0, policy_version 198593 (0.0014) [2024-06-15 13:55:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.7, 300 sec: 48431.1). Total num frames: 406716416. Throughput: 0: 12219.8. Samples: 101757952. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:45,767][1648981] Avg episode reward: [(0, '389.590')] [2024-06-15 13:55:46,751][1651669] Updated weights for policy 0, policy_version 198642 (0.0020) [2024-06-15 13:55:49,231][1651669] Updated weights for policy 0, policy_version 198704 (0.0012) [2024-06-15 13:55:50,095][1651669] Updated weights for policy 0, policy_version 198738 (0.0035) [2024-06-15 13:55:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 407076864. Throughput: 0: 12014.9. Samples: 101832704. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:50,767][1648981] Avg episode reward: [(0, '386.190')] [2024-06-15 13:55:50,963][1651669] Updated weights for policy 0, policy_version 198784 (0.0013) [2024-06-15 13:55:55,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48612.1, 300 sec: 48430.0). Total num frames: 407240704. Throughput: 0: 12266.2. Samples: 101870592. Policy #0 lag: (min: 15.0, avg: 97.0, max: 271.0) [2024-06-15 13:55:55,767][1648981] Avg episode reward: [(0, '386.530')] [2024-06-15 13:55:56,268][1651669] Updated weights for policy 0, policy_version 198850 (0.0015) [2024-06-15 13:55:57,435][1651669] Updated weights for policy 0, policy_version 198901 (0.0034) [2024-06-15 13:55:58,408][1651669] Updated weights for policy 0, policy_version 198928 (0.0011) [2024-06-15 13:55:59,947][1651669] Updated weights for policy 0, policy_version 198992 (0.0106) [2024-06-15 13:56:00,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 49695.0, 300 sec: 49096.4). Total num frames: 407601152. Throughput: 0: 12394.2. Samples: 101953024. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:00,771][1648981] Avg episode reward: [(0, '387.390')] [2024-06-15 13:56:01,848][1651669] Updated weights for policy 0, policy_version 199041 (0.0026) [2024-06-15 13:56:05,782][1648981] Fps is (10 sec: 52345.6, 60 sec: 49684.8, 300 sec: 48427.6). Total num frames: 407764992. Throughput: 0: 12442.9. Samples: 102023168. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:05,786][1648981] Avg episode reward: [(0, '383.350')] [2024-06-15 13:56:06,511][1651669] Updated weights for policy 0, policy_version 199106 (0.0014) [2024-06-15 13:56:08,968][1651669] Updated weights for policy 0, policy_version 199172 (0.0022) [2024-06-15 13:56:10,113][1651669] Updated weights for policy 0, policy_version 199230 (0.0014) [2024-06-15 13:56:10,766][1648981] Fps is (10 sec: 42614.5, 60 sec: 48062.8, 300 sec: 48874.3). Total num frames: 408027136. Throughput: 0: 12253.1. Samples: 102055936. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:10,767][1648981] Avg episode reward: [(0, '382.370')] [2024-06-15 13:56:11,881][1651669] Updated weights for policy 0, policy_version 199284 (0.0013) [2024-06-15 13:56:13,736][1651669] Updated weights for policy 0, policy_version 199315 (0.0012) [2024-06-15 13:56:14,567][1651669] Updated weights for policy 0, policy_version 199356 (0.0011) [2024-06-15 13:56:15,766][1648981] Fps is (10 sec: 52513.1, 60 sec: 50258.9, 300 sec: 48541.1). Total num frames: 408289280. Throughput: 0: 12242.5. Samples: 102126592. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:15,767][1648981] Avg episode reward: [(0, '376.270')] [2024-06-15 13:56:18,736][1651669] Updated weights for policy 0, policy_version 199421 (0.0014) [2024-06-15 13:56:20,770][1648981] Fps is (10 sec: 52407.8, 60 sec: 48602.8, 300 sec: 48882.2). Total num frames: 408551424. Throughput: 0: 12241.4. Samples: 102197760. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:20,771][1648981] Avg episode reward: [(0, '390.040')] [2024-06-15 13:56:21,671][1651669] Updated weights for policy 0, policy_version 199490 (0.0013) [2024-06-15 13:56:22,496][1651274] Signal inference workers to stop experience collection... (10500 times) [2024-06-15 13:56:22,551][1651669] InferenceWorker_p0-w0: stopping experience collection (10500 times) [2024-06-15 13:56:22,744][1651274] Signal inference workers to resume experience collection... (10500 times) [2024-06-15 13:56:22,745][1651669] InferenceWorker_p0-w0: resuming experience collection (10500 times) [2024-06-15 13:56:23,076][1651669] Updated weights for policy 0, policy_version 199552 (0.0020) [2024-06-15 13:56:25,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 49730.3, 300 sec: 48763.2). Total num frames: 408780800. Throughput: 0: 12219.7. Samples: 102237184. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:25,767][1648981] Avg episode reward: [(0, '385.120')] [2024-06-15 13:56:25,832][1651669] Updated weights for policy 0, policy_version 199612 (0.0014) [2024-06-15 13:56:29,722][1651669] Updated weights for policy 0, policy_version 199673 (0.0013) [2024-06-15 13:56:30,766][1648981] Fps is (10 sec: 42615.6, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 408977408. Throughput: 0: 12276.6. Samples: 102310400. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:30,767][1648981] Avg episode reward: [(0, '382.790')] [2024-06-15 13:56:31,467][1651669] Updated weights for policy 0, policy_version 199740 (0.0014) [2024-06-15 13:56:33,685][1651669] Updated weights for policy 0, policy_version 199779 (0.0076) [2024-06-15 13:56:35,506][1651669] Updated weights for policy 0, policy_version 199812 (0.0013) [2024-06-15 13:56:35,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 50790.3, 300 sec: 48543.0). Total num frames: 409239552. Throughput: 0: 12242.5. Samples: 102383616. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:35,767][1648981] Avg episode reward: [(0, '375.770')] [2024-06-15 13:56:36,750][1651669] Updated weights for policy 0, policy_version 199868 (0.0013) [2024-06-15 13:56:40,148][1651669] Updated weights for policy 0, policy_version 199920 (0.0014) [2024-06-15 13:56:40,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 409468928. Throughput: 0: 12174.3. Samples: 102418432. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:40,767][1648981] Avg episode reward: [(0, '369.380')] [2024-06-15 13:56:42,441][1651669] Updated weights for policy 0, policy_version 199998 (0.0013) [2024-06-15 13:56:44,922][1651669] Updated weights for policy 0, policy_version 200050 (0.0014) [2024-06-15 13:56:45,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.2, 300 sec: 48541.1). Total num frames: 409731072. Throughput: 0: 11833.9. Samples: 102485504. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:45,767][1648981] Avg episode reward: [(0, '370.820')] [2024-06-15 13:56:46,208][1651669] Updated weights for policy 0, policy_version 200066 (0.0012) [2024-06-15 13:56:47,502][1651669] Updated weights for policy 0, policy_version 200128 (0.0012) [2024-06-15 13:56:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48324.0). Total num frames: 409960448. Throughput: 0: 12053.4. Samples: 102565376. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:50,767][1648981] Avg episode reward: [(0, '367.890')] [2024-06-15 13:56:50,832][1651669] Updated weights for policy 0, policy_version 200182 (0.0014) [2024-06-15 13:56:52,249][1651669] Updated weights for policy 0, policy_version 200230 (0.0013) [2024-06-15 13:56:55,218][1651669] Updated weights for policy 0, policy_version 200304 (0.0013) [2024-06-15 13:56:55,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 50244.1, 300 sec: 48876.2). Total num frames: 410255360. Throughput: 0: 12196.9. Samples: 102604800. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:56:55,767][1648981] Avg episode reward: [(0, '373.630')] [2024-06-15 13:56:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000200320_410255360.pth... [2024-06-15 13:56:55,827][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000194608_398557184.pth [2024-06-15 13:56:57,772][1651669] Updated weights for policy 0, policy_version 200356 (0.0013) [2024-06-15 13:56:59,748][1651669] Updated weights for policy 0, policy_version 200400 (0.0013) [2024-06-15 13:57:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48062.7, 300 sec: 48763.2). Total num frames: 410484736. Throughput: 0: 12390.4. Samples: 102684160. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:57:00,767][1648981] Avg episode reward: [(0, '388.700')] [2024-06-15 13:57:01,878][1651669] Updated weights for policy 0, policy_version 200449 (0.0014) [2024-06-15 13:57:03,162][1651669] Updated weights for policy 0, policy_version 200508 (0.0040) [2024-06-15 13:57:05,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 48618.9, 300 sec: 48874.5). Total num frames: 410681344. Throughput: 0: 12311.9. Samples: 102751744. Policy #0 lag: (min: 49.0, avg: 141.5, max: 305.0) [2024-06-15 13:57:05,767][1648981] Avg episode reward: [(0, '383.440')] [2024-06-15 13:57:06,166][1651669] Updated weights for policy 0, policy_version 200564 (0.0014) [2024-06-15 13:57:08,635][1651274] Signal inference workers to stop experience collection... (10550 times) [2024-06-15 13:57:08,695][1651669] InferenceWorker_p0-w0: stopping experience collection (10550 times) [2024-06-15 13:57:08,872][1651274] Signal inference workers to resume experience collection... (10550 times) [2024-06-15 13:57:08,873][1651669] InferenceWorker_p0-w0: resuming experience collection (10550 times) [2024-06-15 13:57:09,473][1651669] Updated weights for policy 0, policy_version 200628 (0.0012) [2024-06-15 13:57:10,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 410910720. Throughput: 0: 12253.9. Samples: 102788608. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:10,767][1648981] Avg episode reward: [(0, '379.180')] [2024-06-15 13:57:11,933][1651669] Updated weights for policy 0, policy_version 200692 (0.0014) [2024-06-15 13:57:14,182][1651669] Updated weights for policy 0, policy_version 200752 (0.0131) [2024-06-15 13:57:15,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 411172864. Throughput: 0: 11935.3. Samples: 102847488. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:15,767][1648981] Avg episode reward: [(0, '381.570')] [2024-06-15 13:57:17,918][1651669] Updated weights for policy 0, policy_version 200816 (0.0029) [2024-06-15 13:57:20,102][1651669] Updated weights for policy 0, policy_version 200864 (0.0046) [2024-06-15 13:57:20,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 47516.8, 300 sec: 48430.0). Total num frames: 411402240. Throughput: 0: 11946.7. Samples: 102921216. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:20,767][1648981] Avg episode reward: [(0, '388.910')] [2024-06-15 13:57:22,949][1651669] Updated weights for policy 0, policy_version 200950 (0.0015) [2024-06-15 13:57:25,083][1651669] Updated weights for policy 0, policy_version 201008 (0.0082) [2024-06-15 13:57:25,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 48605.8, 300 sec: 48874.3). Total num frames: 411697152. Throughput: 0: 11946.6. Samples: 102956032. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:25,767][1648981] Avg episode reward: [(0, '394.880')] [2024-06-15 13:57:29,220][1651669] Updated weights for policy 0, policy_version 201058 (0.0014) [2024-06-15 13:57:30,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 47513.5, 300 sec: 48430.0). Total num frames: 411828224. Throughput: 0: 12060.4. Samples: 103028224. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:30,767][1648981] Avg episode reward: [(0, '389.270')] [2024-06-15 13:57:31,848][1651669] Updated weights for policy 0, policy_version 201136 (0.0013) [2024-06-15 13:57:32,828][1651669] Updated weights for policy 0, policy_version 201168 (0.0013) [2024-06-15 13:57:35,268][1651669] Updated weights for policy 0, policy_version 201218 (0.0012) [2024-06-15 13:57:35,767][1648981] Fps is (10 sec: 42596.9, 60 sec: 48059.3, 300 sec: 48541.0). Total num frames: 412123136. Throughput: 0: 12026.2. Samples: 103106560. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:35,767][1648981] Avg episode reward: [(0, '389.890')] [2024-06-15 13:57:36,519][1651669] Updated weights for policy 0, policy_version 201270 (0.0013) [2024-06-15 13:57:39,623][1651669] Updated weights for policy 0, policy_version 201312 (0.0025) [2024-06-15 13:57:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 412352512. Throughput: 0: 11935.4. Samples: 103141888. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:40,767][1648981] Avg episode reward: [(0, '383.570')] [2024-06-15 13:57:41,503][1651669] Updated weights for policy 0, policy_version 201360 (0.0014) [2024-06-15 13:57:43,478][1651669] Updated weights for policy 0, policy_version 201441 (0.0121) [2024-06-15 13:57:45,767][1648981] Fps is (10 sec: 49154.0, 60 sec: 48059.6, 300 sec: 48541.4). Total num frames: 412614656. Throughput: 0: 11719.1. Samples: 103211520. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:45,767][1648981] Avg episode reward: [(0, '384.530')] [2024-06-15 13:57:46,540][1651669] Updated weights for policy 0, policy_version 201504 (0.0014) [2024-06-15 13:57:47,255][1651669] Updated weights for policy 0, policy_version 201535 (0.0012) [2024-06-15 13:57:50,358][1651669] Updated weights for policy 0, policy_version 201589 (0.0013) [2024-06-15 13:57:50,778][1648981] Fps is (10 sec: 52367.2, 60 sec: 48596.3, 300 sec: 48765.9). Total num frames: 412876800. Throughput: 0: 12011.8. Samples: 103292416. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:50,779][1648981] Avg episode reward: [(0, '377.870')] [2024-06-15 13:57:52,163][1651274] Signal inference workers to stop experience collection... (10600 times) [2024-06-15 13:57:52,268][1651669] InferenceWorker_p0-w0: stopping experience collection (10600 times) [2024-06-15 13:57:52,364][1651274] Signal inference workers to resume experience collection... (10600 times) [2024-06-15 13:57:52,374][1651669] InferenceWorker_p0-w0: resuming experience collection (10600 times) [2024-06-15 13:57:52,411][1651669] Updated weights for policy 0, policy_version 201650 (0.0013) [2024-06-15 13:57:54,701][1651669] Updated weights for policy 0, policy_version 201721 (0.0013) [2024-06-15 13:57:55,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48060.0, 300 sec: 48652.8). Total num frames: 413138944. Throughput: 0: 12003.6. Samples: 103328768. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:57:55,767][1648981] Avg episode reward: [(0, '385.940')] [2024-06-15 13:57:57,795][1651669] Updated weights for policy 0, policy_version 201776 (0.0015) [2024-06-15 13:58:00,766][1648981] Fps is (10 sec: 42648.5, 60 sec: 46967.5, 300 sec: 48541.1). Total num frames: 413302784. Throughput: 0: 12253.9. Samples: 103398912. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:58:00,767][1648981] Avg episode reward: [(0, '394.980')] [2024-06-15 13:58:01,109][1651669] Updated weights for policy 0, policy_version 201832 (0.0043) [2024-06-15 13:58:03,201][1651669] Updated weights for policy 0, policy_version 201896 (0.0019) [2024-06-15 13:58:04,271][1651669] Updated weights for policy 0, policy_version 201922 (0.0014) [2024-06-15 13:58:05,261][1651669] Updated weights for policy 0, policy_version 201972 (0.0014) [2024-06-15 13:58:05,782][1648981] Fps is (10 sec: 52346.0, 60 sec: 49685.0, 300 sec: 48871.7). Total num frames: 413663232. Throughput: 0: 12158.6. Samples: 103468544. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:58:05,783][1648981] Avg episode reward: [(0, '387.640')] [2024-06-15 13:58:08,354][1651669] Updated weights for policy 0, policy_version 202038 (0.0012) [2024-06-15 13:58:10,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48059.6, 300 sec: 48652.1). Total num frames: 413794304. Throughput: 0: 12379.0. Samples: 103513088. Policy #0 lag: (min: 49.0, avg: 157.0, max: 305.0) [2024-06-15 13:58:10,768][1648981] Avg episode reward: [(0, '399.380')] [2024-06-15 13:58:11,819][1651669] Updated weights for policy 0, policy_version 202081 (0.0014) [2024-06-15 13:58:13,513][1651669] Updated weights for policy 0, policy_version 202146 (0.0013) [2024-06-15 13:58:14,847][1651669] Updated weights for policy 0, policy_version 202193 (0.0014) [2024-06-15 13:58:15,766][1648981] Fps is (10 sec: 49230.3, 60 sec: 49698.3, 300 sec: 48764.5). Total num frames: 414154752. Throughput: 0: 12344.9. Samples: 103583744. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:15,767][1648981] Avg episode reward: [(0, '394.760')] [2024-06-15 13:58:18,912][1651669] Updated weights for policy 0, policy_version 202241 (0.0013) [2024-06-15 13:58:20,770][1648981] Fps is (10 sec: 52409.3, 60 sec: 48602.6, 300 sec: 48873.7). Total num frames: 414318592. Throughput: 0: 12139.2. Samples: 103652864. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:20,771][1648981] Avg episode reward: [(0, '392.430')] [2024-06-15 13:58:22,582][1651669] Updated weights for policy 0, policy_version 202320 (0.0062) [2024-06-15 13:58:24,282][1651669] Updated weights for policy 0, policy_version 202386 (0.0031) [2024-06-15 13:58:25,229][1651669] Updated weights for policy 0, policy_version 202428 (0.0012) [2024-06-15 13:58:25,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 48059.9, 300 sec: 48434.6). Total num frames: 414580736. Throughput: 0: 12197.0. Samples: 103690752. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:25,767][1648981] Avg episode reward: [(0, '393.850')] [2024-06-15 13:58:26,901][1651669] Updated weights for policy 0, policy_version 202485 (0.0011) [2024-06-15 13:58:30,766][1648981] Fps is (10 sec: 42615.3, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 414744576. Throughput: 0: 12231.2. Samples: 103761920. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:30,767][1648981] Avg episode reward: [(0, '389.540')] [2024-06-15 13:58:31,083][1651669] Updated weights for policy 0, policy_version 202535 (0.0014) [2024-06-15 13:58:34,360][1651669] Updated weights for policy 0, policy_version 202611 (0.0015) [2024-06-15 13:58:35,156][1651274] Signal inference workers to stop experience collection... (10650 times) [2024-06-15 13:58:35,231][1651669] InferenceWorker_p0-w0: stopping experience collection (10650 times) [2024-06-15 13:58:35,390][1651274] Signal inference workers to resume experience collection... (10650 times) [2024-06-15 13:58:35,391][1651669] InferenceWorker_p0-w0: resuming experience collection (10650 times) [2024-06-15 13:58:35,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 48606.2, 300 sec: 48207.8). Total num frames: 415039488. Throughput: 0: 11915.6. Samples: 103828480. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:35,767][1648981] Avg episode reward: [(0, '407.700')] [2024-06-15 13:58:36,128][1651669] Updated weights for policy 0, policy_version 202679 (0.0014) [2024-06-15 13:58:37,184][1651669] Updated weights for policy 0, policy_version 202720 (0.0013) [2024-06-15 13:58:37,929][1651669] Updated weights for policy 0, policy_version 202751 (0.0010) [2024-06-15 13:58:40,794][1648981] Fps is (10 sec: 49015.8, 60 sec: 48037.5, 300 sec: 48425.4). Total num frames: 415236096. Throughput: 0: 11791.5. Samples: 103859712. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:40,795][1648981] Avg episode reward: [(0, '409.900')] [2024-06-15 13:58:40,809][1651274] Saving new best policy, reward=409.900! [2024-06-15 13:58:42,265][1651669] Updated weights for policy 0, policy_version 202809 (0.0109) [2024-06-15 13:58:44,764][1651669] Updated weights for policy 0, policy_version 202851 (0.0127) [2024-06-15 13:58:45,782][1648981] Fps is (10 sec: 49075.2, 60 sec: 48593.2, 300 sec: 48649.5). Total num frames: 415531008. Throughput: 0: 12147.2. Samples: 103945728. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:45,783][1648981] Avg episode reward: [(0, '413.100')] [2024-06-15 13:58:46,119][1651669] Updated weights for policy 0, policy_version 202928 (0.0014) [2024-06-15 13:58:46,116][1651274] Saving new best policy, reward=413.100! [2024-06-15 13:58:47,478][1651669] Updated weights for policy 0, policy_version 202976 (0.0013) [2024-06-15 13:58:50,766][1648981] Fps is (10 sec: 52574.6, 60 sec: 48069.1, 300 sec: 48541.1). Total num frames: 415760384. Throughput: 0: 12155.7. Samples: 104015360. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:50,767][1648981] Avg episode reward: [(0, '410.270')] [2024-06-15 13:58:52,610][1651669] Updated weights for policy 0, policy_version 203031 (0.0013) [2024-06-15 13:58:53,914][1651669] Updated weights for policy 0, policy_version 203076 (0.0015) [2024-06-15 13:58:55,453][1651669] Updated weights for policy 0, policy_version 203159 (0.0148) [2024-06-15 13:58:55,766][1648981] Fps is (10 sec: 55793.6, 60 sec: 49152.0, 300 sec: 49096.4). Total num frames: 416088064. Throughput: 0: 12197.0. Samples: 104061952. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:58:55,767][1648981] Avg episode reward: [(0, '404.570')] [2024-06-15 13:58:56,018][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000203200_416153600.pth... [2024-06-15 13:58:56,103][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000197456_404389888.pth [2024-06-15 13:58:57,144][1651669] Updated weights for policy 0, policy_version 203220 (0.0013) [2024-06-15 13:59:00,767][1648981] Fps is (10 sec: 52426.5, 60 sec: 49697.8, 300 sec: 48652.1). Total num frames: 416284672. Throughput: 0: 12208.2. Samples: 104133120. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:59:00,767][1648981] Avg episode reward: [(0, '402.920')] [2024-06-15 13:59:02,946][1651669] Updated weights for policy 0, policy_version 203280 (0.0016) [2024-06-15 13:59:04,124][1651669] Updated weights for policy 0, policy_version 203327 (0.0018) [2024-06-15 13:59:05,617][1651669] Updated weights for policy 0, policy_version 203385 (0.0012) [2024-06-15 13:59:05,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 48072.3, 300 sec: 48987.0). Total num frames: 416546816. Throughput: 0: 12425.6. Samples: 104211968. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:59:05,767][1648981] Avg episode reward: [(0, '401.790')] [2024-06-15 13:59:07,435][1651669] Updated weights for policy 0, policy_version 203472 (0.0017) [2024-06-15 13:59:08,326][1651669] Updated weights for policy 0, policy_version 203516 (0.0013) [2024-06-15 13:59:10,766][1648981] Fps is (10 sec: 52431.4, 60 sec: 50244.5, 300 sec: 48652.1). Total num frames: 416808960. Throughput: 0: 12265.3. Samples: 104242688. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:59:10,767][1648981] Avg episode reward: [(0, '398.310')] [2024-06-15 13:59:14,010][1651669] Updated weights for policy 0, policy_version 203554 (0.0012) [2024-06-15 13:59:15,618][1651669] Updated weights for policy 0, policy_version 203600 (0.0012) [2024-06-15 13:59:15,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46967.5, 300 sec: 48763.2). Total num frames: 416972800. Throughput: 0: 12492.8. Samples: 104324096. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:59:15,767][1648981] Avg episode reward: [(0, '405.090')] [2024-06-15 13:59:16,149][1651274] Signal inference workers to stop experience collection... (10700 times) [2024-06-15 13:59:16,211][1651669] InferenceWorker_p0-w0: stopping experience collection (10700 times) [2024-06-15 13:59:16,354][1651274] Signal inference workers to resume experience collection... (10700 times) [2024-06-15 13:59:16,356][1651669] InferenceWorker_p0-w0: resuming experience collection (10700 times) [2024-06-15 13:59:17,246][1651669] Updated weights for policy 0, policy_version 203665 (0.0012) [2024-06-15 13:59:19,359][1651669] Updated weights for policy 0, policy_version 203774 (0.0014) [2024-06-15 13:59:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50247.6, 300 sec: 48874.9). Total num frames: 417333248. Throughput: 0: 12458.7. Samples: 104389120. Policy #0 lag: (min: 35.0, avg: 136.0, max: 275.0) [2024-06-15 13:59:20,767][1648981] Avg episode reward: [(0, '398.230')] [2024-06-15 13:59:25,764][1651669] Updated weights for policy 0, policy_version 203829 (0.0012) [2024-06-15 13:59:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.7, 300 sec: 48652.8). Total num frames: 417431552. Throughput: 0: 12682.7. Samples: 104430080. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:25,767][1648981] Avg episode reward: [(0, '399.650')] [2024-06-15 13:59:26,840][1651669] Updated weights for policy 0, policy_version 203857 (0.0011) [2024-06-15 13:59:28,546][1651669] Updated weights for policy 0, policy_version 203936 (0.0028) [2024-06-15 13:59:29,874][1651669] Updated weights for policy 0, policy_version 203990 (0.0013) [2024-06-15 13:59:30,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 51882.4, 300 sec: 48985.3). Total num frames: 417857536. Throughput: 0: 12269.5. Samples: 104497664. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:30,767][1648981] Avg episode reward: [(0, '401.090')] [2024-06-15 13:59:34,884][1651669] Updated weights for policy 0, policy_version 204033 (0.0022) [2024-06-15 13:59:35,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.9, 300 sec: 48652.2). Total num frames: 417923072. Throughput: 0: 12595.2. Samples: 104582144. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:35,767][1648981] Avg episode reward: [(0, '424.890')] [2024-06-15 13:59:36,057][1651274] Saving new best policy, reward=424.890! [2024-06-15 13:59:36,268][1651669] Updated weights for policy 0, policy_version 204082 (0.0011) [2024-06-15 13:59:37,434][1651669] Updated weights for policy 0, policy_version 204128 (0.0012) [2024-06-15 13:59:38,864][1651669] Updated weights for policy 0, policy_version 204181 (0.0012) [2024-06-15 13:59:40,321][1651669] Updated weights for policy 0, policy_version 204241 (0.0012) [2024-06-15 13:59:40,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 51360.3, 300 sec: 48985.4). Total num frames: 418316288. Throughput: 0: 12128.7. Samples: 104607744. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:40,767][1648981] Avg episode reward: [(0, '418.330')] [2024-06-15 13:59:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47526.1, 300 sec: 48430.0). Total num frames: 418381824. Throughput: 0: 12117.5. Samples: 104678400. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:45,767][1648981] Avg episode reward: [(0, '428.680')] [2024-06-15 13:59:45,769][1651274] Saving new best policy, reward=428.680! [2024-06-15 13:59:46,456][1651669] Updated weights for policy 0, policy_version 204289 (0.0015) [2024-06-15 13:59:47,631][1651669] Updated weights for policy 0, policy_version 204348 (0.0014) [2024-06-15 13:59:49,464][1651669] Updated weights for policy 0, policy_version 204407 (0.0013) [2024-06-15 13:59:50,151][1651669] Updated weights for policy 0, policy_version 204436 (0.0012) [2024-06-15 13:59:50,767][1648981] Fps is (10 sec: 39319.0, 60 sec: 49151.5, 300 sec: 48764.4). Total num frames: 418709504. Throughput: 0: 11878.3. Samples: 104746496. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:50,768][1648981] Avg episode reward: [(0, '419.170')] [2024-06-15 13:59:51,687][1651669] Updated weights for policy 0, policy_version 204496 (0.0081) [2024-06-15 13:59:52,070][1651274] Signal inference workers to stop experience collection... (10750 times) [2024-06-15 13:59:52,121][1651669] InferenceWorker_p0-w0: stopping experience collection (10750 times) [2024-06-15 13:59:52,254][1651274] Signal inference workers to resume experience collection... (10750 times) [2024-06-15 13:59:52,255][1651669] InferenceWorker_p0-w0: resuming experience collection (10750 times) [2024-06-15 13:59:55,798][1648981] Fps is (10 sec: 52262.6, 60 sec: 46942.6, 300 sec: 48424.8). Total num frames: 418906112. Throughput: 0: 11892.7. Samples: 104778240. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 13:59:55,799][1648981] Avg episode reward: [(0, '419.920')] [2024-06-15 13:59:58,035][1651669] Updated weights for policy 0, policy_version 204560 (0.0018) [2024-06-15 13:59:59,437][1651669] Updated weights for policy 0, policy_version 204624 (0.0013) [2024-06-15 14:00:00,354][1651669] Updated weights for policy 0, policy_version 204668 (0.0012) [2024-06-15 14:00:00,766][1648981] Fps is (10 sec: 45878.1, 60 sec: 48060.1, 300 sec: 48763.2). Total num frames: 419168256. Throughput: 0: 11889.8. Samples: 104859136. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:00,767][1648981] Avg episode reward: [(0, '433.670')] [2024-06-15 14:00:01,293][1651274] Saving new best policy, reward=433.670! [2024-06-15 14:00:01,469][1651669] Updated weights for policy 0, policy_version 204705 (0.0107) [2024-06-15 14:00:02,782][1651669] Updated weights for policy 0, policy_version 204768 (0.0012) [2024-06-15 14:00:05,807][1648981] Fps is (10 sec: 52383.7, 60 sec: 48027.5, 300 sec: 48424.0). Total num frames: 419430400. Throughput: 0: 11981.4. Samples: 104928768. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:05,807][1648981] Avg episode reward: [(0, '419.340')] [2024-06-15 14:00:08,722][1651669] Updated weights for policy 0, policy_version 204832 (0.0013) [2024-06-15 14:00:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46967.4, 300 sec: 48655.0). Total num frames: 419627008. Throughput: 0: 11969.4. Samples: 104968704. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:10,767][1648981] Avg episode reward: [(0, '436.430')] [2024-06-15 14:00:10,817][1651669] Updated weights for policy 0, policy_version 204898 (0.0014) [2024-06-15 14:00:10,957][1651274] Saving new best policy, reward=436.430! [2024-06-15 14:00:13,021][1651669] Updated weights for policy 0, policy_version 204992 (0.0015) [2024-06-15 14:00:14,497][1651669] Updated weights for policy 0, policy_version 205054 (0.0013) [2024-06-15 14:00:15,766][1648981] Fps is (10 sec: 52641.7, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 419954688. Throughput: 0: 11810.2. Samples: 105029120. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:15,767][1648981] Avg episode reward: [(0, '441.520')] [2024-06-15 14:00:15,768][1651274] Saving new best policy, reward=441.520! [2024-06-15 14:00:20,244][1651669] Updated weights for policy 0, policy_version 205120 (0.0014) [2024-06-15 14:00:20,768][1648981] Fps is (10 sec: 45867.9, 60 sec: 45873.9, 300 sec: 48436.1). Total num frames: 420085760. Throughput: 0: 11832.4. Samples: 105114624. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:20,768][1648981] Avg episode reward: [(0, '451.270')] [2024-06-15 14:00:21,211][1651274] Saving new best policy, reward=451.270! [2024-06-15 14:00:22,236][1651669] Updated weights for policy 0, policy_version 205200 (0.0135) [2024-06-15 14:00:23,892][1651669] Updated weights for policy 0, policy_version 205264 (0.0013) [2024-06-15 14:00:24,773][1651669] Updated weights for policy 0, policy_version 205306 (0.0013) [2024-06-15 14:00:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50790.3, 300 sec: 48763.2). Total num frames: 420478976. Throughput: 0: 11912.5. Samples: 105143808. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:25,767][1648981] Avg episode reward: [(0, '446.420')] [2024-06-15 14:00:30,176][1651669] Updated weights for policy 0, policy_version 205360 (0.0026) [2024-06-15 14:00:30,766][1648981] Fps is (10 sec: 52437.5, 60 sec: 45875.5, 300 sec: 48874.3). Total num frames: 420610048. Throughput: 0: 12333.5. Samples: 105233408. Policy #0 lag: (min: 3.0, avg: 78.2, max: 259.0) [2024-06-15 14:00:30,767][1648981] Avg episode reward: [(0, '438.440')] [2024-06-15 14:00:31,952][1651274] Signal inference workers to stop experience collection... (10800 times) [2024-06-15 14:00:31,963][1651669] Updated weights for policy 0, policy_version 205440 (0.0133) [2024-06-15 14:00:31,979][1651669] InferenceWorker_p0-w0: stopping experience collection (10800 times) [2024-06-15 14:00:32,173][1651274] Signal inference workers to resume experience collection... (10800 times) [2024-06-15 14:00:32,173][1651669] InferenceWorker_p0-w0: resuming experience collection (10800 times) [2024-06-15 14:00:33,295][1651669] Updated weights for policy 0, policy_version 205503 (0.0013) [2024-06-15 14:00:34,989][1651669] Updated weights for policy 0, policy_version 205559 (0.0014) [2024-06-15 14:00:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 51336.5, 300 sec: 48985.4). Total num frames: 421003264. Throughput: 0: 12276.8. Samples: 105298944. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:00:35,767][1648981] Avg episode reward: [(0, '445.320')] [2024-06-15 14:00:40,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 45875.1, 300 sec: 48652.1). Total num frames: 421068800. Throughput: 0: 12604.1. Samples: 105345024. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:00:40,767][1648981] Avg episode reward: [(0, '460.360')] [2024-06-15 14:00:40,901][1651669] Updated weights for policy 0, policy_version 205616 (0.0015) [2024-06-15 14:00:41,193][1651274] Saving new best policy, reward=460.360! [2024-06-15 14:00:42,581][1651669] Updated weights for policy 0, policy_version 205670 (0.0014) [2024-06-15 14:00:44,007][1651669] Updated weights for policy 0, policy_version 205731 (0.0145) [2024-06-15 14:00:45,774][1648981] Fps is (10 sec: 49113.6, 60 sec: 51876.0, 300 sec: 48873.0). Total num frames: 421494784. Throughput: 0: 12126.6. Samples: 105404928. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:00:45,775][1648981] Avg episode reward: [(0, '460.490')] [2024-06-15 14:00:45,928][1651669] Updated weights for policy 0, policy_version 205821 (0.0105) [2024-06-15 14:00:45,959][1651274] Saving new best policy, reward=460.490! [2024-06-15 14:00:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46968.0, 300 sec: 48430.0). Total num frames: 421527552. Throughput: 0: 12583.7. Samples: 105494528. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:00:50,767][1648981] Avg episode reward: [(0, '458.630')] [2024-06-15 14:00:51,603][1651669] Updated weights for policy 0, policy_version 205867 (0.0046) [2024-06-15 14:00:52,492][1651669] Updated weights for policy 0, policy_version 205904 (0.0014) [2024-06-15 14:00:54,197][1651669] Updated weights for policy 0, policy_version 205984 (0.0012) [2024-06-15 14:00:55,419][1651669] Updated weights for policy 0, policy_version 206048 (0.0012) [2024-06-15 14:00:55,772][1648981] Fps is (10 sec: 49163.6, 60 sec: 51359.1, 300 sec: 48763.0). Total num frames: 421986304. Throughput: 0: 12332.0. Samples: 105523712. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:00:55,772][1648981] Avg episode reward: [(0, '453.360')] [2024-06-15 14:00:56,031][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000206080_422051840.pth... [2024-06-15 14:00:56,140][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000200320_410255360.pth [2024-06-15 14:01:00,773][1648981] Fps is (10 sec: 52396.4, 60 sec: 48054.8, 300 sec: 48431.6). Total num frames: 422051840. Throughput: 0: 12673.1. Samples: 105599488. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:00,773][1648981] Avg episode reward: [(0, '435.920')] [2024-06-15 14:01:00,881][1651669] Updated weights for policy 0, policy_version 206081 (0.0013) [2024-06-15 14:01:02,049][1651669] Updated weights for policy 0, policy_version 206131 (0.0012) [2024-06-15 14:01:03,396][1651669] Updated weights for policy 0, policy_version 206180 (0.0013) [2024-06-15 14:01:05,213][1651669] Updated weights for policy 0, policy_version 206225 (0.0016) [2024-06-15 14:01:05,766][1648981] Fps is (10 sec: 39343.3, 60 sec: 49185.2, 300 sec: 48652.2). Total num frames: 422379520. Throughput: 0: 12334.0. Samples: 105669632. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:05,767][1648981] Avg episode reward: [(0, '444.820')] [2024-06-15 14:01:07,423][1651669] Updated weights for policy 0, policy_version 206334 (0.0014) [2024-06-15 14:01:10,766][1648981] Fps is (10 sec: 52461.7, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 422576128. Throughput: 0: 12390.4. Samples: 105701376. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:10,767][1648981] Avg episode reward: [(0, '452.030')] [2024-06-15 14:01:12,622][1651669] Updated weights for policy 0, policy_version 206384 (0.0013) [2024-06-15 14:01:13,580][1651274] Signal inference workers to stop experience collection... (10850 times) [2024-06-15 14:01:13,661][1651669] InferenceWorker_p0-w0: stopping experience collection (10850 times) [2024-06-15 14:01:13,880][1651274] Signal inference workers to resume experience collection... (10850 times) [2024-06-15 14:01:13,881][1651669] InferenceWorker_p0-w0: resuming experience collection (10850 times) [2024-06-15 14:01:13,981][1651669] Updated weights for policy 0, policy_version 206432 (0.0112) [2024-06-15 14:01:15,779][1648981] Fps is (10 sec: 45818.2, 60 sec: 48049.8, 300 sec: 48428.6). Total num frames: 422838272. Throughput: 0: 12068.5. Samples: 105776640. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:15,780][1648981] Avg episode reward: [(0, '414.550')] [2024-06-15 14:01:16,603][1651669] Updated weights for policy 0, policy_version 206496 (0.0016) [2024-06-15 14:01:18,487][1651669] Updated weights for policy 0, policy_version 206576 (0.0019) [2024-06-15 14:01:20,778][1648981] Fps is (10 sec: 52370.1, 60 sec: 50236.3, 300 sec: 48539.3). Total num frames: 423100416. Throughput: 0: 12137.1. Samples: 105845248. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:20,778][1648981] Avg episode reward: [(0, '421.410')] [2024-06-15 14:01:23,684][1651669] Updated weights for policy 0, policy_version 206627 (0.0022) [2024-06-15 14:01:25,510][1651669] Updated weights for policy 0, policy_version 206714 (0.0024) [2024-06-15 14:01:25,766][1648981] Fps is (10 sec: 52493.5, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 423362560. Throughput: 0: 11969.4. Samples: 105883648. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:25,767][1648981] Avg episode reward: [(0, '400.080')] [2024-06-15 14:01:28,151][1651669] Updated weights for policy 0, policy_version 206770 (0.0044) [2024-06-15 14:01:29,822][1651669] Updated weights for policy 0, policy_version 206842 (0.0015) [2024-06-15 14:01:30,766][1648981] Fps is (10 sec: 52487.4, 60 sec: 50244.3, 300 sec: 48763.2). Total num frames: 423624704. Throughput: 0: 11948.7. Samples: 105942528. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:30,767][1648981] Avg episode reward: [(0, '402.130')] [2024-06-15 14:01:35,289][1651669] Updated weights for policy 0, policy_version 206904 (0.0028) [2024-06-15 14:01:35,767][1648981] Fps is (10 sec: 39321.1, 60 sec: 45875.0, 300 sec: 48430.0). Total num frames: 423755776. Throughput: 0: 11810.1. Samples: 106025984. Policy #0 lag: (min: 31.0, avg: 161.0, max: 319.0) [2024-06-15 14:01:35,767][1648981] Avg episode reward: [(0, '409.880')] [2024-06-15 14:01:36,499][1651669] Updated weights for policy 0, policy_version 206960 (0.0032) [2024-06-15 14:01:38,184][1651669] Updated weights for policy 0, policy_version 206997 (0.0012) [2024-06-15 14:01:39,777][1651669] Updated weights for policy 0, policy_version 207058 (0.0012) [2024-06-15 14:01:40,699][1651669] Updated weights for policy 0, policy_version 207099 (0.0023) [2024-06-15 14:01:40,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 51336.7, 300 sec: 48874.3). Total num frames: 424148992. Throughput: 0: 11970.9. Samples: 106062336. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:01:40,767][1648981] Avg episode reward: [(0, '404.490')] [2024-06-15 14:01:45,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 45881.1, 300 sec: 48430.0). Total num frames: 424247296. Throughput: 0: 11982.5. Samples: 106138624. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:01:45,767][1648981] Avg episode reward: [(0, '425.190')] [2024-06-15 14:01:45,841][1651669] Updated weights for policy 0, policy_version 207161 (0.0012) [2024-06-15 14:01:47,842][1651669] Updated weights for policy 0, policy_version 207220 (0.0037) [2024-06-15 14:01:49,686][1651669] Updated weights for policy 0, policy_version 207264 (0.0011) [2024-06-15 14:01:50,766][1648981] Fps is (10 sec: 39320.9, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 424542208. Throughput: 0: 11787.3. Samples: 106200064. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:01:50,767][1648981] Avg episode reward: [(0, '428.250')] [2024-06-15 14:01:51,494][1651669] Updated weights for policy 0, policy_version 207344 (0.0014) [2024-06-15 14:01:55,779][1648981] Fps is (10 sec: 42543.5, 60 sec: 44777.3, 300 sec: 48094.7). Total num frames: 424673280. Throughput: 0: 11840.9. Samples: 106234368. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:01:55,780][1648981] Avg episode reward: [(0, '429.220')] [2024-06-15 14:01:56,428][1651274] Signal inference workers to stop experience collection... (10900 times) [2024-06-15 14:01:56,503][1651669] InferenceWorker_p0-w0: stopping experience collection (10900 times) [2024-06-15 14:01:56,625][1651274] Signal inference workers to resume experience collection... (10900 times) [2024-06-15 14:01:56,626][1651669] InferenceWorker_p0-w0: resuming experience collection (10900 times) [2024-06-15 14:01:56,819][1651669] Updated weights for policy 0, policy_version 207419 (0.0090) [2024-06-15 14:01:58,798][1651669] Updated weights for policy 0, policy_version 207472 (0.0012) [2024-06-15 14:02:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48610.9, 300 sec: 48430.0). Total num frames: 424968192. Throughput: 0: 11790.6. Samples: 106307072. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:00,767][1648981] Avg episode reward: [(0, '429.430')] [2024-06-15 14:02:00,937][1651669] Updated weights for policy 0, policy_version 207520 (0.0011) [2024-06-15 14:02:03,016][1651669] Updated weights for policy 0, policy_version 207600 (0.0037) [2024-06-15 14:02:05,766][1648981] Fps is (10 sec: 52496.6, 60 sec: 46967.4, 300 sec: 48430.0). Total num frames: 425197568. Throughput: 0: 11722.0. Samples: 106372608. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:05,767][1648981] Avg episode reward: [(0, '390.360')] [2024-06-15 14:02:07,574][1651669] Updated weights for policy 0, policy_version 207635 (0.0013) [2024-06-15 14:02:09,097][1651669] Updated weights for policy 0, policy_version 207682 (0.0013) [2024-06-15 14:02:10,611][1651669] Updated weights for policy 0, policy_version 207743 (0.0012) [2024-06-15 14:02:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 425459712. Throughput: 0: 11753.2. Samples: 106412544. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:10,767][1648981] Avg episode reward: [(0, '391.390')] [2024-06-15 14:02:12,983][1651669] Updated weights for policy 0, policy_version 207792 (0.0028) [2024-06-15 14:02:14,044][1651669] Updated weights for policy 0, policy_version 207842 (0.0011) [2024-06-15 14:02:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48069.6, 300 sec: 48541.1). Total num frames: 425721856. Throughput: 0: 11912.5. Samples: 106478592. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:15,767][1648981] Avg episode reward: [(0, '389.900')] [2024-06-15 14:02:17,757][1651669] Updated weights for policy 0, policy_version 207876 (0.0014) [2024-06-15 14:02:18,778][1651669] Updated weights for policy 0, policy_version 207936 (0.0018) [2024-06-15 14:02:20,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 46976.1, 300 sec: 48207.8). Total num frames: 425918464. Throughput: 0: 11855.6. Samples: 106559488. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:20,767][1648981] Avg episode reward: [(0, '383.000')] [2024-06-15 14:02:21,397][1651669] Updated weights for policy 0, policy_version 207996 (0.0014) [2024-06-15 14:02:23,404][1651669] Updated weights for policy 0, policy_version 208040 (0.0012) [2024-06-15 14:02:25,425][1651669] Updated weights for policy 0, policy_version 208119 (0.0013) [2024-06-15 14:02:25,788][1648981] Fps is (10 sec: 52318.1, 60 sec: 48042.8, 300 sec: 48870.8). Total num frames: 426246144. Throughput: 0: 11759.0. Samples: 106591744. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:25,788][1648981] Avg episode reward: [(0, '377.640')] [2024-06-15 14:02:29,516][1651669] Updated weights for policy 0, policy_version 208160 (0.0013) [2024-06-15 14:02:30,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 45875.2, 300 sec: 48319.0). Total num frames: 426377216. Throughput: 0: 11719.1. Samples: 106665984. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:30,767][1648981] Avg episode reward: [(0, '381.630')] [2024-06-15 14:02:32,070][1651669] Updated weights for policy 0, policy_version 208225 (0.0054) [2024-06-15 14:02:33,907][1651669] Updated weights for policy 0, policy_version 208273 (0.0049) [2024-06-15 14:02:35,373][1651669] Updated weights for policy 0, policy_version 208336 (0.0017) [2024-06-15 14:02:35,766][1648981] Fps is (10 sec: 42689.4, 60 sec: 48606.1, 300 sec: 48541.1). Total num frames: 426672128. Throughput: 0: 11719.2. Samples: 106727424. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:35,767][1648981] Avg episode reward: [(0, '386.510')] [2024-06-15 14:02:36,558][1651669] Updated weights for policy 0, policy_version 208384 (0.0012) [2024-06-15 14:02:39,880][1651274] Signal inference workers to stop experience collection... (10950 times) [2024-06-15 14:02:39,924][1651669] InferenceWorker_p0-w0: stopping experience collection (10950 times) [2024-06-15 14:02:40,065][1651274] Signal inference workers to resume experience collection... (10950 times) [2024-06-15 14:02:40,065][1651669] InferenceWorker_p0-w0: resuming experience collection (10950 times) [2024-06-15 14:02:40,757][1651669] Updated weights for policy 0, policy_version 208433 (0.0014) [2024-06-15 14:02:40,768][1648981] Fps is (10 sec: 49143.6, 60 sec: 45327.7, 300 sec: 48318.7). Total num frames: 426868736. Throughput: 0: 11892.7. Samples: 106769408. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:40,768][1648981] Avg episode reward: [(0, '403.910')] [2024-06-15 14:02:42,832][1651669] Updated weights for policy 0, policy_version 208480 (0.0022) [2024-06-15 14:02:44,095][1651669] Updated weights for policy 0, policy_version 208515 (0.0030) [2024-06-15 14:02:45,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 48320.8). Total num frames: 427130880. Throughput: 0: 12026.3. Samples: 106848256. Policy #0 lag: (min: 50.0, avg: 173.1, max: 279.0) [2024-06-15 14:02:45,767][1648981] Avg episode reward: [(0, '415.230')] [2024-06-15 14:02:46,221][1651669] Updated weights for policy 0, policy_version 208592 (0.0018) [2024-06-15 14:02:47,230][1651669] Updated weights for policy 0, policy_version 208637 (0.0012) [2024-06-15 14:02:50,766][1648981] Fps is (10 sec: 45883.0, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 427327488. Throughput: 0: 12128.7. Samples: 106918400. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:02:50,767][1648981] Avg episode reward: [(0, '417.600')] [2024-06-15 14:02:51,466][1651669] Updated weights for policy 0, policy_version 208693 (0.0011) [2024-06-15 14:02:53,658][1651669] Updated weights for policy 0, policy_version 208736 (0.0016) [2024-06-15 14:02:55,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48070.1, 300 sec: 48318.9). Total num frames: 427556864. Throughput: 0: 12083.2. Samples: 106956288. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:02:55,767][1648981] Avg episode reward: [(0, '414.980')] [2024-06-15 14:02:56,011][1651669] Updated weights for policy 0, policy_version 208784 (0.0026) [2024-06-15 14:02:56,334][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000208800_427622400.pth... [2024-06-15 14:02:56,441][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000203200_416153600.pth [2024-06-15 14:02:58,285][1651669] Updated weights for policy 0, policy_version 208865 (0.0019) [2024-06-15 14:03:00,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 47513.7, 300 sec: 47988.3). Total num frames: 427819008. Throughput: 0: 11935.3. Samples: 107015680. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:00,767][1648981] Avg episode reward: [(0, '397.270')] [2024-06-15 14:03:02,141][1651669] Updated weights for policy 0, policy_version 208930 (0.0014) [2024-06-15 14:03:05,138][1651669] Updated weights for policy 0, policy_version 208995 (0.0015) [2024-06-15 14:03:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 428081152. Throughput: 0: 11867.1. Samples: 107093504. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:05,767][1648981] Avg episode reward: [(0, '395.550')] [2024-06-15 14:03:07,233][1651669] Updated weights for policy 0, policy_version 209043 (0.0012) [2024-06-15 14:03:08,476][1651669] Updated weights for policy 0, policy_version 209093 (0.0011) [2024-06-15 14:03:09,716][1651669] Updated weights for policy 0, policy_version 209145 (0.0012) [2024-06-15 14:03:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 48096.7). Total num frames: 428343296. Throughput: 0: 11884.0. Samples: 107126272. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:10,767][1648981] Avg episode reward: [(0, '412.650')] [2024-06-15 14:03:13,461][1651669] Updated weights for policy 0, policy_version 209200 (0.0012) [2024-06-15 14:03:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 48208.5). Total num frames: 428539904. Throughput: 0: 12003.6. Samples: 107206144. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:15,767][1648981] Avg episode reward: [(0, '436.210')] [2024-06-15 14:03:15,977][1651669] Updated weights for policy 0, policy_version 209250 (0.0015) [2024-06-15 14:03:17,396][1651669] Updated weights for policy 0, policy_version 209285 (0.0020) [2024-06-15 14:03:18,542][1651669] Updated weights for policy 0, policy_version 209339 (0.0014) [2024-06-15 14:03:19,283][1651274] Signal inference workers to stop experience collection... (11000 times) [2024-06-15 14:03:19,354][1651669] InferenceWorker_p0-w0: stopping experience collection (11000 times) [2024-06-15 14:03:19,508][1651274] Signal inference workers to resume experience collection... (11000 times) [2024-06-15 14:03:19,509][1651669] InferenceWorker_p0-w0: resuming experience collection (11000 times) [2024-06-15 14:03:19,698][1651669] Updated weights for policy 0, policy_version 209380 (0.0011) [2024-06-15 14:03:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 428867584. Throughput: 0: 12060.4. Samples: 107270144. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:20,767][1648981] Avg episode reward: [(0, '439.210')] [2024-06-15 14:03:23,108][1651669] Updated weights for policy 0, policy_version 209425 (0.0014) [2024-06-15 14:03:24,113][1651669] Updated weights for policy 0, policy_version 209472 (0.0026) [2024-06-15 14:03:25,771][1648981] Fps is (10 sec: 45851.5, 60 sec: 45887.5, 300 sec: 48318.1). Total num frames: 428998656. Throughput: 0: 11945.8. Samples: 107307008. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:25,772][1648981] Avg episode reward: [(0, '427.150')] [2024-06-15 14:03:29,526][1651669] Updated weights for policy 0, policy_version 209568 (0.0023) [2024-06-15 14:03:30,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 429326336. Throughput: 0: 11764.6. Samples: 107377664. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:30,767][1648981] Avg episode reward: [(0, '421.610')] [2024-06-15 14:03:31,169][1651669] Updated weights for policy 0, policy_version 209648 (0.0027) [2024-06-15 14:03:34,035][1651669] Updated weights for policy 0, policy_version 209665 (0.0012) [2024-06-15 14:03:35,295][1651669] Updated weights for policy 0, policy_version 209728 (0.0014) [2024-06-15 14:03:35,767][1648981] Fps is (10 sec: 52450.8, 60 sec: 47512.8, 300 sec: 48434.4). Total num frames: 429522944. Throughput: 0: 11730.2. Samples: 107446272. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:35,768][1648981] Avg episode reward: [(0, '428.650')] [2024-06-15 14:03:40,730][1651669] Updated weights for policy 0, policy_version 209824 (0.0015) [2024-06-15 14:03:40,770][1648981] Fps is (10 sec: 39305.9, 60 sec: 47511.8, 300 sec: 48098.7). Total num frames: 429719552. Throughput: 0: 11900.1. Samples: 107491840. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:40,771][1648981] Avg episode reward: [(0, '430.550')] [2024-06-15 14:03:41,863][1651669] Updated weights for policy 0, policy_version 209876 (0.0013) [2024-06-15 14:03:42,828][1651669] Updated weights for policy 0, policy_version 209920 (0.0013) [2024-06-15 14:03:45,767][1648981] Fps is (10 sec: 42599.7, 60 sec: 46967.0, 300 sec: 48096.7). Total num frames: 429948928. Throughput: 0: 11844.1. Samples: 107548672. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:45,768][1648981] Avg episode reward: [(0, '422.440')] [2024-06-15 14:03:49,859][1651669] Updated weights for policy 0, policy_version 210003 (0.0035) [2024-06-15 14:03:50,766][1648981] Fps is (10 sec: 42615.1, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 430145536. Throughput: 0: 11810.1. Samples: 107624960. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:50,767][1648981] Avg episode reward: [(0, '409.700')] [2024-06-15 14:03:51,237][1651669] Updated weights for policy 0, policy_version 210064 (0.0012) [2024-06-15 14:03:52,710][1651669] Updated weights for policy 0, policy_version 210115 (0.0012) [2024-06-15 14:03:54,002][1651669] Updated weights for policy 0, policy_version 210174 (0.0021) [2024-06-15 14:03:55,780][1648981] Fps is (10 sec: 49086.7, 60 sec: 48048.6, 300 sec: 47983.5). Total num frames: 430440448. Throughput: 0: 11692.7. Samples: 107652608. Policy #0 lag: (min: 111.0, avg: 223.3, max: 326.0) [2024-06-15 14:03:55,781][1648981] Avg episode reward: [(0, '433.170')] [2024-06-15 14:03:57,623][1651669] Updated weights for policy 0, policy_version 210224 (0.0012) [2024-06-15 14:04:00,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46421.3, 300 sec: 47652.5). Total num frames: 430604288. Throughput: 0: 11696.4. Samples: 107732480. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:00,767][1648981] Avg episode reward: [(0, '409.760')] [2024-06-15 14:04:01,494][1651669] Updated weights for policy 0, policy_version 210298 (0.0013) [2024-06-15 14:04:01,555][1651274] Signal inference workers to stop experience collection... (11050 times) [2024-06-15 14:04:01,597][1651669] InferenceWorker_p0-w0: stopping experience collection (11050 times) [2024-06-15 14:04:01,648][1651274] Signal inference workers to resume experience collection... (11050 times) [2024-06-15 14:04:01,648][1651669] InferenceWorker_p0-w0: resuming experience collection (11050 times) [2024-06-15 14:04:02,793][1651669] Updated weights for policy 0, policy_version 210357 (0.0014) [2024-06-15 14:04:04,617][1651669] Updated weights for policy 0, policy_version 210424 (0.0012) [2024-06-15 14:04:05,771][1648981] Fps is (10 sec: 52479.2, 60 sec: 48056.3, 300 sec: 47985.0). Total num frames: 430964736. Throughput: 0: 11763.5. Samples: 107799552. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:05,772][1648981] Avg episode reward: [(0, '409.690')] [2024-06-15 14:04:07,672][1651669] Updated weights for policy 0, policy_version 210466 (0.0025) [2024-06-15 14:04:10,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 45875.1, 300 sec: 47874.6). Total num frames: 431095808. Throughput: 0: 11720.4. Samples: 107834368. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:10,767][1648981] Avg episode reward: [(0, '407.270')] [2024-06-15 14:04:12,361][1651669] Updated weights for policy 0, policy_version 210533 (0.0021) [2024-06-15 14:04:13,256][1651669] Updated weights for policy 0, policy_version 210563 (0.0016) [2024-06-15 14:04:15,369][1651669] Updated weights for policy 0, policy_version 210644 (0.0029) [2024-06-15 14:04:15,766][1648981] Fps is (10 sec: 45895.1, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 431423488. Throughput: 0: 11787.4. Samples: 107908096. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:15,767][1648981] Avg episode reward: [(0, '411.300')] [2024-06-15 14:04:16,059][1651669] Updated weights for policy 0, policy_version 210681 (0.0012) [2024-06-15 14:04:20,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 45875.2, 300 sec: 48096.7). Total num frames: 431620096. Throughput: 0: 11764.9. Samples: 107975680. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:20,767][1648981] Avg episode reward: [(0, '403.230')] [2024-06-15 14:04:22,543][1651669] Updated weights for policy 0, policy_version 210753 (0.0012) [2024-06-15 14:04:23,552][1651669] Updated weights for policy 0, policy_version 210802 (0.0011) [2024-06-15 14:04:24,167][1651669] Updated weights for policy 0, policy_version 210821 (0.0010) [2024-06-15 14:04:25,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48610.0, 300 sec: 47652.5). Total num frames: 431915008. Throughput: 0: 11629.1. Samples: 108015104. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:25,767][1648981] Avg episode reward: [(0, '400.210')] [2024-06-15 14:04:25,828][1651669] Updated weights for policy 0, policy_version 210912 (0.0013) [2024-06-15 14:04:29,327][1651669] Updated weights for policy 0, policy_version 210945 (0.0032) [2024-06-15 14:04:30,800][1648981] Fps is (10 sec: 52254.2, 60 sec: 46941.3, 300 sec: 48202.4). Total num frames: 432144384. Throughput: 0: 12006.2. Samples: 108089344. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:30,801][1648981] Avg episode reward: [(0, '388.690')] [2024-06-15 14:04:33,390][1651669] Updated weights for policy 0, policy_version 211013 (0.0013) [2024-06-15 14:04:34,610][1651669] Updated weights for policy 0, policy_version 211070 (0.0012) [2024-06-15 14:04:35,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 46422.1, 300 sec: 47430.3). Total num frames: 432308224. Throughput: 0: 11946.7. Samples: 108162560. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:35,767][1648981] Avg episode reward: [(0, '380.360')] [2024-06-15 14:04:36,351][1651669] Updated weights for policy 0, policy_version 211136 (0.0016) [2024-06-15 14:04:37,650][1651669] Updated weights for policy 0, policy_version 211196 (0.0011) [2024-06-15 14:04:40,766][1648981] Fps is (10 sec: 42741.2, 60 sec: 47516.7, 300 sec: 48096.8). Total num frames: 432570368. Throughput: 0: 12075.6. Samples: 108195840. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:40,767][1648981] Avg episode reward: [(0, '390.910')] [2024-06-15 14:04:40,998][1651669] Updated weights for policy 0, policy_version 211234 (0.0012) [2024-06-15 14:04:44,746][1651274] Signal inference workers to stop experience collection... (11100 times) [2024-06-15 14:04:44,797][1651669] InferenceWorker_p0-w0: stopping experience collection (11100 times) [2024-06-15 14:04:44,820][1651669] Updated weights for policy 0, policy_version 211288 (0.0014) [2024-06-15 14:04:44,893][1651274] Signal inference workers to resume experience collection... (11100 times) [2024-06-15 14:04:44,894][1651669] InferenceWorker_p0-w0: resuming experience collection (11100 times) [2024-06-15 14:04:45,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 47514.1, 300 sec: 47763.6). Total num frames: 432799744. Throughput: 0: 11935.3. Samples: 108269568. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:45,767][1648981] Avg episode reward: [(0, '396.420')] [2024-06-15 14:04:46,562][1651669] Updated weights for policy 0, policy_version 211344 (0.0092) [2024-06-15 14:04:47,841][1651669] Updated weights for policy 0, policy_version 211396 (0.0021) [2024-06-15 14:04:49,051][1651669] Updated weights for policy 0, policy_version 211455 (0.0012) [2024-06-15 14:04:50,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 47990.9). Total num frames: 433061888. Throughput: 0: 12027.5. Samples: 108340736. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:50,767][1648981] Avg episode reward: [(0, '402.850')] [2024-06-15 14:04:52,148][1651669] Updated weights for policy 0, policy_version 211512 (0.0015) [2024-06-15 14:04:55,208][1651669] Updated weights for policy 0, policy_version 211552 (0.0011) [2024-06-15 14:04:55,790][1648981] Fps is (10 sec: 52304.2, 60 sec: 48051.8, 300 sec: 47981.8). Total num frames: 433324032. Throughput: 0: 12054.1. Samples: 108377088. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:04:55,791][1648981] Avg episode reward: [(0, '408.960')] [2024-06-15 14:04:55,795][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000211584_433324032.pth... [2024-06-15 14:04:55,841][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000206080_422051840.pth [2024-06-15 14:04:58,025][1651669] Updated weights for policy 0, policy_version 211601 (0.0032) [2024-06-15 14:04:59,402][1651669] Updated weights for policy 0, policy_version 211667 (0.0012) [2024-06-15 14:05:00,254][1651669] Updated weights for policy 0, policy_version 211711 (0.0011) [2024-06-15 14:05:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 47992.3). Total num frames: 433586176. Throughput: 0: 11958.1. Samples: 108446208. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:05:00,767][1648981] Avg episode reward: [(0, '401.710')] [2024-06-15 14:05:03,330][1651669] Updated weights for policy 0, policy_version 211765 (0.0013) [2024-06-15 14:05:05,806][1648981] Fps is (10 sec: 39258.7, 60 sec: 45848.1, 300 sec: 47757.1). Total num frames: 433717248. Throughput: 0: 12208.9. Samples: 108525568. Policy #0 lag: (min: 13.0, avg: 125.1, max: 269.0) [2024-06-15 14:05:05,807][1648981] Avg episode reward: [(0, '402.210')] [2024-06-15 14:05:06,138][1651669] Updated weights for policy 0, policy_version 211796 (0.0011) [2024-06-15 14:05:08,718][1651669] Updated weights for policy 0, policy_version 211872 (0.0013) [2024-06-15 14:05:10,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 49145.7, 300 sec: 47762.3). Total num frames: 434044928. Throughput: 0: 12024.2. Samples: 108556288. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:10,775][1648981] Avg episode reward: [(0, '388.830')] [2024-06-15 14:05:10,987][1651669] Updated weights for policy 0, policy_version 211968 (0.0012) [2024-06-15 14:05:15,767][1648981] Fps is (10 sec: 52636.7, 60 sec: 46967.2, 300 sec: 47985.9). Total num frames: 434241536. Throughput: 0: 11921.3. Samples: 108625408. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:15,767][1648981] Avg episode reward: [(0, '390.600')] [2024-06-15 14:05:16,871][1651669] Updated weights for policy 0, policy_version 212048 (0.0013) [2024-06-15 14:05:17,905][1651669] Updated weights for policy 0, policy_version 212093 (0.0012) [2024-06-15 14:05:20,766][1648981] Fps is (10 sec: 45911.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 434503680. Throughput: 0: 11878.4. Samples: 108697088. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:20,767][1648981] Avg episode reward: [(0, '378.940')] [2024-06-15 14:05:21,523][1651669] Updated weights for policy 0, policy_version 212192 (0.0014) [2024-06-15 14:05:25,055][1651669] Updated weights for policy 0, policy_version 212240 (0.0013) [2024-06-15 14:05:25,626][1651274] Signal inference workers to stop experience collection... (11150 times) [2024-06-15 14:05:25,713][1651669] InferenceWorker_p0-w0: stopping experience collection (11150 times) [2024-06-15 14:05:25,766][1648981] Fps is (10 sec: 45876.8, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 434700288. Throughput: 0: 11764.6. Samples: 108725248. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:25,767][1648981] Avg episode reward: [(0, '395.240')] [2024-06-15 14:05:25,969][1651274] Signal inference workers to resume experience collection... (11150 times) [2024-06-15 14:05:25,969][1651669] InferenceWorker_p0-w0: resuming experience collection (11150 times) [2024-06-15 14:05:28,825][1651669] Updated weights for policy 0, policy_version 212290 (0.0086) [2024-06-15 14:05:30,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 45900.7, 300 sec: 47097.0). Total num frames: 434896896. Throughput: 0: 11673.6. Samples: 108794880. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:30,767][1648981] Avg episode reward: [(0, '385.100')] [2024-06-15 14:05:31,456][1651669] Updated weights for policy 0, policy_version 212357 (0.0053) [2024-06-15 14:05:32,698][1651669] Updated weights for policy 0, policy_version 212416 (0.0089) [2024-06-15 14:05:34,166][1651669] Updated weights for policy 0, policy_version 212470 (0.0013) [2024-06-15 14:05:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 435159040. Throughput: 0: 11662.2. Samples: 108865536. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:35,767][1648981] Avg episode reward: [(0, '378.690')] [2024-06-15 14:05:36,432][1651669] Updated weights for policy 0, policy_version 212514 (0.0123) [2024-06-15 14:05:40,548][1651669] Updated weights for policy 0, policy_version 212576 (0.0094) [2024-06-15 14:05:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46421.3, 300 sec: 46987.2). Total num frames: 435355648. Throughput: 0: 11679.8. Samples: 108902400. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:40,767][1648981] Avg episode reward: [(0, '383.570')] [2024-06-15 14:05:42,381][1651669] Updated weights for policy 0, policy_version 212624 (0.0012) [2024-06-15 14:05:43,959][1651669] Updated weights for policy 0, policy_version 212692 (0.0013) [2024-06-15 14:05:44,867][1651669] Updated weights for policy 0, policy_version 212730 (0.0013) [2024-06-15 14:05:45,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 48059.5, 300 sec: 47985.6). Total num frames: 435683328. Throughput: 0: 11628.0. Samples: 108969472. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:45,767][1648981] Avg episode reward: [(0, '383.830')] [2024-06-15 14:05:48,337][1651669] Updated weights for policy 0, policy_version 212792 (0.0014) [2024-06-15 14:05:50,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 46421.2, 300 sec: 46986.8). Total num frames: 435847168. Throughput: 0: 11649.7. Samples: 109049344. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:50,767][1648981] Avg episode reward: [(0, '375.200')] [2024-06-15 14:05:51,001][1651669] Updated weights for policy 0, policy_version 212832 (0.0015) [2024-06-15 14:05:53,107][1651669] Updated weights for policy 0, policy_version 212884 (0.0013) [2024-06-15 14:05:54,781][1651669] Updated weights for policy 0, policy_version 212946 (0.0012) [2024-06-15 14:05:55,747][1651669] Updated weights for policy 0, policy_version 212992 (0.0012) [2024-06-15 14:05:55,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48078.7, 300 sec: 47986.7). Total num frames: 436207616. Throughput: 0: 11687.0. Samples: 109082112. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:05:55,767][1648981] Avg episode reward: [(0, '391.280')] [2024-06-15 14:05:59,950][1651669] Updated weights for policy 0, policy_version 213056 (0.0016) [2024-06-15 14:06:00,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 436338688. Throughput: 0: 11707.8. Samples: 109152256. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:06:00,767][1648981] Avg episode reward: [(0, '381.470')] [2024-06-15 14:06:02,973][1651669] Updated weights for policy 0, policy_version 213106 (0.0016) [2024-06-15 14:06:03,937][1651669] Updated weights for policy 0, policy_version 213140 (0.0012) [2024-06-15 14:06:05,158][1651669] Updated weights for policy 0, policy_version 213187 (0.0013) [2024-06-15 14:06:05,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 48638.2, 300 sec: 47652.4). Total num frames: 436633600. Throughput: 0: 11616.7. Samples: 109219840. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:06:05,767][1648981] Avg episode reward: [(0, '362.510')] [2024-06-15 14:06:06,636][1651669] Updated weights for policy 0, policy_version 213248 (0.0012) [2024-06-15 14:06:10,535][1651274] Signal inference workers to stop experience collection... (11200 times) [2024-06-15 14:06:10,615][1651669] InferenceWorker_p0-w0: stopping experience collection (11200 times) [2024-06-15 14:06:10,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 44788.7, 300 sec: 47099.0). Total num frames: 436731904. Throughput: 0: 11821.5. Samples: 109257216. Policy #0 lag: (min: 15.0, avg: 120.6, max: 271.0) [2024-06-15 14:06:10,767][1648981] Avg episode reward: [(0, '366.900')] [2024-06-15 14:06:10,805][1651274] Signal inference workers to resume experience collection... (11200 times) [2024-06-15 14:06:10,806][1651669] InferenceWorker_p0-w0: resuming experience collection (11200 times) [2024-06-15 14:06:11,860][1651669] Updated weights for policy 0, policy_version 213306 (0.0127) [2024-06-15 14:06:14,159][1651669] Updated weights for policy 0, policy_version 213360 (0.0109) [2024-06-15 14:06:15,250][1651669] Updated weights for policy 0, policy_version 213396 (0.0033) [2024-06-15 14:06:15,768][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.7, 300 sec: 47321.0). Total num frames: 437059584. Throughput: 0: 11730.5. Samples: 109322752. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:15,770][1648981] Avg episode reward: [(0, '367.230')] [2024-06-15 14:06:16,685][1651669] Updated weights for policy 0, policy_version 213460 (0.0013) [2024-06-15 14:06:20,798][1648981] Fps is (10 sec: 52263.9, 60 sec: 45851.0, 300 sec: 47092.0). Total num frames: 437256192. Throughput: 0: 11767.7. Samples: 109395456. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:20,799][1648981] Avg episode reward: [(0, '357.510')] [2024-06-15 14:06:22,595][1651669] Updated weights for policy 0, policy_version 213522 (0.0013) [2024-06-15 14:06:23,460][1651669] Updated weights for policy 0, policy_version 213568 (0.0013) [2024-06-15 14:06:25,344][1651669] Updated weights for policy 0, policy_version 213630 (0.0013) [2024-06-15 14:06:25,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 437518336. Throughput: 0: 11821.5. Samples: 109434368. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:25,767][1648981] Avg episode reward: [(0, '350.300')] [2024-06-15 14:06:26,899][1651669] Updated weights for policy 0, policy_version 213666 (0.0012) [2024-06-15 14:06:28,629][1651669] Updated weights for policy 0, policy_version 213729 (0.0016) [2024-06-15 14:06:30,766][1648981] Fps is (10 sec: 52594.8, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 437780480. Throughput: 0: 11685.0. Samples: 109495296. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:30,767][1648981] Avg episode reward: [(0, '349.270')] [2024-06-15 14:06:33,475][1651669] Updated weights for policy 0, policy_version 213776 (0.0013) [2024-06-15 14:06:35,223][1651669] Updated weights for policy 0, policy_version 213825 (0.0014) [2024-06-15 14:06:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 437944320. Throughput: 0: 11605.4. Samples: 109571584. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:35,767][1648981] Avg episode reward: [(0, '359.650')] [2024-06-15 14:06:36,344][1651669] Updated weights for policy 0, policy_version 213886 (0.0016) [2024-06-15 14:06:38,751][1651669] Updated weights for policy 0, policy_version 213936 (0.0011) [2024-06-15 14:06:40,764][1651669] Updated weights for policy 0, policy_version 214000 (0.0042) [2024-06-15 14:06:40,776][1648981] Fps is (10 sec: 49103.4, 60 sec: 48597.8, 300 sec: 47539.8). Total num frames: 438272000. Throughput: 0: 11636.9. Samples: 109605888. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:40,777][1648981] Avg episode reward: [(0, '360.020')] [2024-06-15 14:06:45,446][1651669] Updated weights for policy 0, policy_version 214048 (0.0090) [2024-06-15 14:06:45,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45329.3, 300 sec: 46986.0). Total num frames: 438403072. Throughput: 0: 11662.2. Samples: 109677056. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:45,767][1648981] Avg episode reward: [(0, '360.800')] [2024-06-15 14:06:46,624][1651669] Updated weights for policy 0, policy_version 214082 (0.0012) [2024-06-15 14:06:48,760][1651669] Updated weights for policy 0, policy_version 214148 (0.0015) [2024-06-15 14:06:50,580][1651274] Signal inference workers to stop experience collection... (11250 times) [2024-06-15 14:06:50,644][1651669] InferenceWorker_p0-w0: stopping experience collection (11250 times) [2024-06-15 14:06:50,655][1651669] Updated weights for policy 0, policy_version 214228 (0.0014) [2024-06-15 14:06:50,766][1648981] Fps is (10 sec: 45920.5, 60 sec: 48059.8, 300 sec: 47654.5). Total num frames: 438730752. Throughput: 0: 11571.2. Samples: 109740544. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:50,767][1648981] Avg episode reward: [(0, '359.460')] [2024-06-15 14:06:50,878][1651274] Signal inference workers to resume experience collection... (11250 times) [2024-06-15 14:06:50,879][1651669] InferenceWorker_p0-w0: resuming experience collection (11250 times) [2024-06-15 14:06:51,662][1651669] Updated weights for policy 0, policy_version 214270 (0.0012) [2024-06-15 14:06:55,767][1648981] Fps is (10 sec: 42597.0, 60 sec: 43690.5, 300 sec: 46985.9). Total num frames: 438829056. Throughput: 0: 11673.5. Samples: 109782528. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:06:55,767][1648981] Avg episode reward: [(0, '367.220')] [2024-06-15 14:06:56,165][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000214304_438894592.pth... [2024-06-15 14:06:56,295][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000208800_427622400.pth [2024-06-15 14:06:56,671][1651669] Updated weights for policy 0, policy_version 214334 (0.0097) [2024-06-15 14:06:58,639][1651669] Updated weights for policy 0, policy_version 214375 (0.0013) [2024-06-15 14:07:00,452][1651669] Updated weights for policy 0, policy_version 214449 (0.0075) [2024-06-15 14:07:00,767][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 439222272. Throughput: 0: 11821.5. Samples: 109854720. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:07:00,769][1648981] Avg episode reward: [(0, '371.230')] [2024-06-15 14:07:02,039][1651669] Updated weights for policy 0, policy_version 214525 (0.0099) [2024-06-15 14:07:05,787][1648981] Fps is (10 sec: 52323.8, 60 sec: 45313.6, 300 sec: 47093.8). Total num frames: 439353344. Throughput: 0: 11892.7. Samples: 109930496. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:07:05,787][1648981] Avg episode reward: [(0, '360.660')] [2024-06-15 14:07:06,758][1651669] Updated weights for policy 0, policy_version 214560 (0.0015) [2024-06-15 14:07:07,341][1651669] Updated weights for policy 0, policy_version 214592 (0.0042) [2024-06-15 14:07:09,934][1651669] Updated weights for policy 0, policy_version 214656 (0.0013) [2024-06-15 14:07:10,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 439681024. Throughput: 0: 11935.3. Samples: 109971456. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:07:10,767][1648981] Avg episode reward: [(0, '365.830')] [2024-06-15 14:07:11,232][1651669] Updated weights for policy 0, policy_version 214709 (0.0014) [2024-06-15 14:07:12,919][1651669] Updated weights for policy 0, policy_version 214776 (0.0012) [2024-06-15 14:07:15,767][1648981] Fps is (10 sec: 52535.1, 60 sec: 46967.3, 300 sec: 47319.2). Total num frames: 439877632. Throughput: 0: 11980.8. Samples: 110034432. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:07:15,768][1648981] Avg episode reward: [(0, '365.400')] [2024-06-15 14:07:18,014][1651669] Updated weights for policy 0, policy_version 214841 (0.0015) [2024-06-15 14:07:20,770][1648981] Fps is (10 sec: 39306.9, 60 sec: 46989.3, 300 sec: 46877.7). Total num frames: 440074240. Throughput: 0: 12002.6. Samples: 110111744. Policy #0 lag: (min: 31.0, avg: 123.3, max: 287.0) [2024-06-15 14:07:20,771][1648981] Avg episode reward: [(0, '365.650')] [2024-06-15 14:07:21,446][1651669] Updated weights for policy 0, policy_version 214915 (0.0020) [2024-06-15 14:07:22,884][1651669] Updated weights for policy 0, policy_version 214982 (0.0012) [2024-06-15 14:07:24,227][1651669] Updated weights for policy 0, policy_version 215038 (0.0014) [2024-06-15 14:07:25,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 440401920. Throughput: 0: 11869.6. Samples: 110139904. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:25,767][1648981] Avg episode reward: [(0, '364.610')] [2024-06-15 14:07:28,997][1651669] Updated weights for policy 0, policy_version 215099 (0.0013) [2024-06-15 14:07:30,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 440532992. Throughput: 0: 12026.3. Samples: 110218240. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:30,767][1648981] Avg episode reward: [(0, '364.610')] [2024-06-15 14:07:31,589][1651274] Signal inference workers to stop experience collection... (11300 times) [2024-06-15 14:07:31,607][1651669] InferenceWorker_p0-w0: stopping experience collection (11300 times) [2024-06-15 14:07:31,610][1651669] Updated weights for policy 0, policy_version 215153 (0.0013) [2024-06-15 14:07:31,791][1651274] Signal inference workers to resume experience collection... (11300 times) [2024-06-15 14:07:31,792][1651669] InferenceWorker_p0-w0: resuming experience collection (11300 times) [2024-06-15 14:07:33,240][1651669] Updated weights for policy 0, policy_version 215232 (0.0013) [2024-06-15 14:07:34,340][1651669] Updated weights for policy 0, policy_version 215280 (0.0013) [2024-06-15 14:07:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.1, 300 sec: 47652.7). Total num frames: 440926208. Throughput: 0: 12197.0. Samples: 110289408. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:35,767][1648981] Avg episode reward: [(0, '374.190')] [2024-06-15 14:07:39,239][1651669] Updated weights for policy 0, policy_version 215344 (0.0018) [2024-06-15 14:07:40,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 46428.9, 300 sec: 47208.1). Total num frames: 441057280. Throughput: 0: 12265.3. Samples: 110334464. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:40,767][1648981] Avg episode reward: [(0, '368.750')] [2024-06-15 14:07:42,238][1651669] Updated weights for policy 0, policy_version 215408 (0.0013) [2024-06-15 14:07:43,919][1651669] Updated weights for policy 0, policy_version 215488 (0.0014) [2024-06-15 14:07:45,111][1651669] Updated weights for policy 0, policy_version 215543 (0.0015) [2024-06-15 14:07:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50790.4, 300 sec: 47874.6). Total num frames: 441450496. Throughput: 0: 12049.1. Samples: 110396928. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:45,767][1648981] Avg episode reward: [(0, '374.570')] [2024-06-15 14:07:50,192][1651669] Updated weights for policy 0, policy_version 215603 (0.0060) [2024-06-15 14:07:50,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 47513.7, 300 sec: 47541.4). Total num frames: 441581568. Throughput: 0: 12100.1. Samples: 110474752. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:50,767][1648981] Avg episode reward: [(0, '384.840')] [2024-06-15 14:07:52,460][1651669] Updated weights for policy 0, policy_version 215637 (0.0021) [2024-06-15 14:07:53,748][1651669] Updated weights for policy 0, policy_version 215700 (0.0026) [2024-06-15 14:07:55,205][1651669] Updated weights for policy 0, policy_version 215764 (0.0014) [2024-06-15 14:07:55,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 51336.8, 300 sec: 47763.5). Total num frames: 441909248. Throughput: 0: 12014.9. Samples: 110512128. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:07:55,767][1648981] Avg episode reward: [(0, '383.140')] [2024-06-15 14:07:59,864][1651669] Updated weights for policy 0, policy_version 215826 (0.0063) [2024-06-15 14:08:00,802][1648981] Fps is (10 sec: 52250.4, 60 sec: 48032.5, 300 sec: 47535.9). Total num frames: 442105856. Throughput: 0: 12267.4. Samples: 110586880. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:00,806][1648981] Avg episode reward: [(0, '374.570')] [2024-06-15 14:08:02,316][1651669] Updated weights for policy 0, policy_version 215873 (0.0013) [2024-06-15 14:08:03,480][1651669] Updated weights for policy 0, policy_version 215930 (0.0017) [2024-06-15 14:08:05,143][1651669] Updated weights for policy 0, policy_version 215970 (0.0013) [2024-06-15 14:08:05,769][1648981] Fps is (10 sec: 45864.9, 60 sec: 50259.5, 300 sec: 47541.0). Total num frames: 442368000. Throughput: 0: 12015.3. Samples: 110652416. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:05,769][1648981] Avg episode reward: [(0, '391.710')] [2024-06-15 14:08:06,805][1651669] Updated weights for policy 0, policy_version 216033 (0.0012) [2024-06-15 14:08:10,743][1651669] Updated weights for policy 0, policy_version 216082 (0.0011) [2024-06-15 14:08:10,770][1648981] Fps is (10 sec: 42727.3, 60 sec: 47510.5, 300 sec: 47429.7). Total num frames: 442531840. Throughput: 0: 12287.0. Samples: 110692864. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:10,771][1648981] Avg episode reward: [(0, '380.740')] [2024-06-15 14:08:12,666][1651274] Signal inference workers to stop experience collection... (11350 times) [2024-06-15 14:08:12,687][1651669] Updated weights for policy 0, policy_version 216129 (0.0014) [2024-06-15 14:08:12,701][1651669] InferenceWorker_p0-w0: stopping experience collection (11350 times) [2024-06-15 14:08:12,936][1651274] Signal inference workers to resume experience collection... (11350 times) [2024-06-15 14:08:12,937][1651669] InferenceWorker_p0-w0: resuming experience collection (11350 times) [2024-06-15 14:08:14,163][1651669] Updated weights for policy 0, policy_version 216192 (0.0011) [2024-06-15 14:08:15,766][1648981] Fps is (10 sec: 45885.7, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 442826752. Throughput: 0: 12265.3. Samples: 110770176. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:15,767][1648981] Avg episode reward: [(0, '396.860')] [2024-06-15 14:08:16,489][1651669] Updated weights for policy 0, policy_version 216257 (0.0015) [2024-06-15 14:08:17,689][1651669] Updated weights for policy 0, policy_version 216313 (0.0013) [2024-06-15 14:08:20,766][1648981] Fps is (10 sec: 49171.4, 60 sec: 49155.1, 300 sec: 47542.2). Total num frames: 443023360. Throughput: 0: 12413.2. Samples: 110848000. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:20,767][1648981] Avg episode reward: [(0, '401.800')] [2024-06-15 14:08:22,046][1651669] Updated weights for policy 0, policy_version 216383 (0.0014) [2024-06-15 14:08:25,275][1651669] Updated weights for policy 0, policy_version 216451 (0.0012) [2024-06-15 14:08:25,766][1648981] Fps is (10 sec: 49151.3, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 443318272. Throughput: 0: 12128.7. Samples: 110880256. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:25,767][1648981] Avg episode reward: [(0, '408.640')] [2024-06-15 14:08:26,627][1651669] Updated weights for policy 0, policy_version 216503 (0.0011) [2024-06-15 14:08:27,890][1651669] Updated weights for policy 0, policy_version 216544 (0.0013) [2024-06-15 14:08:30,773][1648981] Fps is (10 sec: 52400.9, 60 sec: 50239.8, 300 sec: 47540.7). Total num frames: 443547648. Throughput: 0: 12172.8. Samples: 110944768. Policy #0 lag: (min: 94.0, avg: 163.0, max: 318.0) [2024-06-15 14:08:30,776][1648981] Avg episode reward: [(0, '418.340')] [2024-06-15 14:08:32,231][1651669] Updated weights for policy 0, policy_version 216599 (0.0013) [2024-06-15 14:08:33,815][1651669] Updated weights for policy 0, policy_version 216642 (0.0012) [2024-06-15 14:08:34,813][1651669] Updated weights for policy 0, policy_version 216697 (0.0015) [2024-06-15 14:08:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 47764.2). Total num frames: 443809792. Throughput: 0: 12367.6. Samples: 111031296. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:08:35,767][1648981] Avg episode reward: [(0, '407.970')] [2024-06-15 14:08:36,861][1651669] Updated weights for policy 0, policy_version 216758 (0.0015) [2024-06-15 14:08:38,802][1651669] Updated weights for policy 0, policy_version 216816 (0.0013) [2024-06-15 14:08:40,770][1648981] Fps is (10 sec: 52436.4, 60 sec: 50241.2, 300 sec: 47874.1). Total num frames: 444071936. Throughput: 0: 12104.9. Samples: 111056896. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:08:40,771][1648981] Avg episode reward: [(0, '401.370')] [2024-06-15 14:08:43,186][1651669] Updated weights for policy 0, policy_version 216864 (0.0014) [2024-06-15 14:08:44,551][1651669] Updated weights for policy 0, policy_version 216915 (0.0015) [2024-06-15 14:08:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 444334080. Throughput: 0: 12331.5. Samples: 111141376. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:08:45,767][1648981] Avg episode reward: [(0, '405.460')] [2024-06-15 14:08:46,889][1651669] Updated weights for policy 0, policy_version 217008 (0.0015) [2024-06-15 14:08:48,612][1651669] Updated weights for policy 0, policy_version 217045 (0.0012) [2024-06-15 14:08:50,768][1648981] Fps is (10 sec: 52448.8, 60 sec: 50244.2, 300 sec: 47987.9). Total num frames: 444596224. Throughput: 0: 12391.0. Samples: 111209984. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:08:50,770][1648981] Avg episode reward: [(0, '403.570')] [2024-06-15 14:08:53,444][1651669] Updated weights for policy 0, policy_version 217104 (0.0066) [2024-06-15 14:08:54,833][1651669] Updated weights for policy 0, policy_version 217154 (0.0012) [2024-06-15 14:08:55,432][1651274] Signal inference workers to stop experience collection... (11400 times) [2024-06-15 14:08:55,489][1651669] InferenceWorker_p0-w0: stopping experience collection (11400 times) [2024-06-15 14:08:55,661][1651274] Signal inference workers to resume experience collection... (11400 times) [2024-06-15 14:08:55,662][1651669] InferenceWorker_p0-w0: resuming experience collection (11400 times) [2024-06-15 14:08:55,773][1648981] Fps is (10 sec: 49119.0, 60 sec: 48600.4, 300 sec: 48206.7). Total num frames: 444825600. Throughput: 0: 12526.1. Samples: 111256576. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:08:55,774][1648981] Avg episode reward: [(0, '386.990')] [2024-06-15 14:08:55,897][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000217216_444858368.pth... [2024-06-15 14:08:56,011][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000211584_433324032.pth [2024-06-15 14:08:56,709][1651669] Updated weights for policy 0, policy_version 217221 (0.0013) [2024-06-15 14:08:59,069][1651669] Updated weights for policy 0, policy_version 217296 (0.0013) [2024-06-15 14:09:00,269][1651669] Updated weights for policy 0, policy_version 217343 (0.0013) [2024-06-15 14:09:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50272.8, 300 sec: 47986.4). Total num frames: 445120512. Throughput: 0: 12242.5. Samples: 111321088. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:00,767][1648981] Avg episode reward: [(0, '385.710')] [2024-06-15 14:09:05,224][1651669] Updated weights for policy 0, policy_version 217404 (0.0027) [2024-06-15 14:09:05,766][1648981] Fps is (10 sec: 42627.1, 60 sec: 48061.5, 300 sec: 47985.7). Total num frames: 445251584. Throughput: 0: 12265.2. Samples: 111399936. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:05,767][1648981] Avg episode reward: [(0, '374.800')] [2024-06-15 14:09:07,435][1651669] Updated weights for policy 0, policy_version 217472 (0.0015) [2024-06-15 14:09:08,675][1651669] Updated weights for policy 0, policy_version 217531 (0.0013) [2024-06-15 14:09:10,720][1651669] Updated weights for policy 0, policy_version 217584 (0.0017) [2024-06-15 14:09:10,767][1648981] Fps is (10 sec: 49151.7, 60 sec: 51339.8, 300 sec: 48096.7). Total num frames: 445612032. Throughput: 0: 12185.6. Samples: 111428608. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:10,767][1648981] Avg episode reward: [(0, '379.530')] [2024-06-15 14:09:15,413][1651669] Updated weights for policy 0, policy_version 217619 (0.0030) [2024-06-15 14:09:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 445710336. Throughput: 0: 12585.3. Samples: 111511040. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:15,767][1648981] Avg episode reward: [(0, '374.880')] [2024-06-15 14:09:17,135][1651669] Updated weights for policy 0, policy_version 217665 (0.0012) [2024-06-15 14:09:18,830][1651669] Updated weights for policy 0, policy_version 217730 (0.0018) [2024-06-15 14:09:20,493][1651669] Updated weights for policy 0, policy_version 217808 (0.0083) [2024-06-15 14:09:20,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 50790.1, 300 sec: 47985.6). Total num frames: 446070784. Throughput: 0: 12083.1. Samples: 111575040. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:20,767][1648981] Avg episode reward: [(0, '403.750')] [2024-06-15 14:09:21,514][1651669] Updated weights for policy 0, policy_version 217856 (0.0021) [2024-06-15 14:09:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 47546.7). Total num frames: 446169088. Throughput: 0: 12391.4. Samples: 111614464. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:25,767][1648981] Avg episode reward: [(0, '397.430')] [2024-06-15 14:09:27,359][1651669] Updated weights for policy 0, policy_version 217921 (0.0012) [2024-06-15 14:09:28,774][1651669] Updated weights for policy 0, policy_version 217972 (0.0015) [2024-06-15 14:09:30,227][1651669] Updated weights for policy 0, policy_version 218044 (0.0081) [2024-06-15 14:09:30,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 50248.7, 300 sec: 48318.9). Total num frames: 446562304. Throughput: 0: 12094.6. Samples: 111685632. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:30,767][1648981] Avg episode reward: [(0, '394.220')] [2024-06-15 14:09:32,530][1651669] Updated weights for policy 0, policy_version 218110 (0.0014) [2024-06-15 14:09:35,784][1648981] Fps is (10 sec: 52337.7, 60 sec: 48045.8, 300 sec: 47871.8). Total num frames: 446693376. Throughput: 0: 12362.9. Samples: 111766528. Policy #0 lag: (min: 15.0, avg: 94.3, max: 271.0) [2024-06-15 14:09:35,785][1648981] Avg episode reward: [(0, '382.600')] [2024-06-15 14:09:37,003][1651669] Updated weights for policy 0, policy_version 218144 (0.0013) [2024-06-15 14:09:37,161][1651274] Signal inference workers to stop experience collection... (11450 times) [2024-06-15 14:09:37,235][1651669] InferenceWorker_p0-w0: stopping experience collection (11450 times) [2024-06-15 14:09:37,468][1651274] Signal inference workers to resume experience collection... (11450 times) [2024-06-15 14:09:37,469][1651669] InferenceWorker_p0-w0: resuming experience collection (11450 times) [2024-06-15 14:09:38,950][1651669] Updated weights for policy 0, policy_version 218224 (0.0101) [2024-06-15 14:09:40,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48609.0, 300 sec: 48096.8). Total num frames: 446988288. Throughput: 0: 12050.9. Samples: 111798784. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:09:40,767][1648981] Avg episode reward: [(0, '383.660')] [2024-06-15 14:09:40,965][1651669] Updated weights for policy 0, policy_version 218272 (0.0016) [2024-06-15 14:09:43,966][1651669] Updated weights for policy 0, policy_version 218352 (0.0015) [2024-06-15 14:09:45,776][1648981] Fps is (10 sec: 52469.5, 60 sec: 48052.0, 300 sec: 47984.1). Total num frames: 447217664. Throughput: 0: 12023.7. Samples: 111862272. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:09:45,777][1648981] Avg episode reward: [(0, '375.190')] [2024-06-15 14:09:47,469][1651669] Updated weights for policy 0, policy_version 218403 (0.0013) [2024-06-15 14:09:49,087][1651669] Updated weights for policy 0, policy_version 218464 (0.0015) [2024-06-15 14:09:50,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 47989.6). Total num frames: 447479808. Throughput: 0: 12037.7. Samples: 111941632. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:09:50,767][1648981] Avg episode reward: [(0, '363.560')] [2024-06-15 14:09:51,440][1651669] Updated weights for policy 0, policy_version 218513 (0.0013) [2024-06-15 14:09:52,286][1651669] Updated weights for policy 0, policy_version 218555 (0.0013) [2024-06-15 14:09:54,792][1651669] Updated weights for policy 0, policy_version 218614 (0.0012) [2024-06-15 14:09:55,767][1648981] Fps is (10 sec: 52478.8, 60 sec: 48611.2, 300 sec: 47985.7). Total num frames: 447741952. Throughput: 0: 12265.2. Samples: 111980544. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:09:55,767][1648981] Avg episode reward: [(0, '363.160')] [2024-06-15 14:09:58,353][1651669] Updated weights for policy 0, policy_version 218672 (0.0011) [2024-06-15 14:09:59,625][1651669] Updated weights for policy 0, policy_version 218691 (0.0011) [2024-06-15 14:10:00,774][1648981] Fps is (10 sec: 45838.6, 60 sec: 46961.3, 300 sec: 48213.0). Total num frames: 447938560. Throughput: 0: 12081.1. Samples: 112054784. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:00,775][1648981] Avg episode reward: [(0, '363.130')] [2024-06-15 14:10:01,898][1651669] Updated weights for policy 0, policy_version 218772 (0.0013) [2024-06-15 14:10:02,606][1651669] Updated weights for policy 0, policy_version 218814 (0.0014) [2024-06-15 14:10:05,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 48605.9, 300 sec: 47875.9). Total num frames: 448167936. Throughput: 0: 12128.8. Samples: 112120832. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:05,767][1648981] Avg episode reward: [(0, '379.180')] [2024-06-15 14:10:06,538][1651669] Updated weights for policy 0, policy_version 218880 (0.0104) [2024-06-15 14:10:10,778][1648981] Fps is (10 sec: 45857.4, 60 sec: 46412.2, 300 sec: 47983.8). Total num frames: 448397312. Throughput: 0: 12148.3. Samples: 112161280. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:10,779][1648981] Avg episode reward: [(0, '384.160')] [2024-06-15 14:10:10,955][1651669] Updated weights for policy 0, policy_version 218945 (0.0149) [2024-06-15 14:10:12,315][1651669] Updated weights for policy 0, policy_version 219010 (0.0015) [2024-06-15 14:10:13,401][1651669] Updated weights for policy 0, policy_version 219058 (0.0019) [2024-06-15 14:10:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 448659456. Throughput: 0: 12117.4. Samples: 112230912. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:15,767][1648981] Avg episode reward: [(0, '382.980')] [2024-06-15 14:10:17,157][1651669] Updated weights for policy 0, policy_version 219104 (0.0010) [2024-06-15 14:10:19,997][1651274] Signal inference workers to stop experience collection... (11500 times) [2024-06-15 14:10:20,077][1651669] InferenceWorker_p0-w0: stopping experience collection (11500 times) [2024-06-15 14:10:20,079][1651669] Updated weights for policy 0, policy_version 219173 (0.0014) [2024-06-15 14:10:20,219][1651274] Signal inference workers to resume experience collection... (11500 times) [2024-06-15 14:10:20,220][1651669] InferenceWorker_p0-w0: resuming experience collection (11500 times) [2024-06-15 14:10:20,768][1648981] Fps is (10 sec: 52482.0, 60 sec: 47512.4, 300 sec: 48207.6). Total num frames: 448921600. Throughput: 0: 11882.5. Samples: 112301056. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:20,769][1648981] Avg episode reward: [(0, '380.580')] [2024-06-15 14:10:22,465][1651669] Updated weights for policy 0, policy_version 219233 (0.0012) [2024-06-15 14:10:23,990][1651669] Updated weights for policy 0, policy_version 219297 (0.0013) [2024-06-15 14:10:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 449183744. Throughput: 0: 12003.5. Samples: 112338944. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:25,767][1648981] Avg episode reward: [(0, '391.700')] [2024-06-15 14:10:27,540][1651669] Updated weights for policy 0, policy_version 219329 (0.0014) [2024-06-15 14:10:28,644][1651669] Updated weights for policy 0, policy_version 219392 (0.0015) [2024-06-15 14:10:30,596][1651669] Updated weights for policy 0, policy_version 219440 (0.0026) [2024-06-15 14:10:30,767][1648981] Fps is (10 sec: 49156.9, 60 sec: 47513.1, 300 sec: 48318.8). Total num frames: 449413120. Throughput: 0: 12324.6. Samples: 112416768. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:30,768][1648981] Avg episode reward: [(0, '393.900')] [2024-06-15 14:10:32,610][1651669] Updated weights for policy 0, policy_version 219475 (0.0012) [2024-06-15 14:10:33,901][1651669] Updated weights for policy 0, policy_version 219538 (0.0013) [2024-06-15 14:10:35,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 50255.7, 300 sec: 48651.5). Total num frames: 449708032. Throughput: 0: 12059.4. Samples: 112484352. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:35,771][1648981] Avg episode reward: [(0, '392.550')] [2024-06-15 14:10:38,500][1651669] Updated weights for policy 0, policy_version 219585 (0.0015) [2024-06-15 14:10:40,043][1651669] Updated weights for policy 0, policy_version 219652 (0.0018) [2024-06-15 14:10:40,766][1648981] Fps is (10 sec: 49155.8, 60 sec: 48605.9, 300 sec: 48207.9). Total num frames: 449904640. Throughput: 0: 12265.3. Samples: 112532480. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:40,767][1648981] Avg episode reward: [(0, '392.340')] [2024-06-15 14:10:41,186][1651669] Updated weights for policy 0, policy_version 219712 (0.0011) [2024-06-15 14:10:43,849][1651669] Updated weights for policy 0, policy_version 219765 (0.0114) [2024-06-15 14:10:45,343][1651669] Updated weights for policy 0, policy_version 219835 (0.0013) [2024-06-15 14:10:45,766][1648981] Fps is (10 sec: 52448.4, 60 sec: 50252.3, 300 sec: 48763.2). Total num frames: 450232320. Throughput: 0: 12028.4. Samples: 112595968. Policy #0 lag: (min: 5.0, avg: 141.3, max: 293.0) [2024-06-15 14:10:45,767][1648981] Avg episode reward: [(0, '383.830')] [2024-06-15 14:10:50,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 450330624. Throughput: 0: 12310.7. Samples: 112674816. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:10:50,767][1648981] Avg episode reward: [(0, '385.180')] [2024-06-15 14:10:51,067][1651669] Updated weights for policy 0, policy_version 219904 (0.0108) [2024-06-15 14:10:53,725][1651669] Updated weights for policy 0, policy_version 219985 (0.0015) [2024-06-15 14:10:55,043][1651669] Updated weights for policy 0, policy_version 220036 (0.0117) [2024-06-15 14:10:55,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49152.1, 300 sec: 48652.1). Total num frames: 450691072. Throughput: 0: 12120.5. Samples: 112706560. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:10:55,767][1648981] Avg episode reward: [(0, '394.730')] [2024-06-15 14:10:56,033][1651669] Updated weights for policy 0, policy_version 220086 (0.0014) [2024-06-15 14:10:56,189][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000220096_450756608.pth... [2024-06-15 14:10:56,231][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000214304_438894592.pth [2024-06-15 14:10:56,236][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000220096_450756608.pth [2024-06-15 14:11:00,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 47519.9, 300 sec: 47985.7). Total num frames: 450789376. Throughput: 0: 12413.1. Samples: 112789504. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:00,767][1648981] Avg episode reward: [(0, '401.680')] [2024-06-15 14:11:01,200][1651274] Signal inference workers to stop experience collection... (11550 times) [2024-06-15 14:11:01,251][1651669] InferenceWorker_p0-w0: stopping experience collection (11550 times) [2024-06-15 14:11:01,435][1651274] Signal inference workers to resume experience collection... (11550 times) [2024-06-15 14:11:01,436][1651669] InferenceWorker_p0-w0: resuming experience collection (11550 times) [2024-06-15 14:11:01,612][1651669] Updated weights for policy 0, policy_version 220145 (0.0012) [2024-06-15 14:11:02,933][1651669] Updated weights for policy 0, policy_version 220214 (0.0014) [2024-06-15 14:11:04,717][1651669] Updated weights for policy 0, policy_version 220258 (0.0015) [2024-06-15 14:11:05,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 50790.2, 300 sec: 49096.4). Total num frames: 451215360. Throughput: 0: 12288.4. Samples: 112854016. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:05,767][1648981] Avg episode reward: [(0, '387.000')] [2024-06-15 14:11:06,084][1651669] Updated weights for policy 0, policy_version 220336 (0.0014) [2024-06-15 14:11:10,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 48615.4, 300 sec: 48318.9). Total num frames: 451313664. Throughput: 0: 12458.7. Samples: 112899584. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:10,767][1648981] Avg episode reward: [(0, '385.460')] [2024-06-15 14:11:10,820][1651669] Updated weights for policy 0, policy_version 220384 (0.0014) [2024-06-15 14:11:12,592][1651669] Updated weights for policy 0, policy_version 220464 (0.0014) [2024-06-15 14:11:14,951][1651669] Updated weights for policy 0, policy_version 220512 (0.0014) [2024-06-15 14:11:15,793][1648981] Fps is (10 sec: 45756.2, 60 sec: 50222.3, 300 sec: 48875.2). Total num frames: 451674112. Throughput: 0: 12337.9. Samples: 112972288. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:15,793][1648981] Avg episode reward: [(0, '371.500')] [2024-06-15 14:11:16,517][1651669] Updated weights for policy 0, policy_version 220579 (0.0015) [2024-06-15 14:11:20,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48061.1, 300 sec: 48430.0). Total num frames: 451805184. Throughput: 0: 12573.5. Samples: 113050112. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:20,767][1648981] Avg episode reward: [(0, '380.670')] [2024-06-15 14:11:21,131][1651669] Updated weights for policy 0, policy_version 220630 (0.0017) [2024-06-15 14:11:22,352][1651669] Updated weights for policy 0, policy_version 220688 (0.0041) [2024-06-15 14:11:23,332][1651669] Updated weights for policy 0, policy_version 220736 (0.0013) [2024-06-15 14:11:25,767][1648981] Fps is (10 sec: 42709.6, 60 sec: 48605.8, 300 sec: 48541.0). Total num frames: 452100096. Throughput: 0: 12242.4. Samples: 113083392. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:25,767][1648981] Avg episode reward: [(0, '387.610')] [2024-06-15 14:11:27,170][1651669] Updated weights for policy 0, policy_version 220817 (0.0212) [2024-06-15 14:11:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48606.5, 300 sec: 48763.2). Total num frames: 452329472. Throughput: 0: 12344.9. Samples: 113151488. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:30,767][1648981] Avg episode reward: [(0, '381.900')] [2024-06-15 14:11:31,783][1651669] Updated weights for policy 0, policy_version 220880 (0.0015) [2024-06-15 14:11:33,844][1651669] Updated weights for policy 0, policy_version 220929 (0.0012) [2024-06-15 14:11:34,911][1651669] Updated weights for policy 0, policy_version 220985 (0.0012) [2024-06-15 14:11:35,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 48062.9, 300 sec: 48542.7). Total num frames: 452591616. Throughput: 0: 12310.8. Samples: 113228800. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:35,767][1648981] Avg episode reward: [(0, '381.930')] [2024-06-15 14:11:37,131][1651669] Updated weights for policy 0, policy_version 221040 (0.0102) [2024-06-15 14:11:37,895][1651274] Signal inference workers to stop experience collection... (11600 times) [2024-06-15 14:11:37,974][1651669] InferenceWorker_p0-w0: stopping experience collection (11600 times) [2024-06-15 14:11:38,254][1651274] Signal inference workers to resume experience collection... (11600 times) [2024-06-15 14:11:38,254][1651669] InferenceWorker_p0-w0: resuming experience collection (11600 times) [2024-06-15 14:11:38,767][1651669] Updated weights for policy 0, policy_version 221115 (0.0079) [2024-06-15 14:11:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49151.9, 300 sec: 48985.4). Total num frames: 452853760. Throughput: 0: 12140.1. Samples: 113252864. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:40,767][1648981] Avg episode reward: [(0, '378.070')] [2024-06-15 14:11:43,403][1651669] Updated weights for policy 0, policy_version 221154 (0.0017) [2024-06-15 14:11:45,766][1648981] Fps is (10 sec: 45874.3, 60 sec: 46967.5, 300 sec: 48541.1). Total num frames: 453050368. Throughput: 0: 12185.6. Samples: 113337856. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:45,767][1648981] Avg episode reward: [(0, '377.610')] [2024-06-15 14:11:45,808][1651669] Updated weights for policy 0, policy_version 221219 (0.0013) [2024-06-15 14:11:47,432][1651669] Updated weights for policy 0, policy_version 221281 (0.0013) [2024-06-15 14:11:49,627][1651669] Updated weights for policy 0, policy_version 221367 (0.0116) [2024-06-15 14:11:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50790.4, 300 sec: 49318.7). Total num frames: 453378048. Throughput: 0: 12049.1. Samples: 113396224. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:50,767][1648981] Avg episode reward: [(0, '385.260')] [2024-06-15 14:11:54,344][1651669] Updated weights for policy 0, policy_version 221411 (0.0011) [2024-06-15 14:11:55,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 48430.0). Total num frames: 453509120. Throughput: 0: 12185.6. Samples: 113447936. Policy #0 lag: (min: 12.0, avg: 87.0, max: 268.0) [2024-06-15 14:11:55,767][1648981] Avg episode reward: [(0, '388.030')] [2024-06-15 14:11:56,337][1651669] Updated weights for policy 0, policy_version 221457 (0.0026) [2024-06-15 14:11:57,731][1651669] Updated weights for policy 0, policy_version 221510 (0.0017) [2024-06-15 14:11:59,087][1651669] Updated weights for policy 0, policy_version 221568 (0.0012) [2024-06-15 14:12:00,508][1651669] Updated weights for policy 0, policy_version 221632 (0.0013) [2024-06-15 14:12:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 51882.7, 300 sec: 49322.0). Total num frames: 453902336. Throughput: 0: 11885.3. Samples: 113506816. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:00,767][1648981] Avg episode reward: [(0, '391.360')] [2024-06-15 14:12:05,505][1651669] Updated weights for policy 0, policy_version 221696 (0.0014) [2024-06-15 14:12:05,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 46967.6, 300 sec: 48652.1). Total num frames: 454033408. Throughput: 0: 12060.4. Samples: 113592832. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:05,767][1648981] Avg episode reward: [(0, '393.450')] [2024-06-15 14:12:09,167][1651669] Updated weights for policy 0, policy_version 221808 (0.0111) [2024-06-15 14:12:10,586][1651669] Updated weights for policy 0, policy_version 221856 (0.0099) [2024-06-15 14:12:10,769][1648981] Fps is (10 sec: 45863.6, 60 sec: 50788.3, 300 sec: 49096.1). Total num frames: 454361088. Throughput: 0: 11934.7. Samples: 113620480. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:10,770][1648981] Avg episode reward: [(0, '389.980')] [2024-06-15 14:12:15,176][1651669] Updated weights for policy 0, policy_version 221893 (0.0012) [2024-06-15 14:12:15,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46987.9, 300 sec: 48874.9). Total num frames: 454492160. Throughput: 0: 12265.2. Samples: 113703424. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:15,767][1648981] Avg episode reward: [(0, '409.970')] [2024-06-15 14:12:16,439][1651669] Updated weights for policy 0, policy_version 221947 (0.0013) [2024-06-15 14:12:18,645][1651669] Updated weights for policy 0, policy_version 221987 (0.0013) [2024-06-15 14:12:19,411][1651274] Signal inference workers to stop experience collection... (11650 times) [2024-06-15 14:12:19,482][1651669] InferenceWorker_p0-w0: stopping experience collection (11650 times) [2024-06-15 14:12:19,682][1651274] Signal inference workers to resume experience collection... (11650 times) [2024-06-15 14:12:19,683][1651669] InferenceWorker_p0-w0: resuming experience collection (11650 times) [2024-06-15 14:12:20,230][1651669] Updated weights for policy 0, policy_version 222051 (0.0016) [2024-06-15 14:12:20,766][1648981] Fps is (10 sec: 45887.1, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 454819840. Throughput: 0: 11901.1. Samples: 113764352. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:20,767][1648981] Avg episode reward: [(0, '420.590')] [2024-06-15 14:12:21,620][1651669] Updated weights for policy 0, policy_version 222116 (0.0014) [2024-06-15 14:12:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 48874.3). Total num frames: 454950912. Throughput: 0: 12128.7. Samples: 113798656. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:25,767][1648981] Avg episode reward: [(0, '435.570')] [2024-06-15 14:12:26,064][1651669] Updated weights for policy 0, policy_version 222160 (0.0026) [2024-06-15 14:12:26,975][1651669] Updated weights for policy 0, policy_version 222208 (0.0011) [2024-06-15 14:12:29,573][1651669] Updated weights for policy 0, policy_version 222272 (0.0012) [2024-06-15 14:12:30,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 455311360. Throughput: 0: 12140.1. Samples: 113884160. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:30,767][1648981] Avg episode reward: [(0, '425.270')] [2024-06-15 14:12:31,696][1651669] Updated weights for policy 0, policy_version 222352 (0.0014) [2024-06-15 14:12:35,777][1648981] Fps is (10 sec: 52371.1, 60 sec: 48050.7, 300 sec: 48872.5). Total num frames: 455475200. Throughput: 0: 12228.1. Samples: 113946624. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:35,833][1648981] Avg episode reward: [(0, '408.980')] [2024-06-15 14:12:36,714][1651669] Updated weights for policy 0, policy_version 222401 (0.0012) [2024-06-15 14:12:37,752][1651669] Updated weights for policy 0, policy_version 222464 (0.0013) [2024-06-15 14:12:40,724][1651669] Updated weights for policy 0, policy_version 222533 (0.0013) [2024-06-15 14:12:40,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 455737344. Throughput: 0: 12094.6. Samples: 113992192. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:40,767][1648981] Avg episode reward: [(0, '406.070')] [2024-06-15 14:12:42,513][1651669] Updated weights for policy 0, policy_version 222609 (0.0016) [2024-06-15 14:12:43,597][1651669] Updated weights for policy 0, policy_version 222654 (0.0012) [2024-06-15 14:12:45,774][1648981] Fps is (10 sec: 52446.1, 60 sec: 49145.7, 300 sec: 48873.0). Total num frames: 455999488. Throughput: 0: 12160.7. Samples: 114054144. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:45,775][1648981] Avg episode reward: [(0, '428.650')] [2024-06-15 14:12:48,341][1651669] Updated weights for policy 0, policy_version 222697 (0.0013) [2024-06-15 14:12:50,736][1651669] Updated weights for policy 0, policy_version 222746 (0.0048) [2024-06-15 14:12:50,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 46421.3, 300 sec: 48318.9). Total num frames: 456163328. Throughput: 0: 12162.8. Samples: 114140160. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:50,767][1648981] Avg episode reward: [(0, '426.790')] [2024-06-15 14:12:52,246][1651669] Updated weights for policy 0, policy_version 222805 (0.0073) [2024-06-15 14:12:54,612][1651669] Updated weights for policy 0, policy_version 222896 (0.0119) [2024-06-15 14:12:55,768][1648981] Fps is (10 sec: 52461.4, 60 sec: 50243.1, 300 sec: 48879.7). Total num frames: 456523776. Throughput: 0: 12140.3. Samples: 114166784. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:12:55,768][1648981] Avg episode reward: [(0, '426.190')] [2024-06-15 14:12:55,784][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000222912_456523776.pth... [2024-06-15 14:12:55,870][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000217216_444858368.pth [2024-06-15 14:12:58,934][1651274] Signal inference workers to stop experience collection... (11700 times) [2024-06-15 14:12:59,002][1651669] InferenceWorker_p0-w0: stopping experience collection (11700 times) [2024-06-15 14:12:59,213][1651274] Signal inference workers to resume experience collection... (11700 times) [2024-06-15 14:12:59,214][1651669] InferenceWorker_p0-w0: resuming experience collection (11700 times) [2024-06-15 14:12:59,680][1651669] Updated weights for policy 0, policy_version 222970 (0.0021) [2024-06-15 14:13:00,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 45875.2, 300 sec: 48430.4). Total num frames: 456654848. Throughput: 0: 11958.1. Samples: 114241536. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:13:00,767][1648981] Avg episode reward: [(0, '425.380')] [2024-06-15 14:13:02,071][1651669] Updated weights for policy 0, policy_version 223024 (0.0012) [2024-06-15 14:13:03,896][1651669] Updated weights for policy 0, policy_version 223095 (0.0010) [2024-06-15 14:13:05,388][1651669] Updated weights for policy 0, policy_version 223152 (0.0021) [2024-06-15 14:13:05,766][1648981] Fps is (10 sec: 52436.8, 60 sec: 50244.2, 300 sec: 49208.2). Total num frames: 457048064. Throughput: 0: 11980.8. Samples: 114303488. Policy #0 lag: (min: 60.0, avg: 159.0, max: 265.0) [2024-06-15 14:13:05,767][1648981] Avg episode reward: [(0, '431.800')] [2024-06-15 14:13:10,444][1651669] Updated weights for policy 0, policy_version 223216 (0.0042) [2024-06-15 14:13:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46969.4, 300 sec: 48652.1). Total num frames: 457179136. Throughput: 0: 12128.7. Samples: 114344448. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:10,767][1648981] Avg episode reward: [(0, '450.190')] [2024-06-15 14:13:12,988][1651669] Updated weights for policy 0, policy_version 223266 (0.0011) [2024-06-15 14:13:15,225][1651669] Updated weights for policy 0, policy_version 223354 (0.0013) [2024-06-15 14:13:15,768][1648981] Fps is (10 sec: 42594.0, 60 sec: 49697.3, 300 sec: 48985.2). Total num frames: 457474048. Throughput: 0: 11753.0. Samples: 114413056. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:15,768][1648981] Avg episode reward: [(0, '446.430')] [2024-06-15 14:13:16,376][1651669] Updated weights for policy 0, policy_version 223395 (0.0042) [2024-06-15 14:13:20,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45875.1, 300 sec: 48318.9). Total num frames: 457572352. Throughput: 0: 11995.1. Samples: 114486272. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:20,767][1648981] Avg episode reward: [(0, '446.350')] [2024-06-15 14:13:21,210][1651669] Updated weights for policy 0, policy_version 223441 (0.0025) [2024-06-15 14:13:24,775][1651669] Updated weights for policy 0, policy_version 223520 (0.0016) [2024-06-15 14:13:25,766][1648981] Fps is (10 sec: 36048.7, 60 sec: 48059.8, 300 sec: 48430.9). Total num frames: 457834496. Throughput: 0: 11821.5. Samples: 114524160. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:25,767][1648981] Avg episode reward: [(0, '437.450')] [2024-06-15 14:13:26,445][1651669] Updated weights for policy 0, policy_version 223584 (0.0014) [2024-06-15 14:13:28,686][1651669] Updated weights for policy 0, policy_version 223664 (0.0013) [2024-06-15 14:13:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 48430.0). Total num frames: 458096640. Throughput: 0: 11504.9. Samples: 114571776. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:30,767][1648981] Avg episode reward: [(0, '433.080')] [2024-06-15 14:13:33,020][1651669] Updated weights for policy 0, policy_version 223701 (0.0011) [2024-06-15 14:13:35,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 45883.7, 300 sec: 47986.3). Total num frames: 458227712. Throughput: 0: 11582.6. Samples: 114661376. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:35,767][1648981] Avg episode reward: [(0, '434.510')] [2024-06-15 14:13:35,777][1651669] Updated weights for policy 0, policy_version 223760 (0.0013) [2024-06-15 14:13:37,503][1651669] Updated weights for policy 0, policy_version 223824 (0.0107) [2024-06-15 14:13:37,954][1651274] Signal inference workers to stop experience collection... (11750 times) [2024-06-15 14:13:37,986][1651669] InferenceWorker_p0-w0: stopping experience collection (11750 times) [2024-06-15 14:13:38,300][1651274] Signal inference workers to resume experience collection... (11750 times) [2024-06-15 14:13:38,301][1651669] InferenceWorker_p0-w0: resuming experience collection (11750 times) [2024-06-15 14:13:39,533][1651669] Updated weights for policy 0, policy_version 223892 (0.0084) [2024-06-15 14:13:40,576][1651669] Updated weights for policy 0, policy_version 223933 (0.0025) [2024-06-15 14:13:40,774][1648981] Fps is (10 sec: 52388.1, 60 sec: 48053.4, 300 sec: 48428.7). Total num frames: 458620928. Throughput: 0: 11490.0. Samples: 114683904. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:40,775][1648981] Avg episode reward: [(0, '439.400')] [2024-06-15 14:13:44,562][1651669] Updated weights for policy 0, policy_version 223984 (0.0013) [2024-06-15 14:13:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45881.1, 300 sec: 47985.7). Total num frames: 458752000. Throughput: 0: 11446.0. Samples: 114756608. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:45,767][1648981] Avg episode reward: [(0, '442.920')] [2024-06-15 14:13:47,494][1651669] Updated weights for policy 0, policy_version 224018 (0.0013) [2024-06-15 14:13:49,361][1651669] Updated weights for policy 0, policy_version 224096 (0.0012) [2024-06-15 14:13:50,769][1648981] Fps is (10 sec: 42622.5, 60 sec: 48058.1, 300 sec: 48208.6). Total num frames: 459046912. Throughput: 0: 11593.4. Samples: 114825216. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:50,769][1648981] Avg episode reward: [(0, '451.910')] [2024-06-15 14:13:51,271][1651669] Updated weights for policy 0, policy_version 224176 (0.0016) [2024-06-15 14:13:55,466][1651669] Updated weights for policy 0, policy_version 224256 (0.0013) [2024-06-15 14:13:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45876.4, 300 sec: 47985.7). Total num frames: 459276288. Throughput: 0: 11446.1. Samples: 114859520. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:13:55,767][1648981] Avg episode reward: [(0, '473.990')] [2024-06-15 14:13:55,772][1651274] Saving new best policy, reward=473.990! [2024-06-15 14:14:00,666][1651669] Updated weights for policy 0, policy_version 224346 (0.0028) [2024-06-15 14:14:00,769][1648981] Fps is (10 sec: 42596.6, 60 sec: 46965.4, 300 sec: 48207.4). Total num frames: 459472896. Throughput: 0: 11661.8. Samples: 114937856. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:14:00,770][1648981] Avg episode reward: [(0, '477.920')] [2024-06-15 14:14:01,186][1651274] Saving new best policy, reward=477.920! [2024-06-15 14:14:02,426][1651669] Updated weights for policy 0, policy_version 224416 (0.0013) [2024-06-15 14:14:05,220][1651669] Updated weights for policy 0, policy_version 224449 (0.0018) [2024-06-15 14:14:05,768][1648981] Fps is (10 sec: 45865.9, 60 sec: 44781.5, 300 sec: 47874.3). Total num frames: 459735040. Throughput: 0: 11684.5. Samples: 115012096. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:14:05,769][1648981] Avg episode reward: [(0, '463.790')] [2024-06-15 14:14:08,937][1651669] Updated weights for policy 0, policy_version 224517 (0.0123) [2024-06-15 14:14:10,719][1651669] Updated weights for policy 0, policy_version 224580 (0.0013) [2024-06-15 14:14:10,766][1648981] Fps is (10 sec: 45887.1, 60 sec: 45875.2, 300 sec: 48207.8). Total num frames: 459931648. Throughput: 0: 11753.2. Samples: 115053056. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:14:10,767][1648981] Avg episode reward: [(0, '467.080')] [2024-06-15 14:14:12,310][1651669] Updated weights for policy 0, policy_version 224643 (0.0012) [2024-06-15 14:14:13,562][1651669] Updated weights for policy 0, policy_version 224704 (0.0012) [2024-06-15 14:14:15,766][1648981] Fps is (10 sec: 45884.5, 60 sec: 45329.9, 300 sec: 47874.6). Total num frames: 460193792. Throughput: 0: 12026.3. Samples: 115112960. Policy #0 lag: (min: 15.0, avg: 110.5, max: 271.0) [2024-06-15 14:14:15,767][1648981] Avg episode reward: [(0, '465.220')] [2024-06-15 14:14:17,215][1651669] Updated weights for policy 0, policy_version 224752 (0.0021) [2024-06-15 14:14:19,721][1651274] Signal inference workers to stop experience collection... (11800 times) [2024-06-15 14:14:19,783][1651669] InferenceWorker_p0-w0: stopping experience collection (11800 times) [2024-06-15 14:14:19,947][1651274] Signal inference workers to resume experience collection... (11800 times) [2024-06-15 14:14:19,948][1651669] InferenceWorker_p0-w0: resuming experience collection (11800 times) [2024-06-15 14:14:20,489][1651669] Updated weights for policy 0, policy_version 224802 (0.0020) [2024-06-15 14:14:20,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 460423168. Throughput: 0: 11832.9. Samples: 115193856. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:20,767][1648981] Avg episode reward: [(0, '462.390')] [2024-06-15 14:14:22,494][1651669] Updated weights for policy 0, policy_version 224884 (0.0013) [2024-06-15 14:14:24,034][1651669] Updated weights for policy 0, policy_version 224958 (0.0014) [2024-06-15 14:14:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 460718080. Throughput: 0: 11869.1. Samples: 115217920. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:25,767][1648981] Avg episode reward: [(0, '453.530')] [2024-06-15 14:14:28,730][1651669] Updated weights for policy 0, policy_version 225011 (0.0013) [2024-06-15 14:14:30,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45875.3, 300 sec: 47988.5). Total num frames: 460849152. Throughput: 0: 12151.5. Samples: 115303424. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:30,767][1648981] Avg episode reward: [(0, '450.910')] [2024-06-15 14:14:30,981][1651669] Updated weights for policy 0, policy_version 225040 (0.0014) [2024-06-15 14:14:32,712][1651669] Updated weights for policy 0, policy_version 225110 (0.0183) [2024-06-15 14:14:34,373][1651669] Updated weights for policy 0, policy_version 225184 (0.0037) [2024-06-15 14:14:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.2, 300 sec: 48318.9). Total num frames: 461242368. Throughput: 0: 11970.0. Samples: 115363840. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:35,767][1648981] Avg episode reward: [(0, '450.710')] [2024-06-15 14:14:39,082][1651669] Updated weights for policy 0, policy_version 225217 (0.0012) [2024-06-15 14:14:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45881.2, 300 sec: 47987.3). Total num frames: 461373440. Throughput: 0: 12231.1. Samples: 115409920. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:40,767][1648981] Avg episode reward: [(0, '461.930')] [2024-06-15 14:14:41,660][1651669] Updated weights for policy 0, policy_version 225284 (0.0014) [2024-06-15 14:14:43,066][1651669] Updated weights for policy 0, policy_version 225346 (0.0116) [2024-06-15 14:14:44,309][1651669] Updated weights for policy 0, policy_version 225412 (0.0100) [2024-06-15 14:14:45,304][1651669] Updated weights for policy 0, policy_version 225463 (0.0013) [2024-06-15 14:14:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 461766656. Throughput: 0: 11844.9. Samples: 115470848. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:45,767][1648981] Avg episode reward: [(0, '471.660')] [2024-06-15 14:14:50,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46423.0, 300 sec: 47763.5). Total num frames: 461832192. Throughput: 0: 12038.2. Samples: 115553792. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:50,767][1648981] Avg episode reward: [(0, '463.600')] [2024-06-15 14:14:51,039][1651669] Updated weights for policy 0, policy_version 225528 (0.0033) [2024-06-15 14:14:53,906][1651669] Updated weights for policy 0, policy_version 225588 (0.0013) [2024-06-15 14:14:55,258][1651669] Updated weights for policy 0, policy_version 225649 (0.0015) [2024-06-15 14:14:55,631][1651274] Signal inference workers to stop experience collection... (11850 times) [2024-06-15 14:14:55,709][1651669] InferenceWorker_p0-w0: stopping experience collection (11850 times) [2024-06-15 14:14:55,776][1648981] Fps is (10 sec: 39285.6, 60 sec: 48052.4, 300 sec: 48207.6). Total num frames: 462159872. Throughput: 0: 11830.5. Samples: 115585536. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:14:55,776][1648981] Avg episode reward: [(0, '459.940')] [2024-06-15 14:14:55,857][1651274] Signal inference workers to resume experience collection... (11850 times) [2024-06-15 14:14:55,860][1651669] InferenceWorker_p0-w0: resuming experience collection (11850 times) [2024-06-15 14:14:56,251][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000225696_462225408.pth... [2024-06-15 14:14:56,399][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000220096_450756608.pth [2024-06-15 14:14:56,445][1651669] Updated weights for policy 0, policy_version 225697 (0.0053) [2024-06-15 14:15:00,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 46969.4, 300 sec: 47874.6). Total num frames: 462290944. Throughput: 0: 12162.8. Samples: 115660288. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:15:00,767][1648981] Avg episode reward: [(0, '455.440')] [2024-06-15 14:15:01,706][1651669] Updated weights for policy 0, policy_version 225747 (0.0016) [2024-06-15 14:15:03,688][1651669] Updated weights for policy 0, policy_version 225795 (0.0012) [2024-06-15 14:15:05,045][1651669] Updated weights for policy 0, policy_version 225861 (0.0012) [2024-06-15 14:15:05,774][1648981] Fps is (10 sec: 45881.6, 60 sec: 48055.1, 300 sec: 48208.5). Total num frames: 462618624. Throughput: 0: 11910.5. Samples: 115729920. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:15:05,775][1648981] Avg episode reward: [(0, '460.190')] [2024-06-15 14:15:06,413][1651669] Updated weights for policy 0, policy_version 225922 (0.0013) [2024-06-15 14:15:10,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 462815232. Throughput: 0: 12037.7. Samples: 115759616. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:15:10,767][1648981] Avg episode reward: [(0, '463.780')] [2024-06-15 14:15:12,251][1651669] Updated weights for policy 0, policy_version 226001 (0.0138) [2024-06-15 14:15:13,367][1651669] Updated weights for policy 0, policy_version 226048 (0.0013) [2024-06-15 14:15:15,766][1648981] Fps is (10 sec: 42631.8, 60 sec: 47513.6, 300 sec: 47874.9). Total num frames: 463044608. Throughput: 0: 12003.6. Samples: 115843584. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:15:15,767][1648981] Avg episode reward: [(0, '444.750')] [2024-06-15 14:15:16,160][1651669] Updated weights for policy 0, policy_version 226121 (0.0014) [2024-06-15 14:15:17,364][1651669] Updated weights for policy 0, policy_version 226179 (0.0013) [2024-06-15 14:15:18,414][1651669] Updated weights for policy 0, policy_version 226233 (0.0076) [2024-06-15 14:15:20,773][1648981] Fps is (10 sec: 52395.7, 60 sec: 48600.8, 300 sec: 47984.7). Total num frames: 463339520. Throughput: 0: 12286.3. Samples: 115916800. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:15:20,773][1648981] Avg episode reward: [(0, '442.390')] [2024-06-15 14:15:23,467][1651669] Updated weights for policy 0, policy_version 226302 (0.0017) [2024-06-15 14:15:25,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 47513.4, 300 sec: 47985.8). Total num frames: 463568896. Throughput: 0: 12128.6. Samples: 115955712. Policy #0 lag: (min: 13.0, avg: 90.5, max: 269.0) [2024-06-15 14:15:25,767][1648981] Avg episode reward: [(0, '454.080')] [2024-06-15 14:15:26,292][1651669] Updated weights for policy 0, policy_version 226384 (0.0014) [2024-06-15 14:15:27,577][1651669] Updated weights for policy 0, policy_version 226436 (0.0024) [2024-06-15 14:15:28,708][1651669] Updated weights for policy 0, policy_version 226487 (0.0012) [2024-06-15 14:15:30,770][1648981] Fps is (10 sec: 52444.5, 60 sec: 50241.5, 300 sec: 47985.8). Total num frames: 463863808. Throughput: 0: 12230.2. Samples: 116021248. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:15:30,770][1648981] Avg episode reward: [(0, '456.560')] [2024-06-15 14:15:33,550][1651669] Updated weights for policy 0, policy_version 226516 (0.0013) [2024-06-15 14:15:35,243][1651669] Updated weights for policy 0, policy_version 226561 (0.0012) [2024-06-15 14:15:35,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 46421.3, 300 sec: 47874.6). Total num frames: 464027648. Throughput: 0: 12276.6. Samples: 116106240. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:15:35,767][1648981] Avg episode reward: [(0, '456.310')] [2024-06-15 14:15:36,066][1651274] Signal inference workers to stop experience collection... (11900 times) [2024-06-15 14:15:36,122][1651669] InferenceWorker_p0-w0: stopping experience collection (11900 times) [2024-06-15 14:15:36,306][1651274] Signal inference workers to resume experience collection... (11900 times) [2024-06-15 14:15:36,307][1651669] InferenceWorker_p0-w0: resuming experience collection (11900 times) [2024-06-15 14:15:36,679][1651669] Updated weights for policy 0, policy_version 226624 (0.0082) [2024-06-15 14:15:37,883][1651669] Updated weights for policy 0, policy_version 226673 (0.0014) [2024-06-15 14:15:39,326][1651669] Updated weights for policy 0, policy_version 226742 (0.0112) [2024-06-15 14:15:40,803][1648981] Fps is (10 sec: 52255.8, 60 sec: 50213.8, 300 sec: 47979.8). Total num frames: 464388096. Throughput: 0: 12155.5. Samples: 116132864. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:15:40,803][1648981] Avg episode reward: [(0, '456.780')] [2024-06-15 14:15:44,681][1651669] Updated weights for policy 0, policy_version 226784 (0.0012) [2024-06-15 14:15:45,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 48096.8). Total num frames: 464519168. Throughput: 0: 12242.5. Samples: 116211200. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:15:45,767][1648981] Avg episode reward: [(0, '472.880')] [2024-06-15 14:15:46,308][1651669] Updated weights for policy 0, policy_version 226819 (0.0014) [2024-06-15 14:15:47,736][1651669] Updated weights for policy 0, policy_version 226882 (0.0012) [2024-06-15 14:15:49,370][1651669] Updated weights for policy 0, policy_version 226946 (0.0155) [2024-06-15 14:15:50,607][1651669] Updated weights for policy 0, policy_version 226998 (0.0011) [2024-06-15 14:15:50,766][1648981] Fps is (10 sec: 49331.2, 60 sec: 50790.4, 300 sec: 48096.8). Total num frames: 464879616. Throughput: 0: 12164.9. Samples: 116277248. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:15:50,767][1648981] Avg episode reward: [(0, '470.620')] [2024-06-15 14:15:55,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46428.4, 300 sec: 47985.7). Total num frames: 464945152. Throughput: 0: 12413.1. Samples: 116318208. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:15:55,767][1648981] Avg episode reward: [(0, '470.510')] [2024-06-15 14:15:56,548][1651669] Updated weights for policy 0, policy_version 227063 (0.0011) [2024-06-15 14:15:58,580][1651669] Updated weights for policy 0, policy_version 227136 (0.0013) [2024-06-15 14:16:00,410][1651669] Updated weights for policy 0, policy_version 227203 (0.0208) [2024-06-15 14:16:00,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 50790.6, 300 sec: 47874.6). Total num frames: 465338368. Throughput: 0: 12003.6. Samples: 116383744. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:00,767][1648981] Avg episode reward: [(0, '463.280')] [2024-06-15 14:16:05,769][1648981] Fps is (10 sec: 49137.6, 60 sec: 46971.3, 300 sec: 47874.1). Total num frames: 465436672. Throughput: 0: 11799.6. Samples: 116447744. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:05,770][1648981] Avg episode reward: [(0, '456.040')] [2024-06-15 14:16:08,273][1651669] Updated weights for policy 0, policy_version 227315 (0.0181) [2024-06-15 14:16:09,357][1651669] Updated weights for policy 0, policy_version 227347 (0.0014) [2024-06-15 14:16:10,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48605.8, 300 sec: 47656.7). Total num frames: 465731584. Throughput: 0: 11776.1. Samples: 116485632. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:10,767][1648981] Avg episode reward: [(0, '459.910')] [2024-06-15 14:16:10,889][1651669] Updated weights for policy 0, policy_version 227424 (0.0014) [2024-06-15 14:16:12,609][1651274] Signal inference workers to stop experience collection... (11950 times) [2024-06-15 14:16:12,672][1651669] InferenceWorker_p0-w0: stopping experience collection (11950 times) [2024-06-15 14:16:12,674][1651669] Updated weights for policy 0, policy_version 227493 (0.0019) [2024-06-15 14:16:12,803][1651274] Signal inference workers to resume experience collection... (11950 times) [2024-06-15 14:16:12,804][1651669] InferenceWorker_p0-w0: resuming experience collection (11950 times) [2024-06-15 14:16:15,766][1648981] Fps is (10 sec: 52444.2, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 465960960. Throughput: 0: 11833.7. Samples: 116553728. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:15,767][1648981] Avg episode reward: [(0, '450.390')] [2024-06-15 14:16:17,906][1651669] Updated weights for policy 0, policy_version 227536 (0.0012) [2024-06-15 14:16:19,265][1651669] Updated weights for policy 0, policy_version 227585 (0.0014) [2024-06-15 14:16:20,547][1651669] Updated weights for policy 0, policy_version 227648 (0.0014) [2024-06-15 14:16:20,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 48064.6, 300 sec: 47874.6). Total num frames: 466223104. Throughput: 0: 11821.5. Samples: 116638208. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:20,768][1648981] Avg episode reward: [(0, '467.350')] [2024-06-15 14:16:22,409][1651669] Updated weights for policy 0, policy_version 227719 (0.0013) [2024-06-15 14:16:23,444][1651669] Updated weights for policy 0, policy_version 227772 (0.0013) [2024-06-15 14:16:25,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 48603.0, 300 sec: 47985.1). Total num frames: 466485248. Throughput: 0: 11830.1. Samples: 116664832. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:25,771][1648981] Avg episode reward: [(0, '461.770')] [2024-06-15 14:16:29,472][1651669] Updated weights for policy 0, policy_version 227832 (0.0014) [2024-06-15 14:16:30,766][1648981] Fps is (10 sec: 45876.4, 60 sec: 46970.1, 300 sec: 47763.5). Total num frames: 466681856. Throughput: 0: 11889.8. Samples: 116746240. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:30,767][1648981] Avg episode reward: [(0, '480.760')] [2024-06-15 14:16:31,331][1651274] Saving new best policy, reward=480.760! [2024-06-15 14:16:31,909][1651669] Updated weights for policy 0, policy_version 227921 (0.0019) [2024-06-15 14:16:33,359][1651669] Updated weights for policy 0, policy_version 227984 (0.0012) [2024-06-15 14:16:35,767][1648981] Fps is (10 sec: 52447.6, 60 sec: 49697.9, 300 sec: 47985.6). Total num frames: 467009536. Throughput: 0: 11855.6. Samples: 116810752. Policy #0 lag: (min: 190.0, avg: 274.6, max: 447.0) [2024-06-15 14:16:35,767][1648981] Avg episode reward: [(0, '454.800')] [2024-06-15 14:16:40,051][1651669] Updated weights for policy 0, policy_version 228048 (0.0019) [2024-06-15 14:16:40,767][1648981] Fps is (10 sec: 39321.1, 60 sec: 44810.0, 300 sec: 47541.4). Total num frames: 467075072. Throughput: 0: 11855.6. Samples: 116851712. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:16:40,767][1648981] Avg episode reward: [(0, '461.480')] [2024-06-15 14:16:41,900][1651669] Updated weights for policy 0, policy_version 228128 (0.0145) [2024-06-15 14:16:43,635][1651669] Updated weights for policy 0, policy_version 228208 (0.0013) [2024-06-15 14:16:44,885][1651669] Updated weights for policy 0, policy_version 228272 (0.0144) [2024-06-15 14:16:45,786][1648981] Fps is (10 sec: 52326.6, 60 sec: 50227.7, 300 sec: 47982.5). Total num frames: 467533824. Throughput: 0: 11668.5. Samples: 116909056. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:16:45,787][1648981] Avg episode reward: [(0, '456.020')] [2024-06-15 14:16:50,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 44236.8, 300 sec: 47541.4). Total num frames: 467533824. Throughput: 0: 12243.3. Samples: 116998656. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:16:50,767][1648981] Avg episode reward: [(0, '463.040')] [2024-06-15 14:16:52,149][1651669] Updated weights for policy 0, policy_version 228353 (0.0017) [2024-06-15 14:16:52,726][1651274] Signal inference workers to stop experience collection... (12000 times) [2024-06-15 14:16:52,787][1651669] InferenceWorker_p0-w0: stopping experience collection (12000 times) [2024-06-15 14:16:52,978][1651274] Signal inference workers to resume experience collection... (12000 times) [2024-06-15 14:16:52,979][1651669] InferenceWorker_p0-w0: resuming experience collection (12000 times) [2024-06-15 14:16:53,112][1651669] Updated weights for policy 0, policy_version 228404 (0.0017) [2024-06-15 14:16:54,718][1651669] Updated weights for policy 0, policy_version 228468 (0.0141) [2024-06-15 14:16:55,767][1648981] Fps is (10 sec: 45964.8, 60 sec: 50790.2, 300 sec: 47763.5). Total num frames: 467992576. Throughput: 0: 12003.5. Samples: 117025792. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:16:55,767][1648981] Avg episode reward: [(0, '453.440')] [2024-06-15 14:16:56,000][1651669] Updated weights for policy 0, policy_version 228528 (0.0102) [2024-06-15 14:16:56,232][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000228544_468058112.pth... [2024-06-15 14:16:56,308][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000222912_456523776.pth [2024-06-15 14:17:00,767][1648981] Fps is (10 sec: 52426.4, 60 sec: 45328.7, 300 sec: 47541.3). Total num frames: 468058112. Throughput: 0: 12117.2. Samples: 117099008. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:00,768][1648981] Avg episode reward: [(0, '436.720')] [2024-06-15 14:17:02,316][1651669] Updated weights for policy 0, policy_version 228576 (0.0013) [2024-06-15 14:17:03,567][1651669] Updated weights for policy 0, policy_version 228624 (0.0012) [2024-06-15 14:17:04,643][1651669] Updated weights for policy 0, policy_version 228669 (0.0012) [2024-06-15 14:17:05,790][1648981] Fps is (10 sec: 42498.9, 60 sec: 49680.9, 300 sec: 47649.0). Total num frames: 468418560. Throughput: 0: 11826.7. Samples: 117170688. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:05,791][1648981] Avg episode reward: [(0, '423.630')] [2024-06-15 14:17:06,052][1651669] Updated weights for policy 0, policy_version 228736 (0.0016) [2024-06-15 14:17:10,778][1648981] Fps is (10 sec: 52369.7, 60 sec: 47504.4, 300 sec: 47761.6). Total num frames: 468582400. Throughput: 0: 11933.2. Samples: 117201920. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:10,779][1648981] Avg episode reward: [(0, '420.300')] [2024-06-15 14:17:12,619][1651669] Updated weights for policy 0, policy_version 228802 (0.0013) [2024-06-15 14:17:13,936][1651669] Updated weights for policy 0, policy_version 228865 (0.0033) [2024-06-15 14:17:15,219][1651669] Updated weights for policy 0, policy_version 228915 (0.0011) [2024-06-15 14:17:15,766][1648981] Fps is (10 sec: 45984.7, 60 sec: 48606.0, 300 sec: 47652.5). Total num frames: 468877312. Throughput: 0: 11867.0. Samples: 117280256. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:15,767][1648981] Avg episode reward: [(0, '405.910')] [2024-06-15 14:17:16,160][1651669] Updated weights for policy 0, policy_version 228966 (0.0012) [2024-06-15 14:17:17,737][1651669] Updated weights for policy 0, policy_version 229032 (0.0013) [2024-06-15 14:17:20,777][1648981] Fps is (10 sec: 52433.9, 60 sec: 48051.3, 300 sec: 47983.9). Total num frames: 469106688. Throughput: 0: 11966.6. Samples: 117349376. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:20,778][1648981] Avg episode reward: [(0, '411.330')] [2024-06-15 14:17:23,645][1651669] Updated weights for policy 0, policy_version 229088 (0.0014) [2024-06-15 14:17:24,270][1651669] Updated weights for policy 0, policy_version 229120 (0.0014) [2024-06-15 14:17:25,713][1651669] Updated weights for policy 0, policy_version 229176 (0.0093) [2024-06-15 14:17:25,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 47516.6, 300 sec: 47541.4). Total num frames: 469336064. Throughput: 0: 12003.6. Samples: 117391872. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:25,767][1648981] Avg episode reward: [(0, '409.030')] [2024-06-15 14:17:27,080][1651669] Updated weights for policy 0, policy_version 229232 (0.0013) [2024-06-15 14:17:27,619][1651274] Signal inference workers to stop experience collection... (12050 times) [2024-06-15 14:17:27,653][1651669] InferenceWorker_p0-w0: stopping experience collection (12050 times) [2024-06-15 14:17:27,870][1651274] Signal inference workers to resume experience collection... (12050 times) [2024-06-15 14:17:27,872][1651669] InferenceWorker_p0-w0: resuming experience collection (12050 times) [2024-06-15 14:17:28,439][1651669] Updated weights for policy 0, policy_version 229296 (0.0074) [2024-06-15 14:17:30,766][1648981] Fps is (10 sec: 52485.5, 60 sec: 49152.0, 300 sec: 47987.5). Total num frames: 469630976. Throughput: 0: 12225.1. Samples: 117458944. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:30,767][1648981] Avg episode reward: [(0, '408.600')] [2024-06-15 14:17:34,325][1651669] Updated weights for policy 0, policy_version 229329 (0.0014) [2024-06-15 14:17:35,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46421.6, 300 sec: 47652.5). Total num frames: 469794816. Throughput: 0: 11969.4. Samples: 117537280. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:35,767][1648981] Avg episode reward: [(0, '410.520')] [2024-06-15 14:17:35,767][1651669] Updated weights for policy 0, policy_version 229394 (0.0035) [2024-06-15 14:17:36,985][1651669] Updated weights for policy 0, policy_version 229441 (0.0017) [2024-06-15 14:17:38,326][1651669] Updated weights for policy 0, policy_version 229497 (0.0025) [2024-06-15 14:17:40,815][1648981] Fps is (10 sec: 52174.6, 60 sec: 51295.0, 300 sec: 47979.0). Total num frames: 470155264. Throughput: 0: 11979.3. Samples: 117565440. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:40,816][1648981] Avg episode reward: [(0, '410.660')] [2024-06-15 14:17:45,456][1651669] Updated weights for policy 0, policy_version 229569 (0.0012) [2024-06-15 14:17:45,778][1648981] Fps is (10 sec: 39275.1, 60 sec: 44242.7, 300 sec: 47539.5). Total num frames: 470188032. Throughput: 0: 12114.3. Samples: 117644288. Policy #0 lag: (min: 15.0, avg: 84.8, max: 271.0) [2024-06-15 14:17:45,779][1648981] Avg episode reward: [(0, '406.530')] [2024-06-15 14:17:46,949][1651669] Updated weights for policy 0, policy_version 229632 (0.0012) [2024-06-15 14:17:48,861][1651669] Updated weights for policy 0, policy_version 229712 (0.0012) [2024-06-15 14:17:50,774][1648981] Fps is (10 sec: 42773.0, 60 sec: 50783.7, 300 sec: 47651.4). Total num frames: 470581248. Throughput: 0: 11734.6. Samples: 117698560. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:17:50,775][1648981] Avg episode reward: [(0, '403.620')] [2024-06-15 14:17:50,810][1651669] Updated weights for policy 0, policy_version 229792 (0.0075) [2024-06-15 14:17:51,427][1651669] Updated weights for policy 0, policy_version 229823 (0.0014) [2024-06-15 14:17:55,766][1648981] Fps is (10 sec: 49210.1, 60 sec: 44783.2, 300 sec: 47541.4). Total num frames: 470679552. Throughput: 0: 11995.3. Samples: 117741568. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:17:55,767][1648981] Avg episode reward: [(0, '407.160')] [2024-06-15 14:17:59,238][1651669] Updated weights for policy 0, policy_version 229904 (0.0014) [2024-06-15 14:18:00,778][1648981] Fps is (10 sec: 39306.2, 60 sec: 48596.7, 300 sec: 47206.3). Total num frames: 470974464. Throughput: 0: 11829.8. Samples: 117812736. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:00,779][1648981] Avg episode reward: [(0, '402.010')] [2024-06-15 14:18:01,026][1651669] Updated weights for policy 0, policy_version 229984 (0.0015) [2024-06-15 14:18:02,131][1651669] Updated weights for policy 0, policy_version 230032 (0.0021) [2024-06-15 14:18:05,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46439.7, 300 sec: 47541.4). Total num frames: 471203840. Throughput: 0: 11767.4. Samples: 117878784. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:05,767][1648981] Avg episode reward: [(0, '393.040')] [2024-06-15 14:18:09,446][1651669] Updated weights for policy 0, policy_version 230096 (0.0012) [2024-06-15 14:18:09,890][1651274] Signal inference workers to stop experience collection... (12100 times) [2024-06-15 14:18:10,000][1651669] InferenceWorker_p0-w0: stopping experience collection (12100 times) [2024-06-15 14:18:10,126][1651274] Signal inference workers to resume experience collection... (12100 times) [2024-06-15 14:18:10,142][1651669] InferenceWorker_p0-w0: resuming experience collection (12100 times) [2024-06-15 14:18:10,766][1648981] Fps is (10 sec: 36087.5, 60 sec: 45884.2, 300 sec: 46986.2). Total num frames: 471334912. Throughput: 0: 11719.1. Samples: 117919232. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:10,767][1648981] Avg episode reward: [(0, '389.120')] [2024-06-15 14:18:10,983][1651669] Updated weights for policy 0, policy_version 230162 (0.0093) [2024-06-15 14:18:12,924][1651669] Updated weights for policy 0, policy_version 230242 (0.0066) [2024-06-15 14:18:14,610][1651669] Updated weights for policy 0, policy_version 230320 (0.0018) [2024-06-15 14:18:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 471728128. Throughput: 0: 11434.6. Samples: 117973504. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:15,767][1648981] Avg episode reward: [(0, '390.580')] [2024-06-15 14:18:20,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 43698.5, 300 sec: 47097.1). Total num frames: 471728128. Throughput: 0: 11400.5. Samples: 118050304. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:20,767][1648981] Avg episode reward: [(0, '389.270')] [2024-06-15 14:18:21,593][1651669] Updated weights for policy 0, policy_version 230368 (0.0017) [2024-06-15 14:18:23,087][1651669] Updated weights for policy 0, policy_version 230432 (0.0013) [2024-06-15 14:18:24,410][1651669] Updated weights for policy 0, policy_version 230483 (0.0012) [2024-06-15 14:18:25,626][1651669] Updated weights for policy 0, policy_version 230544 (0.0012) [2024-06-15 14:18:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 472154112. Throughput: 0: 11549.6. Samples: 118084608. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:25,767][1648981] Avg episode reward: [(0, '385.830')] [2024-06-15 14:18:26,516][1651669] Updated weights for policy 0, policy_version 230592 (0.0014) [2024-06-15 14:18:30,785][1648981] Fps is (10 sec: 52333.1, 60 sec: 43677.3, 300 sec: 47538.4). Total num frames: 472252416. Throughput: 0: 11364.8. Samples: 118155776. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:30,785][1648981] Avg episode reward: [(0, '392.430')] [2024-06-15 14:18:32,610][1651669] Updated weights for policy 0, policy_version 230656 (0.0013) [2024-06-15 14:18:34,208][1651669] Updated weights for policy 0, policy_version 230720 (0.0011) [2024-06-15 14:18:35,756][1651669] Updated weights for policy 0, policy_version 230785 (0.0012) [2024-06-15 14:18:35,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 47542.6). Total num frames: 472645632. Throughput: 0: 11596.0. Samples: 118220288. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:35,767][1648981] Avg episode reward: [(0, '390.780')] [2024-06-15 14:18:36,844][1651669] Updated weights for policy 0, policy_version 230841 (0.0027) [2024-06-15 14:18:40,766][1648981] Fps is (10 sec: 52524.5, 60 sec: 43726.1, 300 sec: 47541.4). Total num frames: 472776704. Throughput: 0: 11468.8. Samples: 118257664. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:40,767][1648981] Avg episode reward: [(0, '384.650')] [2024-06-15 14:18:43,375][1651669] Updated weights for policy 0, policy_version 230910 (0.0099) [2024-06-15 14:18:44,871][1651669] Updated weights for policy 0, policy_version 230964 (0.0013) [2024-06-15 14:18:45,149][1651274] Signal inference workers to stop experience collection... (12150 times) [2024-06-15 14:18:45,201][1651669] InferenceWorker_p0-w0: stopping experience collection (12150 times) [2024-06-15 14:18:45,384][1651274] Signal inference workers to resume experience collection... (12150 times) [2024-06-15 14:18:45,385][1651669] InferenceWorker_p0-w0: resuming experience collection (12150 times) [2024-06-15 14:18:45,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 48069.0, 300 sec: 47541.7). Total num frames: 473071616. Throughput: 0: 11585.6. Samples: 118333952. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:45,767][1648981] Avg episode reward: [(0, '402.830')] [2024-06-15 14:18:46,246][1651669] Updated weights for policy 0, policy_version 231024 (0.0011) [2024-06-15 14:18:47,900][1651669] Updated weights for policy 0, policy_version 231095 (0.0123) [2024-06-15 14:18:50,778][1648981] Fps is (10 sec: 52367.6, 60 sec: 45326.1, 300 sec: 47539.5). Total num frames: 473300992. Throughput: 0: 11488.6. Samples: 118395904. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:50,779][1648981] Avg episode reward: [(0, '432.100')] [2024-06-15 14:18:54,323][1651669] Updated weights for policy 0, policy_version 231121 (0.0020) [2024-06-15 14:18:55,767][1648981] Fps is (10 sec: 36044.5, 60 sec: 45875.0, 300 sec: 47319.6). Total num frames: 473432064. Throughput: 0: 11582.5. Samples: 118440448. Policy #0 lag: (min: 89.0, avg: 174.8, max: 313.0) [2024-06-15 14:18:55,767][1648981] Avg episode reward: [(0, '441.850')] [2024-06-15 14:18:56,039][1651669] Updated weights for policy 0, policy_version 231191 (0.0013) [2024-06-15 14:18:56,200][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000231200_473497600.pth... [2024-06-15 14:18:56,359][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000225696_462225408.pth [2024-06-15 14:18:57,944][1651669] Updated weights for policy 0, policy_version 231264 (0.0027) [2024-06-15 14:18:59,642][1651669] Updated weights for policy 0, policy_version 231344 (0.0012) [2024-06-15 14:19:00,802][1648981] Fps is (10 sec: 52303.6, 60 sec: 47494.7, 300 sec: 47758.1). Total num frames: 473825280. Throughput: 0: 11527.9. Samples: 118492672. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:00,803][1648981] Avg episode reward: [(0, '449.650')] [2024-06-15 14:19:05,766][1648981] Fps is (10 sec: 39322.5, 60 sec: 43690.6, 300 sec: 47097.0). Total num frames: 473825280. Throughput: 0: 11719.1. Samples: 118577664. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:05,767][1648981] Avg episode reward: [(0, '428.200')] [2024-06-15 14:19:06,732][1651669] Updated weights for policy 0, policy_version 231395 (0.0013) [2024-06-15 14:19:08,306][1651669] Updated weights for policy 0, policy_version 231457 (0.0048) [2024-06-15 14:19:10,228][1651669] Updated weights for policy 0, policy_version 231536 (0.0013) [2024-06-15 14:19:10,766][1648981] Fps is (10 sec: 39462.5, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 474218496. Throughput: 0: 11571.2. Samples: 118605312. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:10,767][1648981] Avg episode reward: [(0, '428.870')] [2024-06-15 14:19:15,792][1648981] Fps is (10 sec: 52294.5, 60 sec: 43671.9, 300 sec: 47204.0). Total num frames: 474349568. Throughput: 0: 11319.0. Samples: 118665216. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:15,793][1648981] Avg episode reward: [(0, '417.610')] [2024-06-15 14:19:17,292][1651669] Updated weights for policy 0, policy_version 231632 (0.0016) [2024-06-15 14:19:19,304][1651669] Updated weights for policy 0, policy_version 231712 (0.0012) [2024-06-15 14:19:20,778][1648981] Fps is (10 sec: 42549.3, 60 sec: 48596.5, 300 sec: 47206.3). Total num frames: 474644480. Throughput: 0: 11374.9. Samples: 118732288. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:20,778][1648981] Avg episode reward: [(0, '427.290')] [2024-06-15 14:19:21,571][1651669] Updated weights for policy 0, policy_version 231808 (0.0136) [2024-06-15 14:19:22,021][1651274] Signal inference workers to stop experience collection... (12200 times) [2024-06-15 14:19:22,053][1651669] InferenceWorker_p0-w0: stopping experience collection (12200 times) [2024-06-15 14:19:22,183][1651274] Signal inference workers to resume experience collection... (12200 times) [2024-06-15 14:19:22,184][1651669] InferenceWorker_p0-w0: resuming experience collection (12200 times) [2024-06-15 14:19:22,736][1651669] Updated weights for policy 0, policy_version 231860 (0.0012) [2024-06-15 14:19:25,770][1648981] Fps is (10 sec: 52544.2, 60 sec: 45326.2, 300 sec: 47540.8). Total num frames: 474873856. Throughput: 0: 11183.4. Samples: 118760960. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:25,771][1648981] Avg episode reward: [(0, '432.890')] [2024-06-15 14:19:29,171][1651669] Updated weights for policy 0, policy_version 231929 (0.0014) [2024-06-15 14:19:30,767][1648981] Fps is (10 sec: 39366.3, 60 sec: 46435.4, 300 sec: 46763.8). Total num frames: 475037696. Throughput: 0: 11298.1. Samples: 118842368. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:30,767][1648981] Avg episode reward: [(0, '439.840')] [2024-06-15 14:19:30,823][1651669] Updated weights for policy 0, policy_version 231968 (0.0013) [2024-06-15 14:19:32,710][1651669] Updated weights for policy 0, policy_version 232051 (0.0019) [2024-06-15 14:19:33,907][1651669] Updated weights for policy 0, policy_version 232121 (0.0013) [2024-06-15 14:19:35,767][1648981] Fps is (10 sec: 52446.3, 60 sec: 45874.9, 300 sec: 47541.3). Total num frames: 475398144. Throughput: 0: 11426.2. Samples: 118909952. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:35,768][1648981] Avg episode reward: [(0, '450.070')] [2024-06-15 14:19:39,933][1651669] Updated weights for policy 0, policy_version 232183 (0.0014) [2024-06-15 14:19:40,767][1648981] Fps is (10 sec: 49151.9, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 475529216. Throughput: 0: 11411.9. Samples: 118953984. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:40,767][1648981] Avg episode reward: [(0, '455.830')] [2024-06-15 14:19:42,760][1651669] Updated weights for policy 0, policy_version 232256 (0.0015) [2024-06-15 14:19:43,756][1651669] Updated weights for policy 0, policy_version 232304 (0.0023) [2024-06-15 14:19:45,069][1651669] Updated weights for policy 0, policy_version 232381 (0.0012) [2024-06-15 14:19:45,766][1648981] Fps is (10 sec: 52431.1, 60 sec: 47513.8, 300 sec: 47763.5). Total num frames: 475922432. Throughput: 0: 11614.5. Samples: 119014912. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:45,767][1648981] Avg episode reward: [(0, '454.800')] [2024-06-15 14:19:50,568][1651669] Updated weights for policy 0, policy_version 232433 (0.0123) [2024-06-15 14:19:50,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 45337.9, 300 sec: 46987.4). Total num frames: 476020736. Throughput: 0: 11514.3. Samples: 119095808. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:50,767][1648981] Avg episode reward: [(0, '461.270')] [2024-06-15 14:19:52,131][1651669] Updated weights for policy 0, policy_version 232480 (0.0013) [2024-06-15 14:19:53,539][1651669] Updated weights for policy 0, policy_version 232515 (0.0012) [2024-06-15 14:19:55,081][1651669] Updated weights for policy 0, policy_version 232592 (0.0014) [2024-06-15 14:19:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.4, 300 sec: 47874.6). Total num frames: 476413952. Throughput: 0: 11639.5. Samples: 119129088. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:19:55,767][1648981] Avg episode reward: [(0, '471.640')] [2024-06-15 14:19:55,901][1651669] Updated weights for policy 0, policy_version 232640 (0.0015) [2024-06-15 14:20:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45356.1, 300 sec: 47209.4). Total num frames: 476545024. Throughput: 0: 12033.2. Samples: 119206400. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:20:00,767][1648981] Avg episode reward: [(0, '486.590')] [2024-06-15 14:20:00,888][1651669] Updated weights for policy 0, policy_version 232692 (0.0013) [2024-06-15 14:20:01,035][1651274] Saving new best policy, reward=486.590! [2024-06-15 14:20:03,352][1651669] Updated weights for policy 0, policy_version 232742 (0.0011) [2024-06-15 14:20:04,702][1651274] Signal inference workers to stop experience collection... (12250 times) [2024-06-15 14:20:04,746][1651669] InferenceWorker_p0-w0: stopping experience collection (12250 times) [2024-06-15 14:20:05,112][1651274] Signal inference workers to resume experience collection... (12250 times) [2024-06-15 14:20:05,113][1651669] InferenceWorker_p0-w0: resuming experience collection (12250 times) [2024-06-15 14:20:05,115][1651669] Updated weights for policy 0, policy_version 232800 (0.0012) [2024-06-15 14:20:05,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 476839936. Throughput: 0: 11927.0. Samples: 119268864. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:20:05,767][1648981] Avg episode reward: [(0, '494.300')] [2024-06-15 14:20:06,009][1651274] Saving new best policy, reward=494.300! [2024-06-15 14:20:06,576][1651669] Updated weights for policy 0, policy_version 232868 (0.0012) [2024-06-15 14:20:10,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 476971008. Throughput: 0: 12186.6. Samples: 119309312. Policy #0 lag: (min: 136.0, avg: 201.9, max: 367.0) [2024-06-15 14:20:10,767][1648981] Avg episode reward: [(0, '511.970')] [2024-06-15 14:20:10,776][1651274] Saving new best policy, reward=511.970! [2024-06-15 14:20:11,504][1651669] Updated weights for policy 0, policy_version 232914 (0.0012) [2024-06-15 14:20:13,410][1651669] Updated weights for policy 0, policy_version 232961 (0.0017) [2024-06-15 14:20:14,798][1651669] Updated weights for policy 0, policy_version 233022 (0.0016) [2024-06-15 14:20:15,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 49173.2, 300 sec: 47320.2). Total num frames: 477298688. Throughput: 0: 12003.6. Samples: 119382528. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:15,767][1648981] Avg episode reward: [(0, '501.190')] [2024-06-15 14:20:16,549][1651669] Updated weights for policy 0, policy_version 233088 (0.0015) [2024-06-15 14:20:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47522.7, 300 sec: 47208.2). Total num frames: 477495296. Throughput: 0: 12037.8. Samples: 119451648. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:20,767][1648981] Avg episode reward: [(0, '500.220')] [2024-06-15 14:20:23,062][1651669] Updated weights for policy 0, policy_version 233168 (0.0096) [2024-06-15 14:20:24,027][1651669] Updated weights for policy 0, policy_version 233209 (0.0011) [2024-06-15 14:20:25,687][1651669] Updated weights for policy 0, policy_version 233253 (0.0027) [2024-06-15 14:20:25,798][1648981] Fps is (10 sec: 39196.6, 60 sec: 46945.6, 300 sec: 46870.4). Total num frames: 477691904. Throughput: 0: 11949.7. Samples: 119492096. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:25,799][1648981] Avg episode reward: [(0, '505.930')] [2024-06-15 14:20:27,140][1651669] Updated weights for policy 0, policy_version 233313 (0.0086) [2024-06-15 14:20:28,249][1651669] Updated weights for policy 0, policy_version 233363 (0.0017) [2024-06-15 14:20:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.2, 300 sec: 47430.3). Total num frames: 478019584. Throughput: 0: 11867.0. Samples: 119548928. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:30,767][1648981] Avg episode reward: [(0, '505.910')] [2024-06-15 14:20:35,019][1651669] Updated weights for policy 0, policy_version 233443 (0.0014) [2024-06-15 14:20:35,766][1648981] Fps is (10 sec: 46021.1, 60 sec: 45875.5, 300 sec: 46658.5). Total num frames: 478150656. Throughput: 0: 11810.1. Samples: 119627264. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:35,767][1648981] Avg episode reward: [(0, '484.870')] [2024-06-15 14:20:36,051][1651669] Updated weights for policy 0, policy_version 233473 (0.0018) [2024-06-15 14:20:37,309][1651669] Updated weights for policy 0, policy_version 233533 (0.0011) [2024-06-15 14:20:38,714][1651669] Updated weights for policy 0, policy_version 233584 (0.0012) [2024-06-15 14:20:40,297][1651669] Updated weights for policy 0, policy_version 233653 (0.0012) [2024-06-15 14:20:40,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.4, 300 sec: 47541.4). Total num frames: 478543872. Throughput: 0: 11798.8. Samples: 119660032. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:40,767][1648981] Avg episode reward: [(0, '487.610')] [2024-06-15 14:20:45,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 46319.5). Total num frames: 478543872. Throughput: 0: 11798.7. Samples: 119737344. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:45,767][1648981] Avg episode reward: [(0, '464.260')] [2024-06-15 14:20:46,296][1651669] Updated weights for policy 0, policy_version 233696 (0.0016) [2024-06-15 14:20:46,428][1651274] Signal inference workers to stop experience collection... (12300 times) [2024-06-15 14:20:46,519][1651669] InferenceWorker_p0-w0: stopping experience collection (12300 times) [2024-06-15 14:20:46,807][1651274] Signal inference workers to resume experience collection... (12300 times) [2024-06-15 14:20:46,808][1651669] InferenceWorker_p0-w0: resuming experience collection (12300 times) [2024-06-15 14:20:48,532][1651669] Updated weights for policy 0, policy_version 233776 (0.0120) [2024-06-15 14:20:50,494][1651669] Updated weights for policy 0, policy_version 233841 (0.0012) [2024-06-15 14:20:50,770][1648981] Fps is (10 sec: 39306.9, 60 sec: 48602.9, 300 sec: 47429.7). Total num frames: 478937088. Throughput: 0: 11672.6. Samples: 119794176. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:50,771][1648981] Avg episode reward: [(0, '474.040')] [2024-06-15 14:20:51,967][1651669] Updated weights for policy 0, policy_version 233910 (0.0026) [2024-06-15 14:20:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 44236.7, 300 sec: 46541.7). Total num frames: 479068160. Throughput: 0: 11491.5. Samples: 119826432. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:20:55,767][1648981] Avg episode reward: [(0, '484.260')] [2024-06-15 14:20:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000233920_479068160.pth... [2024-06-15 14:20:55,826][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000228544_468058112.pth [2024-06-15 14:20:57,640][1651669] Updated weights for policy 0, policy_version 233952 (0.0013) [2024-06-15 14:20:59,036][1651669] Updated weights for policy 0, policy_version 233990 (0.0013) [2024-06-15 14:21:00,767][1648981] Fps is (10 sec: 39334.2, 60 sec: 46420.9, 300 sec: 47097.4). Total num frames: 479330304. Throughput: 0: 11650.7. Samples: 119906816. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:21:00,768][1648981] Avg episode reward: [(0, '500.410')] [2024-06-15 14:21:01,381][1651669] Updated weights for policy 0, policy_version 234080 (0.0012) [2024-06-15 14:21:02,355][1651669] Updated weights for policy 0, policy_version 234128 (0.0012) [2024-06-15 14:21:03,377][1651669] Updated weights for policy 0, policy_version 234166 (0.0026) [2024-06-15 14:21:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 479592448. Throughput: 0: 11468.8. Samples: 119967744. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:21:05,767][1648981] Avg episode reward: [(0, '492.630')] [2024-06-15 14:21:09,471][1651669] Updated weights for policy 0, policy_version 234196 (0.0011) [2024-06-15 14:21:10,766][1648981] Fps is (10 sec: 39323.4, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 479723520. Throughput: 0: 11545.2. Samples: 120011264. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:21:10,767][1648981] Avg episode reward: [(0, '498.210')] [2024-06-15 14:21:11,476][1651669] Updated weights for policy 0, policy_version 234272 (0.0159) [2024-06-15 14:21:13,148][1651669] Updated weights for policy 0, policy_version 234336 (0.0089) [2024-06-15 14:21:14,868][1651669] Updated weights for policy 0, policy_version 234401 (0.0012) [2024-06-15 14:21:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46967.4, 300 sec: 47097.1). Total num frames: 480116736. Throughput: 0: 11502.9. Samples: 120066560. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:21:15,767][1648981] Avg episode reward: [(0, '492.450')] [2024-06-15 14:21:20,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 43690.7, 300 sec: 46209.0). Total num frames: 480116736. Throughput: 0: 11480.2. Samples: 120143872. Policy #0 lag: (min: 11.0, avg: 100.2, max: 267.0) [2024-06-15 14:21:20,767][1648981] Avg episode reward: [(0, '481.670')] [2024-06-15 14:21:21,849][1651669] Updated weights for policy 0, policy_version 234480 (0.0012) [2024-06-15 14:21:23,619][1651669] Updated weights for policy 0, policy_version 234546 (0.0012) [2024-06-15 14:21:23,995][1651274] Signal inference workers to stop experience collection... (12350 times) [2024-06-15 14:21:24,023][1651669] InferenceWorker_p0-w0: stopping experience collection (12350 times) [2024-06-15 14:21:24,244][1651274] Signal inference workers to resume experience collection... (12350 times) [2024-06-15 14:21:24,245][1651669] InferenceWorker_p0-w0: resuming experience collection (12350 times) [2024-06-15 14:21:25,297][1651669] Updated weights for policy 0, policy_version 234611 (0.0014) [2024-06-15 14:21:25,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 46992.4, 300 sec: 46874.9). Total num frames: 480509952. Throughput: 0: 11366.4. Samples: 120171520. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:25,767][1648981] Avg episode reward: [(0, '486.920')] [2024-06-15 14:21:26,732][1651669] Updated weights for policy 0, policy_version 234681 (0.0012) [2024-06-15 14:21:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 43690.7, 300 sec: 46208.5). Total num frames: 480641024. Throughput: 0: 11218.5. Samples: 120242176. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:30,767][1648981] Avg episode reward: [(0, '482.110')] [2024-06-15 14:21:32,905][1651669] Updated weights for policy 0, policy_version 234737 (0.0017) [2024-06-15 14:21:34,156][1651669] Updated weights for policy 0, policy_version 234800 (0.0033) [2024-06-15 14:21:35,686][1651669] Updated weights for policy 0, policy_version 234852 (0.0016) [2024-06-15 14:21:35,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 480968704. Throughput: 0: 11458.4. Samples: 120309760. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:35,767][1648981] Avg episode reward: [(0, '477.120')] [2024-06-15 14:21:36,748][1651669] Updated weights for policy 0, policy_version 234897 (0.0014) [2024-06-15 14:21:40,774][1648981] Fps is (10 sec: 52387.6, 60 sec: 43685.0, 300 sec: 46210.3). Total num frames: 481165312. Throughput: 0: 11546.5. Samples: 120346112. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:40,775][1648981] Avg episode reward: [(0, '477.180')] [2024-06-15 14:21:42,432][1651669] Updated weights for policy 0, policy_version 234945 (0.0013) [2024-06-15 14:21:44,176][1651669] Updated weights for policy 0, policy_version 235032 (0.0108) [2024-06-15 14:21:45,592][1651669] Updated weights for policy 0, policy_version 235088 (0.0012) [2024-06-15 14:21:45,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 481460224. Throughput: 0: 11457.6. Samples: 120422400. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:45,767][1648981] Avg episode reward: [(0, '478.540')] [2024-06-15 14:21:47,557][1651669] Updated weights for policy 0, policy_version 235168 (0.0019) [2024-06-15 14:21:48,314][1651669] Updated weights for policy 0, policy_version 235200 (0.0022) [2024-06-15 14:21:50,766][1648981] Fps is (10 sec: 52469.9, 60 sec: 45878.1, 300 sec: 46430.6). Total num frames: 481689600. Throughput: 0: 11889.8. Samples: 120502784. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:50,767][1648981] Avg episode reward: [(0, '472.710')] [2024-06-15 14:21:53,493][1651669] Updated weights for policy 0, policy_version 235257 (0.0012) [2024-06-15 14:21:54,800][1651669] Updated weights for policy 0, policy_version 235312 (0.0015) [2024-06-15 14:21:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 481951744. Throughput: 0: 11719.1. Samples: 120538624. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:21:55,767][1648981] Avg episode reward: [(0, '465.370')] [2024-06-15 14:21:57,157][1651669] Updated weights for policy 0, policy_version 235389 (0.0015) [2024-06-15 14:21:58,802][1651669] Updated weights for policy 0, policy_version 235440 (0.0013) [2024-06-15 14:22:00,773][1648981] Fps is (10 sec: 52393.7, 60 sec: 48054.8, 300 sec: 46766.5). Total num frames: 482213888. Throughput: 0: 12047.3. Samples: 120608768. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:22:00,774][1648981] Avg episode reward: [(0, '466.370')] [2024-06-15 14:22:02,920][1651669] Updated weights for policy 0, policy_version 235492 (0.0013) [2024-06-15 14:22:04,431][1651274] Signal inference workers to stop experience collection... (12400 times) [2024-06-15 14:22:04,515][1651669] InferenceWorker_p0-w0: stopping experience collection (12400 times) [2024-06-15 14:22:04,796][1651274] Signal inference workers to resume experience collection... (12400 times) [2024-06-15 14:22:04,797][1651669] InferenceWorker_p0-w0: resuming experience collection (12400 times) [2024-06-15 14:22:04,799][1651669] Updated weights for policy 0, policy_version 235536 (0.0011) [2024-06-15 14:22:05,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 47513.4, 300 sec: 46987.8). Total num frames: 482443264. Throughput: 0: 12162.8. Samples: 120691200. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:22:05,767][1648981] Avg episode reward: [(0, '458.630')] [2024-06-15 14:22:06,332][1651669] Updated weights for policy 0, policy_version 235600 (0.0012) [2024-06-15 14:22:07,292][1651669] Updated weights for policy 0, policy_version 235643 (0.0012) [2024-06-15 14:22:09,624][1651669] Updated weights for policy 0, policy_version 235698 (0.0013) [2024-06-15 14:22:10,766][1648981] Fps is (10 sec: 52463.6, 60 sec: 50244.3, 300 sec: 46986.0). Total num frames: 482738176. Throughput: 0: 12242.4. Samples: 120722432. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:22:10,767][1648981] Avg episode reward: [(0, '470.660')] [2024-06-15 14:22:12,704][1651669] Updated weights for policy 0, policy_version 235729 (0.0014) [2024-06-15 14:22:13,640][1651669] Updated weights for policy 0, policy_version 235772 (0.0012) [2024-06-15 14:22:15,385][1651669] Updated weights for policy 0, policy_version 235812 (0.0013) [2024-06-15 14:22:15,798][1648981] Fps is (10 sec: 52266.3, 60 sec: 47488.8, 300 sec: 46982.7). Total num frames: 482967552. Throughput: 0: 12654.6. Samples: 120812032. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:22:15,798][1648981] Avg episode reward: [(0, '467.030')] [2024-06-15 14:22:16,874][1651669] Updated weights for policy 0, policy_version 235860 (0.0012) [2024-06-15 14:22:19,732][1651669] Updated weights for policy 0, policy_version 235921 (0.0012) [2024-06-15 14:22:20,773][1648981] Fps is (10 sec: 52395.5, 60 sec: 52423.2, 300 sec: 47207.1). Total num frames: 483262464. Throughput: 0: 12479.7. Samples: 120871424. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:22:20,773][1648981] Avg episode reward: [(0, '455.880')] [2024-06-15 14:22:22,714][1651669] Updated weights for policy 0, policy_version 235969 (0.0012) [2024-06-15 14:22:24,033][1651669] Updated weights for policy 0, policy_version 236026 (0.0013) [2024-06-15 14:22:25,771][1648981] Fps is (10 sec: 45998.3, 60 sec: 48602.1, 300 sec: 46763.1). Total num frames: 483426304. Throughput: 0: 12653.0. Samples: 120915456. Policy #0 lag: (min: 15.0, avg: 77.5, max: 271.0) [2024-06-15 14:22:25,771][1648981] Avg episode reward: [(0, '448.800')] [2024-06-15 14:22:26,497][1651669] Updated weights for policy 0, policy_version 236086 (0.0011) [2024-06-15 14:22:27,959][1651669] Updated weights for policy 0, policy_version 236146 (0.0014) [2024-06-15 14:22:30,148][1651669] Updated weights for policy 0, policy_version 236179 (0.0012) [2024-06-15 14:22:30,769][1648981] Fps is (10 sec: 49171.2, 60 sec: 51880.5, 300 sec: 47318.8). Total num frames: 483753984. Throughput: 0: 12696.9. Samples: 120993792. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:22:30,770][1648981] Avg episode reward: [(0, '442.980')] [2024-06-15 14:22:32,992][1651669] Updated weights for policy 0, policy_version 236242 (0.0016) [2024-06-15 14:22:35,766][1648981] Fps is (10 sec: 49174.4, 60 sec: 49152.1, 300 sec: 46660.4). Total num frames: 483917824. Throughput: 0: 12492.8. Samples: 121064960. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:22:35,767][1648981] Avg episode reward: [(0, '437.630')] [2024-06-15 14:22:36,637][1651669] Updated weights for policy 0, policy_version 236304 (0.0012) [2024-06-15 14:22:39,691][1651669] Updated weights for policy 0, policy_version 236403 (0.0016) [2024-06-15 14:22:40,766][1648981] Fps is (10 sec: 42609.2, 60 sec: 50250.8, 300 sec: 47432.2). Total num frames: 484179968. Throughput: 0: 12390.4. Samples: 121096192. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:22:40,767][1648981] Avg episode reward: [(0, '434.850')] [2024-06-15 14:22:41,818][1651669] Updated weights for policy 0, policy_version 236433 (0.0012) [2024-06-15 14:22:44,587][1651274] Signal inference workers to stop experience collection... (12450 times) [2024-06-15 14:22:44,622][1651669] InferenceWorker_p0-w0: stopping experience collection (12450 times) [2024-06-15 14:22:44,863][1651274] Signal inference workers to resume experience collection... (12450 times) [2024-06-15 14:22:44,863][1651669] InferenceWorker_p0-w0: resuming experience collection (12450 times) [2024-06-15 14:22:44,998][1651669] Updated weights for policy 0, policy_version 236514 (0.0016) [2024-06-15 14:22:45,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 49694.9, 300 sec: 46986.6). Total num frames: 484442112. Throughput: 0: 12322.9. Samples: 121163264. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:22:45,771][1648981] Avg episode reward: [(0, '426.930')] [2024-06-15 14:22:48,201][1651669] Updated weights for policy 0, policy_version 236562 (0.0014) [2024-06-15 14:22:50,066][1651669] Updated weights for policy 0, policy_version 236640 (0.0012) [2024-06-15 14:22:50,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 49698.0, 300 sec: 47430.3). Total num frames: 484671488. Throughput: 0: 12015.0. Samples: 121231872. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:22:50,767][1648981] Avg episode reward: [(0, '419.670')] [2024-06-15 14:22:52,847][1651669] Updated weights for policy 0, policy_version 236696 (0.0014) [2024-06-15 14:22:54,499][1651669] Updated weights for policy 0, policy_version 236738 (0.0013) [2024-06-15 14:22:55,706][1651669] Updated weights for policy 0, policy_version 236794 (0.0014) [2024-06-15 14:22:55,767][1648981] Fps is (10 sec: 49169.0, 60 sec: 49697.8, 300 sec: 47321.0). Total num frames: 484933632. Throughput: 0: 12253.8. Samples: 121273856. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:22:55,767][1648981] Avg episode reward: [(0, '426.590')] [2024-06-15 14:22:55,794][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000236800_484966400.pth... [2024-06-15 14:22:55,878][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000231200_473497600.pth [2024-06-15 14:22:59,372][1651669] Updated weights for policy 0, policy_version 236837 (0.0012) [2024-06-15 14:23:00,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 49157.5, 300 sec: 47319.2). Total num frames: 485163008. Throughput: 0: 11989.2. Samples: 121351168. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:00,767][1648981] Avg episode reward: [(0, '414.590')] [2024-06-15 14:23:01,032][1651669] Updated weights for policy 0, policy_version 236912 (0.0033) [2024-06-15 14:23:03,746][1651669] Updated weights for policy 0, policy_version 236946 (0.0014) [2024-06-15 14:23:05,766][1648981] Fps is (10 sec: 45877.3, 60 sec: 49152.2, 300 sec: 47652.5). Total num frames: 485392384. Throughput: 0: 12107.7. Samples: 121416192. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:05,767][1648981] Avg episode reward: [(0, '426.680')] [2024-06-15 14:23:06,533][1651669] Updated weights for policy 0, policy_version 237046 (0.0015) [2024-06-15 14:23:10,468][1651669] Updated weights for policy 0, policy_version 237092 (0.0016) [2024-06-15 14:23:10,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 46986.0). Total num frames: 485588992. Throughput: 0: 11936.5. Samples: 121452544. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:10,767][1648981] Avg episode reward: [(0, '423.320')] [2024-06-15 14:23:12,610][1651669] Updated weights for policy 0, policy_version 237178 (0.0015) [2024-06-15 14:23:15,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47538.4, 300 sec: 47763.5). Total num frames: 485818368. Throughput: 0: 11799.4. Samples: 121524736. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:15,767][1648981] Avg episode reward: [(0, '420.560')] [2024-06-15 14:23:16,152][1651669] Updated weights for policy 0, policy_version 237236 (0.0013) [2024-06-15 14:23:18,088][1651669] Updated weights for policy 0, policy_version 237306 (0.0059) [2024-06-15 14:23:20,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 45880.1, 300 sec: 46986.0). Total num frames: 486014976. Throughput: 0: 11878.4. Samples: 121599488. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:20,767][1648981] Avg episode reward: [(0, '409.350')] [2024-06-15 14:23:21,758][1651669] Updated weights for policy 0, policy_version 237366 (0.0014) [2024-06-15 14:23:22,527][1651669] Updated weights for policy 0, policy_version 237397 (0.0062) [2024-06-15 14:23:23,353][1651669] Updated weights for policy 0, policy_version 237439 (0.0014) [2024-06-15 14:23:25,778][1648981] Fps is (10 sec: 49094.1, 60 sec: 48053.9, 300 sec: 47653.5). Total num frames: 486309888. Throughput: 0: 11818.4. Samples: 121628160. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:25,779][1648981] Avg episode reward: [(0, '415.670')] [2024-06-15 14:23:25,822][1651274] Signal inference workers to stop experience collection... (12500 times) [2024-06-15 14:23:25,873][1651669] InferenceWorker_p0-w0: stopping experience collection (12500 times) [2024-06-15 14:23:26,082][1651274] Signal inference workers to resume experience collection... (12500 times) [2024-06-15 14:23:26,083][1651669] InferenceWorker_p0-w0: resuming experience collection (12500 times) [2024-06-15 14:23:26,429][1651669] Updated weights for policy 0, policy_version 237488 (0.0019) [2024-06-15 14:23:28,350][1651669] Updated weights for policy 0, policy_version 237560 (0.0139) [2024-06-15 14:23:30,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 46423.1, 300 sec: 47097.0). Total num frames: 486539264. Throughput: 0: 12015.9. Samples: 121703936. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:30,768][1648981] Avg episode reward: [(0, '415.670')] [2024-06-15 14:23:32,841][1651669] Updated weights for policy 0, policy_version 237618 (0.0014) [2024-06-15 14:23:34,038][1651669] Updated weights for policy 0, policy_version 237680 (0.0060) [2024-06-15 14:23:35,766][1648981] Fps is (10 sec: 49210.0, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 486801408. Throughput: 0: 12128.7. Samples: 121777664. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:35,767][1648981] Avg episode reward: [(0, '424.500')] [2024-06-15 14:23:36,707][1651669] Updated weights for policy 0, policy_version 237713 (0.0015) [2024-06-15 14:23:38,885][1651669] Updated weights for policy 0, policy_version 237798 (0.0013) [2024-06-15 14:23:40,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 487063552. Throughput: 0: 11912.6. Samples: 121809920. Policy #0 lag: (min: 47.0, avg: 177.0, max: 303.0) [2024-06-15 14:23:40,767][1648981] Avg episode reward: [(0, '432.960')] [2024-06-15 14:23:42,260][1651669] Updated weights for policy 0, policy_version 237832 (0.0012) [2024-06-15 14:23:43,968][1651669] Updated weights for policy 0, policy_version 237904 (0.0015) [2024-06-15 14:23:45,778][1648981] Fps is (10 sec: 52366.8, 60 sec: 48053.3, 300 sec: 47541.4). Total num frames: 487325696. Throughput: 0: 11784.3. Samples: 121881600. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:23:45,779][1648981] Avg episode reward: [(0, '441.830')] [2024-06-15 14:23:48,576][1651669] Updated weights for policy 0, policy_version 237984 (0.0014) [2024-06-15 14:23:50,592][1651669] Updated weights for policy 0, policy_version 238049 (0.0014) [2024-06-15 14:23:50,777][1648981] Fps is (10 sec: 45826.8, 60 sec: 47505.3, 300 sec: 47761.9). Total num frames: 487522304. Throughput: 0: 11773.2. Samples: 121946112. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:23:50,778][1648981] Avg episode reward: [(0, '445.290')] [2024-06-15 14:23:53,976][1651669] Updated weights for policy 0, policy_version 238083 (0.0014) [2024-06-15 14:23:55,509][1651669] Updated weights for policy 0, policy_version 238160 (0.0016) [2024-06-15 14:23:55,766][1648981] Fps is (10 sec: 42649.2, 60 sec: 46967.8, 300 sec: 47213.9). Total num frames: 487751680. Throughput: 0: 12015.0. Samples: 121993216. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:23:55,767][1648981] Avg episode reward: [(0, '432.910')] [2024-06-15 14:23:58,780][1651669] Updated weights for policy 0, policy_version 238214 (0.0014) [2024-06-15 14:24:00,200][1651669] Updated weights for policy 0, policy_version 238272 (0.0013) [2024-06-15 14:24:00,766][1648981] Fps is (10 sec: 49203.8, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 488013824. Throughput: 0: 12117.3. Samples: 122070016. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:00,767][1648981] Avg episode reward: [(0, '447.100')] [2024-06-15 14:24:01,874][1651669] Updated weights for policy 0, policy_version 238333 (0.0013) [2024-06-15 14:24:05,353][1651274] Signal inference workers to stop experience collection... (12550 times) [2024-06-15 14:24:05,450][1651669] InferenceWorker_p0-w0: stopping experience collection (12550 times) [2024-06-15 14:24:05,589][1651274] Signal inference workers to resume experience collection... (12550 times) [2024-06-15 14:24:05,590][1651669] InferenceWorker_p0-w0: resuming experience collection (12550 times) [2024-06-15 14:24:05,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 46967.2, 300 sec: 47430.3). Total num frames: 488210432. Throughput: 0: 11935.2. Samples: 122136576. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:05,767][1648981] Avg episode reward: [(0, '450.170')] [2024-06-15 14:24:05,984][1651669] Updated weights for policy 0, policy_version 238400 (0.0041) [2024-06-15 14:24:07,266][1651669] Updated weights for policy 0, policy_version 238464 (0.0014) [2024-06-15 14:24:10,767][1648981] Fps is (10 sec: 42598.0, 60 sec: 47513.6, 300 sec: 47767.7). Total num frames: 488439808. Throughput: 0: 12018.1. Samples: 122168832. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:10,767][1648981] Avg episode reward: [(0, '456.260')] [2024-06-15 14:24:11,331][1651669] Updated weights for policy 0, policy_version 238514 (0.0017) [2024-06-15 14:24:13,277][1651669] Updated weights for policy 0, policy_version 238588 (0.0014) [2024-06-15 14:24:15,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46967.4, 300 sec: 47432.1). Total num frames: 488636416. Throughput: 0: 11719.2. Samples: 122231296. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:15,768][1648981] Avg episode reward: [(0, '446.430')] [2024-06-15 14:24:16,600][1651669] Updated weights for policy 0, policy_version 238640 (0.0021) [2024-06-15 14:24:18,051][1651669] Updated weights for policy 0, policy_version 238704 (0.0014) [2024-06-15 14:24:20,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48059.7, 300 sec: 47542.0). Total num frames: 488898560. Throughput: 0: 11901.1. Samples: 122313216. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:20,767][1648981] Avg episode reward: [(0, '451.220')] [2024-06-15 14:24:21,910][1651669] Updated weights for policy 0, policy_version 238768 (0.0015) [2024-06-15 14:24:23,544][1651669] Updated weights for policy 0, policy_version 238832 (0.0012) [2024-06-15 14:24:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47522.9, 300 sec: 47874.6). Total num frames: 489160704. Throughput: 0: 11832.9. Samples: 122342400. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:25,767][1648981] Avg episode reward: [(0, '447.820')] [2024-06-15 14:24:26,540][1651669] Updated weights for policy 0, policy_version 238880 (0.0012) [2024-06-15 14:24:28,027][1651669] Updated weights for policy 0, policy_version 238917 (0.0010) [2024-06-15 14:24:29,240][1651669] Updated weights for policy 0, policy_version 238976 (0.0029) [2024-06-15 14:24:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 489422848. Throughput: 0: 11892.9. Samples: 122416640. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:30,767][1648981] Avg episode reward: [(0, '450.850')] [2024-06-15 14:24:34,078][1651669] Updated weights for policy 0, policy_version 239043 (0.0013) [2024-06-15 14:24:35,453][1651669] Updated weights for policy 0, policy_version 239101 (0.0014) [2024-06-15 14:24:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 489684992. Throughput: 0: 11824.3. Samples: 122478080. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:35,767][1648981] Avg episode reward: [(0, '442.550')] [2024-06-15 14:24:38,446][1651669] Updated weights for policy 0, policy_version 239141 (0.0013) [2024-06-15 14:24:40,598][1651669] Updated weights for policy 0, policy_version 239211 (0.0014) [2024-06-15 14:24:40,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 489914368. Throughput: 0: 11639.4. Samples: 122516992. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:40,767][1648981] Avg episode reward: [(0, '450.480')] [2024-06-15 14:24:44,130][1651669] Updated weights for policy 0, policy_version 239251 (0.0050) [2024-06-15 14:24:45,648][1651274] Signal inference workers to stop experience collection... (12600 times) [2024-06-15 14:24:45,701][1651669] InferenceWorker_p0-w0: stopping experience collection (12600 times) [2024-06-15 14:24:45,706][1651669] Updated weights for policy 0, policy_version 239314 (0.0013) [2024-06-15 14:24:45,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46430.4, 300 sec: 47763.5). Total num frames: 490110976. Throughput: 0: 11650.8. Samples: 122594304. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:45,767][1648981] Avg episode reward: [(0, '456.310')] [2024-06-15 14:24:45,946][1651274] Signal inference workers to resume experience collection... (12600 times) [2024-06-15 14:24:45,962][1651669] InferenceWorker_p0-w0: resuming experience collection (12600 times) [2024-06-15 14:24:49,018][1651669] Updated weights for policy 0, policy_version 239376 (0.0014) [2024-06-15 14:24:50,255][1651669] Updated weights for policy 0, policy_version 239423 (0.0012) [2024-06-15 14:24:50,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 46975.8, 300 sec: 47208.1). Total num frames: 490340352. Throughput: 0: 11616.8. Samples: 122659328. Policy #0 lag: (min: 9.0, avg: 102.1, max: 265.0) [2024-06-15 14:24:50,767][1648981] Avg episode reward: [(0, '441.470')] [2024-06-15 14:24:52,070][1651669] Updated weights for policy 0, policy_version 239479 (0.0013) [2024-06-15 14:24:55,775][1648981] Fps is (10 sec: 42564.1, 60 sec: 46415.0, 300 sec: 47429.0). Total num frames: 490536960. Throughput: 0: 11739.8. Samples: 122697216. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:24:55,775][1648981] Avg episode reward: [(0, '449.160')] [2024-06-15 14:24:55,980][1651669] Updated weights for policy 0, policy_version 239536 (0.0113) [2024-06-15 14:24:56,324][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000239552_490602496.pth... [2024-06-15 14:24:56,423][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000233920_479068160.pth [2024-06-15 14:24:57,570][1651669] Updated weights for policy 0, policy_version 239601 (0.0015) [2024-06-15 14:24:59,994][1651669] Updated weights for policy 0, policy_version 239648 (0.0013) [2024-06-15 14:25:00,725][1651669] Updated weights for policy 0, policy_version 239680 (0.0021) [2024-06-15 14:25:00,778][1648981] Fps is (10 sec: 52366.9, 60 sec: 47504.3, 300 sec: 47539.5). Total num frames: 490864640. Throughput: 0: 11943.6. Samples: 122768896. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:00,779][1648981] Avg episode reward: [(0, '444.010')] [2024-06-15 14:25:05,785][1648981] Fps is (10 sec: 45827.5, 60 sec: 46407.2, 300 sec: 47538.4). Total num frames: 490995712. Throughput: 0: 11714.3. Samples: 122840576. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:05,786][1648981] Avg episode reward: [(0, '434.790')] [2024-06-15 14:25:06,235][1651669] Updated weights for policy 0, policy_version 239749 (0.0012) [2024-06-15 14:25:07,829][1651669] Updated weights for policy 0, policy_version 239824 (0.0015) [2024-06-15 14:25:09,071][1651669] Updated weights for policy 0, policy_version 239871 (0.0016) [2024-06-15 14:25:10,774][1648981] Fps is (10 sec: 39337.4, 60 sec: 46961.5, 300 sec: 47317.9). Total num frames: 491257856. Throughput: 0: 11717.1. Samples: 122869760. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:10,775][1648981] Avg episode reward: [(0, '440.970')] [2024-06-15 14:25:11,907][1651669] Updated weights for policy 0, policy_version 239929 (0.0013) [2024-06-15 14:25:14,415][1651669] Updated weights for policy 0, policy_version 239984 (0.0111) [2024-06-15 14:25:15,766][1648981] Fps is (10 sec: 52526.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 491520000. Throughput: 0: 11548.5. Samples: 122936320. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:15,767][1648981] Avg episode reward: [(0, '455.700')] [2024-06-15 14:25:18,180][1651669] Updated weights for policy 0, policy_version 240032 (0.0015) [2024-06-15 14:25:20,607][1651669] Updated weights for policy 0, policy_version 240121 (0.0014) [2024-06-15 14:25:20,769][1648981] Fps is (10 sec: 49179.1, 60 sec: 47511.8, 300 sec: 47657.2). Total num frames: 491749376. Throughput: 0: 11673.0. Samples: 123003392. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:20,769][1648981] Avg episode reward: [(0, '461.370')] [2024-06-15 14:25:23,556][1651669] Updated weights for policy 0, policy_version 240180 (0.0013) [2024-06-15 14:25:25,192][1651669] Updated weights for policy 0, policy_version 240224 (0.0011) [2024-06-15 14:25:25,772][1648981] Fps is (10 sec: 52399.5, 60 sec: 48055.3, 300 sec: 47540.5). Total num frames: 492044288. Throughput: 0: 11717.7. Samples: 123044352. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:25,772][1648981] Avg episode reward: [(0, '458.750')] [2024-06-15 14:25:28,448][1651669] Updated weights for policy 0, policy_version 240272 (0.0110) [2024-06-15 14:25:29,174][1651274] Signal inference workers to stop experience collection... (12650 times) [2024-06-15 14:25:29,228][1651669] InferenceWorker_p0-w0: stopping experience collection (12650 times) [2024-06-15 14:25:29,530][1651274] Signal inference workers to resume experience collection... (12650 times) [2024-06-15 14:25:29,531][1651669] InferenceWorker_p0-w0: resuming experience collection (12650 times) [2024-06-15 14:25:30,129][1651669] Updated weights for policy 0, policy_version 240337 (0.0013) [2024-06-15 14:25:30,766][1648981] Fps is (10 sec: 49163.0, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 492240896. Throughput: 0: 11685.0. Samples: 123120128. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:30,767][1648981] Avg episode reward: [(0, '471.860')] [2024-06-15 14:25:32,551][1651669] Updated weights for policy 0, policy_version 240386 (0.0013) [2024-06-15 14:25:35,743][1651669] Updated weights for policy 0, policy_version 240451 (0.0013) [2024-06-15 14:25:35,766][1648981] Fps is (10 sec: 39343.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 492437504. Throughput: 0: 11889.8. Samples: 123194368. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:35,767][1648981] Avg episode reward: [(0, '467.080')] [2024-06-15 14:25:36,810][1651669] Updated weights for policy 0, policy_version 240507 (0.0016) [2024-06-15 14:25:40,539][1651669] Updated weights for policy 0, policy_version 240576 (0.0013) [2024-06-15 14:25:40,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46421.5, 300 sec: 47985.7). Total num frames: 492699648. Throughput: 0: 11937.5. Samples: 123234304. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:40,767][1648981] Avg episode reward: [(0, '467.760')] [2024-06-15 14:25:42,071][1651669] Updated weights for policy 0, policy_version 240640 (0.0055) [2024-06-15 14:25:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.7, 300 sec: 47542.0). Total num frames: 492961792. Throughput: 0: 11642.5. Samples: 123292672. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:45,767][1648981] Avg episode reward: [(0, '479.580')] [2024-06-15 14:25:46,822][1651669] Updated weights for policy 0, policy_version 240708 (0.0014) [2024-06-15 14:25:50,124][1651669] Updated weights for policy 0, policy_version 240772 (0.0013) [2024-06-15 14:25:50,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 46421.0, 300 sec: 47652.4). Total num frames: 493125632. Throughput: 0: 11849.1. Samples: 123373568. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:50,768][1648981] Avg episode reward: [(0, '481.940')] [2024-06-15 14:25:51,699][1651669] Updated weights for policy 0, policy_version 240833 (0.0013) [2024-06-15 14:25:53,004][1651669] Updated weights for policy 0, policy_version 240893 (0.0014) [2024-06-15 14:25:55,767][1648981] Fps is (10 sec: 49150.5, 60 sec: 48612.2, 300 sec: 47874.6). Total num frames: 493453312. Throughput: 0: 11834.8. Samples: 123402240. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:25:55,767][1648981] Avg episode reward: [(0, '468.460')] [2024-06-15 14:25:55,865][1651669] Updated weights for policy 0, policy_version 240952 (0.0041) [2024-06-15 14:25:58,126][1651669] Updated weights for policy 0, policy_version 240995 (0.0013) [2024-06-15 14:26:00,774][1648981] Fps is (10 sec: 52388.4, 60 sec: 46424.2, 300 sec: 47651.1). Total num frames: 493649920. Throughput: 0: 12172.0. Samples: 123484160. Policy #0 lag: (min: 47.0, avg: 174.7, max: 303.0) [2024-06-15 14:26:00,775][1648981] Avg episode reward: [(0, '477.090')] [2024-06-15 14:26:00,859][1651669] Updated weights for policy 0, policy_version 241050 (0.0012) [2024-06-15 14:26:02,645][1651669] Updated weights for policy 0, policy_version 241120 (0.0084) [2024-06-15 14:26:05,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 48074.6, 300 sec: 47985.7). Total num frames: 493879296. Throughput: 0: 12231.7. Samples: 123553792. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:05,767][1648981] Avg episode reward: [(0, '475.140')] [2024-06-15 14:26:06,075][1651669] Updated weights for policy 0, policy_version 241168 (0.0013) [2024-06-15 14:26:07,764][1651669] Updated weights for policy 0, policy_version 241236 (0.0108) [2024-06-15 14:26:10,766][1648981] Fps is (10 sec: 49191.7, 60 sec: 48065.9, 300 sec: 47541.4). Total num frames: 494141440. Throughput: 0: 11925.4. Samples: 123580928. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:10,767][1648981] Avg episode reward: [(0, '483.040')] [2024-06-15 14:26:11,540][1651274] Signal inference workers to stop experience collection... (12700 times) [2024-06-15 14:26:11,590][1651669] InferenceWorker_p0-w0: stopping experience collection (12700 times) [2024-06-15 14:26:11,592][1651669] Updated weights for policy 0, policy_version 241282 (0.0013) [2024-06-15 14:26:11,853][1651274] Signal inference workers to resume experience collection... (12700 times) [2024-06-15 14:26:11,854][1651669] InferenceWorker_p0-w0: resuming experience collection (12700 times) [2024-06-15 14:26:12,936][1651669] Updated weights for policy 0, policy_version 241337 (0.0110) [2024-06-15 14:26:14,945][1651669] Updated weights for policy 0, policy_version 241401 (0.0123) [2024-06-15 14:26:15,798][1648981] Fps is (10 sec: 52264.1, 60 sec: 48034.5, 300 sec: 48424.8). Total num frames: 494403584. Throughput: 0: 11915.6. Samples: 123656704. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:15,798][1648981] Avg episode reward: [(0, '487.270')] [2024-06-15 14:26:18,419][1651669] Updated weights for policy 0, policy_version 241459 (0.0012) [2024-06-15 14:26:19,754][1651669] Updated weights for policy 0, policy_version 241525 (0.0117) [2024-06-15 14:26:20,768][1648981] Fps is (10 sec: 52418.2, 60 sec: 48606.0, 300 sec: 47985.3). Total num frames: 494665728. Throughput: 0: 11877.9. Samples: 123728896. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:20,769][1648981] Avg episode reward: [(0, '497.080')] [2024-06-15 14:26:22,644][1651669] Updated weights for policy 0, policy_version 241554 (0.0012) [2024-06-15 14:26:24,448][1651669] Updated weights for policy 0, policy_version 241601 (0.0030) [2024-06-15 14:26:25,794][1648981] Fps is (10 sec: 49170.2, 60 sec: 47496.0, 300 sec: 48314.3). Total num frames: 494895104. Throughput: 0: 11916.5. Samples: 123770880. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:25,795][1648981] Avg episode reward: [(0, '512.630')] [2024-06-15 14:26:25,866][1651669] Updated weights for policy 0, policy_version 241656 (0.0012) [2024-06-15 14:26:26,013][1651274] Saving new best policy, reward=512.630! [2024-06-15 14:26:28,832][1651669] Updated weights for policy 0, policy_version 241712 (0.0012) [2024-06-15 14:26:30,766][1648981] Fps is (10 sec: 52439.5, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 495190016. Throughput: 0: 12094.6. Samples: 123836928. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:30,767][1648981] Avg episode reward: [(0, '507.330')] [2024-06-15 14:26:32,607][1651669] Updated weights for policy 0, policy_version 241795 (0.0014) [2024-06-15 14:26:33,603][1651669] Updated weights for policy 0, policy_version 241853 (0.0012) [2024-06-15 14:26:35,770][1648981] Fps is (10 sec: 42700.5, 60 sec: 48056.5, 300 sec: 47986.3). Total num frames: 495321088. Throughput: 0: 12150.5. Samples: 123920384. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:35,771][1648981] Avg episode reward: [(0, '510.000')] [2024-06-15 14:26:37,232][1651669] Updated weights for policy 0, policy_version 241914 (0.0013) [2024-06-15 14:26:40,317][1651669] Updated weights for policy 0, policy_version 241972 (0.0012) [2024-06-15 14:26:40,778][1648981] Fps is (10 sec: 39275.0, 60 sec: 48050.2, 300 sec: 47872.7). Total num frames: 495583232. Throughput: 0: 12182.5. Samples: 123950592. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:40,779][1648981] Avg episode reward: [(0, '491.460')] [2024-06-15 14:26:41,775][1651669] Updated weights for policy 0, policy_version 242039 (0.0011) [2024-06-15 14:26:43,836][1651669] Updated weights for policy 0, policy_version 242096 (0.0068) [2024-06-15 14:26:45,767][1648981] Fps is (10 sec: 52448.2, 60 sec: 48059.6, 300 sec: 47985.6). Total num frames: 495845376. Throughput: 0: 11926.0. Samples: 124020736. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:45,768][1648981] Avg episode reward: [(0, '502.130')] [2024-06-15 14:26:47,714][1651669] Updated weights for policy 0, policy_version 242146 (0.0012) [2024-06-15 14:26:50,363][1651669] Updated weights for policy 0, policy_version 242212 (0.0013) [2024-06-15 14:26:50,766][1648981] Fps is (10 sec: 49210.1, 60 sec: 49152.3, 300 sec: 47874.6). Total num frames: 496074752. Throughput: 0: 12162.8. Samples: 124101120. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:50,767][1648981] Avg episode reward: [(0, '483.880')] [2024-06-15 14:26:51,433][1651274] Signal inference workers to stop experience collection... (12750 times) [2024-06-15 14:26:51,483][1651669] InferenceWorker_p0-w0: stopping experience collection (12750 times) [2024-06-15 14:26:51,624][1651274] Signal inference workers to resume experience collection... (12750 times) [2024-06-15 14:26:51,639][1651669] InferenceWorker_p0-w0: resuming experience collection (12750 times) [2024-06-15 14:26:51,943][1651669] Updated weights for policy 0, policy_version 242288 (0.0088) [2024-06-15 14:26:53,604][1651669] Updated weights for policy 0, policy_version 242338 (0.0014) [2024-06-15 14:26:55,774][1648981] Fps is (10 sec: 52389.0, 60 sec: 48599.8, 300 sec: 47985.5). Total num frames: 496369664. Throughput: 0: 12229.0. Samples: 124131328. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:26:55,775][1648981] Avg episode reward: [(0, '480.110')] [2024-06-15 14:26:55,826][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000242368_496369664.pth... [2024-06-15 14:26:55,879][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000236800_484966400.pth [2024-06-15 14:26:58,794][1651669] Updated weights for policy 0, policy_version 242400 (0.0012) [2024-06-15 14:27:00,298][1651669] Updated weights for policy 0, policy_version 242448 (0.0015) [2024-06-15 14:27:00,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48612.4, 300 sec: 47874.6). Total num frames: 496566272. Throughput: 0: 12296.6. Samples: 124209664. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:27:00,767][1648981] Avg episode reward: [(0, '468.460')] [2024-06-15 14:27:01,740][1651669] Updated weights for policy 0, policy_version 242499 (0.0014) [2024-06-15 14:27:02,956][1651669] Updated weights for policy 0, policy_version 242558 (0.0013) [2024-06-15 14:27:05,282][1651669] Updated weights for policy 0, policy_version 242620 (0.0013) [2024-06-15 14:27:05,767][1648981] Fps is (10 sec: 52468.6, 60 sec: 50244.1, 300 sec: 47985.7). Total num frames: 496893952. Throughput: 0: 12186.1. Samples: 124277248. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:27:05,768][1648981] Avg episode reward: [(0, '467.710')] [2024-06-15 14:27:09,730][1651669] Updated weights for policy 0, policy_version 242684 (0.0015) [2024-06-15 14:27:10,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.6, 300 sec: 47657.5). Total num frames: 497025024. Throughput: 0: 12284.2. Samples: 124323328. Policy #0 lag: (min: 63.0, avg: 158.1, max: 319.0) [2024-06-15 14:27:10,767][1648981] Avg episode reward: [(0, '445.910')] [2024-06-15 14:27:12,336][1651669] Updated weights for policy 0, policy_version 242743 (0.0083) [2024-06-15 14:27:13,841][1651669] Updated weights for policy 0, policy_version 242805 (0.0014) [2024-06-15 14:27:15,251][1651669] Updated weights for policy 0, policy_version 242848 (0.0022) [2024-06-15 14:27:15,780][1648981] Fps is (10 sec: 49087.0, 60 sec: 49713.1, 300 sec: 47873.5). Total num frames: 497385472. Throughput: 0: 12193.3. Samples: 124385792. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:15,780][1648981] Avg episode reward: [(0, '448.220')] [2024-06-15 14:27:19,932][1651669] Updated weights for policy 0, policy_version 242899 (0.0026) [2024-06-15 14:27:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 47515.2, 300 sec: 47764.3). Total num frames: 497516544. Throughput: 0: 12152.5. Samples: 124467200. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:20,767][1648981] Avg episode reward: [(0, '458.050')] [2024-06-15 14:27:21,660][1651669] Updated weights for policy 0, policy_version 242948 (0.0015) [2024-06-15 14:27:24,043][1651669] Updated weights for policy 0, policy_version 243040 (0.0013) [2024-06-15 14:27:25,777][1648981] Fps is (10 sec: 42612.7, 60 sec: 48620.2, 300 sec: 47651.2). Total num frames: 497811456. Throughput: 0: 12083.7. Samples: 124494336. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:25,777][1648981] Avg episode reward: [(0, '462.520')] [2024-06-15 14:27:26,394][1651669] Updated weights for policy 0, policy_version 243098 (0.0012) [2024-06-15 14:27:30,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 497942528. Throughput: 0: 12299.5. Samples: 124574208. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:30,767][1648981] Avg episode reward: [(0, '446.700')] [2024-06-15 14:27:30,906][1651669] Updated weights for policy 0, policy_version 243152 (0.0014) [2024-06-15 14:27:31,879][1651669] Updated weights for policy 0, policy_version 243198 (0.0014) [2024-06-15 14:27:33,915][1651669] Updated weights for policy 0, policy_version 243251 (0.0042) [2024-06-15 14:27:34,327][1651274] Signal inference workers to stop experience collection... (12800 times) [2024-06-15 14:27:34,385][1651669] InferenceWorker_p0-w0: stopping experience collection (12800 times) [2024-06-15 14:27:34,619][1651274] Signal inference workers to resume experience collection... (12800 times) [2024-06-15 14:27:34,620][1651669] InferenceWorker_p0-w0: resuming experience collection (12800 times) [2024-06-15 14:27:35,647][1651669] Updated weights for policy 0, policy_version 243323 (0.0094) [2024-06-15 14:27:35,766][1648981] Fps is (10 sec: 52482.1, 60 sec: 50247.6, 300 sec: 47985.7). Total num frames: 498335744. Throughput: 0: 11923.9. Samples: 124637696. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:35,767][1648981] Avg episode reward: [(0, '418.780')] [2024-06-15 14:27:37,827][1651669] Updated weights for policy 0, policy_version 243387 (0.0017) [2024-06-15 14:27:40,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48069.2, 300 sec: 47542.0). Total num frames: 498466816. Throughput: 0: 12017.0. Samples: 124672000. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:40,767][1648981] Avg episode reward: [(0, '421.610')] [2024-06-15 14:27:42,855][1651669] Updated weights for policy 0, policy_version 243440 (0.0013) [2024-06-15 14:27:44,197][1651669] Updated weights for policy 0, policy_version 243490 (0.0011) [2024-06-15 14:27:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49152.2, 300 sec: 47874.6). Total num frames: 498794496. Throughput: 0: 12037.7. Samples: 124751360. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:45,767][1648981] Avg episode reward: [(0, '444.140')] [2024-06-15 14:27:46,107][1651669] Updated weights for policy 0, policy_version 243582 (0.0012) [2024-06-15 14:27:48,362][1651669] Updated weights for policy 0, policy_version 243648 (0.0015) [2024-06-15 14:27:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 498991104. Throughput: 0: 12197.0. Samples: 124826112. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:50,767][1648981] Avg episode reward: [(0, '442.130')] [2024-06-15 14:27:53,625][1651669] Updated weights for policy 0, policy_version 243706 (0.0015) [2024-06-15 14:27:54,951][1651669] Updated weights for policy 0, policy_version 243765 (0.0013) [2024-06-15 14:27:55,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 48612.0, 300 sec: 47874.6). Total num frames: 499286016. Throughput: 0: 12094.5. Samples: 124867584. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:27:55,768][1648981] Avg episode reward: [(0, '436.900')] [2024-06-15 14:27:56,142][1651669] Updated weights for policy 0, policy_version 243813 (0.0012) [2024-06-15 14:27:57,004][1651669] Updated weights for policy 0, policy_version 243856 (0.0104) [2024-06-15 14:27:58,127][1651669] Updated weights for policy 0, policy_version 243904 (0.0016) [2024-06-15 14:28:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49152.1, 300 sec: 47874.6). Total num frames: 499515392. Throughput: 0: 12234.8. Samples: 124936192. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:28:00,767][1648981] Avg episode reward: [(0, '448.720')] [2024-06-15 14:28:05,185][1651669] Updated weights for policy 0, policy_version 243971 (0.0118) [2024-06-15 14:28:05,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 46967.7, 300 sec: 47874.6). Total num frames: 499712000. Throughput: 0: 12128.7. Samples: 125012992. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:28:05,767][1648981] Avg episode reward: [(0, '440.980')] [2024-06-15 14:28:06,154][1651669] Updated weights for policy 0, policy_version 244030 (0.0013) [2024-06-15 14:28:07,245][1651669] Updated weights for policy 0, policy_version 244080 (0.0014) [2024-06-15 14:28:08,519][1651669] Updated weights for policy 0, policy_version 244144 (0.0012) [2024-06-15 14:28:10,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 50244.2, 300 sec: 48207.8). Total num frames: 500039680. Throughput: 0: 12222.4. Samples: 125044224. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:28:10,767][1648981] Avg episode reward: [(0, '457.000')] [2024-06-15 14:28:13,354][1651669] Updated weights for policy 0, policy_version 244177 (0.0013) [2024-06-15 14:28:14,653][1651274] Signal inference workers to stop experience collection... (12850 times) [2024-06-15 14:28:14,685][1651669] InferenceWorker_p0-w0: stopping experience collection (12850 times) [2024-06-15 14:28:14,930][1651274] Signal inference workers to resume experience collection... (12850 times) [2024-06-15 14:28:14,931][1651669] InferenceWorker_p0-w0: resuming experience collection (12850 times) [2024-06-15 14:28:15,399][1651669] Updated weights for policy 0, policy_version 244256 (0.0013) [2024-06-15 14:28:15,770][1648981] Fps is (10 sec: 52408.3, 60 sec: 47521.2, 300 sec: 48207.2). Total num frames: 500236288. Throughput: 0: 12332.4. Samples: 125129216. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:28:15,771][1648981] Avg episode reward: [(0, '476.160')] [2024-06-15 14:28:18,539][1651669] Updated weights for policy 0, policy_version 244322 (0.0014) [2024-06-15 14:28:19,751][1651669] Updated weights for policy 0, policy_version 244372 (0.0012) [2024-06-15 14:28:20,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 50790.5, 300 sec: 48320.9). Total num frames: 500563968. Throughput: 0: 12174.2. Samples: 125185536. Policy #0 lag: (min: 31.0, avg: 131.8, max: 287.0) [2024-06-15 14:28:20,767][1648981] Avg episode reward: [(0, '474.020')] [2024-06-15 14:28:25,766][1648981] Fps is (10 sec: 39336.7, 60 sec: 46975.4, 300 sec: 47763.6). Total num frames: 500629504. Throughput: 0: 12401.8. Samples: 125230080. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:25,767][1648981] Avg episode reward: [(0, '467.250')] [2024-06-15 14:28:25,835][1651669] Updated weights for policy 0, policy_version 244451 (0.0012) [2024-06-15 14:28:27,449][1651669] Updated weights for policy 0, policy_version 244516 (0.0109) [2024-06-15 14:28:29,637][1651669] Updated weights for policy 0, policy_version 244578 (0.0017) [2024-06-15 14:28:30,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 50790.4, 300 sec: 48096.8). Total num frames: 500989952. Throughput: 0: 12140.1. Samples: 125297664. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:30,767][1648981] Avg episode reward: [(0, '469.040')] [2024-06-15 14:28:31,351][1651669] Updated weights for policy 0, policy_version 244666 (0.0105) [2024-06-15 14:28:35,778][1648981] Fps is (10 sec: 45822.3, 60 sec: 45866.3, 300 sec: 47539.5). Total num frames: 501088256. Throughput: 0: 12125.6. Samples: 125371904. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:35,778][1648981] Avg episode reward: [(0, '487.270')] [2024-06-15 14:28:37,569][1651669] Updated weights for policy 0, policy_version 244725 (0.0014) [2024-06-15 14:28:38,893][1651669] Updated weights for policy 0, policy_version 244784 (0.0012) [2024-06-15 14:28:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49698.2, 300 sec: 47876.5). Total num frames: 501448704. Throughput: 0: 11924.0. Samples: 125404160. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:40,767][1648981] Avg episode reward: [(0, '481.380')] [2024-06-15 14:28:40,923][1651669] Updated weights for policy 0, policy_version 244854 (0.0079) [2024-06-15 14:28:42,241][1651669] Updated weights for policy 0, policy_version 244912 (0.0013) [2024-06-15 14:28:45,791][1648981] Fps is (10 sec: 52363.0, 60 sec: 46948.6, 300 sec: 47761.3). Total num frames: 501612544. Throughput: 0: 11883.4. Samples: 125471232. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:45,791][1648981] Avg episode reward: [(0, '479.970')] [2024-06-15 14:28:48,290][1651669] Updated weights for policy 0, policy_version 244966 (0.0054) [2024-06-15 14:28:50,197][1651669] Updated weights for policy 0, policy_version 245044 (0.0185) [2024-06-15 14:28:50,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 501874688. Throughput: 0: 11662.2. Samples: 125537792. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:50,767][1648981] Avg episode reward: [(0, '465.220')] [2024-06-15 14:28:52,297][1651669] Updated weights for policy 0, policy_version 245088 (0.0015) [2024-06-15 14:28:53,582][1651274] Signal inference workers to stop experience collection... (12900 times) [2024-06-15 14:28:53,630][1651669] InferenceWorker_p0-w0: stopping experience collection (12900 times) [2024-06-15 14:28:53,827][1651274] Signal inference workers to resume experience collection... (12900 times) [2024-06-15 14:28:53,828][1651669] InferenceWorker_p0-w0: resuming experience collection (12900 times) [2024-06-15 14:28:54,248][1651669] Updated weights for policy 0, policy_version 245170 (0.0125) [2024-06-15 14:28:55,786][1648981] Fps is (10 sec: 52451.4, 60 sec: 47498.1, 300 sec: 47871.4). Total num frames: 502136832. Throughput: 0: 11770.9. Samples: 125574144. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:28:55,787][1648981] Avg episode reward: [(0, '448.530')] [2024-06-15 14:28:55,798][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000245184_502136832.pth... [2024-06-15 14:28:55,840][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000239552_490602496.pth [2024-06-15 14:28:58,689][1651669] Updated weights for policy 0, policy_version 245205 (0.0015) [2024-06-15 14:28:59,485][1651669] Updated weights for policy 0, policy_version 245245 (0.0084) [2024-06-15 14:29:00,787][1648981] Fps is (10 sec: 42509.5, 60 sec: 46405.1, 300 sec: 47760.2). Total num frames: 502300672. Throughput: 0: 11600.9. Samples: 125651456. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:00,788][1648981] Avg episode reward: [(0, '443.250')] [2024-06-15 14:29:01,356][1651669] Updated weights for policy 0, policy_version 245306 (0.0015) [2024-06-15 14:29:02,974][1651669] Updated weights for policy 0, policy_version 245367 (0.0181) [2024-06-15 14:29:04,501][1651669] Updated weights for policy 0, policy_version 245409 (0.0011) [2024-06-15 14:29:05,766][1648981] Fps is (10 sec: 52532.8, 60 sec: 49151.9, 300 sec: 48207.8). Total num frames: 502661120. Throughput: 0: 11730.5. Samples: 125713408. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:05,767][1648981] Avg episode reward: [(0, '448.950')] [2024-06-15 14:29:09,873][1651669] Updated weights for policy 0, policy_version 245472 (0.0013) [2024-06-15 14:29:10,767][1648981] Fps is (10 sec: 49254.0, 60 sec: 45875.1, 300 sec: 47985.6). Total num frames: 502792192. Throughput: 0: 11684.9. Samples: 125755904. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:10,767][1648981] Avg episode reward: [(0, '453.090')] [2024-06-15 14:29:13,077][1651669] Updated weights for policy 0, policy_version 245571 (0.0013) [2024-06-15 14:29:14,379][1651669] Updated weights for policy 0, policy_version 245629 (0.0038) [2024-06-15 14:29:15,734][1651669] Updated weights for policy 0, policy_version 245667 (0.0043) [2024-06-15 14:29:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48062.9, 300 sec: 48207.8). Total num frames: 503119872. Throughput: 0: 11594.0. Samples: 125819392. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:15,767][1648981] Avg episode reward: [(0, '466.140')] [2024-06-15 14:29:20,745][1651669] Updated weights for policy 0, policy_version 245728 (0.0012) [2024-06-15 14:29:20,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 44782.9, 300 sec: 47763.5). Total num frames: 503250944. Throughput: 0: 11756.3. Samples: 125900800. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:20,767][1648981] Avg episode reward: [(0, '468.990')] [2024-06-15 14:29:23,118][1651669] Updated weights for policy 0, policy_version 245793 (0.0015) [2024-06-15 14:29:24,153][1651669] Updated weights for policy 0, policy_version 245826 (0.0012) [2024-06-15 14:29:25,668][1651669] Updated weights for policy 0, policy_version 245888 (0.0013) [2024-06-15 14:29:25,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 503578624. Throughput: 0: 11798.8. Samples: 125935104. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:25,767][1648981] Avg episode reward: [(0, '462.410')] [2024-06-15 14:29:27,457][1651669] Updated weights for policy 0, policy_version 245950 (0.0013) [2024-06-15 14:29:30,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 45329.1, 300 sec: 47541.4). Total num frames: 503709696. Throughput: 0: 11839.3. Samples: 126003712. Policy #0 lag: (min: 15.0, avg: 95.8, max: 271.0) [2024-06-15 14:29:30,767][1648981] Avg episode reward: [(0, '445.680')] [2024-06-15 14:29:32,139][1651669] Updated weights for policy 0, policy_version 246010 (0.0013) [2024-06-15 14:29:33,839][1651669] Updated weights for policy 0, policy_version 246064 (0.0017) [2024-06-15 14:29:35,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48615.2, 300 sec: 47763.6). Total num frames: 504004608. Throughput: 0: 12151.5. Samples: 126084608. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:29:35,767][1648981] Avg episode reward: [(0, '442.680')] [2024-06-15 14:29:36,267][1651669] Updated weights for policy 0, policy_version 246114 (0.0019) [2024-06-15 14:29:37,026][1651274] Signal inference workers to stop experience collection... (12950 times) [2024-06-15 14:29:37,061][1651669] InferenceWorker_p0-w0: stopping experience collection (12950 times) [2024-06-15 14:29:37,295][1651274] Signal inference workers to resume experience collection... (12950 times) [2024-06-15 14:29:37,295][1651669] InferenceWorker_p0-w0: resuming experience collection (12950 times) [2024-06-15 14:29:38,060][1651669] Updated weights for policy 0, policy_version 246192 (0.0016) [2024-06-15 14:29:40,767][1648981] Fps is (10 sec: 52423.6, 60 sec: 46420.6, 300 sec: 47874.5). Total num frames: 504233984. Throughput: 0: 11883.4. Samples: 126108672. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:29:40,768][1648981] Avg episode reward: [(0, '447.220')] [2024-06-15 14:29:42,508][1651669] Updated weights for policy 0, policy_version 246240 (0.0013) [2024-06-15 14:29:43,710][1651669] Updated weights for policy 0, policy_version 246292 (0.0015) [2024-06-15 14:29:45,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48079.0, 300 sec: 47985.7). Total num frames: 504496128. Throughput: 0: 11895.3. Samples: 126186496. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:29:45,767][1648981] Avg episode reward: [(0, '438.640')] [2024-06-15 14:29:46,162][1651669] Updated weights for policy 0, policy_version 246339 (0.0012) [2024-06-15 14:29:47,483][1651669] Updated weights for policy 0, policy_version 246386 (0.0025) [2024-06-15 14:29:50,770][1648981] Fps is (10 sec: 52413.9, 60 sec: 48056.7, 300 sec: 48208.5). Total num frames: 504758272. Throughput: 0: 12105.0. Samples: 126258176. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:29:50,771][1648981] Avg episode reward: [(0, '438.670')] [2024-06-15 14:29:53,711][1651669] Updated weights for policy 0, policy_version 246481 (0.0014) [2024-06-15 14:29:54,510][1651669] Updated weights for policy 0, policy_version 246528 (0.0015) [2024-06-15 14:29:55,572][1651669] Updated weights for policy 0, policy_version 246587 (0.0111) [2024-06-15 14:29:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48075.5, 300 sec: 47987.6). Total num frames: 505020416. Throughput: 0: 12060.5. Samples: 126298624. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:29:55,767][1648981] Avg episode reward: [(0, '431.580')] [2024-06-15 14:29:57,059][1651669] Updated weights for policy 0, policy_version 246640 (0.0013) [2024-06-15 14:29:58,148][1651669] Updated weights for policy 0, policy_version 246675 (0.0013) [2024-06-15 14:30:00,766][1648981] Fps is (10 sec: 52448.6, 60 sec: 49715.5, 300 sec: 48433.0). Total num frames: 505282560. Throughput: 0: 12253.9. Samples: 126370816. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:00,767][1648981] Avg episode reward: [(0, '440.370')] [2024-06-15 14:30:03,544][1651669] Updated weights for policy 0, policy_version 246723 (0.0118) [2024-06-15 14:30:04,913][1651669] Updated weights for policy 0, policy_version 246780 (0.0086) [2024-06-15 14:30:05,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 48209.1). Total num frames: 505479168. Throughput: 0: 12197.0. Samples: 126449664. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:05,767][1648981] Avg episode reward: [(0, '429.200')] [2024-06-15 14:30:06,666][1651669] Updated weights for policy 0, policy_version 246850 (0.0100) [2024-06-15 14:30:07,636][1651669] Updated weights for policy 0, policy_version 246907 (0.0049) [2024-06-15 14:30:09,247][1651669] Updated weights for policy 0, policy_version 246974 (0.0012) [2024-06-15 14:30:10,778][1648981] Fps is (10 sec: 52366.5, 60 sec: 50234.5, 300 sec: 48428.0). Total num frames: 505806848. Throughput: 0: 12239.2. Samples: 126486016. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:10,779][1648981] Avg episode reward: [(0, '428.550')] [2024-06-15 14:30:14,798][1651669] Updated weights for policy 0, policy_version 247012 (0.0012) [2024-06-15 14:30:15,784][1648981] Fps is (10 sec: 45794.4, 60 sec: 46953.6, 300 sec: 48094.2). Total num frames: 505937920. Throughput: 0: 12465.1. Samples: 126564864. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:15,785][1648981] Avg episode reward: [(0, '431.680')] [2024-06-15 14:30:16,216][1651669] Updated weights for policy 0, policy_version 247056 (0.0033) [2024-06-15 14:30:17,320][1651274] Signal inference workers to stop experience collection... (13000 times) [2024-06-15 14:30:17,380][1651669] InferenceWorker_p0-w0: stopping experience collection (13000 times) [2024-06-15 14:30:17,524][1651274] Signal inference workers to resume experience collection... (13000 times) [2024-06-15 14:30:17,524][1651669] InferenceWorker_p0-w0: resuming experience collection (13000 times) [2024-06-15 14:30:17,658][1651669] Updated weights for policy 0, policy_version 247122 (0.0012) [2024-06-15 14:30:19,003][1651669] Updated weights for policy 0, policy_version 247184 (0.0013) [2024-06-15 14:30:20,766][1648981] Fps is (10 sec: 52491.2, 60 sec: 51336.5, 300 sec: 48430.9). Total num frames: 506331136. Throughput: 0: 12083.2. Samples: 126628352. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:20,767][1648981] Avg episode reward: [(0, '448.290')] [2024-06-15 14:30:25,272][1651669] Updated weights for policy 0, policy_version 247248 (0.0039) [2024-06-15 14:30:25,766][1648981] Fps is (10 sec: 45956.6, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 506396672. Throughput: 0: 12538.6. Samples: 126672896. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:25,767][1648981] Avg episode reward: [(0, '443.500')] [2024-06-15 14:30:26,886][1651669] Updated weights for policy 0, policy_version 247297 (0.0091) [2024-06-15 14:30:28,343][1651669] Updated weights for policy 0, policy_version 247371 (0.0014) [2024-06-15 14:30:29,466][1651669] Updated weights for policy 0, policy_version 247423 (0.0013) [2024-06-15 14:30:30,725][1651669] Updated weights for policy 0, policy_version 247485 (0.0012) [2024-06-15 14:30:30,778][1648981] Fps is (10 sec: 52366.1, 60 sec: 52418.3, 300 sec: 48872.3). Total num frames: 506855424. Throughput: 0: 12216.5. Samples: 126736384. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:30,779][1648981] Avg episode reward: [(0, '449.440')] [2024-06-15 14:30:35,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 506855424. Throughput: 0: 12607.6. Samples: 126825472. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:35,767][1648981] Avg episode reward: [(0, '443.490')] [2024-06-15 14:30:37,264][1651669] Updated weights for policy 0, policy_version 247553 (0.0013) [2024-06-15 14:30:39,426][1651669] Updated weights for policy 0, policy_version 247648 (0.0014) [2024-06-15 14:30:40,766][1648981] Fps is (10 sec: 39368.3, 60 sec: 50245.0, 300 sec: 48430.0). Total num frames: 507248640. Throughput: 0: 12288.0. Samples: 126851584. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:40,767][1648981] Avg episode reward: [(0, '435.400')] [2024-06-15 14:30:41,516][1651669] Updated weights for policy 0, policy_version 247712 (0.0016) [2024-06-15 14:30:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48319.0). Total num frames: 507379712. Throughput: 0: 12390.4. Samples: 126928384. Policy #0 lag: (min: 15.0, avg: 128.9, max: 271.0) [2024-06-15 14:30:45,767][1648981] Avg episode reward: [(0, '444.190')] [2024-06-15 14:30:47,672][1651669] Updated weights for policy 0, policy_version 247776 (0.0013) [2024-06-15 14:30:50,476][1651669] Updated weights for policy 0, policy_version 247888 (0.0099) [2024-06-15 14:30:50,773][1648981] Fps is (10 sec: 42571.0, 60 sec: 48603.7, 300 sec: 48206.8). Total num frames: 507674624. Throughput: 0: 11933.6. Samples: 126986752. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:30:50,778][1648981] Avg episode reward: [(0, '442.490')] [2024-06-15 14:30:52,182][1651669] Updated weights for policy 0, policy_version 247940 (0.0135) [2024-06-15 14:30:53,316][1651669] Updated weights for policy 0, policy_version 247999 (0.0021) [2024-06-15 14:30:55,767][1648981] Fps is (10 sec: 52424.7, 60 sec: 48059.2, 300 sec: 48320.1). Total num frames: 507904000. Throughput: 0: 11904.1. Samples: 127021568. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:30:55,768][1648981] Avg episode reward: [(0, '445.070')] [2024-06-15 14:30:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000248000_507904000.pth... [2024-06-15 14:30:55,817][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000242368_496369664.pth [2024-06-15 14:30:55,840][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000248000_507904000.pth [2024-06-15 14:30:58,797][1651274] Signal inference workers to stop experience collection... (13050 times) [2024-06-15 14:30:59,011][1651669] InferenceWorker_p0-w0: stopping experience collection (13050 times) [2024-06-15 14:30:59,014][1651669] Updated weights for policy 0, policy_version 248043 (0.0055) [2024-06-15 14:30:59,083][1651274] Signal inference workers to resume experience collection... (13050 times) [2024-06-15 14:30:59,084][1651669] InferenceWorker_p0-w0: resuming experience collection (13050 times) [2024-06-15 14:31:00,348][1651669] Updated weights for policy 0, policy_version 248099 (0.0087) [2024-06-15 14:31:00,776][1648981] Fps is (10 sec: 45858.8, 60 sec: 47505.6, 300 sec: 48317.3). Total num frames: 508133376. Throughput: 0: 12085.2. Samples: 127108608. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:00,777][1648981] Avg episode reward: [(0, '441.300')] [2024-06-15 14:31:02,485][1651669] Updated weights for policy 0, policy_version 248182 (0.0044) [2024-06-15 14:31:03,614][1651669] Updated weights for policy 0, policy_version 248224 (0.0013) [2024-06-15 14:31:05,766][1648981] Fps is (10 sec: 52433.0, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 508428288. Throughput: 0: 12094.6. Samples: 127172608. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:05,767][1648981] Avg episode reward: [(0, '436.990')] [2024-06-15 14:31:08,790][1651669] Updated weights for policy 0, policy_version 248272 (0.0012) [2024-06-15 14:31:10,163][1651669] Updated weights for policy 0, policy_version 248336 (0.0012) [2024-06-15 14:31:10,766][1648981] Fps is (10 sec: 49201.5, 60 sec: 46976.7, 300 sec: 48213.0). Total num frames: 508624896. Throughput: 0: 12128.7. Samples: 127218688. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:10,767][1648981] Avg episode reward: [(0, '437.660')] [2024-06-15 14:31:12,096][1651669] Updated weights for policy 0, policy_version 248406 (0.0013) [2024-06-15 14:31:12,961][1651669] Updated weights for policy 0, policy_version 248442 (0.0023) [2024-06-15 14:31:14,987][1651669] Updated weights for policy 0, policy_version 248489 (0.0012) [2024-06-15 14:31:15,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 50258.9, 300 sec: 48430.3). Total num frames: 508952576. Throughput: 0: 12063.6. Samples: 127279104. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:15,768][1648981] Avg episode reward: [(0, '453.990')] [2024-06-15 14:31:20,685][1651669] Updated weights for policy 0, policy_version 248549 (0.0016) [2024-06-15 14:31:20,767][1648981] Fps is (10 sec: 39321.3, 60 sec: 44782.8, 300 sec: 47879.1). Total num frames: 509018112. Throughput: 0: 11912.5. Samples: 127361536. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:20,767][1648981] Avg episode reward: [(0, '453.760')] [2024-06-15 14:31:23,223][1651669] Updated weights for policy 0, policy_version 248656 (0.0014) [2024-06-15 14:31:25,653][1651669] Updated weights for policy 0, policy_version 248707 (0.0014) [2024-06-15 14:31:25,774][1648981] Fps is (10 sec: 39291.6, 60 sec: 49145.6, 300 sec: 47984.4). Total num frames: 509345792. Throughput: 0: 11739.9. Samples: 127379968. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:25,775][1648981] Avg episode reward: [(0, '485.950')] [2024-06-15 14:31:26,866][1651669] Updated weights for policy 0, policy_version 248764 (0.0013) [2024-06-15 14:31:30,767][1648981] Fps is (10 sec: 45875.4, 60 sec: 43699.3, 300 sec: 47986.3). Total num frames: 509476864. Throughput: 0: 11605.3. Samples: 127450624. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:30,767][1648981] Avg episode reward: [(0, '469.780')] [2024-06-15 14:31:32,973][1651669] Updated weights for policy 0, policy_version 248817 (0.0011) [2024-06-15 14:31:34,762][1651669] Updated weights for policy 0, policy_version 248883 (0.0027) [2024-06-15 14:31:35,770][1648981] Fps is (10 sec: 42615.4, 60 sec: 48602.8, 300 sec: 48098.1). Total num frames: 509771776. Throughput: 0: 11913.2. Samples: 127522816. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:35,771][1648981] Avg episode reward: [(0, '457.640')] [2024-06-15 14:31:36,094][1651274] Signal inference workers to stop experience collection... (13100 times) [2024-06-15 14:31:36,201][1651669] InferenceWorker_p0-w0: stopping experience collection (13100 times) [2024-06-15 14:31:36,370][1651274] Signal inference workers to resume experience collection... (13100 times) [2024-06-15 14:31:36,371][1651669] InferenceWorker_p0-w0: resuming experience collection (13100 times) [2024-06-15 14:31:36,517][1651669] Updated weights for policy 0, policy_version 248948 (0.0018) [2024-06-15 14:31:37,956][1651669] Updated weights for policy 0, policy_version 249010 (0.0013) [2024-06-15 14:31:40,768][1648981] Fps is (10 sec: 52423.1, 60 sec: 45874.4, 300 sec: 47985.5). Total num frames: 510001152. Throughput: 0: 11889.7. Samples: 127556608. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:40,770][1648981] Avg episode reward: [(0, '459.450')] [2024-06-15 14:31:42,482][1651669] Updated weights for policy 0, policy_version 249027 (0.0011) [2024-06-15 14:31:44,108][1651669] Updated weights for policy 0, policy_version 249104 (0.0013) [2024-06-15 14:31:45,766][1648981] Fps is (10 sec: 49171.0, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 510263296. Throughput: 0: 11733.1. Samples: 127636480. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:45,767][1648981] Avg episode reward: [(0, '478.550')] [2024-06-15 14:31:45,842][1651669] Updated weights for policy 0, policy_version 249168 (0.0011) [2024-06-15 14:31:47,947][1651669] Updated weights for policy 0, policy_version 249232 (0.0013) [2024-06-15 14:31:50,789][1648981] Fps is (10 sec: 52319.0, 60 sec: 47501.2, 300 sec: 47983.4). Total num frames: 510525440. Throughput: 0: 11633.7. Samples: 127696384. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:50,789][1648981] Avg episode reward: [(0, '466.830')] [2024-06-15 14:31:53,651][1651669] Updated weights for policy 0, policy_version 249284 (0.0013) [2024-06-15 14:31:55,480][1651669] Updated weights for policy 0, policy_version 249360 (0.0013) [2024-06-15 14:31:55,790][1648981] Fps is (10 sec: 42497.0, 60 sec: 46403.5, 300 sec: 47870.7). Total num frames: 510689280. Throughput: 0: 11610.6. Samples: 127741440. Policy #0 lag: (min: 7.0, avg: 55.5, max: 231.0) [2024-06-15 14:31:55,791][1648981] Avg episode reward: [(0, '482.000')] [2024-06-15 14:31:57,384][1651669] Updated weights for policy 0, policy_version 249426 (0.0012) [2024-06-15 14:32:00,099][1651669] Updated weights for policy 0, policy_version 249493 (0.0014) [2024-06-15 14:32:00,766][1648981] Fps is (10 sec: 49261.7, 60 sec: 48067.9, 300 sec: 47874.7). Total num frames: 511016960. Throughput: 0: 11503.0. Samples: 127796736. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:00,767][1648981] Avg episode reward: [(0, '484.340')] [2024-06-15 14:32:00,981][1651669] Updated weights for policy 0, policy_version 249531 (0.0012) [2024-06-15 14:32:05,766][1648981] Fps is (10 sec: 42700.1, 60 sec: 44782.9, 300 sec: 47763.5). Total num frames: 511115264. Throughput: 0: 11389.2. Samples: 127874048. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:05,767][1648981] Avg episode reward: [(0, '473.760')] [2024-06-15 14:32:07,173][1651669] Updated weights for policy 0, policy_version 249617 (0.0014) [2024-06-15 14:32:08,051][1651669] Updated weights for policy 0, policy_version 249661 (0.0013) [2024-06-15 14:32:09,690][1651669] Updated weights for policy 0, policy_version 249720 (0.0014) [2024-06-15 14:32:10,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 46967.5, 300 sec: 47654.6). Total num frames: 511442944. Throughput: 0: 11584.6. Samples: 127901184. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:10,767][1648981] Avg episode reward: [(0, '468.130')] [2024-06-15 14:32:12,796][1651669] Updated weights for policy 0, policy_version 249776 (0.0032) [2024-06-15 14:32:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 43690.8, 300 sec: 47652.4). Total num frames: 511574016. Throughput: 0: 11468.8. Samples: 127966720. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:15,767][1648981] Avg episode reward: [(0, '456.510')] [2024-06-15 14:32:17,009][1651669] Updated weights for policy 0, policy_version 249808 (0.0015) [2024-06-15 14:32:18,211][1651669] Updated weights for policy 0, policy_version 249858 (0.0013) [2024-06-15 14:32:18,611][1651274] Signal inference workers to stop experience collection... (13150 times) [2024-06-15 14:32:18,669][1651669] InferenceWorker_p0-w0: stopping experience collection (13150 times) [2024-06-15 14:32:18,868][1651274] Signal inference workers to resume experience collection... (13150 times) [2024-06-15 14:32:18,869][1651669] InferenceWorker_p0-w0: resuming experience collection (13150 times) [2024-06-15 14:32:19,540][1651669] Updated weights for policy 0, policy_version 249913 (0.0013) [2024-06-15 14:32:20,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47513.7, 300 sec: 47654.1). Total num frames: 511868928. Throughput: 0: 11526.7. Samples: 128041472. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:20,767][1648981] Avg episode reward: [(0, '458.300')] [2024-06-15 14:32:21,490][1651669] Updated weights for policy 0, policy_version 249982 (0.0013) [2024-06-15 14:32:23,753][1651669] Updated weights for policy 0, policy_version 250039 (0.0015) [2024-06-15 14:32:25,779][1648981] Fps is (10 sec: 52361.3, 60 sec: 45871.3, 300 sec: 47983.6). Total num frames: 512098304. Throughput: 0: 11454.4. Samples: 128072192. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:25,780][1648981] Avg episode reward: [(0, '455.850')] [2024-06-15 14:32:27,900][1651669] Updated weights for policy 0, policy_version 250082 (0.0014) [2024-06-15 14:32:29,703][1651669] Updated weights for policy 0, policy_version 250167 (0.0107) [2024-06-15 14:32:30,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 512360448. Throughput: 0: 11298.1. Samples: 128144896. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:30,767][1648981] Avg episode reward: [(0, '456.880')] [2024-06-15 14:32:32,260][1651669] Updated weights for policy 0, policy_version 250208 (0.0012) [2024-06-15 14:32:33,979][1651669] Updated weights for policy 0, policy_version 250241 (0.0015) [2024-06-15 14:32:35,334][1651669] Updated weights for policy 0, policy_version 250296 (0.0015) [2024-06-15 14:32:35,766][1648981] Fps is (10 sec: 52496.5, 60 sec: 47516.6, 300 sec: 47985.7). Total num frames: 512622592. Throughput: 0: 11520.0. Samples: 128214528. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:35,767][1648981] Avg episode reward: [(0, '458.870')] [2024-06-15 14:32:39,235][1651669] Updated weights for policy 0, policy_version 250336 (0.0024) [2024-06-15 14:32:40,772][1648981] Fps is (10 sec: 45851.4, 60 sec: 46964.3, 300 sec: 47540.5). Total num frames: 512819200. Throughput: 0: 11473.6. Samples: 128257536. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:40,772][1648981] Avg episode reward: [(0, '454.720')] [2024-06-15 14:32:40,777][1651669] Updated weights for policy 0, policy_version 250404 (0.0125) [2024-06-15 14:32:43,384][1651669] Updated weights for policy 0, policy_version 250466 (0.0014) [2024-06-15 14:32:45,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 46421.4, 300 sec: 47652.5). Total num frames: 513048576. Throughput: 0: 11616.7. Samples: 128319488. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:45,767][1648981] Avg episode reward: [(0, '449.660')] [2024-06-15 14:32:45,985][1651669] Updated weights for policy 0, policy_version 250528 (0.0013) [2024-06-15 14:32:50,217][1651669] Updated weights for policy 0, policy_version 250576 (0.0012) [2024-06-15 14:32:50,766][1648981] Fps is (10 sec: 39342.0, 60 sec: 44799.5, 300 sec: 47208.2). Total num frames: 513212416. Throughput: 0: 11616.7. Samples: 128396800. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:50,767][1648981] Avg episode reward: [(0, '447.240')] [2024-06-15 14:32:51,872][1651669] Updated weights for policy 0, policy_version 250641 (0.0013) [2024-06-15 14:32:52,774][1651669] Updated weights for policy 0, policy_version 250688 (0.0012) [2024-06-15 14:32:55,044][1651669] Updated weights for policy 0, policy_version 250752 (0.0128) [2024-06-15 14:32:55,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 47532.4, 300 sec: 47541.4). Total num frames: 513540096. Throughput: 0: 11673.6. Samples: 128426496. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:32:55,767][1648981] Avg episode reward: [(0, '459.660')] [2024-06-15 14:32:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000250752_513540096.pth... [2024-06-15 14:32:55,826][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000245184_502136832.pth [2024-06-15 14:32:57,766][1651669] Updated weights for policy 0, policy_version 250807 (0.0010) [2024-06-15 14:33:00,797][1648981] Fps is (10 sec: 45733.7, 60 sec: 44213.9, 300 sec: 47314.2). Total num frames: 513671168. Throughput: 0: 11824.8. Samples: 128499200. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:33:00,802][1648981] Avg episode reward: [(0, '456.770')] [2024-06-15 14:33:01,671][1651669] Updated weights for policy 0, policy_version 250852 (0.0082) [2024-06-15 14:33:02,843][1651274] Signal inference workers to stop experience collection... (13200 times) [2024-06-15 14:33:02,960][1651669] InferenceWorker_p0-w0: stopping experience collection (13200 times) [2024-06-15 14:33:03,101][1651274] Signal inference workers to resume experience collection... (13200 times) [2024-06-15 14:33:03,103][1651669] InferenceWorker_p0-w0: resuming experience collection (13200 times) [2024-06-15 14:33:03,519][1651669] Updated weights for policy 0, policy_version 250912 (0.0014) [2024-06-15 14:33:05,041][1651669] Updated weights for policy 0, policy_version 250960 (0.0016) [2024-06-15 14:33:05,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 514031616. Throughput: 0: 11707.7. Samples: 128568320. Policy #0 lag: (min: 32.0, avg: 173.8, max: 330.0) [2024-06-15 14:33:05,767][1648981] Avg episode reward: [(0, '447.300')] [2024-06-15 14:33:06,780][1651669] Updated weights for policy 0, policy_version 251013 (0.0021) [2024-06-15 14:33:07,984][1651669] Updated weights for policy 0, policy_version 251065 (0.0012) [2024-06-15 14:33:10,766][1648981] Fps is (10 sec: 52591.3, 60 sec: 45875.2, 300 sec: 47319.8). Total num frames: 514195456. Throughput: 0: 11836.3. Samples: 128604672. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:10,767][1648981] Avg episode reward: [(0, '422.790')] [2024-06-15 14:33:11,435][1651669] Updated weights for policy 0, policy_version 251104 (0.0113) [2024-06-15 14:33:12,184][1651669] Updated weights for policy 0, policy_version 251135 (0.0010) [2024-06-15 14:33:15,453][1651669] Updated weights for policy 0, policy_version 251202 (0.0014) [2024-06-15 14:33:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 514490368. Throughput: 0: 11992.2. Samples: 128684544. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:15,767][1648981] Avg episode reward: [(0, '423.760')] [2024-06-15 14:33:18,578][1651669] Updated weights for policy 0, policy_version 251296 (0.0019) [2024-06-15 14:33:20,782][1648981] Fps is (10 sec: 52345.9, 60 sec: 47501.1, 300 sec: 47761.0). Total num frames: 514719744. Throughput: 0: 11851.5. Samples: 128748032. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:20,783][1648981] Avg episode reward: [(0, '431.820')] [2024-06-15 14:33:22,302][1651669] Updated weights for policy 0, policy_version 251344 (0.0021) [2024-06-15 14:33:23,706][1651669] Updated weights for policy 0, policy_version 251392 (0.0013) [2024-06-15 14:33:25,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46977.6, 300 sec: 47208.1). Total num frames: 514916352. Throughput: 0: 11697.7. Samples: 128783872. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:25,767][1648981] Avg episode reward: [(0, '442.710')] [2024-06-15 14:33:26,045][1651669] Updated weights for policy 0, policy_version 251455 (0.0035) [2024-06-15 14:33:27,628][1651669] Updated weights for policy 0, policy_version 251510 (0.0013) [2024-06-15 14:33:30,291][1651669] Updated weights for policy 0, policy_version 251581 (0.0026) [2024-06-15 14:33:30,766][1648981] Fps is (10 sec: 52512.1, 60 sec: 48059.7, 300 sec: 47987.6). Total num frames: 515244032. Throughput: 0: 11901.1. Samples: 128855040. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:30,767][1648981] Avg episode reward: [(0, '459.190')] [2024-06-15 14:33:34,983][1651669] Updated weights for policy 0, policy_version 251647 (0.0012) [2024-06-15 14:33:35,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 515375104. Throughput: 0: 11855.6. Samples: 128930304. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:35,767][1648981] Avg episode reward: [(0, '468.120')] [2024-06-15 14:33:36,849][1651669] Updated weights for policy 0, policy_version 251705 (0.0011) [2024-06-15 14:33:38,689][1651669] Updated weights for policy 0, policy_version 251761 (0.0012) [2024-06-15 14:33:40,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 47517.4, 300 sec: 47656.3). Total num frames: 515670016. Throughput: 0: 11866.9. Samples: 128960512. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:40,768][1648981] Avg episode reward: [(0, '475.280')] [2024-06-15 14:33:41,037][1651669] Updated weights for policy 0, policy_version 251810 (0.0039) [2024-06-15 14:33:44,328][1651669] Updated weights for policy 0, policy_version 251843 (0.0013) [2024-06-15 14:33:45,669][1651669] Updated weights for policy 0, policy_version 251903 (0.0024) [2024-06-15 14:33:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 515899392. Throughput: 0: 12102.9. Samples: 129043456. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:45,767][1648981] Avg episode reward: [(0, '484.070')] [2024-06-15 14:33:46,425][1651274] Signal inference workers to stop experience collection... (13250 times) [2024-06-15 14:33:46,482][1651669] InferenceWorker_p0-w0: stopping experience collection (13250 times) [2024-06-15 14:33:46,716][1651274] Signal inference workers to resume experience collection... (13250 times) [2024-06-15 14:33:46,718][1651669] InferenceWorker_p0-w0: resuming experience collection (13250 times) [2024-06-15 14:33:47,184][1651669] Updated weights for policy 0, policy_version 251959 (0.0026) [2024-06-15 14:33:49,109][1651669] Updated weights for policy 0, policy_version 252021 (0.0014) [2024-06-15 14:33:50,766][1648981] Fps is (10 sec: 49153.9, 60 sec: 49152.0, 300 sec: 47544.6). Total num frames: 516161536. Throughput: 0: 11992.2. Samples: 129107968. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:50,767][1648981] Avg episode reward: [(0, '496.940')] [2024-06-15 14:33:51,938][1651669] Updated weights for policy 0, policy_version 252055 (0.0029) [2024-06-15 14:33:55,540][1651669] Updated weights for policy 0, policy_version 252112 (0.0012) [2024-06-15 14:33:55,767][1648981] Fps is (10 sec: 42596.6, 60 sec: 46421.0, 300 sec: 47544.7). Total num frames: 516325376. Throughput: 0: 12071.7. Samples: 129147904. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:33:55,767][1648981] Avg episode reward: [(0, '484.280')] [2024-06-15 14:33:56,938][1651669] Updated weights for policy 0, policy_version 252162 (0.0013) [2024-06-15 14:33:58,315][1651669] Updated weights for policy 0, policy_version 252216 (0.0013) [2024-06-15 14:33:59,806][1651669] Updated weights for policy 0, policy_version 252259 (0.0012) [2024-06-15 14:34:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50270.2, 300 sec: 47541.4). Total num frames: 516685824. Throughput: 0: 11798.8. Samples: 129215488. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:34:00,767][1648981] Avg episode reward: [(0, '486.310')] [2024-06-15 14:34:02,974][1651669] Updated weights for policy 0, policy_version 252306 (0.0017) [2024-06-15 14:34:05,781][1648981] Fps is (10 sec: 49084.0, 60 sec: 46410.3, 300 sec: 47539.1). Total num frames: 516816896. Throughput: 0: 12220.1. Samples: 129297920. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:34:05,782][1648981] Avg episode reward: [(0, '468.400')] [2024-06-15 14:34:05,812][1651669] Updated weights for policy 0, policy_version 252354 (0.0012) [2024-06-15 14:34:06,853][1651669] Updated weights for policy 0, policy_version 252414 (0.0013) [2024-06-15 14:34:08,432][1651669] Updated weights for policy 0, policy_version 252475 (0.0092) [2024-06-15 14:34:10,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 47652.4). Total num frames: 517177344. Throughput: 0: 12174.2. Samples: 129331712. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:34:10,767][1648981] Avg episode reward: [(0, '469.100')] [2024-06-15 14:34:10,872][1651669] Updated weights for policy 0, policy_version 252540 (0.0013) [2024-06-15 14:34:13,950][1651669] Updated weights for policy 0, policy_version 252607 (0.0039) [2024-06-15 14:34:15,767][1648981] Fps is (10 sec: 52503.1, 60 sec: 47513.5, 300 sec: 47763.5). Total num frames: 517341184. Throughput: 0: 12185.6. Samples: 129403392. Policy #0 lag: (min: 54.0, avg: 199.3, max: 310.0) [2024-06-15 14:34:15,767][1648981] Avg episode reward: [(0, '474.540')] [2024-06-15 14:34:17,450][1651669] Updated weights for policy 0, policy_version 252672 (0.0015) [2024-06-15 14:34:19,687][1651669] Updated weights for policy 0, policy_version 252727 (0.0023) [2024-06-15 14:34:20,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48072.4, 300 sec: 47541.4). Total num frames: 517603328. Throughput: 0: 12083.2. Samples: 129474048. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:20,767][1648981] Avg episode reward: [(0, '473.630')] [2024-06-15 14:34:21,507][1651669] Updated weights for policy 0, policy_version 252768 (0.0015) [2024-06-15 14:34:24,561][1651669] Updated weights for policy 0, policy_version 252832 (0.0014) [2024-06-15 14:34:25,774][1648981] Fps is (10 sec: 52388.8, 60 sec: 49145.7, 300 sec: 47984.4). Total num frames: 517865472. Throughput: 0: 12263.2. Samples: 129512448. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:25,775][1648981] Avg episode reward: [(0, '492.660')] [2024-06-15 14:34:27,663][1651669] Updated weights for policy 0, policy_version 252880 (0.0011) [2024-06-15 14:34:29,366][1651669] Updated weights for policy 0, policy_version 252931 (0.0014) [2024-06-15 14:34:30,567][1651669] Updated weights for policy 0, policy_version 252984 (0.0018) [2024-06-15 14:34:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 518127616. Throughput: 0: 12049.1. Samples: 129585664. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:30,767][1648981] Avg episode reward: [(0, '477.990')] [2024-06-15 14:34:31,810][1651274] Signal inference workers to stop experience collection... (13300 times) [2024-06-15 14:34:31,880][1651669] InferenceWorker_p0-w0: stopping experience collection (13300 times) [2024-06-15 14:34:32,006][1651274] Signal inference workers to resume experience collection... (13300 times) [2024-06-15 14:34:32,013][1651669] InferenceWorker_p0-w0: resuming experience collection (13300 times) [2024-06-15 14:34:32,447][1651669] Updated weights for policy 0, policy_version 253028 (0.0013) [2024-06-15 14:34:34,753][1651669] Updated weights for policy 0, policy_version 253062 (0.0047) [2024-06-15 14:34:35,766][1648981] Fps is (10 sec: 45911.0, 60 sec: 49152.1, 300 sec: 47763.7). Total num frames: 518324224. Throughput: 0: 12242.5. Samples: 129658880. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:35,767][1648981] Avg episode reward: [(0, '478.160')] [2024-06-15 14:34:36,075][1651669] Updated weights for policy 0, policy_version 253115 (0.0013) [2024-06-15 14:34:38,170][1651669] Updated weights for policy 0, policy_version 253168 (0.0011) [2024-06-15 14:34:40,580][1651669] Updated weights for policy 0, policy_version 253232 (0.0018) [2024-06-15 14:34:40,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49152.3, 300 sec: 47874.6). Total num frames: 518619136. Throughput: 0: 12208.5. Samples: 129697280. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:40,767][1648981] Avg episode reward: [(0, '464.600')] [2024-06-15 14:34:42,588][1651669] Updated weights for policy 0, policy_version 253265 (0.0012) [2024-06-15 14:34:45,576][1651669] Updated weights for policy 0, policy_version 253318 (0.0013) [2024-06-15 14:34:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 47653.1). Total num frames: 518815744. Throughput: 0: 12367.6. Samples: 129772032. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:45,767][1648981] Avg episode reward: [(0, '465.400')] [2024-06-15 14:34:46,932][1651669] Updated weights for policy 0, policy_version 253376 (0.0013) [2024-06-15 14:34:49,318][1651669] Updated weights for policy 0, policy_version 253428 (0.0012) [2024-06-15 14:34:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 519077888. Throughput: 0: 12155.3. Samples: 129844736. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:50,767][1648981] Avg episode reward: [(0, '461.600')] [2024-06-15 14:34:51,168][1651669] Updated weights for policy 0, policy_version 253488 (0.0088) [2024-06-15 14:34:53,967][1651669] Updated weights for policy 0, policy_version 253561 (0.0015) [2024-06-15 14:34:55,767][1648981] Fps is (10 sec: 49148.9, 60 sec: 49698.0, 300 sec: 47541.3). Total num frames: 519307264. Throughput: 0: 12174.1. Samples: 129879552. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:34:55,768][1648981] Avg episode reward: [(0, '461.770')] [2024-06-15 14:34:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000253568_519307264.pth... [2024-06-15 14:34:55,822][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000248000_507904000.pth [2024-06-15 14:34:57,073][1651669] Updated weights for policy 0, policy_version 253605 (0.0017) [2024-06-15 14:34:58,963][1651669] Updated weights for policy 0, policy_version 253649 (0.0012) [2024-06-15 14:35:00,060][1651669] Updated weights for policy 0, policy_version 253696 (0.0014) [2024-06-15 14:35:00,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 48059.5, 300 sec: 47763.5). Total num frames: 519569408. Throughput: 0: 12367.6. Samples: 129959936. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:00,767][1648981] Avg episode reward: [(0, '457.090')] [2024-06-15 14:35:02,225][1651669] Updated weights for policy 0, policy_version 253755 (0.0026) [2024-06-15 14:35:04,270][1651669] Updated weights for policy 0, policy_version 253808 (0.0013) [2024-06-15 14:35:05,774][1648981] Fps is (10 sec: 52391.1, 60 sec: 50249.7, 300 sec: 47542.0). Total num frames: 519831552. Throughput: 0: 12331.4. Samples: 130029056. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:05,775][1648981] Avg episode reward: [(0, '457.620')] [2024-06-15 14:35:07,343][1651669] Updated weights for policy 0, policy_version 253856 (0.0015) [2024-06-15 14:35:09,963][1651669] Updated weights for policy 0, policy_version 253891 (0.0014) [2024-06-15 14:35:10,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 47513.7, 300 sec: 47766.4). Total num frames: 520028160. Throughput: 0: 12244.6. Samples: 130063360. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:10,767][1648981] Avg episode reward: [(0, '452.660')] [2024-06-15 14:35:11,154][1651669] Updated weights for policy 0, policy_version 253942 (0.0015) [2024-06-15 14:35:13,183][1651669] Updated weights for policy 0, policy_version 253985 (0.0014) [2024-06-15 14:35:14,640][1651669] Updated weights for policy 0, policy_version 254038 (0.0015) [2024-06-15 14:35:14,870][1651274] Signal inference workers to stop experience collection... (13350 times) [2024-06-15 14:35:14,914][1651669] InferenceWorker_p0-w0: stopping experience collection (13350 times) [2024-06-15 14:35:15,139][1651274] Signal inference workers to resume experience collection... (13350 times) [2024-06-15 14:35:15,139][1651669] InferenceWorker_p0-w0: resuming experience collection (13350 times) [2024-06-15 14:35:15,775][1648981] Fps is (10 sec: 52425.4, 60 sec: 50237.3, 300 sec: 47540.0). Total num frames: 520355840. Throughput: 0: 12149.2. Samples: 130132480. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:15,775][1648981] Avg episode reward: [(0, '458.200')] [2024-06-15 14:35:17,359][1651669] Updated weights for policy 0, policy_version 254096 (0.0016) [2024-06-15 14:35:18,542][1651669] Updated weights for policy 0, policy_version 254144 (0.0014) [2024-06-15 14:35:20,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 520486912. Throughput: 0: 12333.5. Samples: 130213888. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:20,767][1648981] Avg episode reward: [(0, '456.580')] [2024-06-15 14:35:22,188][1651669] Updated weights for policy 0, policy_version 254200 (0.0013) [2024-06-15 14:35:23,892][1651669] Updated weights for policy 0, policy_version 254256 (0.0015) [2024-06-15 14:35:25,195][1651669] Updated weights for policy 0, policy_version 254289 (0.0014) [2024-06-15 14:35:25,766][1648981] Fps is (10 sec: 45913.7, 60 sec: 49158.3, 300 sec: 47321.1). Total num frames: 520814592. Throughput: 0: 12299.4. Samples: 130250752. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:25,767][1648981] Avg episode reward: [(0, '461.030')] [2024-06-15 14:35:26,211][1651669] Updated weights for policy 0, policy_version 254335 (0.0046) [2024-06-15 14:35:28,645][1651669] Updated weights for policy 0, policy_version 254399 (0.0013) [2024-06-15 14:35:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 521011200. Throughput: 0: 12117.3. Samples: 130317312. Policy #0 lag: (min: 7.0, avg: 125.5, max: 263.0) [2024-06-15 14:35:30,767][1648981] Avg episode reward: [(0, '476.960')] [2024-06-15 14:35:32,682][1651669] Updated weights for policy 0, policy_version 254464 (0.0012) [2024-06-15 14:35:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 521273344. Throughput: 0: 12083.2. Samples: 130388480. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:35:35,767][1648981] Avg episode reward: [(0, '452.440')] [2024-06-15 14:35:36,441][1651669] Updated weights for policy 0, policy_version 254530 (0.0014) [2024-06-15 14:35:37,729][1651669] Updated weights for policy 0, policy_version 254587 (0.0013) [2024-06-15 14:35:39,781][1651669] Updated weights for policy 0, policy_version 254626 (0.0012) [2024-06-15 14:35:40,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 521535488. Throughput: 0: 12072.0. Samples: 130422784. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:35:40,767][1648981] Avg episode reward: [(0, '460.520')] [2024-06-15 14:35:42,849][1651669] Updated weights for policy 0, policy_version 254688 (0.0018) [2024-06-15 14:35:45,334][1651669] Updated weights for policy 0, policy_version 254753 (0.0013) [2024-06-15 14:35:45,768][1648981] Fps is (10 sec: 49144.0, 60 sec: 49150.6, 300 sec: 47764.3). Total num frames: 521764864. Throughput: 0: 12082.8. Samples: 130503680. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:35:45,769][1648981] Avg episode reward: [(0, '448.970')] [2024-06-15 14:35:45,914][1651669] Updated weights for policy 0, policy_version 254784 (0.0011) [2024-06-15 14:35:48,784][1651669] Updated weights for policy 0, policy_version 254847 (0.0014) [2024-06-15 14:35:50,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48605.9, 300 sec: 47763.7). Total num frames: 521994240. Throughput: 0: 12005.6. Samples: 130569216. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:35:50,767][1648981] Avg episode reward: [(0, '462.740')] [2024-06-15 14:35:51,485][1651669] Updated weights for policy 0, policy_version 254907 (0.0013) [2024-06-15 14:35:55,773][1648981] Fps is (10 sec: 42575.8, 60 sec: 48054.7, 300 sec: 47653.0). Total num frames: 522190848. Throughput: 0: 11978.9. Samples: 130602496. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:35:55,774][1648981] Avg episode reward: [(0, '457.090')] [2024-06-15 14:35:55,954][1651669] Updated weights for policy 0, policy_version 254994 (0.0025) [2024-06-15 14:35:58,946][1651669] Updated weights for policy 0, policy_version 255041 (0.0037) [2024-06-15 14:36:00,180][1651669] Updated weights for policy 0, policy_version 255099 (0.0087) [2024-06-15 14:36:00,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 522452992. Throughput: 0: 12119.6. Samples: 130677760. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:00,767][1648981] Avg episode reward: [(0, '448.930')] [2024-06-15 14:36:01,409][1651274] Signal inference workers to stop experience collection... (13400 times) [2024-06-15 14:36:01,550][1651669] InferenceWorker_p0-w0: stopping experience collection (13400 times) [2024-06-15 14:36:01,767][1651274] Signal inference workers to resume experience collection... (13400 times) [2024-06-15 14:36:01,767][1651669] InferenceWorker_p0-w0: resuming experience collection (13400 times) [2024-06-15 14:36:02,554][1651669] Updated weights for policy 0, policy_version 255161 (0.0098) [2024-06-15 14:36:04,063][1651669] Updated weights for policy 0, policy_version 255200 (0.0016) [2024-06-15 14:36:05,783][1648981] Fps is (10 sec: 52380.8, 60 sec: 48053.1, 300 sec: 47760.9). Total num frames: 522715136. Throughput: 0: 11783.2. Samples: 130744320. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:05,783][1648981] Avg episode reward: [(0, '456.280')] [2024-06-15 14:36:06,577][1651669] Updated weights for policy 0, policy_version 255237 (0.0012) [2024-06-15 14:36:09,894][1651669] Updated weights for policy 0, policy_version 255297 (0.0014) [2024-06-15 14:36:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 522911744. Throughput: 0: 11855.7. Samples: 130784256. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:10,767][1648981] Avg episode reward: [(0, '438.770')] [2024-06-15 14:36:11,114][1651669] Updated weights for policy 0, policy_version 255360 (0.0014) [2024-06-15 14:36:13,259][1651669] Updated weights for policy 0, policy_version 255420 (0.0100) [2024-06-15 14:36:15,766][1648981] Fps is (10 sec: 45949.4, 60 sec: 46974.1, 300 sec: 47985.7). Total num frames: 523173888. Throughput: 0: 11923.9. Samples: 130853888. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:15,767][1648981] Avg episode reward: [(0, '465.790')] [2024-06-15 14:36:16,019][1651669] Updated weights for policy 0, policy_version 255472 (0.0016) [2024-06-15 14:36:18,489][1651669] Updated weights for policy 0, policy_version 255504 (0.0013) [2024-06-15 14:36:19,436][1651669] Updated weights for policy 0, policy_version 255552 (0.0013) [2024-06-15 14:36:20,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 47542.6). Total num frames: 523370496. Throughput: 0: 11935.3. Samples: 130925568. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:20,767][1648981] Avg episode reward: [(0, '471.920')] [2024-06-15 14:36:22,005][1651669] Updated weights for policy 0, policy_version 255603 (0.0013) [2024-06-15 14:36:24,028][1651669] Updated weights for policy 0, policy_version 255664 (0.0012) [2024-06-15 14:36:25,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.6, 300 sec: 47985.7). Total num frames: 523632640. Throughput: 0: 11992.2. Samples: 130962432. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:25,767][1648981] Avg episode reward: [(0, '471.010')] [2024-06-15 14:36:25,796][1651669] Updated weights for policy 0, policy_version 255696 (0.0052) [2024-06-15 14:36:27,097][1651669] Updated weights for policy 0, policy_version 255744 (0.0012) [2024-06-15 14:36:30,245][1651669] Updated weights for policy 0, policy_version 255806 (0.0014) [2024-06-15 14:36:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47875.2). Total num frames: 523894784. Throughput: 0: 11753.7. Samples: 131032576. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:30,767][1648981] Avg episode reward: [(0, '465.180')] [2024-06-15 14:36:33,141][1651669] Updated weights for policy 0, policy_version 255859 (0.0134) [2024-06-15 14:36:34,564][1651669] Updated weights for policy 0, policy_version 255904 (0.0014) [2024-06-15 14:36:35,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47985.9). Total num frames: 524156928. Throughput: 0: 11912.5. Samples: 131105280. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:35,767][1648981] Avg episode reward: [(0, '454.910')] [2024-06-15 14:36:35,787][1651669] Updated weights for policy 0, policy_version 255937 (0.0013) [2024-06-15 14:36:37,138][1651669] Updated weights for policy 0, policy_version 255993 (0.0026) [2024-06-15 14:36:40,029][1651669] Updated weights for policy 0, policy_version 256033 (0.0106) [2024-06-15 14:36:40,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 48056.8, 300 sec: 47985.1). Total num frames: 524419072. Throughput: 0: 12015.8. Samples: 131143168. Policy #0 lag: (min: 12.0, avg: 103.6, max: 268.0) [2024-06-15 14:36:40,771][1648981] Avg episode reward: [(0, '445.990')] [2024-06-15 14:36:43,028][1651669] Updated weights for policy 0, policy_version 256081 (0.0030) [2024-06-15 14:36:44,620][1651669] Updated weights for policy 0, policy_version 256132 (0.0014) [2024-06-15 14:36:45,772][1648981] Fps is (10 sec: 52401.5, 60 sec: 48603.0, 300 sec: 47988.4). Total num frames: 524681216. Throughput: 0: 12024.9. Samples: 131218944. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:36:45,773][1648981] Avg episode reward: [(0, '442.810')] [2024-06-15 14:36:46,250][1651274] Signal inference workers to stop experience collection... (13450 times) [2024-06-15 14:36:46,295][1651669] InferenceWorker_p0-w0: stopping experience collection (13450 times) [2024-06-15 14:36:46,313][1651669] Updated weights for policy 0, policy_version 256195 (0.0012) [2024-06-15 14:36:46,495][1651274] Signal inference workers to resume experience collection... (13450 times) [2024-06-15 14:36:46,496][1651669] InferenceWorker_p0-w0: resuming experience collection (13450 times) [2024-06-15 14:36:47,584][1651669] Updated weights for policy 0, policy_version 256248 (0.0013) [2024-06-15 14:36:50,767][1648981] Fps is (10 sec: 39335.3, 60 sec: 46967.2, 300 sec: 47878.4). Total num frames: 524812288. Throughput: 0: 12212.7. Samples: 131293696. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:36:50,768][1648981] Avg episode reward: [(0, '439.800')] [2024-06-15 14:36:51,776][1651669] Updated weights for policy 0, policy_version 256304 (0.0013) [2024-06-15 14:36:54,365][1651669] Updated weights for policy 0, policy_version 256353 (0.0014) [2024-06-15 14:36:55,766][1648981] Fps is (10 sec: 39341.8, 60 sec: 48065.2, 300 sec: 47652.4). Total num frames: 525074432. Throughput: 0: 12071.8. Samples: 131327488. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:36:55,767][1648981] Avg episode reward: [(0, '427.430')] [2024-06-15 14:36:56,073][1651669] Updated weights for policy 0, policy_version 256403 (0.0013) [2024-06-15 14:36:56,288][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000256416_525139968.pth... [2024-06-15 14:36:56,384][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000250752_513540096.pth [2024-06-15 14:36:57,272][1651669] Updated weights for policy 0, policy_version 256449 (0.0011) [2024-06-15 14:36:58,380][1651669] Updated weights for policy 0, policy_version 256508 (0.0012) [2024-06-15 14:37:00,783][1648981] Fps is (10 sec: 52344.3, 60 sec: 48046.6, 300 sec: 48205.1). Total num frames: 525336576. Throughput: 0: 12112.9. Samples: 131399168. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:00,784][1648981] Avg episode reward: [(0, '442.400')] [2024-06-15 14:37:02,835][1651669] Updated weights for policy 0, policy_version 256564 (0.0014) [2024-06-15 14:37:04,684][1651669] Updated weights for policy 0, policy_version 256608 (0.0013) [2024-06-15 14:37:05,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48072.7, 300 sec: 47985.7). Total num frames: 525598720. Throughput: 0: 12197.0. Samples: 131474432. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:05,767][1648981] Avg episode reward: [(0, '465.510')] [2024-06-15 14:37:06,386][1651669] Updated weights for policy 0, policy_version 256658 (0.0060) [2024-06-15 14:37:07,831][1651669] Updated weights for policy 0, policy_version 256721 (0.0018) [2024-06-15 14:37:10,766][1648981] Fps is (10 sec: 52515.1, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 525860864. Throughput: 0: 12003.5. Samples: 131502592. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:10,767][1648981] Avg episode reward: [(0, '497.060')] [2024-06-15 14:37:11,929][1651669] Updated weights for policy 0, policy_version 256784 (0.0130) [2024-06-15 14:37:14,668][1651669] Updated weights for policy 0, policy_version 256840 (0.0013) [2024-06-15 14:37:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 48207.8). Total num frames: 526090240. Throughput: 0: 12470.0. Samples: 131593728. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:15,767][1648981] Avg episode reward: [(0, '462.250')] [2024-06-15 14:37:16,836][1651669] Updated weights for policy 0, policy_version 256898 (0.0013) [2024-06-15 14:37:18,341][1651669] Updated weights for policy 0, policy_version 256961 (0.0018) [2024-06-15 14:37:20,767][1648981] Fps is (10 sec: 52424.2, 60 sec: 50243.6, 300 sec: 48432.0). Total num frames: 526385152. Throughput: 0: 12208.1. Samples: 131654656. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:20,768][1648981] Avg episode reward: [(0, '465.020')] [2024-06-15 14:37:23,142][1651669] Updated weights for policy 0, policy_version 257056 (0.0091) [2024-06-15 14:37:24,827][1651669] Updated weights for policy 0, policy_version 257094 (0.0011) [2024-06-15 14:37:25,782][1648981] Fps is (10 sec: 49074.6, 60 sec: 49139.1, 300 sec: 48205.3). Total num frames: 526581760. Throughput: 0: 12375.7. Samples: 131700224. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:25,783][1648981] Avg episode reward: [(0, '467.730')] [2024-06-15 14:37:27,109][1651669] Updated weights for policy 0, policy_version 257155 (0.0012) [2024-06-15 14:37:28,440][1651669] Updated weights for policy 0, policy_version 257208 (0.0037) [2024-06-15 14:37:28,641][1651274] Signal inference workers to stop experience collection... (13500 times) [2024-06-15 14:37:28,704][1651669] InferenceWorker_p0-w0: stopping experience collection (13500 times) [2024-06-15 14:37:28,759][1651274] Signal inference workers to resume experience collection... (13500 times) [2024-06-15 14:37:28,759][1651669] InferenceWorker_p0-w0: resuming experience collection (13500 times) [2024-06-15 14:37:29,932][1651669] Updated weights for policy 0, policy_version 257264 (0.0114) [2024-06-15 14:37:30,766][1648981] Fps is (10 sec: 52433.2, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 526909440. Throughput: 0: 12243.9. Samples: 131769856. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:30,767][1648981] Avg episode reward: [(0, '477.510')] [2024-06-15 14:37:33,209][1651669] Updated weights for policy 0, policy_version 257296 (0.0025) [2024-06-15 14:37:34,136][1651669] Updated weights for policy 0, policy_version 257344 (0.0015) [2024-06-15 14:37:35,766][1648981] Fps is (10 sec: 49229.6, 60 sec: 48605.9, 300 sec: 48319.8). Total num frames: 527073280. Throughput: 0: 12538.4. Samples: 131857920. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:35,767][1648981] Avg episode reward: [(0, '494.440')] [2024-06-15 14:37:37,221][1651669] Updated weights for policy 0, policy_version 257409 (0.0012) [2024-06-15 14:37:38,517][1651669] Updated weights for policy 0, policy_version 257465 (0.0012) [2024-06-15 14:37:40,087][1651669] Updated weights for policy 0, policy_version 257507 (0.0020) [2024-06-15 14:37:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50247.4, 300 sec: 48763.2). Total num frames: 527433728. Throughput: 0: 12367.7. Samples: 131884032. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:40,767][1648981] Avg episode reward: [(0, '499.140')] [2024-06-15 14:37:44,080][1651669] Updated weights for policy 0, policy_version 257552 (0.0013) [2024-06-15 14:37:45,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48063.9, 300 sec: 48652.1). Total num frames: 527564800. Throughput: 0: 12520.1. Samples: 131962368. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:45,767][1648981] Avg episode reward: [(0, '505.740')] [2024-06-15 14:37:46,536][1651669] Updated weights for policy 0, policy_version 257632 (0.0012) [2024-06-15 14:37:48,463][1651669] Updated weights for policy 0, policy_version 257681 (0.0013) [2024-06-15 14:37:49,517][1651669] Updated weights for policy 0, policy_version 257723 (0.0013) [2024-06-15 14:37:50,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 51336.8, 300 sec: 48652.2). Total num frames: 527892480. Throughput: 0: 12322.1. Samples: 132028928. Policy #0 lag: (min: 15.0, avg: 108.0, max: 271.0) [2024-06-15 14:37:50,767][1648981] Avg episode reward: [(0, '515.080')] [2024-06-15 14:37:51,027][1651669] Updated weights for policy 0, policy_version 257763 (0.0013) [2024-06-15 14:37:51,230][1651274] Saving new best policy, reward=515.080! [2024-06-15 14:37:54,805][1651669] Updated weights for policy 0, policy_version 257808 (0.0013) [2024-06-15 14:37:55,549][1651669] Updated weights for policy 0, policy_version 257853 (0.0012) [2024-06-15 14:37:55,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.3, 300 sec: 48879.4). Total num frames: 528089088. Throughput: 0: 12526.9. Samples: 132066304. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:37:55,767][1648981] Avg episode reward: [(0, '495.440')] [2024-06-15 14:37:58,128][1651669] Updated weights for policy 0, policy_version 257910 (0.0015) [2024-06-15 14:37:59,800][1651669] Updated weights for policy 0, policy_version 257957 (0.0014) [2024-06-15 14:38:00,779][1648981] Fps is (10 sec: 49087.6, 60 sec: 50793.3, 300 sec: 48650.0). Total num frames: 528384000. Throughput: 0: 12284.4. Samples: 132146688. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:00,780][1648981] Avg episode reward: [(0, '499.960')] [2024-06-15 14:38:01,855][1651669] Updated weights for policy 0, policy_version 258036 (0.0014) [2024-06-15 14:38:05,689][1651669] Updated weights for policy 0, policy_version 258064 (0.0052) [2024-06-15 14:38:05,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 528515072. Throughput: 0: 12470.3. Samples: 132215808. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:05,767][1648981] Avg episode reward: [(0, '480.750')] [2024-06-15 14:38:07,723][1651669] Updated weights for policy 0, policy_version 258114 (0.0013) [2024-06-15 14:38:09,829][1651669] Updated weights for policy 0, policy_version 258180 (0.0014) [2024-06-15 14:38:10,766][1648981] Fps is (10 sec: 45935.1, 60 sec: 49698.1, 300 sec: 48652.1). Total num frames: 528842752. Throughput: 0: 12337.8. Samples: 132255232. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:10,767][1648981] Avg episode reward: [(0, '475.500')] [2024-06-15 14:38:10,818][1651274] Signal inference workers to stop experience collection... (13550 times) [2024-06-15 14:38:10,876][1651669] InferenceWorker_p0-w0: stopping experience collection (13550 times) [2024-06-15 14:38:11,166][1651274] Signal inference workers to resume experience collection... (13550 times) [2024-06-15 14:38:11,167][1651669] InferenceWorker_p0-w0: resuming experience collection (13550 times) [2024-06-15 14:38:12,038][1651669] Updated weights for policy 0, policy_version 258272 (0.0133) [2024-06-15 14:38:15,767][1648981] Fps is (10 sec: 49149.2, 60 sec: 48605.4, 300 sec: 48432.5). Total num frames: 529006592. Throughput: 0: 12231.0. Samples: 132320256. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:15,768][1648981] Avg episode reward: [(0, '439.460')] [2024-06-15 14:38:16,171][1651669] Updated weights for policy 0, policy_version 258320 (0.0013) [2024-06-15 14:38:17,195][1651669] Updated weights for policy 0, policy_version 258368 (0.0023) [2024-06-15 14:38:20,334][1651669] Updated weights for policy 0, policy_version 258431 (0.0012) [2024-06-15 14:38:20,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48060.5, 300 sec: 48652.2). Total num frames: 529268736. Throughput: 0: 11935.3. Samples: 132395008. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:20,767][1648981] Avg episode reward: [(0, '430.830')] [2024-06-15 14:38:22,411][1651669] Updated weights for policy 0, policy_version 258496 (0.0011) [2024-06-15 14:38:23,798][1651669] Updated weights for policy 0, policy_version 258556 (0.0013) [2024-06-15 14:38:25,775][1648981] Fps is (10 sec: 52385.5, 60 sec: 49157.7, 300 sec: 48428.5). Total num frames: 529530880. Throughput: 0: 12035.3. Samples: 132425728. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:25,776][1648981] Avg episode reward: [(0, '447.140')] [2024-06-15 14:38:27,653][1651669] Updated weights for policy 0, policy_version 258615 (0.0029) [2024-06-15 14:38:30,200][1651669] Updated weights for policy 0, policy_version 258656 (0.0015) [2024-06-15 14:38:30,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 47513.4, 300 sec: 48763.2). Total num frames: 529760256. Throughput: 0: 12105.9. Samples: 132507136. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:30,767][1648981] Avg episode reward: [(0, '465.810')] [2024-06-15 14:38:32,418][1651669] Updated weights for policy 0, policy_version 258704 (0.0064) [2024-06-15 14:38:34,876][1651669] Updated weights for policy 0, policy_version 258800 (0.0029) [2024-06-15 14:38:35,766][1648981] Fps is (10 sec: 52475.4, 60 sec: 49698.2, 300 sec: 48763.3). Total num frames: 530055168. Throughput: 0: 12003.5. Samples: 132569088. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:35,767][1648981] Avg episode reward: [(0, '454.700')] [2024-06-15 14:38:38,091][1651669] Updated weights for policy 0, policy_version 258842 (0.0013) [2024-06-15 14:38:40,770][1648981] Fps is (10 sec: 42583.7, 60 sec: 45872.4, 300 sec: 48429.4). Total num frames: 530186240. Throughput: 0: 12014.0. Samples: 132606976. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:40,771][1648981] Avg episode reward: [(0, '433.060')] [2024-06-15 14:38:41,863][1651669] Updated weights for policy 0, policy_version 258916 (0.0015) [2024-06-15 14:38:43,923][1651669] Updated weights for policy 0, policy_version 258976 (0.0019) [2024-06-15 14:38:45,445][1651669] Updated weights for policy 0, policy_version 259040 (0.0014) [2024-06-15 14:38:45,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 530513920. Throughput: 0: 11938.7. Samples: 132683776. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:45,767][1648981] Avg episode reward: [(0, '440.060')] [2024-06-15 14:38:48,533][1651669] Updated weights for policy 0, policy_version 259088 (0.0117) [2024-06-15 14:38:49,440][1651669] Updated weights for policy 0, policy_version 259136 (0.0017) [2024-06-15 14:38:50,766][1648981] Fps is (10 sec: 52449.1, 60 sec: 46967.5, 300 sec: 48763.3). Total num frames: 530710528. Throughput: 0: 11935.3. Samples: 132752896. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:50,767][1648981] Avg episode reward: [(0, '434.210')] [2024-06-15 14:38:53,454][1651669] Updated weights for policy 0, policy_version 259187 (0.0014) [2024-06-15 14:38:54,564][1651274] Signal inference workers to stop experience collection... (13600 times) [2024-06-15 14:38:54,627][1651669] InferenceWorker_p0-w0: stopping experience collection (13600 times) [2024-06-15 14:38:54,641][1651669] Updated weights for policy 0, policy_version 259220 (0.0012) [2024-06-15 14:38:54,790][1651274] Signal inference workers to resume experience collection... (13600 times) [2024-06-15 14:38:54,790][1651669] InferenceWorker_p0-w0: resuming experience collection (13600 times) [2024-06-15 14:38:55,768][1648981] Fps is (10 sec: 45869.2, 60 sec: 48058.7, 300 sec: 48429.8). Total num frames: 530972672. Throughput: 0: 11809.8. Samples: 132786688. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:38:55,768][1648981] Avg episode reward: [(0, '417.430')] [2024-06-15 14:38:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000259264_530972672.pth... [2024-06-15 14:38:55,847][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000253568_519307264.pth [2024-06-15 14:38:56,165][1651669] Updated weights for policy 0, policy_version 259270 (0.0012) [2024-06-15 14:38:57,380][1651669] Updated weights for policy 0, policy_version 259324 (0.0013) [2024-06-15 14:38:59,605][1651669] Updated weights for policy 0, policy_version 259361 (0.0014) [2024-06-15 14:39:00,112][1651669] Updated weights for policy 0, policy_version 259391 (0.0012) [2024-06-15 14:39:00,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 47524.0, 300 sec: 48876.7). Total num frames: 531234816. Throughput: 0: 11981.0. Samples: 132859392. Policy #0 lag: (min: 10.0, avg: 103.7, max: 266.0) [2024-06-15 14:39:00,767][1648981] Avg episode reward: [(0, '401.550')] [2024-06-15 14:39:04,686][1651669] Updated weights for policy 0, policy_version 259441 (0.0011) [2024-06-15 14:39:05,717][1651669] Updated weights for policy 0, policy_version 259490 (0.0014) [2024-06-15 14:39:05,766][1648981] Fps is (10 sec: 45881.0, 60 sec: 48605.8, 300 sec: 48318.9). Total num frames: 531431424. Throughput: 0: 11889.7. Samples: 132930048. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:05,767][1648981] Avg episode reward: [(0, '399.950')] [2024-06-15 14:39:07,220][1651669] Updated weights for policy 0, policy_version 259536 (0.0010) [2024-06-15 14:39:09,530][1651669] Updated weights for policy 0, policy_version 259587 (0.0018) [2024-06-15 14:39:10,355][1651669] Updated weights for policy 0, policy_version 259638 (0.0015) [2024-06-15 14:39:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48605.8, 300 sec: 48874.3). Total num frames: 531759104. Throughput: 0: 12085.6. Samples: 132969472. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:10,767][1648981] Avg episode reward: [(0, '417.710')] [2024-06-15 14:39:14,447][1651669] Updated weights for policy 0, policy_version 259696 (0.0022) [2024-06-15 14:39:15,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.4, 300 sec: 48652.1). Total num frames: 531955712. Throughput: 0: 12026.3. Samples: 133048320. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:15,770][1648981] Avg episode reward: [(0, '428.210')] [2024-06-15 14:39:15,784][1651669] Updated weights for policy 0, policy_version 259752 (0.0106) [2024-06-15 14:39:17,953][1651669] Updated weights for policy 0, policy_version 259808 (0.0022) [2024-06-15 14:39:20,599][1651669] Updated weights for policy 0, policy_version 259872 (0.0013) [2024-06-15 14:39:20,767][1648981] Fps is (10 sec: 45873.5, 60 sec: 49151.6, 300 sec: 48653.4). Total num frames: 532217856. Throughput: 0: 12276.5. Samples: 133121536. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:20,767][1648981] Avg episode reward: [(0, '423.960')] [2024-06-15 14:39:21,424][1651669] Updated weights for policy 0, policy_version 259904 (0.0013) [2024-06-15 14:39:25,449][1651669] Updated weights for policy 0, policy_version 259961 (0.0012) [2024-06-15 14:39:25,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48066.8, 300 sec: 48430.0). Total num frames: 532414464. Throughput: 0: 12323.2. Samples: 133161472. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:25,767][1648981] Avg episode reward: [(0, '436.350')] [2024-06-15 14:39:26,691][1651669] Updated weights for policy 0, policy_version 260016 (0.0014) [2024-06-15 14:39:28,520][1651669] Updated weights for policy 0, policy_version 260051 (0.0012) [2024-06-15 14:39:30,564][1651669] Updated weights for policy 0, policy_version 260098 (0.0012) [2024-06-15 14:39:30,766][1648981] Fps is (10 sec: 49154.1, 60 sec: 49152.2, 300 sec: 48763.2). Total num frames: 532709376. Throughput: 0: 12128.7. Samples: 133229568. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:30,767][1648981] Avg episode reward: [(0, '447.850')] [2024-06-15 14:39:34,469][1651669] Updated weights for policy 0, policy_version 260172 (0.0019) [2024-06-15 14:39:35,294][1651669] Updated weights for policy 0, policy_version 260221 (0.0014) [2024-06-15 14:39:35,716][1651274] Signal inference workers to stop experience collection... (13650 times) [2024-06-15 14:39:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 532938752. Throughput: 0: 12481.4. Samples: 133314560. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:35,767][1648981] Avg episode reward: [(0, '454.740')] [2024-06-15 14:39:35,812][1651669] InferenceWorker_p0-w0: stopping experience collection (13650 times) [2024-06-15 14:39:36,065][1651274] Signal inference workers to resume experience collection... (13650 times) [2024-06-15 14:39:36,066][1651669] InferenceWorker_p0-w0: resuming experience collection (13650 times) [2024-06-15 14:39:37,203][1651669] Updated weights for policy 0, policy_version 260280 (0.0013) [2024-06-15 14:39:39,602][1651669] Updated weights for policy 0, policy_version 260336 (0.0019) [2024-06-15 14:39:40,770][1648981] Fps is (10 sec: 49137.7, 60 sec: 50244.9, 300 sec: 48762.7). Total num frames: 533200896. Throughput: 0: 12458.2. Samples: 133347328. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:40,772][1648981] Avg episode reward: [(0, '457.280')] [2024-06-15 14:39:42,269][1651669] Updated weights for policy 0, policy_version 260384 (0.0012) [2024-06-15 14:39:45,250][1651669] Updated weights for policy 0, policy_version 260448 (0.0041) [2024-06-15 14:39:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 533430272. Throughput: 0: 12435.9. Samples: 133419008. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:45,767][1648981] Avg episode reward: [(0, '490.460')] [2024-06-15 14:39:47,814][1651669] Updated weights for policy 0, policy_version 260528 (0.0013) [2024-06-15 14:39:49,553][1651669] Updated weights for policy 0, policy_version 260569 (0.0113) [2024-06-15 14:39:50,766][1648981] Fps is (10 sec: 52444.0, 60 sec: 50244.1, 300 sec: 48874.4). Total num frames: 533725184. Throughput: 0: 12561.1. Samples: 133495296. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:50,767][1648981] Avg episode reward: [(0, '488.930')] [2024-06-15 14:39:52,261][1651669] Updated weights for policy 0, policy_version 260610 (0.0012) [2024-06-15 14:39:54,897][1651669] Updated weights for policy 0, policy_version 260674 (0.0124) [2024-06-15 14:39:55,776][1648981] Fps is (10 sec: 52380.4, 60 sec: 49691.6, 300 sec: 48761.7). Total num frames: 533954560. Throughput: 0: 12535.8. Samples: 133533696. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:39:55,776][1648981] Avg episode reward: [(0, '475.770')] [2024-06-15 14:39:55,912][1651669] Updated weights for policy 0, policy_version 260736 (0.0014) [2024-06-15 14:39:57,992][1651669] Updated weights for policy 0, policy_version 260789 (0.0016) [2024-06-15 14:40:00,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 49151.9, 300 sec: 48653.4). Total num frames: 534183936. Throughput: 0: 12344.9. Samples: 133603840. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:40:00,767][1648981] Avg episode reward: [(0, '503.660')] [2024-06-15 14:40:01,283][1651669] Updated weights for policy 0, policy_version 260850 (0.0013) [2024-06-15 14:40:03,436][1651669] Updated weights for policy 0, policy_version 260868 (0.0028) [2024-06-15 14:40:05,766][1648981] Fps is (10 sec: 42637.7, 60 sec: 49152.1, 300 sec: 48652.1). Total num frames: 534380544. Throughput: 0: 12288.1. Samples: 133674496. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:40:05,767][1648981] Avg episode reward: [(0, '495.090')] [2024-06-15 14:40:06,054][1651669] Updated weights for policy 0, policy_version 260939 (0.0015) [2024-06-15 14:40:07,040][1651669] Updated weights for policy 0, policy_version 260989 (0.0053) [2024-06-15 14:40:08,727][1651669] Updated weights for policy 0, policy_version 261055 (0.0013) [2024-06-15 14:40:10,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 48059.8, 300 sec: 48431.4). Total num frames: 534642688. Throughput: 0: 12094.6. Samples: 133705728. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:40:10,767][1648981] Avg episode reward: [(0, '486.870')] [2024-06-15 14:40:13,127][1651669] Updated weights for policy 0, policy_version 261114 (0.0014) [2024-06-15 14:40:14,613][1651669] Updated weights for policy 0, policy_version 261152 (0.0017) [2024-06-15 14:40:15,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 534904832. Throughput: 0: 12401.7. Samples: 133787648. Policy #0 lag: (min: 59.0, avg: 147.7, max: 287.0) [2024-06-15 14:40:15,768][1648981] Avg episode reward: [(0, '494.250')] [2024-06-15 14:40:17,148][1651669] Updated weights for policy 0, policy_version 261216 (0.0012) [2024-06-15 14:40:19,194][1651669] Updated weights for policy 0, policy_version 261266 (0.0048) [2024-06-15 14:40:19,531][1651274] Signal inference workers to stop experience collection... (13700 times) [2024-06-15 14:40:19,578][1651669] InferenceWorker_p0-w0: stopping experience collection (13700 times) [2024-06-15 14:40:19,714][1651274] Signal inference workers to resume experience collection... (13700 times) [2024-06-15 14:40:19,715][1651669] InferenceWorker_p0-w0: resuming experience collection (13700 times) [2024-06-15 14:40:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.3, 300 sec: 48652.2). Total num frames: 535166976. Throughput: 0: 11878.4. Samples: 133849088. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:20,767][1648981] Avg episode reward: [(0, '507.560')] [2024-06-15 14:40:23,303][1651669] Updated weights for policy 0, policy_version 261331 (0.0015) [2024-06-15 14:40:24,283][1651669] Updated weights for policy 0, policy_version 261375 (0.0014) [2024-06-15 14:40:25,768][1648981] Fps is (10 sec: 39317.6, 60 sec: 48058.8, 300 sec: 48429.8). Total num frames: 535298048. Throughput: 0: 12060.9. Samples: 133890048. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:25,768][1648981] Avg episode reward: [(0, '476.840')] [2024-06-15 14:40:26,884][1651669] Updated weights for policy 0, policy_version 261434 (0.0012) [2024-06-15 14:40:28,812][1651669] Updated weights for policy 0, policy_version 261488 (0.0016) [2024-06-15 14:40:30,782][1648981] Fps is (10 sec: 45803.0, 60 sec: 48593.1, 300 sec: 48649.6). Total num frames: 535625728. Throughput: 0: 11965.2. Samples: 133957632. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:30,783][1648981] Avg episode reward: [(0, '449.350')] [2024-06-15 14:40:31,058][1651669] Updated weights for policy 0, policy_version 261568 (0.0013) [2024-06-15 14:40:34,930][1651669] Updated weights for policy 0, policy_version 261626 (0.0014) [2024-06-15 14:40:35,766][1648981] Fps is (10 sec: 52434.5, 60 sec: 48059.6, 300 sec: 48430.0). Total num frames: 535822336. Throughput: 0: 11878.4. Samples: 134029824. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:35,767][1648981] Avg episode reward: [(0, '447.940')] [2024-06-15 14:40:38,055][1651669] Updated weights for policy 0, policy_version 261685 (0.0073) [2024-06-15 14:40:39,186][1651669] Updated weights for policy 0, policy_version 261717 (0.0010) [2024-06-15 14:40:40,043][1651669] Updated weights for policy 0, policy_version 261760 (0.0012) [2024-06-15 14:40:40,766][1648981] Fps is (10 sec: 45947.5, 60 sec: 48062.1, 300 sec: 48541.3). Total num frames: 536084480. Throughput: 0: 11880.8. Samples: 134068224. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:40,767][1648981] Avg episode reward: [(0, '431.720')] [2024-06-15 14:40:42,138][1651669] Updated weights for policy 0, policy_version 261824 (0.0016) [2024-06-15 14:40:45,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 536313856. Throughput: 0: 11923.9. Samples: 134140416. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:45,767][1648981] Avg episode reward: [(0, '425.450')] [2024-06-15 14:40:47,576][1651669] Updated weights for policy 0, policy_version 261891 (0.0034) [2024-06-15 14:40:48,862][1651669] Updated weights for policy 0, policy_version 261950 (0.0013) [2024-06-15 14:40:50,389][1651669] Updated weights for policy 0, policy_version 262000 (0.0014) [2024-06-15 14:40:50,770][1648981] Fps is (10 sec: 49133.8, 60 sec: 47510.7, 300 sec: 48763.8). Total num frames: 536576000. Throughput: 0: 11945.7. Samples: 134212096. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:50,771][1648981] Avg episode reward: [(0, '421.920')] [2024-06-15 14:40:51,540][1651669] Updated weights for policy 0, policy_version 262048 (0.0113) [2024-06-15 14:40:55,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46428.4, 300 sec: 48430.0). Total num frames: 536739840. Throughput: 0: 12037.7. Samples: 134247424. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:40:55,767][1648981] Avg episode reward: [(0, '419.780')] [2024-06-15 14:40:56,045][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000262096_536772608.pth... [2024-06-15 14:40:56,046][1651669] Updated weights for policy 0, policy_version 262096 (0.0013) [2024-06-15 14:40:56,188][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000256416_525139968.pth [2024-06-15 14:40:56,965][1651669] Updated weights for policy 0, policy_version 262138 (0.0011) [2024-06-15 14:40:58,795][1651669] Updated weights for policy 0, policy_version 262177 (0.0109) [2024-06-15 14:41:00,567][1651669] Updated weights for policy 0, policy_version 262226 (0.0024) [2024-06-15 14:41:00,766][1648981] Fps is (10 sec: 45892.6, 60 sec: 47513.7, 300 sec: 48543.7). Total num frames: 537034752. Throughput: 0: 11946.7. Samples: 134325248. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:41:00,767][1648981] Avg episode reward: [(0, '421.560')] [2024-06-15 14:41:02,124][1651669] Updated weights for policy 0, policy_version 262289 (0.0014) [2024-06-15 14:41:02,441][1651274] Signal inference workers to stop experience collection... (13750 times) [2024-06-15 14:41:02,554][1651669] InferenceWorker_p0-w0: stopping experience collection (13750 times) [2024-06-15 14:41:02,747][1651274] Signal inference workers to resume experience collection... (13750 times) [2024-06-15 14:41:02,748][1651669] InferenceWorker_p0-w0: resuming experience collection (13750 times) [2024-06-15 14:41:05,767][1648981] Fps is (10 sec: 52424.0, 60 sec: 48058.9, 300 sec: 48652.0). Total num frames: 537264128. Throughput: 0: 12037.4. Samples: 134390784. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:41:05,768][1648981] Avg episode reward: [(0, '436.300')] [2024-06-15 14:41:07,619][1651669] Updated weights for policy 0, policy_version 262368 (0.0107) [2024-06-15 14:41:09,457][1651669] Updated weights for policy 0, policy_version 262402 (0.0017) [2024-06-15 14:41:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 537493504. Throughput: 0: 11924.2. Samples: 134426624. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:41:10,767][1648981] Avg episode reward: [(0, '450.270')] [2024-06-15 14:41:10,815][1651669] Updated weights for policy 0, policy_version 262464 (0.0021) [2024-06-15 14:41:13,047][1651669] Updated weights for policy 0, policy_version 262532 (0.0025) [2024-06-15 14:41:14,281][1651669] Updated weights for policy 0, policy_version 262587 (0.0013) [2024-06-15 14:41:15,767][1648981] Fps is (10 sec: 52433.3, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 537788416. Throughput: 0: 11939.4. Samples: 134494720. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:41:15,768][1648981] Avg episode reward: [(0, '448.550')] [2024-06-15 14:41:19,548][1651669] Updated weights for policy 0, policy_version 262650 (0.0013) [2024-06-15 14:41:20,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 537919488. Throughput: 0: 12015.0. Samples: 134570496. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:41:20,767][1648981] Avg episode reward: [(0, '444.540')] [2024-06-15 14:41:21,433][1651669] Updated weights for policy 0, policy_version 262689 (0.0018) [2024-06-15 14:41:22,824][1651669] Updated weights for policy 0, policy_version 262736 (0.0014) [2024-06-15 14:41:24,206][1651669] Updated weights for policy 0, policy_version 262786 (0.0027) [2024-06-15 14:41:25,513][1651669] Updated weights for policy 0, policy_version 262846 (0.0016) [2024-06-15 14:41:25,767][1648981] Fps is (10 sec: 52429.0, 60 sec: 50245.1, 300 sec: 48874.3). Total num frames: 538312704. Throughput: 0: 11889.8. Samples: 134603264. Policy #0 lag: (min: 31.0, avg: 147.1, max: 287.0) [2024-06-15 14:41:25,767][1648981] Avg episode reward: [(0, '472.740')] [2024-06-15 14:41:30,277][1651669] Updated weights for policy 0, policy_version 262906 (0.0032) [2024-06-15 14:41:30,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46979.8, 300 sec: 48430.0). Total num frames: 538443776. Throughput: 0: 11980.8. Samples: 134679552. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:41:30,767][1648981] Avg episode reward: [(0, '485.510')] [2024-06-15 14:41:32,991][1651669] Updated weights for policy 0, policy_version 262948 (0.0014) [2024-06-15 14:41:34,611][1651669] Updated weights for policy 0, policy_version 262994 (0.0012) [2024-06-15 14:41:35,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 48059.8, 300 sec: 48430.6). Total num frames: 538705920. Throughput: 0: 11913.5. Samples: 134748160. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:41:35,767][1648981] Avg episode reward: [(0, '473.870')] [2024-06-15 14:41:35,940][1651669] Updated weights for policy 0, policy_version 263056 (0.0010) [2024-06-15 14:41:40,369][1651669] Updated weights for policy 0, policy_version 263121 (0.0013) [2024-06-15 14:41:40,788][1648981] Fps is (10 sec: 45781.3, 60 sec: 46951.5, 300 sec: 48205.3). Total num frames: 538902528. Throughput: 0: 11998.1. Samples: 134787584. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:41:40,795][1648981] Avg episode reward: [(0, '475.860')] [2024-06-15 14:41:42,528][1651669] Updated weights for policy 0, policy_version 263184 (0.0014) [2024-06-15 14:41:44,952][1651669] Updated weights for policy 0, policy_version 263249 (0.0016) [2024-06-15 14:41:45,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.8, 300 sec: 48763.3). Total num frames: 539197440. Throughput: 0: 11821.5. Samples: 134857216. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:41:45,767][1648981] Avg episode reward: [(0, '449.840')] [2024-06-15 14:41:46,151][1651274] Signal inference workers to stop experience collection... (13800 times) [2024-06-15 14:41:46,186][1651669] InferenceWorker_p0-w0: stopping experience collection (13800 times) [2024-06-15 14:41:46,189][1651669] Updated weights for policy 0, policy_version 263299 (0.0090) [2024-06-15 14:41:46,345][1651274] Signal inference workers to resume experience collection... (13800 times) [2024-06-15 14:41:46,358][1651669] InferenceWorker_p0-w0: resuming experience collection (13800 times) [2024-06-15 14:41:47,479][1651669] Updated weights for policy 0, policy_version 263356 (0.0012) [2024-06-15 14:41:50,769][1648981] Fps is (10 sec: 45959.1, 60 sec: 46422.5, 300 sec: 48429.6). Total num frames: 539361280. Throughput: 0: 12105.6. Samples: 134935552. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:41:50,769][1648981] Avg episode reward: [(0, '461.890')] [2024-06-15 14:41:51,870][1651669] Updated weights for policy 0, policy_version 263417 (0.0015) [2024-06-15 14:41:53,892][1651669] Updated weights for policy 0, policy_version 263488 (0.0014) [2024-06-15 14:41:55,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48605.9, 300 sec: 48543.8). Total num frames: 539656192. Throughput: 0: 12049.0. Samples: 134968832. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:41:55,767][1648981] Avg episode reward: [(0, '466.810')] [2024-06-15 14:41:57,283][1651669] Updated weights for policy 0, policy_version 263570 (0.0146) [2024-06-15 14:41:58,237][1651669] Updated weights for policy 0, policy_version 263616 (0.0029) [2024-06-15 14:42:00,770][1648981] Fps is (10 sec: 52419.5, 60 sec: 47510.4, 300 sec: 48429.3). Total num frames: 539885568. Throughput: 0: 12116.3. Samples: 135040000. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:00,771][1648981] Avg episode reward: [(0, '474.650')] [2024-06-15 14:42:03,627][1651669] Updated weights for policy 0, policy_version 263683 (0.0016) [2024-06-15 14:42:04,981][1651669] Updated weights for policy 0, policy_version 263741 (0.0013) [2024-06-15 14:42:05,773][1648981] Fps is (10 sec: 49118.8, 60 sec: 48055.1, 300 sec: 48428.9). Total num frames: 540147712. Throughput: 0: 12001.7. Samples: 135110656. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:05,774][1648981] Avg episode reward: [(0, '476.320')] [2024-06-15 14:42:07,717][1651669] Updated weights for policy 0, policy_version 263794 (0.0013) [2024-06-15 14:42:09,308][1651669] Updated weights for policy 0, policy_version 263863 (0.0011) [2024-06-15 14:42:10,767][1648981] Fps is (10 sec: 52449.0, 60 sec: 48605.7, 300 sec: 48541.0). Total num frames: 540409856. Throughput: 0: 11935.3. Samples: 135140352. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:10,767][1648981] Avg episode reward: [(0, '472.760')] [2024-06-15 14:42:13,791][1651669] Updated weights for policy 0, policy_version 263907 (0.0014) [2024-06-15 14:42:15,782][1648981] Fps is (10 sec: 42560.5, 60 sec: 46409.3, 300 sec: 48094.3). Total num frames: 540573696. Throughput: 0: 11931.1. Samples: 135216640. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:15,783][1648981] Avg episode reward: [(0, '461.290')] [2024-06-15 14:42:15,852][1651669] Updated weights for policy 0, policy_version 263968 (0.0016) [2024-06-15 14:42:17,640][1651669] Updated weights for policy 0, policy_version 264002 (0.0018) [2024-06-15 14:42:19,033][1651669] Updated weights for policy 0, policy_version 264064 (0.0013) [2024-06-15 14:42:20,610][1651669] Updated weights for policy 0, policy_version 264125 (0.0014) [2024-06-15 14:42:20,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.1, 300 sec: 48654.7). Total num frames: 540934144. Throughput: 0: 11741.8. Samples: 135276544. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:20,767][1648981] Avg episode reward: [(0, '461.460')] [2024-06-15 14:42:25,777][1648981] Fps is (10 sec: 45899.4, 60 sec: 45321.3, 300 sec: 47872.9). Total num frames: 541032448. Throughput: 0: 11938.0. Samples: 135324672. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:25,777][1648981] Avg episode reward: [(0, '469.750')] [2024-06-15 14:42:25,858][1651669] Updated weights for policy 0, policy_version 264186 (0.0014) [2024-06-15 14:42:26,915][1651669] Updated weights for policy 0, policy_version 264226 (0.0015) [2024-06-15 14:42:29,168][1651669] Updated weights for policy 0, policy_version 264273 (0.0014) [2024-06-15 14:42:29,178][1651274] Signal inference workers to stop experience collection... (13850 times) [2024-06-15 14:42:29,201][1651669] InferenceWorker_p0-w0: stopping experience collection (13850 times) [2024-06-15 14:42:29,439][1651274] Signal inference workers to resume experience collection... (13850 times) [2024-06-15 14:42:29,440][1651669] InferenceWorker_p0-w0: resuming experience collection (13850 times) [2024-06-15 14:42:30,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 541360128. Throughput: 0: 11867.0. Samples: 135391232. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:30,767][1648981] Avg episode reward: [(0, '491.870')] [2024-06-15 14:42:30,917][1651669] Updated weights for policy 0, policy_version 264340 (0.0012) [2024-06-15 14:42:35,766][1648981] Fps is (10 sec: 42643.0, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 541458432. Throughput: 0: 11810.7. Samples: 135467008. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:35,767][1648981] Avg episode reward: [(0, '470.270')] [2024-06-15 14:42:36,034][1651669] Updated weights for policy 0, policy_version 264385 (0.0019) [2024-06-15 14:42:37,898][1651669] Updated weights for policy 0, policy_version 264464 (0.0012) [2024-06-15 14:42:39,107][1651669] Updated weights for policy 0, policy_version 264511 (0.0015) [2024-06-15 14:42:40,775][1648981] Fps is (10 sec: 42562.9, 60 sec: 48069.5, 300 sec: 48206.5). Total num frames: 541786112. Throughput: 0: 11773.8. Samples: 135498752. Policy #0 lag: (min: 15.0, avg: 101.4, max: 271.0) [2024-06-15 14:42:40,779][1648981] Avg episode reward: [(0, '445.470')] [2024-06-15 14:42:41,032][1651669] Updated weights for policy 0, policy_version 264563 (0.0012) [2024-06-15 14:42:42,674][1651669] Updated weights for policy 0, policy_version 264630 (0.0011) [2024-06-15 14:42:45,786][1648981] Fps is (10 sec: 52325.5, 60 sec: 46406.1, 300 sec: 47760.3). Total num frames: 541982720. Throughput: 0: 11658.2. Samples: 135564800. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:42:45,787][1648981] Avg episode reward: [(0, '445.710')] [2024-06-15 14:42:48,414][1651669] Updated weights for policy 0, policy_version 264675 (0.0013) [2024-06-15 14:42:50,475][1651669] Updated weights for policy 0, policy_version 264768 (0.0013) [2024-06-15 14:42:50,770][1648981] Fps is (10 sec: 45895.9, 60 sec: 48058.5, 300 sec: 47985.1). Total num frames: 542244864. Throughput: 0: 11697.1. Samples: 135636992. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:42:50,771][1648981] Avg episode reward: [(0, '439.190')] [2024-06-15 14:42:52,636][1651669] Updated weights for policy 0, policy_version 264836 (0.0014) [2024-06-15 14:42:53,934][1651669] Updated weights for policy 0, policy_version 264884 (0.0025) [2024-06-15 14:42:55,770][1648981] Fps is (10 sec: 52514.7, 60 sec: 47511.0, 300 sec: 47876.2). Total num frames: 542507008. Throughput: 0: 11672.8. Samples: 135665664. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:42:55,770][1648981] Avg episode reward: [(0, '454.820')] [2024-06-15 14:42:55,792][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000264896_542507008.pth... [2024-06-15 14:42:55,848][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000259264_530972672.pth [2024-06-15 14:42:59,524][1651669] Updated weights for policy 0, policy_version 264935 (0.0011) [2024-06-15 14:43:00,766][1648981] Fps is (10 sec: 45892.6, 60 sec: 46970.6, 300 sec: 48096.8). Total num frames: 542703616. Throughput: 0: 11791.5. Samples: 135747072. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:00,767][1648981] Avg episode reward: [(0, '464.680')] [2024-06-15 14:43:01,326][1651669] Updated weights for policy 0, policy_version 265023 (0.0012) [2024-06-15 14:43:04,137][1651669] Updated weights for policy 0, policy_version 265089 (0.0014) [2024-06-15 14:43:05,343][1651669] Updated weights for policy 0, policy_version 265147 (0.0012) [2024-06-15 14:43:05,766][1648981] Fps is (10 sec: 52446.2, 60 sec: 48065.2, 300 sec: 48096.8). Total num frames: 543031296. Throughput: 0: 11764.7. Samples: 135805952. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:05,767][1648981] Avg episode reward: [(0, '459.830')] [2024-06-15 14:43:10,559][1651274] Signal inference workers to stop experience collection... (13900 times) [2024-06-15 14:43:10,605][1651669] InferenceWorker_p0-w0: stopping experience collection (13900 times) [2024-06-15 14:43:10,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 44783.0, 300 sec: 47763.6). Total num frames: 543096832. Throughput: 0: 11721.8. Samples: 135852032. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:10,767][1648981] Avg episode reward: [(0, '466.590')] [2024-06-15 14:43:10,785][1651274] Signal inference workers to resume experience collection... (13900 times) [2024-06-15 14:43:10,785][1651669] InferenceWorker_p0-w0: resuming experience collection (13900 times) [2024-06-15 14:43:11,317][1651669] Updated weights for policy 0, policy_version 265217 (0.0012) [2024-06-15 14:43:12,431][1651669] Updated weights for policy 0, policy_version 265279 (0.0153) [2024-06-15 14:43:14,977][1651669] Updated weights for policy 0, policy_version 265347 (0.0013) [2024-06-15 14:43:15,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48618.5, 300 sec: 48207.8). Total num frames: 543490048. Throughput: 0: 11798.7. Samples: 135922176. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:15,767][1648981] Avg episode reward: [(0, '473.100')] [2024-06-15 14:43:20,624][1651669] Updated weights for policy 0, policy_version 265410 (0.0012) [2024-06-15 14:43:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 43690.8, 300 sec: 47542.8). Total num frames: 543555584. Throughput: 0: 11810.1. Samples: 135998464. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:20,767][1648981] Avg episode reward: [(0, '455.970')] [2024-06-15 14:43:22,157][1651669] Updated weights for policy 0, policy_version 265475 (0.0015) [2024-06-15 14:43:24,246][1651669] Updated weights for policy 0, policy_version 265540 (0.0017) [2024-06-15 14:43:25,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48068.1, 300 sec: 47985.7). Total num frames: 543916032. Throughput: 0: 11766.8. Samples: 136028160. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:25,767][1648981] Avg episode reward: [(0, '444.170')] [2024-06-15 14:43:26,213][1651669] Updated weights for policy 0, policy_version 265616 (0.0012) [2024-06-15 14:43:27,458][1651669] Updated weights for policy 0, policy_version 265662 (0.0037) [2024-06-15 14:43:30,769][1648981] Fps is (10 sec: 52415.3, 60 sec: 45327.1, 300 sec: 47540.9). Total num frames: 544079872. Throughput: 0: 11962.6. Samples: 136102912. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:30,770][1648981] Avg episode reward: [(0, '435.830')] [2024-06-15 14:43:33,632][1651669] Updated weights for policy 0, policy_version 265736 (0.0014) [2024-06-15 14:43:34,548][1651669] Updated weights for policy 0, policy_version 265786 (0.0131) [2024-06-15 14:43:35,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48605.8, 300 sec: 48097.4). Total num frames: 544374784. Throughput: 0: 11788.4. Samples: 136167424. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:35,767][1648981] Avg episode reward: [(0, '435.830')] [2024-06-15 14:43:36,432][1651669] Updated weights for policy 0, policy_version 265842 (0.0014) [2024-06-15 14:43:37,226][1651669] Updated weights for policy 0, policy_version 265875 (0.0011) [2024-06-15 14:43:38,345][1651669] Updated weights for policy 0, policy_version 265920 (0.0011) [2024-06-15 14:43:40,766][1648981] Fps is (10 sec: 52442.3, 60 sec: 46974.0, 300 sec: 47763.5). Total num frames: 544604160. Throughput: 0: 11879.3. Samples: 136200192. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:40,767][1648981] Avg episode reward: [(0, '425.400')] [2024-06-15 14:43:43,809][1651669] Updated weights for policy 0, policy_version 265984 (0.0013) [2024-06-15 14:43:45,505][1651669] Updated weights for policy 0, policy_version 266050 (0.0013) [2024-06-15 14:43:45,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48075.6, 300 sec: 47985.7). Total num frames: 544866304. Throughput: 0: 11855.7. Samples: 136280576. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:45,767][1648981] Avg episode reward: [(0, '418.860')] [2024-06-15 14:43:46,292][1651274] Signal inference workers to stop experience collection... (13950 times) [2024-06-15 14:43:46,360][1651669] InferenceWorker_p0-w0: stopping experience collection (13950 times) [2024-06-15 14:43:46,546][1651274] Signal inference workers to resume experience collection... (13950 times) [2024-06-15 14:43:46,570][1651669] InferenceWorker_p0-w0: resuming experience collection (13950 times) [2024-06-15 14:43:47,535][1651669] Updated weights for policy 0, policy_version 266114 (0.0014) [2024-06-15 14:43:48,757][1651669] Updated weights for policy 0, policy_version 266163 (0.0012) [2024-06-15 14:43:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48062.8, 300 sec: 47985.9). Total num frames: 545128448. Throughput: 0: 12094.6. Samples: 136350208. Policy #0 lag: (min: 47.0, avg: 174.5, max: 303.0) [2024-06-15 14:43:50,767][1648981] Avg episode reward: [(0, '447.480')] [2024-06-15 14:43:54,302][1651669] Updated weights for policy 0, policy_version 266208 (0.0012) [2024-06-15 14:43:55,740][1651669] Updated weights for policy 0, policy_version 266272 (0.0018) [2024-06-15 14:43:55,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46970.1, 300 sec: 47763.5). Total num frames: 545325056. Throughput: 0: 12026.3. Samples: 136393216. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:43:55,767][1648981] Avg episode reward: [(0, '466.860')] [2024-06-15 14:43:58,104][1651669] Updated weights for policy 0, policy_version 266368 (0.0094) [2024-06-15 14:43:59,486][1651669] Updated weights for policy 0, policy_version 266423 (0.0013) [2024-06-15 14:44:00,774][1648981] Fps is (10 sec: 52391.2, 60 sec: 49146.2, 300 sec: 48206.7). Total num frames: 545652736. Throughput: 0: 11740.0. Samples: 136450560. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:00,774][1648981] Avg episode reward: [(0, '478.330')] [2024-06-15 14:44:04,507][1651669] Updated weights for policy 0, policy_version 266459 (0.0017) [2024-06-15 14:44:05,642][1651669] Updated weights for policy 0, policy_version 266512 (0.0097) [2024-06-15 14:44:05,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 545816576. Throughput: 0: 12105.9. Samples: 136543232. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:05,767][1648981] Avg episode reward: [(0, '480.700')] [2024-06-15 14:44:07,386][1651669] Updated weights for policy 0, policy_version 266592 (0.0095) [2024-06-15 14:44:08,175][1651669] Updated weights for policy 0, policy_version 266624 (0.0013) [2024-06-15 14:44:09,715][1651669] Updated weights for policy 0, policy_version 266684 (0.0012) [2024-06-15 14:44:10,766][1648981] Fps is (10 sec: 52466.2, 60 sec: 51336.5, 300 sec: 48207.9). Total num frames: 546177024. Throughput: 0: 12094.6. Samples: 136572416. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:10,767][1648981] Avg episode reward: [(0, '493.460')] [2024-06-15 14:44:15,353][1651669] Updated weights for policy 0, policy_version 266738 (0.0012) [2024-06-15 14:44:15,770][1648981] Fps is (10 sec: 49134.2, 60 sec: 46964.6, 300 sec: 47763.0). Total num frames: 546308096. Throughput: 0: 12321.8. Samples: 136657408. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:15,771][1648981] Avg episode reward: [(0, '502.930')] [2024-06-15 14:44:16,490][1651669] Updated weights for policy 0, policy_version 266800 (0.0020) [2024-06-15 14:44:18,215][1651669] Updated weights for policy 0, policy_version 266868 (0.0013) [2024-06-15 14:44:19,524][1651669] Updated weights for policy 0, policy_version 266896 (0.0012) [2024-06-15 14:44:20,606][1651669] Updated weights for policy 0, policy_version 266943 (0.0026) [2024-06-15 14:44:20,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 52428.7, 300 sec: 48430.0). Total num frames: 546701312. Throughput: 0: 12299.4. Samples: 136720896. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:20,767][1648981] Avg episode reward: [(0, '488.590')] [2024-06-15 14:44:25,606][1651669] Updated weights for policy 0, policy_version 266993 (0.0012) [2024-06-15 14:44:25,766][1648981] Fps is (10 sec: 49170.7, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 546799616. Throughput: 0: 12595.2. Samples: 136766976. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:25,767][1648981] Avg episode reward: [(0, '460.380')] [2024-06-15 14:44:26,289][1651274] Signal inference workers to stop experience collection... (14000 times) [2024-06-15 14:44:26,365][1651669] InferenceWorker_p0-w0: stopping experience collection (14000 times) [2024-06-15 14:44:26,519][1651274] Signal inference workers to resume experience collection... (14000 times) [2024-06-15 14:44:26,520][1651669] InferenceWorker_p0-w0: resuming experience collection (14000 times) [2024-06-15 14:44:27,211][1651669] Updated weights for policy 0, policy_version 267072 (0.0227) [2024-06-15 14:44:28,020][1651669] Updated weights for policy 0, policy_version 267107 (0.0094) [2024-06-15 14:44:30,073][1651669] Updated weights for policy 0, policy_version 267168 (0.0026) [2024-06-15 14:44:30,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 52431.1, 300 sec: 48430.0). Total num frames: 547225600. Throughput: 0: 12447.3. Samples: 136840704. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:30,767][1648981] Avg episode reward: [(0, '458.750')] [2024-06-15 14:44:30,783][1651669] Updated weights for policy 0, policy_version 267200 (0.0027) [2024-06-15 14:44:35,772][1648981] Fps is (10 sec: 45850.9, 60 sec: 48055.6, 300 sec: 47652.1). Total num frames: 547258368. Throughput: 0: 12741.6. Samples: 136923648. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:35,772][1648981] Avg episode reward: [(0, '459.810')] [2024-06-15 14:44:36,439][1651669] Updated weights for policy 0, policy_version 267253 (0.0013) [2024-06-15 14:44:37,552][1651669] Updated weights for policy 0, policy_version 267312 (0.0013) [2024-06-15 14:44:39,159][1651669] Updated weights for policy 0, policy_version 267376 (0.0182) [2024-06-15 14:44:40,763][1651669] Updated weights for policy 0, policy_version 267413 (0.0012) [2024-06-15 14:44:40,768][1648981] Fps is (10 sec: 42592.0, 60 sec: 50789.1, 300 sec: 48207.6). Total num frames: 547651584. Throughput: 0: 12537.9. Samples: 136957440. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:40,769][1648981] Avg episode reward: [(0, '464.240')] [2024-06-15 14:44:45,766][1648981] Fps is (10 sec: 49177.6, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 547749888. Throughput: 0: 12847.5. Samples: 137028608. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:45,767][1648981] Avg episode reward: [(0, '458.490')] [2024-06-15 14:44:45,859][1651669] Updated weights for policy 0, policy_version 267458 (0.0012) [2024-06-15 14:44:47,885][1651669] Updated weights for policy 0, policy_version 267546 (0.0039) [2024-06-15 14:44:48,374][1651669] Updated weights for policy 0, policy_version 267578 (0.0017) [2024-06-15 14:44:49,921][1651669] Updated weights for policy 0, policy_version 267635 (0.0011) [2024-06-15 14:44:50,774][1648981] Fps is (10 sec: 49121.0, 60 sec: 50237.7, 300 sec: 48097.0). Total num frames: 548143104. Throughput: 0: 12354.2. Samples: 137099264. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:50,775][1648981] Avg episode reward: [(0, '464.100')] [2024-06-15 14:44:51,757][1651669] Updated weights for policy 0, policy_version 267651 (0.0012) [2024-06-15 14:44:53,120][1651669] Updated weights for policy 0, policy_version 267705 (0.0094) [2024-06-15 14:44:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 548274176. Throughput: 0: 12492.8. Samples: 137134592. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:44:55,767][1648981] Avg episode reward: [(0, '440.900')] [2024-06-15 14:44:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000267712_548274176.pth... [2024-06-15 14:44:55,925][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000262096_536772608.pth [2024-06-15 14:44:56,859][1651669] Updated weights for policy 0, policy_version 267748 (0.0012) [2024-06-15 14:44:58,813][1651669] Updated weights for policy 0, policy_version 267808 (0.0012) [2024-06-15 14:45:00,156][1651669] Updated weights for policy 0, policy_version 267872 (0.0013) [2024-06-15 14:45:00,767][1648981] Fps is (10 sec: 49189.9, 60 sec: 49704.0, 300 sec: 48318.9). Total num frames: 548634624. Throughput: 0: 12072.8. Samples: 137200640. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:45:00,767][1648981] Avg episode reward: [(0, '414.750')] [2024-06-15 14:45:02,653][1651669] Updated weights for policy 0, policy_version 267920 (0.0015) [2024-06-15 14:45:03,819][1651669] Updated weights for policy 0, policy_version 267964 (0.0014) [2024-06-15 14:45:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 548798464. Throughput: 0: 12515.6. Samples: 137284096. Policy #0 lag: (min: 10.0, avg: 69.3, max: 266.0) [2024-06-15 14:45:05,767][1648981] Avg episode reward: [(0, '409.940')] [2024-06-15 14:45:07,155][1651274] Signal inference workers to stop experience collection... (14050 times) [2024-06-15 14:45:07,191][1651669] Updated weights for policy 0, policy_version 268004 (0.0015) [2024-06-15 14:45:07,215][1651669] InferenceWorker_p0-w0: stopping experience collection (14050 times) [2024-06-15 14:45:07,477][1651274] Signal inference workers to resume experience collection... (14050 times) [2024-06-15 14:45:07,477][1651669] InferenceWorker_p0-w0: resuming experience collection (14050 times) [2024-06-15 14:45:09,115][1651669] Updated weights for policy 0, policy_version 268048 (0.0013) [2024-06-15 14:45:10,607][1651669] Updated weights for policy 0, policy_version 268117 (0.0127) [2024-06-15 14:45:10,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 48606.0, 300 sec: 48096.8). Total num frames: 549093376. Throughput: 0: 12333.5. Samples: 137321984. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:10,767][1648981] Avg episode reward: [(0, '407.570')] [2024-06-15 14:45:13,363][1651669] Updated weights for policy 0, policy_version 268176 (0.0077) [2024-06-15 14:45:14,669][1651669] Updated weights for policy 0, policy_version 268224 (0.0039) [2024-06-15 14:45:15,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 50247.3, 300 sec: 47985.7). Total num frames: 549322752. Throughput: 0: 12185.6. Samples: 137389056. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:15,767][1648981] Avg episode reward: [(0, '471.850')] [2024-06-15 14:45:18,789][1651669] Updated weights for policy 0, policy_version 268288 (0.0014) [2024-06-15 14:45:20,776][1648981] Fps is (10 sec: 42556.4, 60 sec: 46959.9, 300 sec: 48206.4). Total num frames: 549519360. Throughput: 0: 12002.4. Samples: 137463808. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:20,777][1648981] Avg episode reward: [(0, '472.430')] [2024-06-15 14:45:21,787][1651669] Updated weights for policy 0, policy_version 268372 (0.0014) [2024-06-15 14:45:22,650][1651669] Updated weights for policy 0, policy_version 268411 (0.0023) [2024-06-15 14:45:25,597][1651669] Updated weights for policy 0, policy_version 268475 (0.0013) [2024-06-15 14:45:25,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50790.3, 300 sec: 48210.4). Total num frames: 549847040. Throughput: 0: 12083.6. Samples: 137501184. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:25,767][1648981] Avg episode reward: [(0, '464.900')] [2024-06-15 14:45:28,459][1651669] Updated weights for policy 0, policy_version 268515 (0.0108) [2024-06-15 14:45:29,058][1651669] Updated weights for policy 0, policy_version 268544 (0.0035) [2024-06-15 14:45:30,779][1648981] Fps is (10 sec: 49138.7, 60 sec: 46411.7, 300 sec: 48094.7). Total num frames: 550010880. Throughput: 0: 12205.0. Samples: 137577984. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:30,779][1648981] Avg episode reward: [(0, '488.090')] [2024-06-15 14:45:31,403][1651669] Updated weights for policy 0, policy_version 268608 (0.0015) [2024-06-15 14:45:32,654][1651669] Updated weights for policy 0, policy_version 268664 (0.0110) [2024-06-15 14:45:35,694][1651669] Updated weights for policy 0, policy_version 268732 (0.0147) [2024-06-15 14:45:35,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 51886.9, 300 sec: 48429.9). Total num frames: 550371328. Throughput: 0: 12039.7. Samples: 137640960. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:35,768][1648981] Avg episode reward: [(0, '495.850')] [2024-06-15 14:45:39,611][1651669] Updated weights for policy 0, policy_version 268784 (0.0014) [2024-06-15 14:45:40,770][1648981] Fps is (10 sec: 49194.3, 60 sec: 47511.7, 300 sec: 48096.1). Total num frames: 550502400. Throughput: 0: 12207.3. Samples: 137683968. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:40,771][1648981] Avg episode reward: [(0, '492.400')] [2024-06-15 14:45:41,217][1651669] Updated weights for policy 0, policy_version 268816 (0.0012) [2024-06-15 14:45:42,485][1651669] Updated weights for policy 0, policy_version 268867 (0.0012) [2024-06-15 14:45:43,897][1651669] Updated weights for policy 0, policy_version 268924 (0.0012) [2024-06-15 14:45:45,766][1648981] Fps is (10 sec: 42600.0, 60 sec: 50790.4, 300 sec: 48208.5). Total num frames: 550797312. Throughput: 0: 12299.4. Samples: 137754112. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:45,767][1648981] Avg episode reward: [(0, '487.030')] [2024-06-15 14:45:46,139][1651669] Updated weights for policy 0, policy_version 268961 (0.0013) [2024-06-15 14:45:49,518][1651274] Signal inference workers to stop experience collection... (14100 times) [2024-06-15 14:45:49,566][1651669] InferenceWorker_p0-w0: stopping experience collection (14100 times) [2024-06-15 14:45:49,827][1651274] Signal inference workers to resume experience collection... (14100 times) [2024-06-15 14:45:49,828][1651669] InferenceWorker_p0-w0: resuming experience collection (14100 times) [2024-06-15 14:45:49,830][1651669] Updated weights for policy 0, policy_version 269024 (0.0012) [2024-06-15 14:45:50,766][1648981] Fps is (10 sec: 52449.1, 60 sec: 48066.0, 300 sec: 48430.0). Total num frames: 551026688. Throughput: 0: 12026.3. Samples: 137825280. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:50,767][1648981] Avg episode reward: [(0, '494.830')] [2024-06-15 14:45:52,330][1651669] Updated weights for policy 0, policy_version 269072 (0.0054) [2024-06-15 14:45:54,186][1651669] Updated weights for policy 0, policy_version 269136 (0.0012) [2024-06-15 14:45:55,287][1651669] Updated weights for policy 0, policy_version 269178 (0.0031) [2024-06-15 14:45:55,776][1648981] Fps is (10 sec: 49104.4, 60 sec: 50236.2, 300 sec: 48317.3). Total num frames: 551288832. Throughput: 0: 12057.8. Samples: 137864704. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:45:55,777][1648981] Avg episode reward: [(0, '500.970')] [2024-06-15 14:45:57,698][1651669] Updated weights for policy 0, policy_version 269239 (0.0016) [2024-06-15 14:46:00,483][1651669] Updated weights for policy 0, policy_version 269269 (0.0015) [2024-06-15 14:46:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.7, 300 sec: 48208.0). Total num frames: 551485440. Throughput: 0: 12219.8. Samples: 137938944. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:46:00,767][1648981] Avg episode reward: [(0, '499.590')] [2024-06-15 14:46:03,477][1651669] Updated weights for policy 0, policy_version 269330 (0.0023) [2024-06-15 14:46:04,307][1651669] Updated weights for policy 0, policy_version 269376 (0.0012) [2024-06-15 14:46:05,768][1648981] Fps is (10 sec: 45914.2, 60 sec: 49151.0, 300 sec: 48318.7). Total num frames: 551747584. Throughput: 0: 12028.6. Samples: 138004992. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:46:05,769][1648981] Avg episode reward: [(0, '508.610')] [2024-06-15 14:46:06,148][1651669] Updated weights for policy 0, policy_version 269432 (0.0011) [2024-06-15 14:46:08,437][1651669] Updated weights for policy 0, policy_version 269474 (0.0016) [2024-06-15 14:46:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 551944192. Throughput: 0: 11923.9. Samples: 138037760. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:46:10,767][1648981] Avg episode reward: [(0, '509.640')] [2024-06-15 14:46:12,793][1651669] Updated weights for policy 0, policy_version 269568 (0.0013) [2024-06-15 14:46:15,766][1648981] Fps is (10 sec: 45880.8, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 552206336. Throughput: 0: 11870.3. Samples: 138112000. Policy #0 lag: (min: 12.0, avg: 93.0, max: 268.0) [2024-06-15 14:46:15,767][1648981] Avg episode reward: [(0, '492.080')] [2024-06-15 14:46:16,899][1651669] Updated weights for policy 0, policy_version 269664 (0.0137) [2024-06-15 14:46:19,580][1651669] Updated weights for policy 0, policy_version 269728 (0.0013) [2024-06-15 14:46:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49160.0, 300 sec: 47985.7). Total num frames: 552468480. Throughput: 0: 11912.6. Samples: 138177024. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:20,767][1648981] Avg episode reward: [(0, '494.950')] [2024-06-15 14:46:23,258][1651669] Updated weights for policy 0, policy_version 269777 (0.0014) [2024-06-15 14:46:25,533][1651669] Updated weights for policy 0, policy_version 269827 (0.0015) [2024-06-15 14:46:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 552632320. Throughput: 0: 11947.7. Samples: 138221568. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:25,767][1648981] Avg episode reward: [(0, '498.800')] [2024-06-15 14:46:27,388][1651669] Updated weights for policy 0, policy_version 269908 (0.0021) [2024-06-15 14:46:28,207][1651669] Updated weights for policy 0, policy_version 269952 (0.0013) [2024-06-15 14:46:30,551][1651669] Updated weights for policy 0, policy_version 270005 (0.0023) [2024-06-15 14:46:30,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49162.2, 300 sec: 48318.9). Total num frames: 552960000. Throughput: 0: 12026.3. Samples: 138295296. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:30,767][1648981] Avg episode reward: [(0, '491.020')] [2024-06-15 14:46:33,638][1651274] Signal inference workers to stop experience collection... (14150 times) [2024-06-15 14:46:33,669][1651669] InferenceWorker_p0-w0: stopping experience collection (14150 times) [2024-06-15 14:46:33,945][1651274] Signal inference workers to resume experience collection... (14150 times) [2024-06-15 14:46:33,946][1651669] InferenceWorker_p0-w0: resuming experience collection (14150 times) [2024-06-15 14:46:34,112][1651669] Updated weights for policy 0, policy_version 270052 (0.0012) [2024-06-15 14:46:35,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 45875.3, 300 sec: 48211.1). Total num frames: 553123840. Throughput: 0: 12071.7. Samples: 138368512. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:35,768][1648981] Avg episode reward: [(0, '481.030')] [2024-06-15 14:46:36,625][1651669] Updated weights for policy 0, policy_version 270082 (0.0015) [2024-06-15 14:46:38,607][1651669] Updated weights for policy 0, policy_version 270160 (0.0013) [2024-06-15 14:46:40,161][1651669] Updated weights for policy 0, policy_version 270224 (0.0012) [2024-06-15 14:46:40,794][1648981] Fps is (10 sec: 49015.4, 60 sec: 49132.3, 300 sec: 48314.4). Total num frames: 553451520. Throughput: 0: 11964.6. Samples: 138403328. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:40,795][1648981] Avg episode reward: [(0, '495.890')] [2024-06-15 14:46:44,051][1651669] Updated weights for policy 0, policy_version 270273 (0.0018) [2024-06-15 14:46:45,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 47513.6, 300 sec: 48430.4). Total num frames: 553648128. Throughput: 0: 11901.2. Samples: 138474496. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:45,767][1648981] Avg episode reward: [(0, '496.070')] [2024-06-15 14:46:48,106][1651669] Updated weights for policy 0, policy_version 270352 (0.0096) [2024-06-15 14:46:49,953][1651669] Updated weights for policy 0, policy_version 270418 (0.0011) [2024-06-15 14:46:50,794][1648981] Fps is (10 sec: 42598.8, 60 sec: 47491.6, 300 sec: 48203.3). Total num frames: 553877504. Throughput: 0: 11837.3. Samples: 138537984. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:50,795][1648981] Avg episode reward: [(0, '501.610')] [2024-06-15 14:46:51,763][1651669] Updated weights for policy 0, policy_version 270466 (0.0045) [2024-06-15 14:46:55,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 45882.6, 300 sec: 47986.3). Total num frames: 554041344. Throughput: 0: 11832.9. Samples: 138570240. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:46:55,767][1648981] Avg episode reward: [(0, '484.980')] [2024-06-15 14:46:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000270528_554041344.pth... [2024-06-15 14:46:55,974][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000264896_542507008.pth [2024-06-15 14:46:56,675][1651669] Updated weights for policy 0, policy_version 270560 (0.0012) [2024-06-15 14:46:59,502][1651669] Updated weights for policy 0, policy_version 270598 (0.0014) [2024-06-15 14:47:00,774][1648981] Fps is (10 sec: 39399.4, 60 sec: 46415.1, 300 sec: 47874.4). Total num frames: 554270720. Throughput: 0: 11796.7. Samples: 138642944. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:00,775][1648981] Avg episode reward: [(0, '487.020')] [2024-06-15 14:47:01,107][1651669] Updated weights for policy 0, policy_version 270656 (0.0022) [2024-06-15 14:47:02,678][1651669] Updated weights for policy 0, policy_version 270715 (0.0013) [2024-06-15 14:47:04,796][1651669] Updated weights for policy 0, policy_version 270754 (0.0013) [2024-06-15 14:47:05,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 46968.4, 300 sec: 47985.7). Total num frames: 554565632. Throughput: 0: 11741.9. Samples: 138705408. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:05,767][1648981] Avg episode reward: [(0, '483.510')] [2024-06-15 14:47:07,810][1651669] Updated weights for policy 0, policy_version 270802 (0.0013) [2024-06-15 14:47:10,770][1648981] Fps is (10 sec: 42616.1, 60 sec: 45872.2, 300 sec: 47876.5). Total num frames: 554696704. Throughput: 0: 11536.1. Samples: 138740736. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:10,771][1648981] Avg episode reward: [(0, '492.550')] [2024-06-15 14:47:11,646][1651669] Updated weights for policy 0, policy_version 270880 (0.0017) [2024-06-15 14:47:13,419][1651669] Updated weights for policy 0, policy_version 270944 (0.0011) [2024-06-15 14:47:15,137][1651669] Updated weights for policy 0, policy_version 270979 (0.0013) [2024-06-15 14:47:15,774][1648981] Fps is (10 sec: 45839.0, 60 sec: 46961.3, 300 sec: 47762.3). Total num frames: 555024384. Throughput: 0: 11409.9. Samples: 138808832. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:15,775][1648981] Avg episode reward: [(0, '484.370')] [2024-06-15 14:47:15,780][1651274] Signal inference workers to stop experience collection... (14200 times) [2024-06-15 14:47:15,808][1651669] InferenceWorker_p0-w0: stopping experience collection (14200 times) [2024-06-15 14:47:15,928][1651274] Signal inference workers to resume experience collection... (14200 times) [2024-06-15 14:47:15,960][1651669] InferenceWorker_p0-w0: resuming experience collection (14200 times) [2024-06-15 14:47:19,148][1651669] Updated weights for policy 0, policy_version 271056 (0.0015) [2024-06-15 14:47:20,086][1651669] Updated weights for policy 0, policy_version 271101 (0.0012) [2024-06-15 14:47:20,767][1648981] Fps is (10 sec: 52448.1, 60 sec: 45875.0, 300 sec: 48098.4). Total num frames: 555220992. Throughput: 0: 11400.6. Samples: 138881536. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:20,767][1648981] Avg episode reward: [(0, '472.270')] [2024-06-15 14:47:23,511][1651669] Updated weights for policy 0, policy_version 271168 (0.0013) [2024-06-15 14:47:25,070][1651669] Updated weights for policy 0, policy_version 271232 (0.0013) [2024-06-15 14:47:25,766][1648981] Fps is (10 sec: 45911.1, 60 sec: 47513.5, 300 sec: 47874.6). Total num frames: 555483136. Throughput: 0: 11327.9. Samples: 138912768. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:25,767][1648981] Avg episode reward: [(0, '507.080')] [2024-06-15 14:47:30,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 44236.8, 300 sec: 47985.7). Total num frames: 555614208. Throughput: 0: 11286.8. Samples: 138982400. Policy #0 lag: (min: 0.0, avg: 127.2, max: 256.0) [2024-06-15 14:47:30,767][1648981] Avg episode reward: [(0, '498.470')] [2024-06-15 14:47:30,794][1651669] Updated weights for policy 0, policy_version 271312 (0.0021) [2024-06-15 14:47:31,703][1651669] Updated weights for policy 0, policy_version 271360 (0.0013) [2024-06-15 14:47:34,508][1651669] Updated weights for policy 0, policy_version 271413 (0.0014) [2024-06-15 14:47:35,562][1651669] Updated weights for policy 0, policy_version 271456 (0.0013) [2024-06-15 14:47:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.7, 300 sec: 47987.0). Total num frames: 555941888. Throughput: 0: 11487.3. Samples: 139054592. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:47:35,767][1648981] Avg episode reward: [(0, '498.280')] [2024-06-15 14:47:38,564][1651669] Updated weights for policy 0, policy_version 271527 (0.0015) [2024-06-15 14:47:40,768][1648981] Fps is (10 sec: 52426.8, 60 sec: 44803.5, 300 sec: 47988.8). Total num frames: 556138496. Throughput: 0: 11502.9. Samples: 139087872. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:47:40,771][1648981] Avg episode reward: [(0, '505.830')] [2024-06-15 14:47:40,869][1651669] Updated weights for policy 0, policy_version 271554 (0.0014) [2024-06-15 14:47:44,068][1651669] Updated weights for policy 0, policy_version 271634 (0.0013) [2024-06-15 14:47:45,034][1651669] Updated weights for policy 0, policy_version 271677 (0.0013) [2024-06-15 14:47:45,770][1648981] Fps is (10 sec: 45857.8, 60 sec: 45872.3, 300 sec: 47985.7). Total num frames: 556400640. Throughput: 0: 11572.3. Samples: 139163648. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:47:45,771][1648981] Avg episode reward: [(0, '494.380')] [2024-06-15 14:47:47,023][1651669] Updated weights for policy 0, policy_version 271735 (0.0153) [2024-06-15 14:47:50,203][1651669] Updated weights for policy 0, policy_version 271793 (0.0012) [2024-06-15 14:47:50,775][1648981] Fps is (10 sec: 52386.7, 60 sec: 46436.3, 300 sec: 47984.9). Total num frames: 556662784. Throughput: 0: 11739.7. Samples: 139233792. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:47:50,775][1648981] Avg episode reward: [(0, '505.300')] [2024-06-15 14:47:52,505][1651669] Updated weights for policy 0, policy_version 271856 (0.0056) [2024-06-15 14:47:54,825][1651669] Updated weights for policy 0, policy_version 271904 (0.0017) [2024-06-15 14:47:55,766][1648981] Fps is (10 sec: 52448.4, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 556924928. Throughput: 0: 11765.6. Samples: 139270144. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:47:55,767][1648981] Avg episode reward: [(0, '515.350')] [2024-06-15 14:47:55,793][1651274] Saving new best policy, reward=515.350! [2024-06-15 14:47:57,821][1651669] Updated weights for policy 0, policy_version 271968 (0.0014) [2024-06-15 14:48:00,697][1651669] Updated weights for policy 0, policy_version 272002 (0.0018) [2024-06-15 14:48:00,767][1648981] Fps is (10 sec: 39354.2, 60 sec: 46427.4, 300 sec: 47541.4). Total num frames: 557056000. Throughput: 0: 11823.6. Samples: 139340800. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:00,767][1648981] Avg episode reward: [(0, '527.240')] [2024-06-15 14:48:01,458][1651274] Saving new best policy, reward=527.240! [2024-06-15 14:48:02,164][1651669] Updated weights for policy 0, policy_version 272064 (0.0012) [2024-06-15 14:48:02,843][1651274] Signal inference workers to stop experience collection... (14250 times) [2024-06-15 14:48:02,915][1651669] InferenceWorker_p0-w0: stopping experience collection (14250 times) [2024-06-15 14:48:03,056][1651274] Signal inference workers to resume experience collection... (14250 times) [2024-06-15 14:48:03,057][1651669] InferenceWorker_p0-w0: resuming experience collection (14250 times) [2024-06-15 14:48:03,813][1651669] Updated weights for policy 0, policy_version 272115 (0.0014) [2024-06-15 14:48:05,768][1648981] Fps is (10 sec: 45873.8, 60 sec: 46967.2, 300 sec: 48429.9). Total num frames: 557383680. Throughput: 0: 11787.3. Samples: 139411968. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:05,770][1648981] Avg episode reward: [(0, '529.980')] [2024-06-15 14:48:06,020][1651274] Saving new best policy, reward=529.980! [2024-06-15 14:48:06,232][1651669] Updated weights for policy 0, policy_version 272179 (0.0011) [2024-06-15 14:48:09,256][1651669] Updated weights for policy 0, policy_version 272225 (0.0082) [2024-06-15 14:48:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48062.7, 300 sec: 47763.5). Total num frames: 557580288. Throughput: 0: 11912.5. Samples: 139448832. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:10,767][1648981] Avg episode reward: [(0, '501.430')] [2024-06-15 14:48:11,804][1651669] Updated weights for policy 0, policy_version 272272 (0.0015) [2024-06-15 14:48:13,135][1651669] Updated weights for policy 0, policy_version 272319 (0.0013) [2024-06-15 14:48:15,002][1651669] Updated weights for policy 0, policy_version 272378 (0.0013) [2024-06-15 14:48:15,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 46973.6, 300 sec: 48430.0). Total num frames: 557842432. Throughput: 0: 11878.4. Samples: 139516928. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:15,767][1648981] Avg episode reward: [(0, '490.990')] [2024-06-15 14:48:17,213][1651669] Updated weights for policy 0, policy_version 272437 (0.0133) [2024-06-15 14:48:20,250][1651669] Updated weights for policy 0, policy_version 272481 (0.0016) [2024-06-15 14:48:20,774][1648981] Fps is (10 sec: 52388.5, 60 sec: 48053.6, 300 sec: 48095.5). Total num frames: 558104576. Throughput: 0: 11944.6. Samples: 139592192. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:20,775][1648981] Avg episode reward: [(0, '463.400')] [2024-06-15 14:48:23,072][1651669] Updated weights for policy 0, policy_version 272544 (0.0144) [2024-06-15 14:48:25,352][1651669] Updated weights for policy 0, policy_version 272600 (0.0014) [2024-06-15 14:48:25,771][1648981] Fps is (10 sec: 45854.3, 60 sec: 46963.9, 300 sec: 48207.5). Total num frames: 558301184. Throughput: 0: 11922.8. Samples: 139624448. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:25,772][1648981] Avg episode reward: [(0, '482.810')] [2024-06-15 14:48:26,225][1651669] Updated weights for policy 0, policy_version 272640 (0.0012) [2024-06-15 14:48:27,850][1651669] Updated weights for policy 0, policy_version 272699 (0.0014) [2024-06-15 14:48:30,775][1648981] Fps is (10 sec: 42593.4, 60 sec: 48598.6, 300 sec: 47984.2). Total num frames: 558530560. Throughput: 0: 11968.0. Samples: 139702272. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:30,776][1648981] Avg episode reward: [(0, '470.820')] [2024-06-15 14:48:31,393][1651669] Updated weights for policy 0, policy_version 272760 (0.0014) [2024-06-15 14:48:34,010][1651669] Updated weights for policy 0, policy_version 272820 (0.0014) [2024-06-15 14:48:35,494][1651669] Updated weights for policy 0, policy_version 272850 (0.0012) [2024-06-15 14:48:35,782][1648981] Fps is (10 sec: 52370.4, 60 sec: 48047.1, 300 sec: 48205.3). Total num frames: 558825472. Throughput: 0: 12081.2. Samples: 139777536. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:35,783][1648981] Avg episode reward: [(0, '468.640')] [2024-06-15 14:48:37,726][1651669] Updated weights for policy 0, policy_version 272900 (0.0013) [2024-06-15 14:48:38,997][1651669] Updated weights for policy 0, policy_version 272957 (0.0012) [2024-06-15 14:48:40,767][1648981] Fps is (10 sec: 52470.8, 60 sec: 48605.4, 300 sec: 48096.6). Total num frames: 559054848. Throughput: 0: 12037.4. Samples: 139811840. Policy #0 lag: (min: 15.0, avg: 113.8, max: 271.0) [2024-06-15 14:48:40,768][1648981] Avg episode reward: [(0, '464.160')] [2024-06-15 14:48:41,277][1651669] Updated weights for policy 0, policy_version 273011 (0.0013) [2024-06-15 14:48:43,582][1651669] Updated weights for policy 0, policy_version 273056 (0.0012) [2024-06-15 14:48:45,766][1648981] Fps is (10 sec: 45947.8, 60 sec: 48062.8, 300 sec: 47985.7). Total num frames: 559284224. Throughput: 0: 12265.3. Samples: 139892736. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:48:45,767][1648981] Avg episode reward: [(0, '459.540')] [2024-06-15 14:48:45,910][1651669] Updated weights for policy 0, policy_version 273104 (0.0012) [2024-06-15 14:48:46,490][1651274] Signal inference workers to stop experience collection... (14300 times) [2024-06-15 14:48:46,524][1651669] InferenceWorker_p0-w0: stopping experience collection (14300 times) [2024-06-15 14:48:46,719][1651274] Signal inference workers to resume experience collection... (14300 times) [2024-06-15 14:48:46,720][1651669] InferenceWorker_p0-w0: resuming experience collection (14300 times) [2024-06-15 14:48:47,112][1651669] Updated weights for policy 0, policy_version 273149 (0.0014) [2024-06-15 14:48:49,459][1651669] Updated weights for policy 0, policy_version 273202 (0.0012) [2024-06-15 14:48:50,766][1648981] Fps is (10 sec: 49156.3, 60 sec: 48066.4, 300 sec: 48207.8). Total num frames: 559546368. Throughput: 0: 12276.7. Samples: 139964416. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:48:50,767][1648981] Avg episode reward: [(0, '455.360')] [2024-06-15 14:48:51,692][1651669] Updated weights for policy 0, policy_version 273249 (0.0026) [2024-06-15 14:48:54,125][1651669] Updated weights for policy 0, policy_version 273312 (0.0014) [2024-06-15 14:48:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47986.8). Total num frames: 559808512. Throughput: 0: 12344.9. Samples: 140004352. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:48:55,767][1648981] Avg episode reward: [(0, '480.570')] [2024-06-15 14:48:55,794][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000273344_559808512.pth... [2024-06-15 14:48:55,862][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000267712_548274176.pth [2024-06-15 14:48:56,564][1651669] Updated weights for policy 0, policy_version 273349 (0.0013) [2024-06-15 14:48:57,915][1651669] Updated weights for policy 0, policy_version 273408 (0.0014) [2024-06-15 14:49:00,412][1651669] Updated weights for policy 0, policy_version 273472 (0.0114) [2024-06-15 14:49:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48318.9). Total num frames: 560070656. Throughput: 0: 12322.1. Samples: 140071424. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:00,767][1648981] Avg episode reward: [(0, '477.300')] [2024-06-15 14:49:02,992][1651669] Updated weights for policy 0, policy_version 273527 (0.0016) [2024-06-15 14:49:05,338][1651669] Updated weights for policy 0, policy_version 273569 (0.0011) [2024-06-15 14:49:05,767][1648981] Fps is (10 sec: 49147.3, 60 sec: 48605.4, 300 sec: 47874.4). Total num frames: 560300032. Throughput: 0: 12324.0. Samples: 140146688. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:05,768][1648981] Avg episode reward: [(0, '467.950')] [2024-06-15 14:49:07,910][1651669] Updated weights for policy 0, policy_version 273632 (0.0017) [2024-06-15 14:49:09,339][1651669] Updated weights for policy 0, policy_version 273680 (0.0070) [2024-06-15 14:49:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 48430.6). Total num frames: 560594944. Throughput: 0: 12505.4. Samples: 140187136. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:10,767][1648981] Avg episode reward: [(0, '482.070')] [2024-06-15 14:49:12,795][1651669] Updated weights for policy 0, policy_version 273744 (0.0014) [2024-06-15 14:49:15,370][1651669] Updated weights for policy 0, policy_version 273794 (0.0012) [2024-06-15 14:49:15,767][1648981] Fps is (10 sec: 45878.1, 60 sec: 48605.6, 300 sec: 47652.4). Total num frames: 560758784. Throughput: 0: 12222.1. Samples: 140252160. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:15,768][1648981] Avg episode reward: [(0, '475.730')] [2024-06-15 14:49:18,564][1651669] Updated weights for policy 0, policy_version 273882 (0.0106) [2024-06-15 14:49:20,175][1651669] Updated weights for policy 0, policy_version 273952 (0.0012) [2024-06-15 14:49:20,768][1648981] Fps is (10 sec: 49147.1, 60 sec: 49703.7, 300 sec: 48429.8). Total num frames: 561086464. Throughput: 0: 12246.5. Samples: 140328448. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:20,769][1648981] Avg episode reward: [(0, '478.280')] [2024-06-15 14:49:24,111][1651669] Updated weights for policy 0, policy_version 273987 (0.0015) [2024-06-15 14:49:25,495][1651669] Updated weights for policy 0, policy_version 274048 (0.0068) [2024-06-15 14:49:25,766][1648981] Fps is (10 sec: 49153.2, 60 sec: 49155.7, 300 sec: 47541.4). Total num frames: 561250304. Throughput: 0: 12333.7. Samples: 140366848. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:25,767][1648981] Avg episode reward: [(0, '466.120')] [2024-06-15 14:49:27,646][1651669] Updated weights for policy 0, policy_version 274106 (0.0027) [2024-06-15 14:49:29,787][1651669] Updated weights for policy 0, policy_version 274154 (0.0015) [2024-06-15 14:49:29,979][1651274] Signal inference workers to stop experience collection... (14350 times) [2024-06-15 14:49:30,020][1651669] InferenceWorker_p0-w0: stopping experience collection (14350 times) [2024-06-15 14:49:30,203][1651274] Signal inference workers to resume experience collection... (14350 times) [2024-06-15 14:49:30,204][1651669] InferenceWorker_p0-w0: resuming experience collection (14350 times) [2024-06-15 14:49:30,790][1648981] Fps is (10 sec: 45771.4, 60 sec: 50231.9, 300 sec: 48427.0). Total num frames: 561545216. Throughput: 0: 12167.8. Samples: 140440576. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:30,791][1648981] Avg episode reward: [(0, '475.290')] [2024-06-15 14:49:31,401][1651669] Updated weights for policy 0, policy_version 274224 (0.0085) [2024-06-15 14:49:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48072.3, 300 sec: 47652.7). Total num frames: 561709056. Throughput: 0: 12162.8. Samples: 140511744. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:35,767][1648981] Avg episode reward: [(0, '458.730')] [2024-06-15 14:49:36,164][1651669] Updated weights for policy 0, policy_version 274292 (0.0014) [2024-06-15 14:49:37,358][1651669] Updated weights for policy 0, policy_version 274305 (0.0030) [2024-06-15 14:49:39,126][1651669] Updated weights for policy 0, policy_version 274372 (0.0013) [2024-06-15 14:49:40,356][1651669] Updated weights for policy 0, policy_version 274421 (0.0012) [2024-06-15 14:49:40,790][1648981] Fps is (10 sec: 49151.4, 60 sec: 49679.2, 300 sec: 48426.1). Total num frames: 562036736. Throughput: 0: 12110.9. Samples: 140549632. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:40,791][1648981] Avg episode reward: [(0, '457.590')] [2024-06-15 14:49:42,046][1651669] Updated weights for policy 0, policy_version 274496 (0.0012) [2024-06-15 14:49:45,782][1648981] Fps is (10 sec: 45803.1, 60 sec: 48047.0, 300 sec: 47540.1). Total num frames: 562167808. Throughput: 0: 12283.7. Samples: 140624384. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:45,783][1648981] Avg episode reward: [(0, '479.280')] [2024-06-15 14:49:47,528][1651669] Updated weights for policy 0, policy_version 274556 (0.0012) [2024-06-15 14:49:50,332][1651669] Updated weights for policy 0, policy_version 274626 (0.0014) [2024-06-15 14:49:50,776][1648981] Fps is (10 sec: 42665.3, 60 sec: 48599.3, 300 sec: 48095.4). Total num frames: 562462720. Throughput: 0: 12138.1. Samples: 140692992. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:50,783][1648981] Avg episode reward: [(0, '458.320')] [2024-06-15 14:49:51,629][1651669] Updated weights for policy 0, policy_version 274677 (0.0013) [2024-06-15 14:49:53,060][1651669] Updated weights for policy 0, policy_version 274746 (0.0014) [2024-06-15 14:49:55,771][1648981] Fps is (10 sec: 52488.4, 60 sec: 48056.2, 300 sec: 47651.7). Total num frames: 562692096. Throughput: 0: 11843.1. Samples: 140720128. Policy #0 lag: (min: 15.0, avg: 122.3, max: 271.0) [2024-06-15 14:49:55,771][1648981] Avg episode reward: [(0, '453.700')] [2024-06-15 14:49:58,564][1651669] Updated weights for policy 0, policy_version 274807 (0.0013) [2024-06-15 14:50:00,766][1648981] Fps is (10 sec: 36074.4, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 562823168. Throughput: 0: 12015.0. Samples: 140792832. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:00,767][1648981] Avg episode reward: [(0, '457.820')] [2024-06-15 14:50:01,336][1651669] Updated weights for policy 0, policy_version 274848 (0.0013) [2024-06-15 14:50:02,353][1651669] Updated weights for policy 0, policy_version 274896 (0.0012) [2024-06-15 14:50:04,508][1651669] Updated weights for policy 0, policy_version 274992 (0.0155) [2024-06-15 14:50:05,786][1648981] Fps is (10 sec: 52348.6, 60 sec: 48590.6, 300 sec: 47871.4). Total num frames: 563216384. Throughput: 0: 11737.0. Samples: 140856832. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:05,787][1648981] Avg episode reward: [(0, '461.660')] [2024-06-15 14:50:09,751][1651669] Updated weights for policy 0, policy_version 275042 (0.0014) [2024-06-15 14:50:10,778][1648981] Fps is (10 sec: 52366.8, 60 sec: 45866.2, 300 sec: 47539.5). Total num frames: 563347456. Throughput: 0: 11784.3. Samples: 140897280. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:10,779][1648981] Avg episode reward: [(0, '466.140')] [2024-06-15 14:50:12,245][1651669] Updated weights for policy 0, policy_version 275077 (0.0013) [2024-06-15 14:50:13,241][1651274] Signal inference workers to stop experience collection... (14400 times) [2024-06-15 14:50:13,391][1651669] InferenceWorker_p0-w0: stopping experience collection (14400 times) [2024-06-15 14:50:13,589][1651274] Signal inference workers to resume experience collection... (14400 times) [2024-06-15 14:50:13,590][1651669] InferenceWorker_p0-w0: resuming experience collection (14400 times) [2024-06-15 14:50:14,000][1651669] Updated weights for policy 0, policy_version 275154 (0.0011) [2024-06-15 14:50:15,600][1651669] Updated weights for policy 0, policy_version 275218 (0.0011) [2024-06-15 14:50:15,773][1648981] Fps is (10 sec: 42655.3, 60 sec: 48054.8, 300 sec: 47875.1). Total num frames: 563642368. Throughput: 0: 11712.2. Samples: 140967424. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:15,774][1648981] Avg episode reward: [(0, '493.910')] [2024-06-15 14:50:20,108][1651669] Updated weights for policy 0, policy_version 275270 (0.0013) [2024-06-15 14:50:20,767][1648981] Fps is (10 sec: 45928.7, 60 sec: 45329.7, 300 sec: 47319.2). Total num frames: 563806208. Throughput: 0: 11707.7. Samples: 141038592. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:20,767][1648981] Avg episode reward: [(0, '469.540')] [2024-06-15 14:50:21,317][1651669] Updated weights for policy 0, policy_version 275324 (0.0211) [2024-06-15 14:50:24,807][1651669] Updated weights for policy 0, policy_version 275377 (0.0026) [2024-06-15 14:50:25,766][1648981] Fps is (10 sec: 42625.7, 60 sec: 46967.5, 300 sec: 47654.5). Total num frames: 564068352. Throughput: 0: 11657.0. Samples: 141073920. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:25,767][1648981] Avg episode reward: [(0, '489.630')] [2024-06-15 14:50:26,391][1651669] Updated weights for policy 0, policy_version 275441 (0.0010) [2024-06-15 14:50:27,865][1651669] Updated weights for policy 0, policy_version 275514 (0.0023) [2024-06-15 14:50:30,767][1648981] Fps is (10 sec: 45871.3, 60 sec: 45346.2, 300 sec: 47097.0). Total num frames: 564264960. Throughput: 0: 11472.6. Samples: 141140480. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:30,768][1648981] Avg episode reward: [(0, '499.640')] [2024-06-15 14:50:32,383][1651669] Updated weights for policy 0, policy_version 275575 (0.0010) [2024-06-15 14:50:35,232][1651669] Updated weights for policy 0, policy_version 275616 (0.0012) [2024-06-15 14:50:35,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.4, 300 sec: 47430.9). Total num frames: 564494336. Throughput: 0: 11641.6. Samples: 141216768. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:35,767][1648981] Avg episode reward: [(0, '486.880')] [2024-06-15 14:50:36,528][1651669] Updated weights for policy 0, policy_version 275666 (0.0011) [2024-06-15 14:50:37,865][1651669] Updated weights for policy 0, policy_version 275728 (0.0011) [2024-06-15 14:50:40,787][1648981] Fps is (10 sec: 52327.2, 60 sec: 45877.8, 300 sec: 47427.0). Total num frames: 564789248. Throughput: 0: 11669.5. Samples: 141245440. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:40,787][1648981] Avg episode reward: [(0, '491.170')] [2024-06-15 14:50:42,624][1651669] Updated weights for policy 0, policy_version 275777 (0.0037) [2024-06-15 14:50:43,811][1651669] Updated weights for policy 0, policy_version 275836 (0.0015) [2024-06-15 14:50:45,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45887.3, 300 sec: 47097.1). Total num frames: 564920320. Throughput: 0: 11798.7. Samples: 141323776. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:45,767][1648981] Avg episode reward: [(0, '480.730')] [2024-06-15 14:50:47,291][1651669] Updated weights for policy 0, policy_version 275904 (0.0011) [2024-06-15 14:50:49,026][1651669] Updated weights for policy 0, policy_version 275969 (0.0012) [2024-06-15 14:50:50,523][1651669] Updated weights for policy 0, policy_version 276028 (0.0026) [2024-06-15 14:50:50,766][1648981] Fps is (10 sec: 52536.1, 60 sec: 47520.1, 300 sec: 47542.9). Total num frames: 565313536. Throughput: 0: 11678.7. Samples: 141382144. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:50,767][1648981] Avg episode reward: [(0, '465.000')] [2024-06-15 14:50:53,511][1651274] Signal inference workers to stop experience collection... (14450 times) [2024-06-15 14:50:53,568][1651669] InferenceWorker_p0-w0: stopping experience collection (14450 times) [2024-06-15 14:50:53,734][1651274] Signal inference workers to resume experience collection... (14450 times) [2024-06-15 14:50:53,734][1651669] InferenceWorker_p0-w0: resuming experience collection (14450 times) [2024-06-15 14:50:54,612][1651669] Updated weights for policy 0, policy_version 276092 (0.0107) [2024-06-15 14:50:55,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 45878.4, 300 sec: 47319.2). Total num frames: 565444608. Throughput: 0: 11847.3. Samples: 141430272. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:50:55,768][1648981] Avg episode reward: [(0, '485.650')] [2024-06-15 14:50:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000276096_565444608.pth... [2024-06-15 14:50:55,868][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000270528_554041344.pth [2024-06-15 14:50:55,875][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000276096_565444608.pth [2024-06-15 14:50:57,984][1651669] Updated weights for policy 0, policy_version 276160 (0.0013) [2024-06-15 14:50:59,654][1651669] Updated weights for policy 0, policy_version 276226 (0.0015) [2024-06-15 14:51:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49151.9, 300 sec: 47541.6). Total num frames: 565772288. Throughput: 0: 11698.0. Samples: 141493760. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:51:00,767][1648981] Avg episode reward: [(0, '470.240')] [2024-06-15 14:51:01,130][1651669] Updated weights for policy 0, policy_version 276282 (0.0098) [2024-06-15 14:51:04,868][1651669] Updated weights for policy 0, policy_version 276322 (0.0012) [2024-06-15 14:51:05,769][1648981] Fps is (10 sec: 52419.1, 60 sec: 45888.7, 300 sec: 47541.0). Total num frames: 565968896. Throughput: 0: 11821.0. Samples: 141570560. Policy #0 lag: (min: 4.0, avg: 101.2, max: 260.0) [2024-06-15 14:51:05,769][1648981] Avg episode reward: [(0, '434.220')] [2024-06-15 14:51:08,543][1651669] Updated weights for policy 0, policy_version 276401 (0.0011) [2024-06-15 14:51:10,599][1651669] Updated weights for policy 0, policy_version 276500 (0.0131) [2024-06-15 14:51:10,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 48615.5, 300 sec: 47652.5). Total num frames: 566263808. Throughput: 0: 11912.6. Samples: 141609984. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:10,767][1648981] Avg episode reward: [(0, '438.680')] [2024-06-15 14:51:11,825][1651669] Updated weights for policy 0, policy_version 276544 (0.0016) [2024-06-15 14:51:15,766][1648981] Fps is (10 sec: 42607.3, 60 sec: 45880.1, 300 sec: 47208.1). Total num frames: 566394880. Throughput: 0: 11833.1. Samples: 141672960. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:15,767][1648981] Avg episode reward: [(0, '481.260')] [2024-06-15 14:51:18,789][1651669] Updated weights for policy 0, policy_version 276612 (0.0016) [2024-06-15 14:51:20,349][1651669] Updated weights for policy 0, policy_version 276691 (0.0014) [2024-06-15 14:51:20,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 566689792. Throughput: 0: 11821.5. Samples: 141748736. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:20,767][1648981] Avg episode reward: [(0, '477.510')] [2024-06-15 14:51:22,005][1651669] Updated weights for policy 0, policy_version 276768 (0.0107) [2024-06-15 14:51:25,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 566886400. Throughput: 0: 11781.4. Samples: 141775360. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:25,767][1648981] Avg episode reward: [(0, '473.800')] [2024-06-15 14:51:26,439][1651669] Updated weights for policy 0, policy_version 276816 (0.0014) [2024-06-15 14:51:27,447][1651669] Updated weights for policy 0, policy_version 276860 (0.0013) [2024-06-15 14:51:30,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 46968.2, 300 sec: 47319.3). Total num frames: 567083008. Throughput: 0: 11889.8. Samples: 141858816. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:30,767][1648981] Avg episode reward: [(0, '465.550')] [2024-06-15 14:51:31,316][1651669] Updated weights for policy 0, policy_version 276944 (0.0017) [2024-06-15 14:51:31,448][1651274] Signal inference workers to stop experience collection... (14500 times) [2024-06-15 14:51:31,495][1651669] InferenceWorker_p0-w0: stopping experience collection (14500 times) [2024-06-15 14:51:31,626][1651274] Signal inference workers to resume experience collection... (14500 times) [2024-06-15 14:51:31,630][1651669] InferenceWorker_p0-w0: resuming experience collection (14500 times) [2024-06-15 14:51:32,407][1651669] Updated weights for policy 0, policy_version 276995 (0.0116) [2024-06-15 14:51:35,770][1648981] Fps is (10 sec: 52407.1, 60 sec: 48602.5, 300 sec: 47323.0). Total num frames: 567410688. Throughput: 0: 12059.3. Samples: 141924864. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:35,771][1648981] Avg episode reward: [(0, '465.270')] [2024-06-15 14:51:37,397][1651669] Updated weights for policy 0, policy_version 277072 (0.0014) [2024-06-15 14:51:40,742][1651669] Updated weights for policy 0, policy_version 277138 (0.0108) [2024-06-15 14:51:40,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 46437.0, 300 sec: 47208.1). Total num frames: 567574528. Throughput: 0: 11719.2. Samples: 141957632. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:40,767][1648981] Avg episode reward: [(0, '486.130')] [2024-06-15 14:51:41,828][1651669] Updated weights for policy 0, policy_version 277188 (0.0013) [2024-06-15 14:51:42,876][1651669] Updated weights for policy 0, policy_version 277248 (0.0014) [2024-06-15 14:51:44,019][1651669] Updated weights for policy 0, policy_version 277303 (0.0012) [2024-06-15 14:51:45,766][1648981] Fps is (10 sec: 52450.1, 60 sec: 50244.2, 300 sec: 47656.9). Total num frames: 567934976. Throughput: 0: 12094.6. Samples: 142038016. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:45,767][1648981] Avg episode reward: [(0, '477.860')] [2024-06-15 14:51:48,588][1651669] Updated weights for policy 0, policy_version 277367 (0.0013) [2024-06-15 14:51:50,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 568066048. Throughput: 0: 12197.6. Samples: 142119424. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:50,767][1648981] Avg episode reward: [(0, '513.320')] [2024-06-15 14:51:51,465][1651669] Updated weights for policy 0, policy_version 277410 (0.0014) [2024-06-15 14:51:52,968][1651669] Updated weights for policy 0, policy_version 277477 (0.0014) [2024-06-15 14:51:54,720][1651669] Updated weights for policy 0, policy_version 277552 (0.0017) [2024-06-15 14:51:55,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.5, 300 sec: 48098.1). Total num frames: 568459264. Throughput: 0: 12037.7. Samples: 142151680. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:51:55,767][1648981] Avg episode reward: [(0, '470.860')] [2024-06-15 14:51:58,447][1651669] Updated weights for policy 0, policy_version 277587 (0.0012) [2024-06-15 14:52:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 568590336. Throughput: 0: 12265.3. Samples: 142224896. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:52:00,767][1648981] Avg episode reward: [(0, '453.800')] [2024-06-15 14:52:02,213][1651669] Updated weights for policy 0, policy_version 277649 (0.0012) [2024-06-15 14:52:03,717][1651669] Updated weights for policy 0, policy_version 277712 (0.0012) [2024-06-15 14:52:05,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48607.7, 300 sec: 48097.4). Total num frames: 568885248. Throughput: 0: 12060.5. Samples: 142291456. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:52:05,767][1648981] Avg episode reward: [(0, '455.420')] [2024-06-15 14:52:05,900][1651669] Updated weights for policy 0, policy_version 277792 (0.0012) [2024-06-15 14:52:09,849][1651669] Updated weights for policy 0, policy_version 277840 (0.0016) [2024-06-15 14:52:10,315][1651274] Signal inference workers to stop experience collection... (14550 times) [2024-06-15 14:52:10,367][1651669] InferenceWorker_p0-w0: stopping experience collection (14550 times) [2024-06-15 14:52:10,621][1651274] Signal inference workers to resume experience collection... (14550 times) [2024-06-15 14:52:10,622][1651669] InferenceWorker_p0-w0: resuming experience collection (14550 times) [2024-06-15 14:52:10,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 47653.7). Total num frames: 569081856. Throughput: 0: 12333.5. Samples: 142330368. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:52:10,767][1648981] Avg episode reward: [(0, '471.520')] [2024-06-15 14:52:13,046][1651669] Updated weights for policy 0, policy_version 277904 (0.0014) [2024-06-15 14:52:14,964][1651669] Updated weights for policy 0, policy_version 277984 (0.0012) [2024-06-15 14:52:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.1, 300 sec: 47874.6). Total num frames: 569344000. Throughput: 0: 11980.8. Samples: 142397952. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:52:15,767][1648981] Avg episode reward: [(0, '478.850')] [2024-06-15 14:52:17,110][1651669] Updated weights for policy 0, policy_version 278064 (0.0025) [2024-06-15 14:52:20,802][1648981] Fps is (10 sec: 42446.4, 60 sec: 46939.5, 300 sec: 47535.6). Total num frames: 569507840. Throughput: 0: 11904.2. Samples: 142460928. Policy #0 lag: (min: 95.0, avg: 170.2, max: 335.0) [2024-06-15 14:52:20,803][1648981] Avg episode reward: [(0, '473.660')] [2024-06-15 14:52:22,485][1651669] Updated weights for policy 0, policy_version 278128 (0.0022) [2024-06-15 14:52:25,357][1651669] Updated weights for policy 0, policy_version 278176 (0.0024) [2024-06-15 14:52:25,767][1648981] Fps is (10 sec: 36043.8, 60 sec: 46967.3, 300 sec: 47763.5). Total num frames: 569704448. Throughput: 0: 11958.0. Samples: 142495744. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:25,767][1648981] Avg episode reward: [(0, '460.640')] [2024-06-15 14:52:26,620][1651669] Updated weights for policy 0, policy_version 278224 (0.0014) [2024-06-15 14:52:28,573][1651669] Updated weights for policy 0, policy_version 278290 (0.0013) [2024-06-15 14:52:29,602][1651669] Updated weights for policy 0, policy_version 278335 (0.0014) [2024-06-15 14:52:30,779][1648981] Fps is (10 sec: 52552.8, 60 sec: 49142.0, 300 sec: 47761.5). Total num frames: 570032128. Throughput: 0: 11465.7. Samples: 142554112. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:30,779][1648981] Avg episode reward: [(0, '447.580')] [2024-06-15 14:52:34,561][1651669] Updated weights for policy 0, policy_version 278370 (0.0014) [2024-06-15 14:52:35,772][1648981] Fps is (10 sec: 45850.5, 60 sec: 45874.0, 300 sec: 47540.5). Total num frames: 570163200. Throughput: 0: 11490.1. Samples: 142636544. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:35,773][1648981] Avg episode reward: [(0, '481.640')] [2024-06-15 14:52:36,552][1651669] Updated weights for policy 0, policy_version 278420 (0.0022) [2024-06-15 14:52:38,999][1651669] Updated weights for policy 0, policy_version 278513 (0.0013) [2024-06-15 14:52:40,766][1648981] Fps is (10 sec: 45931.0, 60 sec: 48605.9, 300 sec: 47764.1). Total num frames: 570490880. Throughput: 0: 11377.8. Samples: 142663680. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:40,767][1648981] Avg episode reward: [(0, '467.640')] [2024-06-15 14:52:40,937][1651669] Updated weights for policy 0, policy_version 278579 (0.0012) [2024-06-15 14:52:45,441][1651669] Updated weights for policy 0, policy_version 278613 (0.0028) [2024-06-15 14:52:45,767][1648981] Fps is (10 sec: 45896.4, 60 sec: 44782.2, 300 sec: 47320.4). Total num frames: 570621952. Throughput: 0: 11354.8. Samples: 142735872. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:45,768][1648981] Avg episode reward: [(0, '508.880')] [2024-06-15 14:52:47,659][1651669] Updated weights for policy 0, policy_version 278663 (0.0013) [2024-06-15 14:52:49,643][1651669] Updated weights for policy 0, policy_version 278740 (0.0016) [2024-06-15 14:52:50,524][1651274] Signal inference workers to stop experience collection... (14600 times) [2024-06-15 14:52:50,619][1651669] InferenceWorker_p0-w0: stopping experience collection (14600 times) [2024-06-15 14:52:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 570916864. Throughput: 0: 11138.8. Samples: 142792704. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:50,767][1648981] Avg episode reward: [(0, '512.660')] [2024-06-15 14:52:50,816][1651274] Signal inference workers to resume experience collection... (14600 times) [2024-06-15 14:52:50,818][1651669] InferenceWorker_p0-w0: resuming experience collection (14600 times) [2024-06-15 14:52:51,425][1651669] Updated weights for policy 0, policy_version 278802 (0.0016) [2024-06-15 14:52:55,768][1648981] Fps is (10 sec: 45873.5, 60 sec: 43689.6, 300 sec: 47541.2). Total num frames: 571080704. Throughput: 0: 11081.6. Samples: 142829056. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:52:55,768][1648981] Avg episode reward: [(0, '485.760')] [2024-06-15 14:52:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000278848_571080704.pth... [2024-06-15 14:52:55,885][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000273344_559808512.pth [2024-06-15 14:52:56,944][1651669] Updated weights for policy 0, policy_version 278865 (0.0012) [2024-06-15 14:52:59,711][1651669] Updated weights for policy 0, policy_version 278947 (0.0013) [2024-06-15 14:53:00,778][1648981] Fps is (10 sec: 42547.0, 60 sec: 45865.9, 300 sec: 47317.3). Total num frames: 571342848. Throughput: 0: 11238.2. Samples: 142903808. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:00,779][1648981] Avg episode reward: [(0, '498.980')] [2024-06-15 14:53:02,172][1651669] Updated weights for policy 0, policy_version 279009 (0.0013) [2024-06-15 14:53:04,052][1651669] Updated weights for policy 0, policy_version 279088 (0.0011) [2024-06-15 14:53:05,766][1648981] Fps is (10 sec: 52435.9, 60 sec: 45329.0, 300 sec: 47541.4). Total num frames: 571604992. Throughput: 0: 11238.8. Samples: 142966272. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:05,767][1648981] Avg episode reward: [(0, '488.200')] [2024-06-15 14:53:08,111][1651669] Updated weights for policy 0, policy_version 279120 (0.0021) [2024-06-15 14:53:09,033][1651669] Updated weights for policy 0, policy_version 279168 (0.0013) [2024-06-15 14:53:10,767][1648981] Fps is (10 sec: 42649.2, 60 sec: 44782.8, 300 sec: 47208.1). Total num frames: 571768832. Throughput: 0: 11389.2. Samples: 143008256. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:10,767][1648981] Avg episode reward: [(0, '448.190')] [2024-06-15 14:53:11,262][1651669] Updated weights for policy 0, policy_version 279218 (0.0013) [2024-06-15 14:53:13,232][1651669] Updated weights for policy 0, policy_version 279284 (0.0103) [2024-06-15 14:53:14,851][1651669] Updated weights for policy 0, policy_version 279351 (0.0095) [2024-06-15 14:53:15,774][1648981] Fps is (10 sec: 52387.4, 60 sec: 46415.1, 300 sec: 47541.3). Total num frames: 572129280. Throughput: 0: 11538.2. Samples: 143073280. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:15,775][1648981] Avg episode reward: [(0, '465.920')] [2024-06-15 14:53:19,511][1651669] Updated weights for policy 0, policy_version 279395 (0.0012) [2024-06-15 14:53:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 45902.5, 300 sec: 47319.9). Total num frames: 572260352. Throughput: 0: 11367.8. Samples: 143148032. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:20,767][1648981] Avg episode reward: [(0, '427.820')] [2024-06-15 14:53:21,319][1651669] Updated weights for policy 0, policy_version 279444 (0.0012) [2024-06-15 14:53:23,329][1651669] Updated weights for policy 0, policy_version 279491 (0.0016) [2024-06-15 14:53:25,084][1651669] Updated weights for policy 0, policy_version 279569 (0.0014) [2024-06-15 14:53:25,766][1648981] Fps is (10 sec: 49191.1, 60 sec: 48606.1, 300 sec: 47765.0). Total num frames: 572620800. Throughput: 0: 11605.3. Samples: 143185920. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:25,767][1648981] Avg episode reward: [(0, '436.990')] [2024-06-15 14:53:25,844][1651669] Updated weights for policy 0, policy_version 279609 (0.0013) [2024-06-15 14:53:30,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 44792.0, 300 sec: 47099.6). Total num frames: 572719104. Throughput: 0: 11708.0. Samples: 143262720. Policy #0 lag: (min: 15.0, avg: 103.5, max: 271.0) [2024-06-15 14:53:30,767][1648981] Avg episode reward: [(0, '418.800')] [2024-06-15 14:53:31,014][1651669] Updated weights for policy 0, policy_version 279666 (0.0014) [2024-06-15 14:53:32,321][1651669] Updated weights for policy 0, policy_version 279714 (0.0015) [2024-06-15 14:53:34,270][1651274] Signal inference workers to stop experience collection... (14650 times) [2024-06-15 14:53:34,356][1651669] InferenceWorker_p0-w0: stopping experience collection (14650 times) [2024-06-15 14:53:34,518][1651274] Signal inference workers to resume experience collection... (14650 times) [2024-06-15 14:53:34,519][1651669] InferenceWorker_p0-w0: resuming experience collection (14650 times) [2024-06-15 14:53:34,680][1651669] Updated weights for policy 0, policy_version 279777 (0.0014) [2024-06-15 14:53:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48610.5, 300 sec: 47541.5). Total num frames: 573079552. Throughput: 0: 11844.3. Samples: 143325696. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:53:35,767][1648981] Avg episode reward: [(0, '417.070')] [2024-06-15 14:53:36,191][1651669] Updated weights for policy 0, policy_version 279842 (0.0080) [2024-06-15 14:53:40,287][1651669] Updated weights for policy 0, policy_version 279875 (0.0014) [2024-06-15 14:53:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45329.1, 300 sec: 47208.1). Total num frames: 573210624. Throughput: 0: 12129.1. Samples: 143374848. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:53:40,767][1648981] Avg episode reward: [(0, '410.140')] [2024-06-15 14:53:41,690][1651669] Updated weights for policy 0, policy_version 279933 (0.0034) [2024-06-15 14:53:43,625][1651669] Updated weights for policy 0, policy_version 279985 (0.0044) [2024-06-15 14:53:45,020][1651669] Updated weights for policy 0, policy_version 280016 (0.0019) [2024-06-15 14:53:45,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48060.5, 300 sec: 47319.2). Total num frames: 573505536. Throughput: 0: 11949.9. Samples: 143441408. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:53:45,767][1648981] Avg episode reward: [(0, '432.800')] [2024-06-15 14:53:46,661][1651669] Updated weights for policy 0, policy_version 280082 (0.0029) [2024-06-15 14:53:50,515][1651669] Updated weights for policy 0, policy_version 280129 (0.0016) [2024-06-15 14:53:50,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 46421.2, 300 sec: 47097.0). Total num frames: 573702144. Throughput: 0: 12401.8. Samples: 143524352. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:53:50,767][1648981] Avg episode reward: [(0, '445.040')] [2024-06-15 14:53:51,981][1651669] Updated weights for policy 0, policy_version 280191 (0.0014) [2024-06-15 14:53:54,741][1651669] Updated weights for policy 0, policy_version 280244 (0.0108) [2024-06-15 14:53:55,481][1651669] Updated weights for policy 0, policy_version 280273 (0.0037) [2024-06-15 14:53:55,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 49150.0, 300 sec: 47318.6). Total num frames: 574029824. Throughput: 0: 12196.0. Samples: 143557120. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:53:55,771][1648981] Avg episode reward: [(0, '476.040')] [2024-06-15 14:53:57,485][1651669] Updated weights for policy 0, policy_version 280368 (0.0013) [2024-06-15 14:54:00,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48069.4, 300 sec: 47208.3). Total num frames: 574226432. Throughput: 0: 12199.1. Samples: 143622144. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:00,767][1648981] Avg episode reward: [(0, '453.690')] [2024-06-15 14:54:02,498][1651669] Updated weights for policy 0, policy_version 280400 (0.0013) [2024-06-15 14:54:03,548][1651669] Updated weights for policy 0, policy_version 280441 (0.0129) [2024-06-15 14:54:05,120][1651669] Updated weights for policy 0, policy_version 280507 (0.0013) [2024-06-15 14:54:05,770][1648981] Fps is (10 sec: 45875.9, 60 sec: 48056.8, 300 sec: 47096.5). Total num frames: 574488576. Throughput: 0: 12184.6. Samples: 143696384. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:05,771][1648981] Avg episode reward: [(0, '465.560')] [2024-06-15 14:54:07,193][1651669] Updated weights for policy 0, policy_version 280572 (0.0015) [2024-06-15 14:54:08,406][1651669] Updated weights for policy 0, policy_version 280624 (0.0017) [2024-06-15 14:54:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.3, 300 sec: 47430.3). Total num frames: 574750720. Throughput: 0: 11923.9. Samples: 143722496. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:10,767][1648981] Avg episode reward: [(0, '471.590')] [2024-06-15 14:54:14,582][1651669] Updated weights for policy 0, policy_version 280697 (0.0014) [2024-06-15 14:54:15,263][1651274] Signal inference workers to stop experience collection... (14700 times) [2024-06-15 14:54:15,339][1651669] InferenceWorker_p0-w0: stopping experience collection (14700 times) [2024-06-15 14:54:15,483][1651274] Signal inference workers to resume experience collection... (14700 times) [2024-06-15 14:54:15,484][1651669] InferenceWorker_p0-w0: resuming experience collection (14700 times) [2024-06-15 14:54:15,768][1648981] Fps is (10 sec: 45882.6, 60 sec: 46972.0, 300 sec: 46985.8). Total num frames: 574947328. Throughput: 0: 12150.9. Samples: 143809536. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:15,769][1648981] Avg episode reward: [(0, '462.080')] [2024-06-15 14:54:15,957][1651669] Updated weights for policy 0, policy_version 280752 (0.0080) [2024-06-15 14:54:17,875][1651669] Updated weights for policy 0, policy_version 280825 (0.0013) [2024-06-15 14:54:18,893][1651669] Updated weights for policy 0, policy_version 280864 (0.0013) [2024-06-15 14:54:20,767][1648981] Fps is (10 sec: 52425.3, 60 sec: 50243.7, 300 sec: 47541.3). Total num frames: 575275008. Throughput: 0: 12083.0. Samples: 143869440. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:20,768][1648981] Avg episode reward: [(0, '465.110')] [2024-06-15 14:54:25,266][1651669] Updated weights for policy 0, policy_version 280944 (0.0014) [2024-06-15 14:54:25,766][1648981] Fps is (10 sec: 45885.0, 60 sec: 46421.3, 300 sec: 46989.8). Total num frames: 575406080. Throughput: 0: 12049.1. Samples: 143917056. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:25,767][1648981] Avg episode reward: [(0, '473.520')] [2024-06-15 14:54:27,332][1651669] Updated weights for policy 0, policy_version 281008 (0.0013) [2024-06-15 14:54:29,363][1651669] Updated weights for policy 0, policy_version 281072 (0.0014) [2024-06-15 14:54:30,766][1648981] Fps is (10 sec: 45878.2, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 575733760. Throughput: 0: 11935.3. Samples: 143978496. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:30,767][1648981] Avg episode reward: [(0, '454.090')] [2024-06-15 14:54:31,169][1651669] Updated weights for policy 0, policy_version 281145 (0.0014) [2024-06-15 14:54:35,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45329.0, 300 sec: 46656.5). Total num frames: 575799296. Throughput: 0: 11878.4. Samples: 144058880. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:35,767][1648981] Avg episode reward: [(0, '441.170')] [2024-06-15 14:54:36,427][1651669] Updated weights for policy 0, policy_version 281185 (0.0012) [2024-06-15 14:54:37,697][1651669] Updated weights for policy 0, policy_version 281218 (0.0011) [2024-06-15 14:54:39,519][1651669] Updated weights for policy 0, policy_version 281283 (0.0012) [2024-06-15 14:54:40,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 48605.9, 300 sec: 47321.7). Total num frames: 576126976. Throughput: 0: 12016.0. Samples: 144097792. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:40,767][1648981] Avg episode reward: [(0, '441.160')] [2024-06-15 14:54:41,311][1651669] Updated weights for policy 0, policy_version 281344 (0.0012) [2024-06-15 14:54:42,690][1651669] Updated weights for policy 0, policy_version 281406 (0.0012) [2024-06-15 14:54:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46967.5, 300 sec: 46987.3). Total num frames: 576323584. Throughput: 0: 11844.3. Samples: 144155136. Policy #0 lag: (min: 3.0, avg: 113.2, max: 227.0) [2024-06-15 14:54:45,767][1648981] Avg episode reward: [(0, '454.890')] [2024-06-15 14:54:47,245][1651669] Updated weights for policy 0, policy_version 281461 (0.0014) [2024-06-15 14:54:49,656][1651669] Updated weights for policy 0, policy_version 281520 (0.0013) [2024-06-15 14:54:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48606.0, 300 sec: 47208.9). Total num frames: 576618496. Throughput: 0: 11970.4. Samples: 144235008. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:54:50,767][1648981] Avg episode reward: [(0, '448.070')] [2024-06-15 14:54:51,657][1651669] Updated weights for policy 0, policy_version 281585 (0.0012) [2024-06-15 14:54:52,620][1651274] Signal inference workers to stop experience collection... (14750 times) [2024-06-15 14:54:52,700][1651669] InferenceWorker_p0-w0: stopping experience collection (14750 times) [2024-06-15 14:54:52,955][1651274] Signal inference workers to resume experience collection... (14750 times) [2024-06-15 14:54:52,956][1651669] InferenceWorker_p0-w0: resuming experience collection (14750 times) [2024-06-15 14:54:53,561][1651669] Updated weights for policy 0, policy_version 281655 (0.0022) [2024-06-15 14:54:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 46970.4, 300 sec: 47541.4). Total num frames: 576847872. Throughput: 0: 11832.9. Samples: 144254976. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:54:55,767][1648981] Avg episode reward: [(0, '439.660')] [2024-06-15 14:54:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000281664_576847872.pth... [2024-06-15 14:54:55,832][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000276096_565444608.pth [2024-06-15 14:54:58,805][1651669] Updated weights for policy 0, policy_version 281712 (0.0012) [2024-06-15 14:55:00,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 47513.4, 300 sec: 46989.1). Total num frames: 577077248. Throughput: 0: 11856.1. Samples: 144343040. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:00,768][1648981] Avg episode reward: [(0, '437.230')] [2024-06-15 14:55:00,808][1651669] Updated weights for policy 0, policy_version 281783 (0.0024) [2024-06-15 14:55:02,381][1651669] Updated weights for policy 0, policy_version 281840 (0.0012) [2024-06-15 14:55:04,190][1651669] Updated weights for policy 0, policy_version 281907 (0.0016) [2024-06-15 14:55:05,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48062.6, 300 sec: 47543.3). Total num frames: 577372160. Throughput: 0: 11730.6. Samples: 144397312. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:05,767][1648981] Avg episode reward: [(0, '439.900')] [2024-06-15 14:55:09,599][1651669] Updated weights for policy 0, policy_version 281940 (0.0012) [2024-06-15 14:55:10,767][1648981] Fps is (10 sec: 42598.3, 60 sec: 45875.0, 300 sec: 46987.0). Total num frames: 577503232. Throughput: 0: 11605.3. Samples: 144439296. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:10,767][1648981] Avg episode reward: [(0, '407.280')] [2024-06-15 14:55:11,115][1651669] Updated weights for policy 0, policy_version 281986 (0.0016) [2024-06-15 14:55:12,880][1651669] Updated weights for policy 0, policy_version 282051 (0.0013) [2024-06-15 14:55:15,044][1651669] Updated weights for policy 0, policy_version 282128 (0.0149) [2024-06-15 14:55:15,775][1648981] Fps is (10 sec: 45835.7, 60 sec: 48054.4, 300 sec: 47540.0). Total num frames: 577830912. Throughput: 0: 11660.0. Samples: 144503296. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:15,776][1648981] Avg episode reward: [(0, '424.730')] [2024-06-15 14:55:16,087][1651669] Updated weights for policy 0, policy_version 282176 (0.0014) [2024-06-15 14:55:20,766][1648981] Fps is (10 sec: 39322.6, 60 sec: 43691.2, 300 sec: 46874.9). Total num frames: 577896448. Throughput: 0: 11514.3. Samples: 144577024. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:20,767][1648981] Avg episode reward: [(0, '425.430')] [2024-06-15 14:55:21,742][1651669] Updated weights for policy 0, policy_version 282224 (0.0015) [2024-06-15 14:55:24,277][1651669] Updated weights for policy 0, policy_version 282304 (0.0132) [2024-06-15 14:55:25,323][1651669] Updated weights for policy 0, policy_version 282339 (0.0017) [2024-06-15 14:55:25,766][1648981] Fps is (10 sec: 42635.5, 60 sec: 47513.6, 300 sec: 47430.4). Total num frames: 578256896. Throughput: 0: 11423.3. Samples: 144611840. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:25,767][1648981] Avg episode reward: [(0, '412.210')] [2024-06-15 14:55:26,836][1651669] Updated weights for policy 0, policy_version 282400 (0.0013) [2024-06-15 14:55:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 47208.1). Total num frames: 578420736. Throughput: 0: 11605.3. Samples: 144677376. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:30,767][1648981] Avg episode reward: [(0, '418.170')] [2024-06-15 14:55:31,556][1651669] Updated weights for policy 0, policy_version 282435 (0.0054) [2024-06-15 14:55:34,764][1651669] Updated weights for policy 0, policy_version 282528 (0.0014) [2024-06-15 14:55:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 47100.3). Total num frames: 578682880. Throughput: 0: 11355.0. Samples: 144745984. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:35,767][1648981] Avg episode reward: [(0, '417.970')] [2024-06-15 14:55:36,556][1651274] Signal inference workers to stop experience collection... (14800 times) [2024-06-15 14:55:36,609][1651669] InferenceWorker_p0-w0: stopping experience collection (14800 times) [2024-06-15 14:55:36,627][1651669] Updated weights for policy 0, policy_version 282578 (0.0012) [2024-06-15 14:55:36,904][1651274] Signal inference workers to resume experience collection... (14800 times) [2024-06-15 14:55:36,905][1651669] InferenceWorker_p0-w0: resuming experience collection (14800 times) [2024-06-15 14:55:38,824][1651669] Updated weights for policy 0, policy_version 282660 (0.0012) [2024-06-15 14:55:40,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 46967.2, 300 sec: 47541.3). Total num frames: 578945024. Throughput: 0: 11502.9. Samples: 144772608. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:40,768][1648981] Avg episode reward: [(0, '409.280')] [2024-06-15 14:55:43,076][1651669] Updated weights for policy 0, policy_version 282707 (0.0016) [2024-06-15 14:55:43,721][1651669] Updated weights for policy 0, policy_version 282748 (0.0012) [2024-06-15 14:55:45,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 579108864. Throughput: 0: 11400.6. Samples: 144856064. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:45,767][1648981] Avg episode reward: [(0, '416.080')] [2024-06-15 14:55:46,617][1651669] Updated weights for policy 0, policy_version 282809 (0.0094) [2024-06-15 14:55:47,970][1651669] Updated weights for policy 0, policy_version 282850 (0.0013) [2024-06-15 14:55:49,697][1651669] Updated weights for policy 0, policy_version 282932 (0.0015) [2024-06-15 14:55:50,766][1648981] Fps is (10 sec: 52430.9, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 579469312. Throughput: 0: 11741.9. Samples: 144925696. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:50,767][1648981] Avg episode reward: [(0, '411.890')] [2024-06-15 14:55:53,158][1651669] Updated weights for policy 0, policy_version 282963 (0.0012) [2024-06-15 14:55:55,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 45875.0, 300 sec: 46874.9). Total num frames: 579600384. Throughput: 0: 11821.5. Samples: 144971264. Policy #0 lag: (min: 15.0, avg: 102.5, max: 271.0) [2024-06-15 14:55:55,767][1648981] Avg episode reward: [(0, '414.430')] [2024-06-15 14:55:56,542][1651669] Updated weights for policy 0, policy_version 283011 (0.0068) [2024-06-15 14:55:58,351][1651669] Updated weights for policy 0, policy_version 283094 (0.0136) [2024-06-15 14:56:00,093][1651669] Updated weights for policy 0, policy_version 283170 (0.0102) [2024-06-15 14:56:00,771][1648981] Fps is (10 sec: 52404.3, 60 sec: 48602.3, 300 sec: 47541.0). Total num frames: 579993600. Throughput: 0: 11856.7. Samples: 145036800. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:00,771][1648981] Avg episode reward: [(0, '416.350')] [2024-06-15 14:56:03,652][1651669] Updated weights for policy 0, policy_version 283216 (0.0013) [2024-06-15 14:56:04,760][1651669] Updated weights for policy 0, policy_version 283262 (0.0013) [2024-06-15 14:56:05,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 45875.3, 300 sec: 46986.0). Total num frames: 580124672. Throughput: 0: 12197.0. Samples: 145125888. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:05,767][1648981] Avg episode reward: [(0, '431.020')] [2024-06-15 14:56:07,245][1651669] Updated weights for policy 0, policy_version 283312 (0.0013) [2024-06-15 14:56:08,963][1651669] Updated weights for policy 0, policy_version 283376 (0.0163) [2024-06-15 14:56:10,261][1651669] Updated weights for policy 0, policy_version 283448 (0.0015) [2024-06-15 14:56:10,766][1648981] Fps is (10 sec: 52453.0, 60 sec: 50244.5, 300 sec: 47874.6). Total num frames: 580517888. Throughput: 0: 12162.9. Samples: 145159168. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:10,767][1648981] Avg episode reward: [(0, '438.920')] [2024-06-15 14:56:15,403][1651669] Updated weights for policy 0, policy_version 283516 (0.0012) [2024-06-15 14:56:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46974.3, 300 sec: 47319.2). Total num frames: 580648960. Throughput: 0: 12424.6. Samples: 145236480. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:15,767][1648981] Avg episode reward: [(0, '426.410')] [2024-06-15 14:56:16,441][1651274] Signal inference workers to stop experience collection... (14850 times) [2024-06-15 14:56:16,505][1651669] InferenceWorker_p0-w0: stopping experience collection (14850 times) [2024-06-15 14:56:16,787][1651274] Signal inference workers to resume experience collection... (14850 times) [2024-06-15 14:56:16,788][1651669] InferenceWorker_p0-w0: resuming experience collection (14850 times) [2024-06-15 14:56:17,581][1651669] Updated weights for policy 0, policy_version 283568 (0.0012) [2024-06-15 14:56:20,196][1651669] Updated weights for policy 0, policy_version 283664 (0.0010) [2024-06-15 14:56:20,770][1648981] Fps is (10 sec: 45858.0, 60 sec: 51333.3, 300 sec: 47762.9). Total num frames: 580976640. Throughput: 0: 12389.4. Samples: 145303552. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:20,771][1648981] Avg episode reward: [(0, '420.770')] [2024-06-15 14:56:25,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 581042176. Throughput: 0: 12527.0. Samples: 145336320. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:25,767][1648981] Avg episode reward: [(0, '421.450')] [2024-06-15 14:56:26,422][1651669] Updated weights for policy 0, policy_version 283744 (0.0013) [2024-06-15 14:56:27,098][1651669] Updated weights for policy 0, policy_version 283772 (0.0010) [2024-06-15 14:56:28,750][1651669] Updated weights for policy 0, policy_version 283810 (0.0011) [2024-06-15 14:56:30,366][1651669] Updated weights for policy 0, policy_version 283875 (0.0013) [2024-06-15 14:56:30,767][1648981] Fps is (10 sec: 42613.5, 60 sec: 49698.0, 300 sec: 47430.9). Total num frames: 581402624. Throughput: 0: 12356.2. Samples: 145412096. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:30,767][1648981] Avg episode reward: [(0, '449.200')] [2024-06-15 14:56:31,960][1651669] Updated weights for policy 0, policy_version 283939 (0.0029) [2024-06-15 14:56:35,790][1648981] Fps is (10 sec: 52304.1, 60 sec: 48040.7, 300 sec: 47426.5). Total num frames: 581566464. Throughput: 0: 12315.6. Samples: 145480192. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:35,791][1648981] Avg episode reward: [(0, '471.650')] [2024-06-15 14:56:38,389][1651669] Updated weights for policy 0, policy_version 284020 (0.0014) [2024-06-15 14:56:39,775][1651669] Updated weights for policy 0, policy_version 284064 (0.0011) [2024-06-15 14:56:40,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 48060.0, 300 sec: 47097.1). Total num frames: 581828608. Throughput: 0: 12026.4. Samples: 145512448. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:40,767][1648981] Avg episode reward: [(0, '472.180')] [2024-06-15 14:56:41,560][1651669] Updated weights for policy 0, policy_version 284128 (0.0012) [2024-06-15 14:56:43,599][1651669] Updated weights for policy 0, policy_version 284197 (0.0012) [2024-06-15 14:56:45,771][1648981] Fps is (10 sec: 52531.7, 60 sec: 49694.6, 300 sec: 47540.7). Total num frames: 582090752. Throughput: 0: 11821.6. Samples: 145568768. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:45,771][1648981] Avg episode reward: [(0, '465.750')] [2024-06-15 14:56:49,987][1651669] Updated weights for policy 0, policy_version 284256 (0.0033) [2024-06-15 14:56:50,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 582221824. Throughput: 0: 11571.2. Samples: 145646592. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:50,767][1648981] Avg episode reward: [(0, '468.510')] [2024-06-15 14:56:51,269][1651669] Updated weights for policy 0, policy_version 284292 (0.0013) [2024-06-15 14:56:53,714][1651669] Updated weights for policy 0, policy_version 284388 (0.0014) [2024-06-15 14:56:54,450][1651274] Signal inference workers to stop experience collection... (14900 times) [2024-06-15 14:56:54,492][1651669] InferenceWorker_p0-w0: stopping experience collection (14900 times) [2024-06-15 14:56:54,777][1651274] Signal inference workers to resume experience collection... (14900 times) [2024-06-15 14:56:54,778][1651669] InferenceWorker_p0-w0: resuming experience collection (14900 times) [2024-06-15 14:56:55,446][1651669] Updated weights for policy 0, policy_version 284453 (0.0015) [2024-06-15 14:56:55,766][1648981] Fps is (10 sec: 49172.7, 60 sec: 49698.4, 300 sec: 47430.3). Total num frames: 582582272. Throughput: 0: 11537.1. Samples: 145678336. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:56:55,767][1648981] Avg episode reward: [(0, '479.260')] [2024-06-15 14:56:55,854][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000284480_582615040.pth... [2024-06-15 14:56:55,901][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000278848_571080704.pth [2024-06-15 14:57:00,769][1648981] Fps is (10 sec: 39310.6, 60 sec: 43692.0, 300 sec: 46541.2). Total num frames: 582615040. Throughput: 0: 11286.0. Samples: 145744384. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:57:00,772][1648981] Avg episode reward: [(0, '455.350')] [2024-06-15 14:57:01,917][1651669] Updated weights for policy 0, policy_version 284528 (0.0025) [2024-06-15 14:57:03,609][1651669] Updated weights for policy 0, policy_version 284576 (0.0022) [2024-06-15 14:57:05,400][1651669] Updated weights for policy 0, policy_version 284640 (0.0036) [2024-06-15 14:57:05,782][1648981] Fps is (10 sec: 35988.4, 60 sec: 46955.2, 300 sec: 46983.5). Total num frames: 582942720. Throughput: 0: 11238.3. Samples: 145809408. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:57:05,783][1648981] Avg episode reward: [(0, '462.930')] [2024-06-15 14:57:07,588][1651669] Updated weights for policy 0, policy_version 284720 (0.0013) [2024-06-15 14:57:10,767][1648981] Fps is (10 sec: 52442.3, 60 sec: 43690.5, 300 sec: 46763.8). Total num frames: 583139328. Throughput: 0: 11161.5. Samples: 145838592. Policy #0 lag: (min: 114.0, avg: 178.4, max: 309.0) [2024-06-15 14:57:10,768][1648981] Avg episode reward: [(0, '484.280')] [2024-06-15 14:57:12,924][1651669] Updated weights for policy 0, policy_version 284768 (0.0014) [2024-06-15 14:57:14,940][1651669] Updated weights for policy 0, policy_version 284818 (0.0017) [2024-06-15 14:57:15,782][1648981] Fps is (10 sec: 42598.3, 60 sec: 45317.1, 300 sec: 46989.2). Total num frames: 583368704. Throughput: 0: 11157.7. Samples: 145914368. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:15,783][1648981] Avg episode reward: [(0, '491.170')] [2024-06-15 14:57:17,029][1651669] Updated weights for policy 0, policy_version 284897 (0.0110) [2024-06-15 14:57:18,533][1651669] Updated weights for policy 0, policy_version 284960 (0.0012) [2024-06-15 14:57:19,346][1651669] Updated weights for policy 0, policy_version 284992 (0.0073) [2024-06-15 14:57:20,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 44785.8, 300 sec: 47319.3). Total num frames: 583663616. Throughput: 0: 11099.2. Samples: 145979392. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:20,767][1648981] Avg episode reward: [(0, '497.390')] [2024-06-15 14:57:24,896][1651669] Updated weights for policy 0, policy_version 285040 (0.0020) [2024-06-15 14:57:25,767][1648981] Fps is (10 sec: 42662.5, 60 sec: 45874.7, 300 sec: 46654.6). Total num frames: 583794688. Throughput: 0: 11241.1. Samples: 146018304. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:25,768][1648981] Avg episode reward: [(0, '457.280')] [2024-06-15 14:57:26,347][1651669] Updated weights for policy 0, policy_version 285090 (0.0012) [2024-06-15 14:57:27,982][1651669] Updated weights for policy 0, policy_version 285155 (0.0013) [2024-06-15 14:57:29,139][1651669] Updated weights for policy 0, policy_version 285201 (0.0012) [2024-06-15 14:57:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46421.5, 300 sec: 47542.3). Total num frames: 584187904. Throughput: 0: 11401.6. Samples: 146081792. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:30,767][1648981] Avg episode reward: [(0, '449.830')] [2024-06-15 14:57:34,776][1651669] Updated weights for policy 0, policy_version 285249 (0.0011) [2024-06-15 14:57:35,766][1648981] Fps is (10 sec: 49155.3, 60 sec: 45347.1, 300 sec: 46763.8). Total num frames: 584286208. Throughput: 0: 11628.1. Samples: 146169856. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:35,767][1648981] Avg episode reward: [(0, '453.110')] [2024-06-15 14:57:35,772][1651669] Updated weights for policy 0, policy_version 285309 (0.0020) [2024-06-15 14:57:36,256][1651274] Signal inference workers to stop experience collection... (14950 times) [2024-06-15 14:57:36,327][1651669] InferenceWorker_p0-w0: stopping experience collection (14950 times) [2024-06-15 14:57:36,444][1651274] Signal inference workers to resume experience collection... (14950 times) [2024-06-15 14:57:36,445][1651669] InferenceWorker_p0-w0: resuming experience collection (14950 times) [2024-06-15 14:57:36,710][1651669] Updated weights for policy 0, policy_version 285348 (0.0099) [2024-06-15 14:57:38,118][1651669] Updated weights for policy 0, policy_version 285408 (0.0011) [2024-06-15 14:57:39,726][1651669] Updated weights for policy 0, policy_version 285457 (0.0011) [2024-06-15 14:57:40,756][1651669] Updated weights for policy 0, policy_version 285504 (0.0011) [2024-06-15 14:57:40,790][1648981] Fps is (10 sec: 52307.1, 60 sec: 48041.2, 300 sec: 47759.9). Total num frames: 584712192. Throughput: 0: 11542.5. Samples: 146198016. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:40,790][1648981] Avg episode reward: [(0, '429.500')] [2024-06-15 14:57:45,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 44234.2, 300 sec: 46873.7). Total num frames: 584744960. Throughput: 0: 11808.8. Samples: 146275840. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:45,775][1648981] Avg episode reward: [(0, '435.390')] [2024-06-15 14:57:46,568][1651669] Updated weights for policy 0, policy_version 285563 (0.0019) [2024-06-15 14:57:47,435][1651669] Updated weights for policy 0, policy_version 285600 (0.0105) [2024-06-15 14:57:48,845][1651669] Updated weights for policy 0, policy_version 285664 (0.0013) [2024-06-15 14:57:49,879][1651669] Updated weights for policy 0, policy_version 285701 (0.0014) [2024-06-15 14:57:50,766][1648981] Fps is (10 sec: 45981.9, 60 sec: 49152.0, 300 sec: 47763.8). Total num frames: 585170944. Throughput: 0: 11893.9. Samples: 146344448. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:50,767][1648981] Avg episode reward: [(0, '436.240')] [2024-06-15 14:57:55,766][1648981] Fps is (10 sec: 49190.3, 60 sec: 44236.8, 300 sec: 47099.0). Total num frames: 585236480. Throughput: 0: 12071.9. Samples: 146381824. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:57:55,767][1648981] Avg episode reward: [(0, '434.820')] [2024-06-15 14:57:56,574][1651669] Updated weights for policy 0, policy_version 285777 (0.0013) [2024-06-15 14:57:57,964][1651669] Updated weights for policy 0, policy_version 285840 (0.0020) [2024-06-15 14:57:58,979][1651669] Updated weights for policy 0, policy_version 285888 (0.0031) [2024-06-15 14:58:00,239][1651669] Updated weights for policy 0, policy_version 285944 (0.0015) [2024-06-15 14:58:00,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 50246.7, 300 sec: 47541.4). Total num frames: 585629696. Throughput: 0: 12076.1. Samples: 146457600. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:58:00,767][1648981] Avg episode reward: [(0, '414.060')] [2024-06-15 14:58:02,017][1651669] Updated weights for policy 0, policy_version 286015 (0.0017) [2024-06-15 14:58:05,796][1648981] Fps is (10 sec: 52285.8, 60 sec: 46958.3, 300 sec: 47425.9). Total num frames: 585760768. Throughput: 0: 12189.5. Samples: 146528256. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:58:05,797][1648981] Avg episode reward: [(0, '395.770')] [2024-06-15 14:58:08,205][1651669] Updated weights for policy 0, policy_version 286064 (0.0011) [2024-06-15 14:58:09,413][1651669] Updated weights for policy 0, policy_version 286113 (0.0011) [2024-06-15 14:58:10,702][1651669] Updated weights for policy 0, policy_version 286178 (0.0107) [2024-06-15 14:58:10,791][1648981] Fps is (10 sec: 45762.0, 60 sec: 49132.0, 300 sec: 47316.5). Total num frames: 586088448. Throughput: 0: 12327.0. Samples: 146573312. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:58:10,792][1648981] Avg episode reward: [(0, '397.570')] [2024-06-15 14:58:11,359][1651274] Signal inference workers to stop experience collection... (15000 times) [2024-06-15 14:58:11,430][1651669] InferenceWorker_p0-w0: stopping experience collection (15000 times) [2024-06-15 14:58:11,584][1651274] Signal inference workers to resume experience collection... (15000 times) [2024-06-15 14:58:11,584][1651669] InferenceWorker_p0-w0: resuming experience collection (15000 times) [2024-06-15 14:58:11,887][1651669] Updated weights for policy 0, policy_version 286240 (0.0012) [2024-06-15 14:58:12,668][1651669] Updated weights for policy 0, policy_version 286272 (0.0012) [2024-06-15 14:58:15,766][1648981] Fps is (10 sec: 52573.1, 60 sec: 48618.7, 300 sec: 47541.4). Total num frames: 586285056. Throughput: 0: 12401.8. Samples: 146639872. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:58:15,767][1648981] Avg episode reward: [(0, '408.310')] [2024-06-15 14:58:19,535][1651669] Updated weights for policy 0, policy_version 286341 (0.0127) [2024-06-15 14:58:20,753][1651669] Updated weights for policy 0, policy_version 286406 (0.0016) [2024-06-15 14:58:20,766][1648981] Fps is (10 sec: 45988.3, 60 sec: 48059.7, 300 sec: 47208.1). Total num frames: 586547200. Throughput: 0: 12117.3. Samples: 146715136. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:58:20,767][1648981] Avg episode reward: [(0, '395.540')] [2024-06-15 14:58:22,698][1651669] Updated weights for policy 0, policy_version 286512 (0.0017) [2024-06-15 14:58:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.8, 300 sec: 47763.5). Total num frames: 586809344. Throughput: 0: 12203.3. Samples: 146746880. Policy #0 lag: (min: 15.0, avg: 93.7, max: 271.0) [2024-06-15 14:58:25,767][1648981] Avg episode reward: [(0, '397.590')] [2024-06-15 14:58:29,060][1651669] Updated weights for policy 0, policy_version 286545 (0.0013) [2024-06-15 14:58:30,609][1651669] Updated weights for policy 0, policy_version 286608 (0.0012) [2024-06-15 14:58:30,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 46421.1, 300 sec: 47097.0). Total num frames: 586973184. Throughput: 0: 12347.0. Samples: 146831360. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:58:30,767][1648981] Avg episode reward: [(0, '399.900')] [2024-06-15 14:58:31,701][1651669] Updated weights for policy 0, policy_version 286657 (0.0017) [2024-06-15 14:58:33,583][1651669] Updated weights for policy 0, policy_version 286752 (0.0013) [2024-06-15 14:58:35,784][1648981] Fps is (10 sec: 52335.0, 60 sec: 50775.3, 300 sec: 47871.7). Total num frames: 587333632. Throughput: 0: 12158.0. Samples: 146891776. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:58:35,785][1648981] Avg episode reward: [(0, '408.250')] [2024-06-15 14:58:40,477][1651669] Updated weights for policy 0, policy_version 286817 (0.0179) [2024-06-15 14:58:40,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 44800.3, 300 sec: 47097.1). Total num frames: 587399168. Throughput: 0: 12288.0. Samples: 146934784. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:58:40,767][1648981] Avg episode reward: [(0, '379.500')] [2024-06-15 14:58:42,100][1651669] Updated weights for policy 0, policy_version 286880 (0.0012) [2024-06-15 14:58:43,265][1651669] Updated weights for policy 0, policy_version 286934 (0.0013) [2024-06-15 14:58:45,625][1651669] Updated weights for policy 0, policy_version 287033 (0.0014) [2024-06-15 14:58:45,770][1648981] Fps is (10 sec: 52503.0, 60 sec: 51886.1, 300 sec: 47985.1). Total num frames: 587857920. Throughput: 0: 11922.9. Samples: 146994176. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:58:45,771][1648981] Avg episode reward: [(0, '381.570')] [2024-06-15 14:58:50,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 44782.9, 300 sec: 46875.5). Total num frames: 587857920. Throughput: 0: 12136.1. Samples: 147074048. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:58:50,768][1648981] Avg episode reward: [(0, '382.590')] [2024-06-15 14:58:52,067][1651274] Signal inference workers to stop experience collection... (15050 times) [2024-06-15 14:58:52,124][1651669] InferenceWorker_p0-w0: stopping experience collection (15050 times) [2024-06-15 14:58:52,126][1651669] Updated weights for policy 0, policy_version 287073 (0.0013) [2024-06-15 14:58:52,378][1651274] Signal inference workers to resume experience collection... (15050 times) [2024-06-15 14:58:52,379][1651669] InferenceWorker_p0-w0: resuming experience collection (15050 times) [2024-06-15 14:58:53,716][1651669] Updated weights for policy 0, policy_version 287140 (0.0089) [2024-06-15 14:58:55,766][1648981] Fps is (10 sec: 36058.5, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 588218368. Throughput: 0: 11759.7. Samples: 147102208. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:58:55,767][1648981] Avg episode reward: [(0, '380.800')] [2024-06-15 14:58:55,808][1651669] Updated weights for policy 0, policy_version 287232 (0.0012) [2024-06-15 14:58:56,224][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000287248_588283904.pth... [2024-06-15 14:58:56,395][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000281664_576847872.pth [2024-06-15 14:59:00,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 45874.9, 300 sec: 47097.6). Total num frames: 588382208. Throughput: 0: 11616.6. Samples: 147162624. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:00,767][1648981] Avg episode reward: [(0, '362.180')] [2024-06-15 14:59:03,031][1651669] Updated weights for policy 0, policy_version 287299 (0.0016) [2024-06-15 14:59:05,127][1651669] Updated weights for policy 0, policy_version 287392 (0.0011) [2024-06-15 14:59:05,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 47535.3, 300 sec: 46986.0). Total num frames: 588611584. Throughput: 0: 11457.4. Samples: 147230720. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:05,767][1648981] Avg episode reward: [(0, '380.450')] [2024-06-15 14:59:06,733][1651669] Updated weights for policy 0, policy_version 287459 (0.0049) [2024-06-15 14:59:07,706][1651669] Updated weights for policy 0, policy_version 287504 (0.0012) [2024-06-15 14:59:10,783][1648981] Fps is (10 sec: 52341.0, 60 sec: 46973.4, 300 sec: 47316.8). Total num frames: 588906496. Throughput: 0: 11407.6. Samples: 147260416. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:10,784][1648981] Avg episode reward: [(0, '380.600')] [2024-06-15 14:59:14,636][1651669] Updated weights for policy 0, policy_version 287568 (0.0014) [2024-06-15 14:59:15,767][1648981] Fps is (10 sec: 42597.4, 60 sec: 45875.0, 300 sec: 46652.8). Total num frames: 589037568. Throughput: 0: 11434.7. Samples: 147345920. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:15,767][1648981] Avg episode reward: [(0, '371.230')] [2024-06-15 14:59:15,919][1651669] Updated weights for policy 0, policy_version 287620 (0.0015) [2024-06-15 14:59:17,702][1651669] Updated weights for policy 0, policy_version 287686 (0.0013) [2024-06-15 14:59:19,321][1651669] Updated weights for policy 0, policy_version 287760 (0.0013) [2024-06-15 14:59:20,391][1651669] Updated weights for policy 0, policy_version 287808 (0.0014) [2024-06-15 14:59:20,766][1648981] Fps is (10 sec: 52518.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 589430784. Throughput: 0: 11200.2. Samples: 147395584. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:20,767][1648981] Avg episode reward: [(0, '369.930')] [2024-06-15 14:59:25,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 43690.7, 300 sec: 46430.6). Total num frames: 589430784. Throughput: 0: 11161.6. Samples: 147437056. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:25,767][1648981] Avg episode reward: [(0, '399.860')] [2024-06-15 14:59:26,921][1651669] Updated weights for policy 0, policy_version 287856 (0.0015) [2024-06-15 14:59:28,725][1651669] Updated weights for policy 0, policy_version 287909 (0.0012) [2024-06-15 14:59:29,359][1651274] Signal inference workers to stop experience collection... (15100 times) [2024-06-15 14:59:29,389][1651669] InferenceWorker_p0-w0: stopping experience collection (15100 times) [2024-06-15 14:59:29,561][1651274] Signal inference workers to resume experience collection... (15100 times) [2024-06-15 14:59:29,563][1651669] InferenceWorker_p0-w0: resuming experience collection (15100 times) [2024-06-15 14:59:30,770][1648981] Fps is (10 sec: 39309.1, 60 sec: 47511.3, 300 sec: 47540.9). Total num frames: 589824000. Throughput: 0: 11400.7. Samples: 147507200. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:30,770][1648981] Avg episode reward: [(0, '415.020')] [2024-06-15 14:59:30,988][1651669] Updated weights for policy 0, policy_version 288016 (0.0129) [2024-06-15 14:59:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 43703.7, 300 sec: 46874.9). Total num frames: 589955072. Throughput: 0: 11161.6. Samples: 147576320. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:35,767][1648981] Avg episode reward: [(0, '420.680')] [2024-06-15 14:59:36,888][1651669] Updated weights for policy 0, policy_version 288065 (0.0016) [2024-06-15 14:59:38,177][1651669] Updated weights for policy 0, policy_version 288121 (0.0106) [2024-06-15 14:59:40,173][1651669] Updated weights for policy 0, policy_version 288176 (0.0115) [2024-06-15 14:59:40,766][1648981] Fps is (10 sec: 39333.9, 60 sec: 46967.4, 300 sec: 47097.0). Total num frames: 590217216. Throughput: 0: 11286.7. Samples: 147610112. Policy #0 lag: (min: 15.0, avg: 73.5, max: 271.0) [2024-06-15 14:59:40,767][1648981] Avg episode reward: [(0, '421.420')] [2024-06-15 14:59:41,611][1651669] Updated weights for policy 0, policy_version 288224 (0.0012) [2024-06-15 14:59:43,626][1651669] Updated weights for policy 0, policy_version 288308 (0.0175) [2024-06-15 14:59:45,774][1648981] Fps is (10 sec: 52388.1, 60 sec: 43687.8, 300 sec: 46984.7). Total num frames: 590479360. Throughput: 0: 11296.2. Samples: 147671040. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 14:59:45,775][1648981] Avg episode reward: [(0, '413.470')] [2024-06-15 14:59:48,592][1651669] Updated weights for policy 0, policy_version 288339 (0.0021) [2024-06-15 14:59:49,544][1651669] Updated weights for policy 0, policy_version 288380 (0.0027) [2024-06-15 14:59:50,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 590643200. Throughput: 0: 11605.3. Samples: 147752960. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 14:59:50,767][1648981] Avg episode reward: [(0, '427.710')] [2024-06-15 14:59:52,565][1651669] Updated weights for policy 0, policy_version 288470 (0.0105) [2024-06-15 14:59:53,754][1651669] Updated weights for policy 0, policy_version 288516 (0.0019) [2024-06-15 14:59:54,749][1651669] Updated weights for policy 0, policy_version 288573 (0.0014) [2024-06-15 14:59:55,766][1648981] Fps is (10 sec: 52470.0, 60 sec: 46421.4, 300 sec: 47208.2). Total num frames: 591003648. Throughput: 0: 11518.7. Samples: 147778560. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 14:59:55,767][1648981] Avg episode reward: [(0, '406.270')] [2024-06-15 14:59:59,443][1651669] Updated weights for policy 0, policy_version 288610 (0.0013) [2024-06-15 15:00:00,786][1648981] Fps is (10 sec: 49054.8, 60 sec: 45860.3, 300 sec: 46649.6). Total num frames: 591134720. Throughput: 0: 11452.4. Samples: 147861504. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:00,787][1648981] Avg episode reward: [(0, '399.400')] [2024-06-15 15:00:01,828][1651669] Updated weights for policy 0, policy_version 288672 (0.0011) [2024-06-15 15:00:04,248][1651669] Updated weights for policy 0, policy_version 288768 (0.0015) [2024-06-15 15:00:05,623][1651669] Updated weights for policy 0, policy_version 288829 (0.0012) [2024-06-15 15:00:05,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48605.7, 300 sec: 47541.4). Total num frames: 591527936. Throughput: 0: 11525.7. Samples: 147914240. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:05,767][1648981] Avg episode reward: [(0, '407.410')] [2024-06-15 15:00:10,766][1648981] Fps is (10 sec: 39399.6, 60 sec: 43703.1, 300 sec: 46432.0). Total num frames: 591527936. Throughput: 0: 11628.1. Samples: 147960320. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:10,767][1648981] Avg episode reward: [(0, '393.920')] [2024-06-15 15:00:11,589][1651274] Signal inference workers to stop experience collection... (15150 times) [2024-06-15 15:00:11,637][1651669] InferenceWorker_p0-w0: stopping experience collection (15150 times) [2024-06-15 15:00:11,910][1651274] Signal inference workers to resume experience collection... (15150 times) [2024-06-15 15:00:11,911][1651669] InferenceWorker_p0-w0: resuming experience collection (15150 times) [2024-06-15 15:00:12,092][1651669] Updated weights for policy 0, policy_version 288883 (0.0012) [2024-06-15 15:00:13,353][1651669] Updated weights for policy 0, policy_version 288931 (0.0012) [2024-06-15 15:00:15,554][1651669] Updated weights for policy 0, policy_version 289018 (0.0018) [2024-06-15 15:00:15,775][1648981] Fps is (10 sec: 39289.4, 60 sec: 48053.2, 300 sec: 47540.0). Total num frames: 591921152. Throughput: 0: 11422.0. Samples: 148021248. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:15,775][1648981] Avg episode reward: [(0, '357.500')] [2024-06-15 15:00:17,706][1651669] Updated weights for policy 0, policy_version 289058 (0.0012) [2024-06-15 15:00:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 46763.8). Total num frames: 592052224. Throughput: 0: 11525.7. Samples: 148094976. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:20,767][1648981] Avg episode reward: [(0, '356.610')] [2024-06-15 15:00:22,259][1651669] Updated weights for policy 0, policy_version 289093 (0.0014) [2024-06-15 15:00:23,972][1651669] Updated weights for policy 0, policy_version 289156 (0.0353) [2024-06-15 15:00:25,213][1651669] Updated weights for policy 0, policy_version 289215 (0.0013) [2024-06-15 15:00:25,766][1648981] Fps is (10 sec: 39354.6, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 592314368. Throughput: 0: 11650.9. Samples: 148134400. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:25,767][1648981] Avg episode reward: [(0, '363.830')] [2024-06-15 15:00:26,895][1651669] Updated weights for policy 0, policy_version 289276 (0.0133) [2024-06-15 15:00:29,359][1651669] Updated weights for policy 0, policy_version 289338 (0.0012) [2024-06-15 15:00:30,786][1648981] Fps is (10 sec: 52324.9, 60 sec: 45862.5, 300 sec: 47093.9). Total num frames: 592576512. Throughput: 0: 11522.6. Samples: 148189696. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:30,787][1648981] Avg episode reward: [(0, '359.840')] [2024-06-15 15:00:35,437][1651669] Updated weights for policy 0, policy_version 289412 (0.0099) [2024-06-15 15:00:35,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 46421.2, 300 sec: 46763.8). Total num frames: 592740352. Throughput: 0: 11400.5. Samples: 148265984. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:35,767][1648981] Avg episode reward: [(0, '364.580')] [2024-06-15 15:00:36,541][1651669] Updated weights for policy 0, policy_version 289459 (0.0011) [2024-06-15 15:00:37,431][1651669] Updated weights for policy 0, policy_version 289493 (0.0012) [2024-06-15 15:00:39,420][1651669] Updated weights for policy 0, policy_version 289552 (0.0013) [2024-06-15 15:00:40,766][1648981] Fps is (10 sec: 52532.8, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 593100800. Throughput: 0: 11514.3. Samples: 148296704. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:40,767][1648981] Avg episode reward: [(0, '366.400')] [2024-06-15 15:00:44,769][1651669] Updated weights for policy 0, policy_version 289602 (0.0021) [2024-06-15 15:00:45,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 45335.0, 300 sec: 46541.7). Total num frames: 593199104. Throughput: 0: 11508.0. Samples: 148379136. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:45,767][1648981] Avg episode reward: [(0, '347.120')] [2024-06-15 15:00:46,085][1651669] Updated weights for policy 0, policy_version 289659 (0.0061) [2024-06-15 15:00:47,172][1651669] Updated weights for policy 0, policy_version 289712 (0.0012) [2024-06-15 15:00:49,540][1651669] Updated weights for policy 0, policy_version 289761 (0.0010) [2024-06-15 15:00:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 593526784. Throughput: 0: 11855.7. Samples: 148447744. Policy #0 lag: (min: 39.0, avg: 165.4, max: 327.0) [2024-06-15 15:00:50,767][1648981] Avg episode reward: [(0, '352.010')] [2024-06-15 15:00:50,810][1651274] Signal inference workers to stop experience collection... (15200 times) [2024-06-15 15:00:50,901][1651669] InferenceWorker_p0-w0: stopping experience collection (15200 times) [2024-06-15 15:00:51,016][1651274] Signal inference workers to resume experience collection... (15200 times) [2024-06-15 15:00:51,016][1651669] InferenceWorker_p0-w0: resuming experience collection (15200 times) [2024-06-15 15:00:51,018][1651669] Updated weights for policy 0, policy_version 289824 (0.0013) [2024-06-15 15:00:55,677][1651669] Updated weights for policy 0, policy_version 289888 (0.0014) [2024-06-15 15:00:55,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 44782.9, 300 sec: 46431.3). Total num frames: 593690624. Throughput: 0: 11639.5. Samples: 148484096. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:00:55,767][1648981] Avg episode reward: [(0, '351.620')] [2024-06-15 15:00:56,314][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000289920_593756160.pth... [2024-06-15 15:00:56,428][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000284480_582615040.pth [2024-06-15 15:00:56,817][1651669] Updated weights for policy 0, policy_version 289938 (0.0012) [2024-06-15 15:00:59,425][1651669] Updated weights for policy 0, policy_version 289985 (0.0015) [2024-06-15 15:01:00,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 47529.3, 300 sec: 46986.0). Total num frames: 593985536. Throughput: 0: 12028.6. Samples: 148562432. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:00,767][1648981] Avg episode reward: [(0, '363.050')] [2024-06-15 15:01:01,666][1651669] Updated weights for policy 0, policy_version 290067 (0.0126) [2024-06-15 15:01:05,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 43690.8, 300 sec: 46208.4). Total num frames: 594149376. Throughput: 0: 11787.4. Samples: 148625408. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:05,767][1648981] Avg episode reward: [(0, '379.660')] [2024-06-15 15:01:07,375][1651669] Updated weights for policy 0, policy_version 290144 (0.0014) [2024-06-15 15:01:08,743][1651669] Updated weights for policy 0, policy_version 290213 (0.0109) [2024-06-15 15:01:10,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 594411520. Throughput: 0: 11696.3. Samples: 148660736. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:10,767][1648981] Avg episode reward: [(0, '385.040')] [2024-06-15 15:01:11,048][1651669] Updated weights for policy 0, policy_version 290247 (0.0012) [2024-06-15 15:01:13,363][1651669] Updated weights for policy 0, policy_version 290336 (0.0097) [2024-06-15 15:01:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45881.5, 300 sec: 46431.2). Total num frames: 594673664. Throughput: 0: 11849.5. Samples: 148722688. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:15,767][1648981] Avg episode reward: [(0, '387.960')] [2024-06-15 15:01:17,900][1651669] Updated weights for policy 0, policy_version 290384 (0.0011) [2024-06-15 15:01:19,873][1651669] Updated weights for policy 0, policy_version 290464 (0.0094) [2024-06-15 15:01:20,774][1648981] Fps is (10 sec: 52388.2, 60 sec: 48053.5, 300 sec: 47095.8). Total num frames: 594935808. Throughput: 0: 11774.0. Samples: 148795904. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:20,775][1648981] Avg episode reward: [(0, '373.730')] [2024-06-15 15:01:23,247][1651669] Updated weights for policy 0, policy_version 290515 (0.0024) [2024-06-15 15:01:25,044][1651669] Updated weights for policy 0, policy_version 290592 (0.0098) [2024-06-15 15:01:25,770][1648981] Fps is (10 sec: 49133.6, 60 sec: 47510.6, 300 sec: 46652.2). Total num frames: 595165184. Throughput: 0: 11957.0. Samples: 148834816. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:25,771][1648981] Avg episode reward: [(0, '364.820')] [2024-06-15 15:01:28,212][1651669] Updated weights for policy 0, policy_version 290628 (0.0012) [2024-06-15 15:01:30,248][1651669] Updated weights for policy 0, policy_version 290704 (0.0120) [2024-06-15 15:01:30,766][1648981] Fps is (10 sec: 45910.8, 60 sec: 46983.0, 300 sec: 46878.7). Total num frames: 595394560. Throughput: 0: 11628.1. Samples: 148902400. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:30,767][1648981] Avg episode reward: [(0, '382.620')] [2024-06-15 15:01:34,751][1651274] Signal inference workers to stop experience collection... (15250 times) [2024-06-15 15:01:34,804][1651669] InferenceWorker_p0-w0: stopping experience collection (15250 times) [2024-06-15 15:01:35,038][1651274] Signal inference workers to resume experience collection... (15250 times) [2024-06-15 15:01:35,039][1651669] InferenceWorker_p0-w0: resuming experience collection (15250 times) [2024-06-15 15:01:35,041][1651669] Updated weights for policy 0, policy_version 290784 (0.0017) [2024-06-15 15:01:35,766][1648981] Fps is (10 sec: 39336.7, 60 sec: 46967.6, 300 sec: 46541.7). Total num frames: 595558400. Throughput: 0: 11696.4. Samples: 148974080. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:35,767][1648981] Avg episode reward: [(0, '383.290')] [2024-06-15 15:01:36,832][1651669] Updated weights for policy 0, policy_version 290849 (0.0065) [2024-06-15 15:01:37,540][1651669] Updated weights for policy 0, policy_version 290880 (0.0038) [2024-06-15 15:01:40,697][1651669] Updated weights for policy 0, policy_version 290944 (0.0018) [2024-06-15 15:01:40,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 45875.0, 300 sec: 46653.4). Total num frames: 595853312. Throughput: 0: 11605.3. Samples: 149006336. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:40,768][1648981] Avg episode reward: [(0, '391.130')] [2024-06-15 15:01:42,379][1651669] Updated weights for policy 0, policy_version 291008 (0.0012) [2024-06-15 15:01:45,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 46874.9). Total num frames: 596049920. Throughput: 0: 11514.3. Samples: 149080576. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:45,767][1648981] Avg episode reward: [(0, '374.430')] [2024-06-15 15:01:46,344][1651669] Updated weights for policy 0, policy_version 291063 (0.0028) [2024-06-15 15:01:48,030][1651669] Updated weights for policy 0, policy_version 291129 (0.0012) [2024-06-15 15:01:50,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 45875.2, 300 sec: 46430.6). Total num frames: 596279296. Throughput: 0: 11741.9. Samples: 149153792. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:50,767][1648981] Avg episode reward: [(0, '330.360')] [2024-06-15 15:01:50,868][1651669] Updated weights for policy 0, policy_version 291168 (0.0014) [2024-06-15 15:01:51,548][1651669] Updated weights for policy 0, policy_version 291200 (0.0042) [2024-06-15 15:01:53,241][1651669] Updated weights for policy 0, policy_version 291264 (0.0012) [2024-06-15 15:01:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 47097.5). Total num frames: 596508672. Throughput: 0: 11730.5. Samples: 149188608. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:01:55,767][1648981] Avg episode reward: [(0, '306.340')] [2024-06-15 15:01:57,431][1651669] Updated weights for policy 0, policy_version 291327 (0.0012) [2024-06-15 15:01:59,033][1651669] Updated weights for policy 0, policy_version 291382 (0.0012) [2024-06-15 15:02:00,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 46421.3, 300 sec: 46877.4). Total num frames: 596770816. Throughput: 0: 11980.8. Samples: 149261824. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:02:00,767][1648981] Avg episode reward: [(0, '322.120')] [2024-06-15 15:02:01,426][1651669] Updated weights for policy 0, policy_version 291413 (0.0014) [2024-06-15 15:02:02,362][1651669] Updated weights for policy 0, policy_version 291456 (0.0013) [2024-06-15 15:02:03,766][1651669] Updated weights for policy 0, policy_version 291520 (0.0010) [2024-06-15 15:02:05,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 48056.7, 300 sec: 47096.5). Total num frames: 597032960. Throughput: 0: 12129.8. Samples: 149341696. Policy #0 lag: (min: 12.0, avg: 97.3, max: 268.0) [2024-06-15 15:02:05,771][1648981] Avg episode reward: [(0, '322.540')] [2024-06-15 15:02:07,526][1651669] Updated weights for policy 0, policy_version 291584 (0.0011) [2024-06-15 15:02:09,748][1651669] Updated weights for policy 0, policy_version 291638 (0.0016) [2024-06-15 15:02:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47210.7). Total num frames: 597295104. Throughput: 0: 12015.9. Samples: 149375488. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:10,767][1648981] Avg episode reward: [(0, '347.480')] [2024-06-15 15:02:11,908][1651669] Updated weights for policy 0, policy_version 291666 (0.0012) [2024-06-15 15:02:13,355][1651669] Updated weights for policy 0, policy_version 291728 (0.0012) [2024-06-15 15:02:15,766][1648981] Fps is (10 sec: 52448.5, 60 sec: 48059.7, 300 sec: 47097.0). Total num frames: 597557248. Throughput: 0: 11992.2. Samples: 149442048. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:15,767][1648981] Avg episode reward: [(0, '340.800')] [2024-06-15 15:02:17,101][1651274] Signal inference workers to stop experience collection... (15300 times) [2024-06-15 15:02:17,109][1651669] Updated weights for policy 0, policy_version 291777 (0.0020) [2024-06-15 15:02:17,161][1651669] InferenceWorker_p0-w0: stopping experience collection (15300 times) [2024-06-15 15:02:17,365][1651274] Signal inference workers to resume experience collection... (15300 times) [2024-06-15 15:02:17,365][1651669] InferenceWorker_p0-w0: resuming experience collection (15300 times) [2024-06-15 15:02:18,281][1651669] Updated weights for policy 0, policy_version 291834 (0.0041) [2024-06-15 15:02:20,770][1648981] Fps is (10 sec: 49133.6, 60 sec: 47516.8, 300 sec: 47429.8). Total num frames: 597786624. Throughput: 0: 12048.0. Samples: 149516288. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:20,771][1648981] Avg episode reward: [(0, '346.680')] [2024-06-15 15:02:20,895][1651669] Updated weights for policy 0, policy_version 291895 (0.0114) [2024-06-15 15:02:23,447][1651669] Updated weights for policy 0, policy_version 291939 (0.0015) [2024-06-15 15:02:24,930][1651669] Updated weights for policy 0, policy_version 292000 (0.0022) [2024-06-15 15:02:25,612][1651669] Updated weights for policy 0, policy_version 292032 (0.0040) [2024-06-15 15:02:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48608.9, 300 sec: 47097.1). Total num frames: 598081536. Throughput: 0: 12219.8. Samples: 149556224. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:25,767][1648981] Avg episode reward: [(0, '369.960')] [2024-06-15 15:02:29,164][1651669] Updated weights for policy 0, policy_version 292092 (0.0015) [2024-06-15 15:02:30,766][1648981] Fps is (10 sec: 45892.6, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 598245376. Throughput: 0: 12106.0. Samples: 149625344. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:30,767][1648981] Avg episode reward: [(0, '366.710')] [2024-06-15 15:02:31,953][1651669] Updated weights for policy 0, policy_version 292155 (0.0014) [2024-06-15 15:02:33,966][1651669] Updated weights for policy 0, policy_version 292199 (0.0013) [2024-06-15 15:02:35,268][1651669] Updated weights for policy 0, policy_version 292240 (0.0012) [2024-06-15 15:02:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49698.1, 300 sec: 46878.6). Total num frames: 598540288. Throughput: 0: 12094.6. Samples: 149698048. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:35,767][1648981] Avg episode reward: [(0, '391.540')] [2024-06-15 15:02:36,260][1651669] Updated weights for policy 0, policy_version 292288 (0.0014) [2024-06-15 15:02:39,162][1651669] Updated weights for policy 0, policy_version 292342 (0.0096) [2024-06-15 15:02:40,800][1648981] Fps is (10 sec: 48986.0, 60 sec: 48032.9, 300 sec: 47426.1). Total num frames: 598736896. Throughput: 0: 12290.1. Samples: 149742080. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:40,801][1648981] Avg episode reward: [(0, '399.500')] [2024-06-15 15:02:41,584][1651669] Updated weights for policy 0, policy_version 292384 (0.0012) [2024-06-15 15:02:44,223][1651669] Updated weights for policy 0, policy_version 292455 (0.0013) [2024-06-15 15:02:44,641][1651669] Updated weights for policy 0, policy_version 292473 (0.0013) [2024-06-15 15:02:45,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 49698.1, 300 sec: 46986.0). Total num frames: 599031808. Throughput: 0: 12288.0. Samples: 149814784. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:45,767][1648981] Avg episode reward: [(0, '415.670')] [2024-06-15 15:02:46,164][1651669] Updated weights for policy 0, policy_version 292528 (0.0013) [2024-06-15 15:02:49,392][1651669] Updated weights for policy 0, policy_version 292592 (0.0015) [2024-06-15 15:02:50,766][1648981] Fps is (10 sec: 52606.7, 60 sec: 49698.1, 300 sec: 47541.4). Total num frames: 599261184. Throughput: 0: 12289.0. Samples: 149894656. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:50,767][1648981] Avg episode reward: [(0, '432.630')] [2024-06-15 15:02:52,355][1651669] Updated weights for policy 0, policy_version 292667 (0.0013) [2024-06-15 15:02:54,553][1651669] Updated weights for policy 0, policy_version 292705 (0.0012) [2024-06-15 15:02:55,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 50790.2, 300 sec: 47208.1). Total num frames: 599556096. Throughput: 0: 12367.6. Samples: 149932032. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:02:55,767][1648981] Avg episode reward: [(0, '411.950')] [2024-06-15 15:02:56,210][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000292768_599588864.pth... [2024-06-15 15:02:56,224][1651669] Updated weights for policy 0, policy_version 292768 (0.0020) [2024-06-15 15:02:56,363][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000287248_588283904.pth [2024-06-15 15:02:59,696][1651274] Signal inference workers to stop experience collection... (15350 times) [2024-06-15 15:02:59,752][1651669] InferenceWorker_p0-w0: stopping experience collection (15350 times) [2024-06-15 15:02:59,979][1651274] Signal inference workers to resume experience collection... (15350 times) [2024-06-15 15:02:59,980][1651669] InferenceWorker_p0-w0: resuming experience collection (15350 times) [2024-06-15 15:03:00,716][1651669] Updated weights for policy 0, policy_version 292848 (0.0013) [2024-06-15 15:03:00,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49698.1, 300 sec: 47434.7). Total num frames: 599752704. Throughput: 0: 12515.6. Samples: 150005248. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:03:00,767][1648981] Avg episode reward: [(0, '405.970')] [2024-06-15 15:03:02,476][1651669] Updated weights for policy 0, policy_version 292888 (0.0011) [2024-06-15 15:03:03,348][1651669] Updated weights for policy 0, policy_version 292928 (0.0012) [2024-06-15 15:03:05,236][1651669] Updated weights for policy 0, policy_version 292983 (0.0013) [2024-06-15 15:03:05,766][1648981] Fps is (10 sec: 49153.3, 60 sec: 50247.5, 300 sec: 47323.2). Total num frames: 600047616. Throughput: 0: 12482.5. Samples: 150077952. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:03:05,767][1648981] Avg episode reward: [(0, '409.250')] [2024-06-15 15:03:06,570][1651669] Updated weights for policy 0, policy_version 293027 (0.0011) [2024-06-15 15:03:10,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 600178688. Throughput: 0: 12367.6. Samples: 150112768. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:03:10,767][1648981] Avg episode reward: [(0, '421.350')] [2024-06-15 15:03:11,103][1651669] Updated weights for policy 0, policy_version 293088 (0.0013) [2024-06-15 15:03:12,881][1651669] Updated weights for policy 0, policy_version 293123 (0.0012) [2024-06-15 15:03:15,068][1651669] Updated weights for policy 0, policy_version 293185 (0.0100) [2024-06-15 15:03:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48606.0, 300 sec: 47208.2). Total num frames: 600473600. Throughput: 0: 12526.9. Samples: 150189056. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:03:15,767][1648981] Avg episode reward: [(0, '422.320')] [2024-06-15 15:03:16,949][1651669] Updated weights for policy 0, policy_version 293265 (0.0013) [2024-06-15 15:03:17,766][1651669] Updated weights for policy 0, policy_version 293304 (0.0011) [2024-06-15 15:03:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48608.9, 300 sec: 47097.1). Total num frames: 600702976. Throughput: 0: 12492.8. Samples: 150260224. Policy #0 lag: (min: 6.0, avg: 129.1, max: 262.0) [2024-06-15 15:03:20,767][1648981] Avg episode reward: [(0, '391.090')] [2024-06-15 15:03:22,307][1651669] Updated weights for policy 0, policy_version 293360 (0.0019) [2024-06-15 15:03:25,041][1651669] Updated weights for policy 0, policy_version 293424 (0.0098) [2024-06-15 15:03:25,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 600965120. Throughput: 0: 12263.1. Samples: 150293504. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:25,767][1648981] Avg episode reward: [(0, '415.800')] [2024-06-15 15:03:26,551][1651669] Updated weights for policy 0, policy_version 293472 (0.0011) [2024-06-15 15:03:27,979][1651669] Updated weights for policy 0, policy_version 293508 (0.0016) [2024-06-15 15:03:30,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 49697.9, 300 sec: 47099.9). Total num frames: 601227264. Throughput: 0: 12117.3. Samples: 150360064. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:30,767][1648981] Avg episode reward: [(0, '427.140')] [2024-06-15 15:03:32,304][1651669] Updated weights for policy 0, policy_version 293570 (0.0101) [2024-06-15 15:03:35,336][1651669] Updated weights for policy 0, policy_version 293650 (0.0083) [2024-06-15 15:03:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 601423872. Throughput: 0: 11969.4. Samples: 150433280. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:35,767][1648981] Avg episode reward: [(0, '418.650')] [2024-06-15 15:03:37,525][1651669] Updated weights for policy 0, policy_version 293712 (0.0013) [2024-06-15 15:03:38,623][1651669] Updated weights for policy 0, policy_version 293756 (0.0087) [2024-06-15 15:03:40,209][1651669] Updated weights for policy 0, policy_version 293813 (0.0014) [2024-06-15 15:03:40,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50272.5, 300 sec: 47097.6). Total num frames: 601751552. Throughput: 0: 11969.5. Samples: 150470656. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:40,767][1648981] Avg episode reward: [(0, '387.260')] [2024-06-15 15:03:43,499][1651274] Signal inference workers to stop experience collection... (15400 times) [2024-06-15 15:03:43,551][1651669] InferenceWorker_p0-w0: stopping experience collection (15400 times) [2024-06-15 15:03:43,697][1651274] Signal inference workers to resume experience collection... (15400 times) [2024-06-15 15:03:43,698][1651669] InferenceWorker_p0-w0: resuming experience collection (15400 times) [2024-06-15 15:03:44,091][1651669] Updated weights for policy 0, policy_version 293872 (0.0022) [2024-06-15 15:03:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 601915392. Throughput: 0: 12094.6. Samples: 150549504. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:45,767][1648981] Avg episode reward: [(0, '381.280')] [2024-06-15 15:03:45,773][1651669] Updated weights for policy 0, policy_version 293920 (0.0027) [2024-06-15 15:03:48,031][1651669] Updated weights for policy 0, policy_version 293953 (0.0030) [2024-06-15 15:03:49,124][1651669] Updated weights for policy 0, policy_version 294010 (0.0012) [2024-06-15 15:03:50,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.2, 300 sec: 47541.4). Total num frames: 602243072. Throughput: 0: 11946.7. Samples: 150615552. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:50,767][1648981] Avg episode reward: [(0, '386.930')] [2024-06-15 15:03:50,791][1651669] Updated weights for policy 0, policy_version 294073 (0.0019) [2024-06-15 15:03:54,720][1651669] Updated weights for policy 0, policy_version 294128 (0.0014) [2024-06-15 15:03:55,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 602439680. Throughput: 0: 12140.1. Samples: 150659072. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:03:55,767][1648981] Avg episode reward: [(0, '377.750')] [2024-06-15 15:03:56,386][1651669] Updated weights for policy 0, policy_version 294192 (0.0012) [2024-06-15 15:03:58,825][1651669] Updated weights for policy 0, policy_version 294210 (0.0012) [2024-06-15 15:04:00,263][1651669] Updated weights for policy 0, policy_version 294263 (0.0014) [2024-06-15 15:04:00,776][1648981] Fps is (10 sec: 42556.0, 60 sec: 48597.8, 300 sec: 47650.8). Total num frames: 602669056. Throughput: 0: 12069.1. Samples: 150732288. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:00,777][1648981] Avg episode reward: [(0, '379.360')] [2024-06-15 15:04:01,612][1651669] Updated weights for policy 0, policy_version 294307 (0.0014) [2024-06-15 15:04:02,130][1651669] Updated weights for policy 0, policy_version 294335 (0.0012) [2024-06-15 15:04:05,608][1651669] Updated weights for policy 0, policy_version 294400 (0.0012) [2024-06-15 15:04:05,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.6, 300 sec: 47544.1). Total num frames: 602931200. Throughput: 0: 12094.6. Samples: 150804480. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:05,767][1648981] Avg episode reward: [(0, '379.010')] [2024-06-15 15:04:09,607][1651669] Updated weights for policy 0, policy_version 294480 (0.0012) [2024-06-15 15:04:10,766][1648981] Fps is (10 sec: 52481.0, 60 sec: 50244.4, 300 sec: 47985.7). Total num frames: 603193344. Throughput: 0: 12140.1. Samples: 150839808. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:10,767][1648981] Avg episode reward: [(0, '396.040')] [2024-06-15 15:04:10,779][1651669] Updated weights for policy 0, policy_version 294528 (0.0034) [2024-06-15 15:04:13,106][1651669] Updated weights for policy 0, policy_version 294583 (0.0048) [2024-06-15 15:04:15,467][1651669] Updated weights for policy 0, policy_version 294611 (0.0013) [2024-06-15 15:04:15,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 48605.9, 300 sec: 47319.2). Total num frames: 603389952. Throughput: 0: 12413.2. Samples: 150918656. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:15,767][1648981] Avg episode reward: [(0, '394.580')] [2024-06-15 15:04:17,134][1651669] Updated weights for policy 0, policy_version 294688 (0.0118) [2024-06-15 15:04:20,314][1651669] Updated weights for policy 0, policy_version 294738 (0.0014) [2024-06-15 15:04:20,772][1648981] Fps is (10 sec: 45847.4, 60 sec: 49147.0, 300 sec: 48206.8). Total num frames: 603652096. Throughput: 0: 12309.1. Samples: 150987264. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:20,773][1648981] Avg episode reward: [(0, '383.050')] [2024-06-15 15:04:21,438][1651669] Updated weights for policy 0, policy_version 294783 (0.0011) [2024-06-15 15:04:23,881][1651669] Updated weights for policy 0, policy_version 294839 (0.0013) [2024-06-15 15:04:25,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 47541.9). Total num frames: 603848704. Throughput: 0: 12219.8. Samples: 151020544. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:25,767][1648981] Avg episode reward: [(0, '384.170')] [2024-06-15 15:04:27,107][1651274] Signal inference workers to stop experience collection... (15450 times) [2024-06-15 15:04:27,137][1651669] InferenceWorker_p0-w0: stopping experience collection (15450 times) [2024-06-15 15:04:27,347][1651274] Signal inference workers to resume experience collection... (15450 times) [2024-06-15 15:04:27,348][1651669] InferenceWorker_p0-w0: resuming experience collection (15450 times) [2024-06-15 15:04:27,351][1651669] Updated weights for policy 0, policy_version 294896 (0.0014) [2024-06-15 15:04:28,973][1651669] Updated weights for policy 0, policy_version 294973 (0.0015) [2024-06-15 15:04:30,767][1648981] Fps is (10 sec: 45901.6, 60 sec: 48059.7, 300 sec: 47985.6). Total num frames: 604110848. Throughput: 0: 12060.4. Samples: 151092224. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:30,768][1648981] Avg episode reward: [(0, '399.570')] [2024-06-15 15:04:32,475][1651669] Updated weights for policy 0, policy_version 295026 (0.0013) [2024-06-15 15:04:33,311][1651669] Updated weights for policy 0, policy_version 295057 (0.0016) [2024-06-15 15:04:35,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 604372992. Throughput: 0: 12197.0. Samples: 151164416. Policy #0 lag: (min: 20.0, avg: 126.2, max: 276.0) [2024-06-15 15:04:35,767][1648981] Avg episode reward: [(0, '419.060')] [2024-06-15 15:04:37,262][1651669] Updated weights for policy 0, policy_version 295120 (0.0018) [2024-06-15 15:04:38,728][1651669] Updated weights for policy 0, policy_version 295184 (0.0012) [2024-06-15 15:04:39,781][1651669] Updated weights for policy 0, policy_version 295227 (0.0012) [2024-06-15 15:04:40,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 48059.8, 300 sec: 47987.0). Total num frames: 604635136. Throughput: 0: 12219.8. Samples: 151208960. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:04:40,767][1648981] Avg episode reward: [(0, '420.650')] [2024-06-15 15:04:42,212][1651669] Updated weights for policy 0, policy_version 295269 (0.0013) [2024-06-15 15:04:43,261][1651669] Updated weights for policy 0, policy_version 295299 (0.0021) [2024-06-15 15:04:44,605][1651669] Updated weights for policy 0, policy_version 295353 (0.0013) [2024-06-15 15:04:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 604897280. Throughput: 0: 12120.0. Samples: 151277568. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:04:45,767][1648981] Avg episode reward: [(0, '421.280')] [2024-06-15 15:04:48,644][1651669] Updated weights for policy 0, policy_version 295394 (0.0012) [2024-06-15 15:04:50,134][1651669] Updated weights for policy 0, policy_version 295472 (0.0012) [2024-06-15 15:04:50,767][1648981] Fps is (10 sec: 52426.0, 60 sec: 48605.5, 300 sec: 47985.6). Total num frames: 605159424. Throughput: 0: 12231.0. Samples: 151354880. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:04:50,768][1648981] Avg episode reward: [(0, '423.740')] [2024-06-15 15:04:52,612][1651669] Updated weights for policy 0, policy_version 295536 (0.0017) [2024-06-15 15:04:55,644][1651669] Updated weights for policy 0, policy_version 295600 (0.0016) [2024-06-15 15:04:55,768][1648981] Fps is (10 sec: 49141.5, 60 sec: 49150.4, 300 sec: 48321.8). Total num frames: 605388800. Throughput: 0: 12230.5. Samples: 151390208. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:04:55,769][1648981] Avg episode reward: [(0, '385.370')] [2024-06-15 15:04:55,987][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000295616_605421568.pth... [2024-06-15 15:04:56,033][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000289920_593756160.pth [2024-06-15 15:04:59,241][1651669] Updated weights for policy 0, policy_version 295648 (0.0014) [2024-06-15 15:05:00,469][1651669] Updated weights for policy 0, policy_version 295696 (0.0013) [2024-06-15 15:05:00,780][1648981] Fps is (10 sec: 42544.3, 60 sec: 48603.2, 300 sec: 47650.3). Total num frames: 605585408. Throughput: 0: 12056.9. Samples: 151461376. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:00,780][1648981] Avg episode reward: [(0, '368.870')] [2024-06-15 15:05:02,728][1651669] Updated weights for policy 0, policy_version 295760 (0.0015) [2024-06-15 15:05:05,133][1651669] Updated weights for policy 0, policy_version 295814 (0.0120) [2024-06-15 15:05:05,766][1648981] Fps is (10 sec: 49162.2, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 605880320. Throughput: 0: 12164.5. Samples: 151534592. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:05,767][1648981] Avg episode reward: [(0, '374.600')] [2024-06-15 15:05:06,151][1651669] Updated weights for policy 0, policy_version 295868 (0.0012) [2024-06-15 15:05:08,913][1651274] Signal inference workers to stop experience collection... (15500 times) [2024-06-15 15:05:08,941][1651669] InferenceWorker_p0-w0: stopping experience collection (15500 times) [2024-06-15 15:05:09,255][1651274] Signal inference workers to resume experience collection... (15500 times) [2024-06-15 15:05:09,256][1651669] InferenceWorker_p0-w0: resuming experience collection (15500 times) [2024-06-15 15:05:09,995][1651669] Updated weights for policy 0, policy_version 295929 (0.0036) [2024-06-15 15:05:10,766][1648981] Fps is (10 sec: 52498.2, 60 sec: 48605.9, 300 sec: 48098.1). Total num frames: 606109696. Throughput: 0: 12390.4. Samples: 151578112. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:10,767][1648981] Avg episode reward: [(0, '371.310')] [2024-06-15 15:05:11,353][1651669] Updated weights for policy 0, policy_version 295984 (0.0015) [2024-06-15 15:05:13,290][1651669] Updated weights for policy 0, policy_version 296023 (0.0115) [2024-06-15 15:05:15,779][1648981] Fps is (10 sec: 45816.7, 60 sec: 49141.5, 300 sec: 48427.9). Total num frames: 606339072. Throughput: 0: 12330.1. Samples: 151647232. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:15,780][1648981] Avg episode reward: [(0, '369.310')] [2024-06-15 15:05:15,827][1651669] Updated weights for policy 0, policy_version 296070 (0.0017) [2024-06-15 15:05:19,432][1651669] Updated weights for policy 0, policy_version 296130 (0.0130) [2024-06-15 15:05:20,603][1651669] Updated weights for policy 0, policy_version 296188 (0.0014) [2024-06-15 15:05:20,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49157.0, 300 sec: 48430.0). Total num frames: 606601216. Throughput: 0: 12515.6. Samples: 151727616. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:20,767][1648981] Avg episode reward: [(0, '357.960')] [2024-06-15 15:05:21,936][1651669] Updated weights for policy 0, policy_version 296254 (0.0013) [2024-06-15 15:05:24,468][1651669] Updated weights for policy 0, policy_version 296304 (0.0013) [2024-06-15 15:05:25,766][1648981] Fps is (10 sec: 52496.1, 60 sec: 50244.2, 300 sec: 48433.3). Total num frames: 606863360. Throughput: 0: 12299.4. Samples: 151762432. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:25,767][1648981] Avg episode reward: [(0, '354.010')] [2024-06-15 15:05:26,940][1651669] Updated weights for policy 0, policy_version 296369 (0.0013) [2024-06-15 15:05:30,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48606.1, 300 sec: 48430.0). Total num frames: 607027200. Throughput: 0: 12492.8. Samples: 151839744. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:30,767][1648981] Avg episode reward: [(0, '374.960')] [2024-06-15 15:05:31,313][1651669] Updated weights for policy 0, policy_version 296418 (0.0012) [2024-06-15 15:05:32,705][1651669] Updated weights for policy 0, policy_version 296482 (0.0011) [2024-06-15 15:05:34,665][1651669] Updated weights for policy 0, policy_version 296546 (0.0030) [2024-06-15 15:05:35,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 607387648. Throughput: 0: 12185.7. Samples: 151903232. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:35,767][1648981] Avg episode reward: [(0, '386.750')] [2024-06-15 15:05:37,147][1651669] Updated weights for policy 0, policy_version 296592 (0.0014) [2024-06-15 15:05:38,409][1651669] Updated weights for policy 0, policy_version 296638 (0.0037) [2024-06-15 15:05:40,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 48059.5, 300 sec: 48541.0). Total num frames: 607518720. Throughput: 0: 12117.8. Samples: 151935488. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:40,768][1648981] Avg episode reward: [(0, '365.840')] [2024-06-15 15:05:43,092][1651669] Updated weights for policy 0, policy_version 296698 (0.0012) [2024-06-15 15:05:44,303][1651669] Updated weights for policy 0, policy_version 296756 (0.0016) [2024-06-15 15:05:45,312][1651669] Updated weights for policy 0, policy_version 296790 (0.0034) [2024-06-15 15:05:45,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 49151.8, 300 sec: 48541.0). Total num frames: 607846400. Throughput: 0: 12325.7. Samples: 152015872. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:45,767][1648981] Avg episode reward: [(0, '396.050')] [2024-06-15 15:05:48,206][1651669] Updated weights for policy 0, policy_version 296852 (0.0012) [2024-06-15 15:05:48,496][1651274] Signal inference workers to stop experience collection... (15550 times) [2024-06-15 15:05:48,573][1651669] InferenceWorker_p0-w0: stopping experience collection (15550 times) [2024-06-15 15:05:48,919][1651274] Signal inference workers to resume experience collection... (15550 times) [2024-06-15 15:05:48,919][1651669] InferenceWorker_p0-w0: resuming experience collection (15550 times) [2024-06-15 15:05:50,767][1648981] Fps is (10 sec: 52425.4, 60 sec: 48059.4, 300 sec: 48652.0). Total num frames: 608043008. Throughput: 0: 12151.2. Samples: 152081408. Policy #0 lag: (min: 11.0, avg: 96.4, max: 267.0) [2024-06-15 15:05:50,768][1648981] Avg episode reward: [(0, '398.080')] [2024-06-15 15:05:52,889][1651669] Updated weights for policy 0, policy_version 296898 (0.0012) [2024-06-15 15:05:54,348][1651669] Updated weights for policy 0, policy_version 296968 (0.0013) [2024-06-15 15:05:55,606][1651669] Updated weights for policy 0, policy_version 297020 (0.0100) [2024-06-15 15:05:55,772][1648981] Fps is (10 sec: 45850.1, 60 sec: 48603.0, 300 sec: 48540.1). Total num frames: 608305152. Throughput: 0: 12172.7. Samples: 152125952. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:05:55,773][1648981] Avg episode reward: [(0, '433.840')] [2024-06-15 15:05:57,072][1651669] Updated weights for policy 0, policy_version 297083 (0.0013) [2024-06-15 15:05:59,939][1651669] Updated weights for policy 0, policy_version 297141 (0.0012) [2024-06-15 15:06:00,768][1648981] Fps is (10 sec: 52425.3, 60 sec: 49707.8, 300 sec: 48874.0). Total num frames: 608567296. Throughput: 0: 12063.4. Samples: 152189952. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:00,769][1648981] Avg episode reward: [(0, '429.680')] [2024-06-15 15:06:04,510][1651669] Updated weights for policy 0, policy_version 297168 (0.0013) [2024-06-15 15:06:05,766][1648981] Fps is (10 sec: 39343.8, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 608698368. Throughput: 0: 11980.8. Samples: 152266752. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:05,767][1648981] Avg episode reward: [(0, '423.340')] [2024-06-15 15:06:05,960][1651669] Updated weights for policy 0, policy_version 297232 (0.0013) [2024-06-15 15:06:07,484][1651669] Updated weights for policy 0, policy_version 297285 (0.0015) [2024-06-15 15:06:08,812][1651669] Updated weights for policy 0, policy_version 297344 (0.0012) [2024-06-15 15:06:10,766][1648981] Fps is (10 sec: 45882.6, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 609026048. Throughput: 0: 11787.4. Samples: 152292864. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:10,767][1648981] Avg episode reward: [(0, '425.690')] [2024-06-15 15:06:11,109][1651669] Updated weights for policy 0, policy_version 297400 (0.0013) [2024-06-15 15:06:15,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45885.0, 300 sec: 47986.9). Total num frames: 609091584. Throughput: 0: 11821.5. Samples: 152371712. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:15,767][1648981] Avg episode reward: [(0, '407.910')] [2024-06-15 15:06:16,435][1651669] Updated weights for policy 0, policy_version 297456 (0.0013) [2024-06-15 15:06:17,779][1651669] Updated weights for policy 0, policy_version 297506 (0.0013) [2024-06-15 15:06:19,787][1651669] Updated weights for policy 0, policy_version 297584 (0.0013) [2024-06-15 15:06:20,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 48541.7). Total num frames: 609484800. Throughput: 0: 11730.5. Samples: 152431104. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:20,767][1648981] Avg episode reward: [(0, '407.550')] [2024-06-15 15:06:21,817][1651669] Updated weights for policy 0, policy_version 297634 (0.0012) [2024-06-15 15:06:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 48207.8). Total num frames: 609615872. Throughput: 0: 11821.6. Samples: 152467456. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:25,767][1648981] Avg episode reward: [(0, '388.890')] [2024-06-15 15:06:27,865][1651669] Updated weights for policy 0, policy_version 297680 (0.0014) [2024-06-15 15:06:29,536][1651669] Updated weights for policy 0, policy_version 297744 (0.0011) [2024-06-15 15:06:30,413][1651274] Signal inference workers to stop experience collection... (15600 times) [2024-06-15 15:06:30,467][1651669] InferenceWorker_p0-w0: stopping experience collection (15600 times) [2024-06-15 15:06:30,601][1651274] Signal inference workers to resume experience collection... (15600 times) [2024-06-15 15:06:30,603][1651669] InferenceWorker_p0-w0: resuming experience collection (15600 times) [2024-06-15 15:06:30,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 609878016. Throughput: 0: 11696.4. Samples: 152542208. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:30,767][1648981] Avg episode reward: [(0, '368.630')] [2024-06-15 15:06:30,788][1651669] Updated weights for policy 0, policy_version 297794 (0.0014) [2024-06-15 15:06:32,177][1651669] Updated weights for policy 0, policy_version 297846 (0.0012) [2024-06-15 15:06:33,389][1651669] Updated weights for policy 0, policy_version 297904 (0.0013) [2024-06-15 15:06:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 610140160. Throughput: 0: 11764.9. Samples: 152610816. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:35,767][1648981] Avg episode reward: [(0, '372.980')] [2024-06-15 15:06:38,354][1651669] Updated weights for policy 0, policy_version 297936 (0.0014) [2024-06-15 15:06:39,643][1651669] Updated weights for policy 0, policy_version 297987 (0.0044) [2024-06-15 15:06:40,786][1648981] Fps is (10 sec: 49054.8, 60 sec: 47498.1, 300 sec: 48537.8). Total num frames: 610369536. Throughput: 0: 11726.8. Samples: 152653824. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:40,787][1648981] Avg episode reward: [(0, '356.430')] [2024-06-15 15:06:40,937][1651669] Updated weights for policy 0, policy_version 298048 (0.0013) [2024-06-15 15:06:42,204][1651669] Updated weights for policy 0, policy_version 298096 (0.0010) [2024-06-15 15:06:43,623][1651669] Updated weights for policy 0, policy_version 298148 (0.0012) [2024-06-15 15:06:45,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 46967.5, 300 sec: 48763.2). Total num frames: 610664448. Throughput: 0: 11708.1. Samples: 152716800. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:45,767][1648981] Avg episode reward: [(0, '331.290')] [2024-06-15 15:06:49,590][1651669] Updated weights for policy 0, policy_version 298193 (0.0014) [2024-06-15 15:06:50,766][1648981] Fps is (10 sec: 42683.4, 60 sec: 45876.0, 300 sec: 48430.0). Total num frames: 610795520. Throughput: 0: 11821.5. Samples: 152798720. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:50,767][1648981] Avg episode reward: [(0, '325.930')] [2024-06-15 15:06:50,928][1651669] Updated weights for policy 0, policy_version 298256 (0.0017) [2024-06-15 15:06:52,150][1651669] Updated weights for policy 0, policy_version 298305 (0.0012) [2024-06-15 15:06:53,565][1651669] Updated weights for policy 0, policy_version 298368 (0.0011) [2024-06-15 15:06:55,041][1651669] Updated weights for policy 0, policy_version 298423 (0.0015) [2024-06-15 15:06:55,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48063.9, 300 sec: 48874.2). Total num frames: 611188736. Throughput: 0: 11866.9. Samples: 152826880. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:06:55,767][1648981] Avg episode reward: [(0, '337.790')] [2024-06-15 15:06:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000298432_611188736.pth... [2024-06-15 15:06:55,818][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000292768_599588864.pth [2024-06-15 15:07:00,658][1651669] Updated weights for policy 0, policy_version 298464 (0.0014) [2024-06-15 15:07:00,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 44784.2, 300 sec: 48208.5). Total num frames: 611254272. Throughput: 0: 11867.1. Samples: 152905728. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:07:00,767][1648981] Avg episode reward: [(0, '335.160')] [2024-06-15 15:07:02,276][1651669] Updated weights for policy 0, policy_version 298529 (0.0013) [2024-06-15 15:07:03,556][1651669] Updated weights for policy 0, policy_version 298595 (0.0013) [2024-06-15 15:07:04,268][1651274] Signal inference workers to stop experience collection... (15650 times) [2024-06-15 15:07:04,319][1651669] InferenceWorker_p0-w0: stopping experience collection (15650 times) [2024-06-15 15:07:04,490][1651274] Signal inference workers to resume experience collection... (15650 times) [2024-06-15 15:07:04,491][1651669] InferenceWorker_p0-w0: resuming experience collection (15650 times) [2024-06-15 15:07:05,517][1651669] Updated weights for policy 0, policy_version 298678 (0.0013) [2024-06-15 15:07:05,775][1648981] Fps is (10 sec: 52388.3, 60 sec: 50237.5, 300 sec: 48873.0). Total num frames: 611713024. Throughput: 0: 11864.9. Samples: 152965120. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:07:05,775][1648981] Avg episode reward: [(0, '355.590')] [2024-06-15 15:07:10,768][1648981] Fps is (10 sec: 45865.4, 60 sec: 44781.4, 300 sec: 47985.4). Total num frames: 611713024. Throughput: 0: 12105.4. Samples: 153012224. Policy #0 lag: (min: 2.0, avg: 80.0, max: 258.0) [2024-06-15 15:07:10,769][1648981] Avg episode reward: [(0, '371.780')] [2024-06-15 15:07:11,171][1651669] Updated weights for policy 0, policy_version 298690 (0.0032) [2024-06-15 15:07:12,559][1651669] Updated weights for policy 0, policy_version 298745 (0.0108) [2024-06-15 15:07:13,579][1651669] Updated weights for policy 0, policy_version 298793 (0.0012) [2024-06-15 15:07:15,205][1651669] Updated weights for policy 0, policy_version 298864 (0.0012) [2024-06-15 15:07:15,766][1648981] Fps is (10 sec: 39353.0, 60 sec: 50244.2, 300 sec: 48541.7). Total num frames: 612106240. Throughput: 0: 12094.5. Samples: 153086464. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:15,767][1648981] Avg episode reward: [(0, '372.450')] [2024-06-15 15:07:16,834][1651669] Updated weights for policy 0, policy_version 298933 (0.0119) [2024-06-15 15:07:20,766][1648981] Fps is (10 sec: 52439.3, 60 sec: 45875.1, 300 sec: 47985.7). Total num frames: 612237312. Throughput: 0: 12356.3. Samples: 153166848. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:20,767][1648981] Avg episode reward: [(0, '352.920')] [2024-06-15 15:07:22,307][1651669] Updated weights for policy 0, policy_version 298965 (0.0012) [2024-06-15 15:07:24,450][1651669] Updated weights for policy 0, policy_version 299058 (0.0012) [2024-06-15 15:07:25,702][1651669] Updated weights for policy 0, policy_version 299121 (0.0024) [2024-06-15 15:07:25,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49698.2, 300 sec: 48652.1). Total num frames: 612597760. Throughput: 0: 12202.3. Samples: 153202688. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:25,767][1648981] Avg episode reward: [(0, '354.280')] [2024-06-15 15:07:27,128][1651669] Updated weights for policy 0, policy_version 299194 (0.0012) [2024-06-15 15:07:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 612761600. Throughput: 0: 12333.6. Samples: 153271808. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:30,767][1648981] Avg episode reward: [(0, '348.180')] [2024-06-15 15:07:33,260][1651669] Updated weights for policy 0, policy_version 299232 (0.0036) [2024-06-15 15:07:34,851][1651669] Updated weights for policy 0, policy_version 299296 (0.0011) [2024-06-15 15:07:35,774][1648981] Fps is (10 sec: 42565.5, 60 sec: 48053.5, 300 sec: 48434.3). Total num frames: 613023744. Throughput: 0: 12160.7. Samples: 153346048. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:35,775][1648981] Avg episode reward: [(0, '362.560')] [2024-06-15 15:07:36,157][1651669] Updated weights for policy 0, policy_version 299345 (0.0019) [2024-06-15 15:07:37,289][1651669] Updated weights for policy 0, policy_version 299408 (0.0045) [2024-06-15 15:07:40,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48621.9, 300 sec: 48318.9). Total num frames: 613285888. Throughput: 0: 12276.7. Samples: 153379328. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:40,767][1648981] Avg episode reward: [(0, '336.980')] [2024-06-15 15:07:43,023][1651669] Updated weights for policy 0, policy_version 299457 (0.0014) [2024-06-15 15:07:43,739][1651274] Signal inference workers to stop experience collection... (15700 times) [2024-06-15 15:07:43,790][1651669] InferenceWorker_p0-w0: stopping experience collection (15700 times) [2024-06-15 15:07:43,949][1651274] Signal inference workers to resume experience collection... (15700 times) [2024-06-15 15:07:43,950][1651669] InferenceWorker_p0-w0: resuming experience collection (15700 times) [2024-06-15 15:07:43,952][1651669] Updated weights for policy 0, policy_version 299504 (0.0030) [2024-06-15 15:07:45,753][1651669] Updated weights for policy 0, policy_version 299569 (0.0113) [2024-06-15 15:07:45,766][1648981] Fps is (10 sec: 49190.1, 60 sec: 47513.7, 300 sec: 48318.9). Total num frames: 613515264. Throughput: 0: 12310.7. Samples: 153459712. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:45,767][1648981] Avg episode reward: [(0, '349.340')] [2024-06-15 15:07:47,329][1651669] Updated weights for policy 0, policy_version 299638 (0.0013) [2024-06-15 15:07:50,776][1648981] Fps is (10 sec: 52378.8, 60 sec: 50236.1, 300 sec: 48317.4). Total num frames: 613810176. Throughput: 0: 12344.5. Samples: 153520640. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:50,777][1648981] Avg episode reward: [(0, '332.800')] [2024-06-15 15:07:54,894][1651669] Updated weights for policy 0, policy_version 299719 (0.0015) [2024-06-15 15:07:55,766][1648981] Fps is (10 sec: 36044.9, 60 sec: 44783.2, 300 sec: 47874.6). Total num frames: 613875712. Throughput: 0: 12254.4. Samples: 153563648. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:07:55,767][1648981] Avg episode reward: [(0, '326.090')] [2024-06-15 15:07:56,458][1651669] Updated weights for policy 0, policy_version 299777 (0.0014) [2024-06-15 15:07:58,235][1651669] Updated weights for policy 0, policy_version 299847 (0.0020) [2024-06-15 15:07:59,999][1651669] Updated weights for policy 0, policy_version 299924 (0.0013) [2024-06-15 15:08:00,767][1648981] Fps is (10 sec: 49198.7, 60 sec: 50790.2, 300 sec: 48318.9). Total num frames: 614301696. Throughput: 0: 11844.3. Samples: 153619456. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:08:00,768][1648981] Avg episode reward: [(0, '341.130')] [2024-06-15 15:08:05,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 43696.6, 300 sec: 47985.7). Total num frames: 614334464. Throughput: 0: 11741.9. Samples: 153695232. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:08:05,767][1648981] Avg episode reward: [(0, '356.010')] [2024-06-15 15:08:06,953][1651669] Updated weights for policy 0, policy_version 299984 (0.0015) [2024-06-15 15:08:08,758][1651669] Updated weights for policy 0, policy_version 300048 (0.0013) [2024-06-15 15:08:10,766][1648981] Fps is (10 sec: 32768.4, 60 sec: 48607.5, 300 sec: 47985.7). Total num frames: 614629376. Throughput: 0: 11650.9. Samples: 153726976. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:08:10,767][1648981] Avg episode reward: [(0, '342.770')] [2024-06-15 15:08:11,084][1651669] Updated weights for policy 0, policy_version 300132 (0.0013) [2024-06-15 15:08:12,973][1651669] Updated weights for policy 0, policy_version 300216 (0.0013) [2024-06-15 15:08:15,785][1648981] Fps is (10 sec: 52331.9, 60 sec: 45861.1, 300 sec: 47982.7). Total num frames: 614858752. Throughput: 0: 11293.5. Samples: 153780224. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:08:15,786][1648981] Avg episode reward: [(0, '334.870')] [2024-06-15 15:08:19,480][1651669] Updated weights for policy 0, policy_version 300272 (0.0093) [2024-06-15 15:08:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 615055360. Throughput: 0: 11311.5. Samples: 153854976. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:08:20,767][1648981] Avg episode reward: [(0, '350.550')] [2024-06-15 15:08:21,092][1651669] Updated weights for policy 0, policy_version 300336 (0.0011) [2024-06-15 15:08:21,197][1651274] Signal inference workers to stop experience collection... (15750 times) [2024-06-15 15:08:21,253][1651669] InferenceWorker_p0-w0: stopping experience collection (15750 times) [2024-06-15 15:08:21,481][1651274] Signal inference workers to resume experience collection... (15750 times) [2024-06-15 15:08:21,482][1651669] InferenceWorker_p0-w0: resuming experience collection (15750 times) [2024-06-15 15:08:23,376][1651669] Updated weights for policy 0, policy_version 300421 (0.0137) [2024-06-15 15:08:24,375][1651669] Updated weights for policy 0, policy_version 300474 (0.0029) [2024-06-15 15:08:25,767][1648981] Fps is (10 sec: 52522.6, 60 sec: 46420.8, 300 sec: 47985.6). Total num frames: 615383040. Throughput: 0: 11138.7. Samples: 153880576. Policy #0 lag: (min: 15.0, avg: 66.9, max: 271.0) [2024-06-15 15:08:25,769][1648981] Avg episode reward: [(0, '330.500')] [2024-06-15 15:08:30,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 44782.9, 300 sec: 47541.4). Total num frames: 615448576. Throughput: 0: 11298.1. Samples: 153968128. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:08:30,767][1648981] Avg episode reward: [(0, '358.010')] [2024-06-15 15:08:30,874][1651669] Updated weights for policy 0, policy_version 300514 (0.0016) [2024-06-15 15:08:32,596][1651669] Updated weights for policy 0, policy_version 300577 (0.0011) [2024-06-15 15:08:34,292][1651669] Updated weights for policy 0, policy_version 300646 (0.0013) [2024-06-15 15:08:35,777][1648981] Fps is (10 sec: 45827.5, 60 sec: 46964.9, 300 sec: 47761.7). Total num frames: 615841792. Throughput: 0: 11058.8. Samples: 154018304. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:08:35,778][1648981] Avg episode reward: [(0, '354.160')] [2024-06-15 15:08:36,123][1651669] Updated weights for policy 0, policy_version 300736 (0.0015) [2024-06-15 15:08:40,790][1648981] Fps is (10 sec: 45768.4, 60 sec: 43673.7, 300 sec: 47426.5). Total num frames: 615907328. Throughput: 0: 10917.0. Samples: 154055168. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:08:40,790][1648981] Avg episode reward: [(0, '358.350')] [2024-06-15 15:08:42,659][1651669] Updated weights for policy 0, policy_version 300800 (0.0012) [2024-06-15 15:08:44,236][1651669] Updated weights for policy 0, policy_version 300851 (0.0014) [2024-06-15 15:08:45,770][1648981] Fps is (10 sec: 39350.0, 60 sec: 45326.1, 300 sec: 47429.7). Total num frames: 616235008. Throughput: 0: 11251.7. Samples: 154125824. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:08:45,771][1648981] Avg episode reward: [(0, '351.420')] [2024-06-15 15:08:46,299][1651669] Updated weights for policy 0, policy_version 300928 (0.0025) [2024-06-15 15:08:50,766][1648981] Fps is (10 sec: 52551.2, 60 sec: 43697.7, 300 sec: 47430.3). Total num frames: 616431616. Throughput: 0: 10979.6. Samples: 154189312. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:08:50,767][1648981] Avg episode reward: [(0, '348.250')] [2024-06-15 15:08:54,272][1651669] Updated weights for policy 0, policy_version 301024 (0.0081) [2024-06-15 15:08:55,767][1648981] Fps is (10 sec: 36058.0, 60 sec: 45328.9, 300 sec: 47209.7). Total num frames: 616595456. Throughput: 0: 11195.7. Samples: 154230784. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:08:55,767][1648981] Avg episode reward: [(0, '349.100')] [2024-06-15 15:08:56,403][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000301104_616660992.pth... [2024-06-15 15:08:56,514][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000295616_605421568.pth [2024-06-15 15:08:56,620][1651669] Updated weights for policy 0, policy_version 301107 (0.0012) [2024-06-15 15:08:58,247][1651669] Updated weights for policy 0, policy_version 301171 (0.0012) [2024-06-15 15:08:58,544][1651274] Signal inference workers to stop experience collection... (15800 times) [2024-06-15 15:08:58,625][1651669] InferenceWorker_p0-w0: stopping experience collection (15800 times) [2024-06-15 15:08:58,828][1651274] Signal inference workers to resume experience collection... (15800 times) [2024-06-15 15:08:58,829][1651669] InferenceWorker_p0-w0: resuming experience collection (15800 times) [2024-06-15 15:08:59,691][1651669] Updated weights for policy 0, policy_version 301248 (0.0013) [2024-06-15 15:09:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 44236.9, 300 sec: 47541.4). Total num frames: 616955904. Throughput: 0: 11166.2. Samples: 154282496. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:00,767][1648981] Avg episode reward: [(0, '342.050')] [2024-06-15 15:09:05,766][1648981] Fps is (10 sec: 39322.5, 60 sec: 44236.8, 300 sec: 46763.8). Total num frames: 616988672. Throughput: 0: 11343.6. Samples: 154365440. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:05,767][1648981] Avg episode reward: [(0, '358.090')] [2024-06-15 15:09:06,743][1651669] Updated weights for policy 0, policy_version 301312 (0.0013) [2024-06-15 15:09:08,162][1651669] Updated weights for policy 0, policy_version 301363 (0.0032) [2024-06-15 15:09:09,894][1651669] Updated weights for policy 0, policy_version 301432 (0.0016) [2024-06-15 15:09:10,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 617381888. Throughput: 0: 11332.4. Samples: 154390528. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:10,767][1648981] Avg episode reward: [(0, '348.590')] [2024-06-15 15:09:11,248][1651669] Updated weights for policy 0, policy_version 301488 (0.0014) [2024-06-15 15:09:15,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 43704.2, 300 sec: 46875.9). Total num frames: 617480192. Throughput: 0: 11059.2. Samples: 154465792. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:15,767][1648981] Avg episode reward: [(0, '361.320')] [2024-06-15 15:09:17,017][1651669] Updated weights for policy 0, policy_version 301552 (0.0112) [2024-06-15 15:09:18,545][1651669] Updated weights for policy 0, policy_version 301616 (0.0012) [2024-06-15 15:09:20,250][1651669] Updated weights for policy 0, policy_version 301680 (0.0012) [2024-06-15 15:09:20,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 47541.3). Total num frames: 617873408. Throughput: 0: 11221.2. Samples: 154523136. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:20,767][1648981] Avg episode reward: [(0, '350.970')] [2024-06-15 15:09:21,375][1651669] Updated weights for policy 0, policy_version 301732 (0.0011) [2024-06-15 15:09:25,768][1648981] Fps is (10 sec: 52419.0, 60 sec: 43689.8, 300 sec: 47096.8). Total num frames: 618004480. Throughput: 0: 11394.6. Samples: 154567680. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:25,769][1648981] Avg episode reward: [(0, '364.590')] [2024-06-15 15:09:27,388][1651669] Updated weights for policy 0, policy_version 301761 (0.0013) [2024-06-15 15:09:29,429][1651669] Updated weights for policy 0, policy_version 301842 (0.0013) [2024-06-15 15:09:30,774][1648981] Fps is (10 sec: 39291.2, 60 sec: 46961.4, 300 sec: 47095.8). Total num frames: 618266624. Throughput: 0: 11433.7. Samples: 154640384. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:30,774][1648981] Avg episode reward: [(0, '340.360')] [2024-06-15 15:09:30,963][1651669] Updated weights for policy 0, policy_version 301906 (0.0014) [2024-06-15 15:09:32,679][1651669] Updated weights for policy 0, policy_version 301984 (0.0042) [2024-06-15 15:09:35,766][1648981] Fps is (10 sec: 52438.7, 60 sec: 44791.2, 300 sec: 47097.1). Total num frames: 618528768. Throughput: 0: 11685.0. Samples: 154715136. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:35,767][1648981] Avg episode reward: [(0, '323.720')] [2024-06-15 15:09:38,058][1651669] Updated weights for policy 0, policy_version 302018 (0.0031) [2024-06-15 15:09:38,779][1651274] Signal inference workers to stop experience collection... (15850 times) [2024-06-15 15:09:38,812][1651669] InferenceWorker_p0-w0: stopping experience collection (15850 times) [2024-06-15 15:09:39,061][1651274] Signal inference workers to resume experience collection... (15850 times) [2024-06-15 15:09:39,071][1651669] InferenceWorker_p0-w0: resuming experience collection (15850 times) [2024-06-15 15:09:40,351][1651669] Updated weights for policy 0, policy_version 302112 (0.0013) [2024-06-15 15:09:40,766][1648981] Fps is (10 sec: 45910.7, 60 sec: 46985.7, 300 sec: 46874.9). Total num frames: 618725376. Throughput: 0: 11605.4. Samples: 154753024. Policy #0 lag: (min: 2.0, avg: 47.0, max: 258.0) [2024-06-15 15:09:40,767][1648981] Avg episode reward: [(0, '326.040')] [2024-06-15 15:09:41,996][1651669] Updated weights for policy 0, policy_version 302162 (0.0050) [2024-06-15 15:09:43,539][1651669] Updated weights for policy 0, policy_version 302224 (0.0044) [2024-06-15 15:09:44,506][1651669] Updated weights for policy 0, policy_version 302267 (0.0014) [2024-06-15 15:09:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 46970.5, 300 sec: 47097.1). Total num frames: 619053056. Throughput: 0: 11559.8. Samples: 154802688. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:09:45,767][1648981] Avg episode reward: [(0, '338.620')] [2024-06-15 15:09:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45329.1, 300 sec: 46653.1). Total num frames: 619151360. Throughput: 0: 11559.8. Samples: 154885632. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:09:50,767][1648981] Avg episode reward: [(0, '362.840')] [2024-06-15 15:09:51,029][1651669] Updated weights for policy 0, policy_version 302337 (0.0015) [2024-06-15 15:09:52,387][1651669] Updated weights for policy 0, policy_version 302389 (0.0022) [2024-06-15 15:09:54,011][1651669] Updated weights for policy 0, policy_version 302455 (0.0011) [2024-06-15 15:09:55,649][1651669] Updated weights for policy 0, policy_version 302512 (0.0013) [2024-06-15 15:09:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49152.1, 300 sec: 47321.3). Total num frames: 619544576. Throughput: 0: 11616.7. Samples: 154913280. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:09:55,767][1648981] Avg episode reward: [(0, '356.760')] [2024-06-15 15:10:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 619610112. Throughput: 0: 11764.6. Samples: 154995200. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:00,767][1648981] Avg episode reward: [(0, '369.150')] [2024-06-15 15:10:01,045][1651669] Updated weights for policy 0, policy_version 302565 (0.0015) [2024-06-15 15:10:02,621][1651669] Updated weights for policy 0, policy_version 302640 (0.0014) [2024-06-15 15:10:04,220][1651669] Updated weights for policy 0, policy_version 302704 (0.0013) [2024-06-15 15:10:05,675][1651669] Updated weights for policy 0, policy_version 302738 (0.0013) [2024-06-15 15:10:05,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 50244.3, 300 sec: 47097.1). Total num frames: 620003328. Throughput: 0: 11924.0. Samples: 155059712. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:05,767][1648981] Avg episode reward: [(0, '366.620')] [2024-06-15 15:10:06,649][1651669] Updated weights for policy 0, policy_version 302781 (0.0013) [2024-06-15 15:10:10,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 45329.0, 300 sec: 46654.7). Total num frames: 620101632. Throughput: 0: 11844.7. Samples: 155100672. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:10,767][1648981] Avg episode reward: [(0, '359.830')] [2024-06-15 15:10:12,326][1651669] Updated weights for policy 0, policy_version 302848 (0.0017) [2024-06-15 15:10:13,773][1651669] Updated weights for policy 0, policy_version 302901 (0.0123) [2024-06-15 15:10:14,412][1651274] Signal inference workers to stop experience collection... (15900 times) [2024-06-15 15:10:14,446][1651669] InferenceWorker_p0-w0: stopping experience collection (15900 times) [2024-06-15 15:10:14,645][1651274] Signal inference workers to resume experience collection... (15900 times) [2024-06-15 15:10:14,650][1651669] InferenceWorker_p0-w0: resuming experience collection (15900 times) [2024-06-15 15:10:15,766][1648981] Fps is (10 sec: 49151.0, 60 sec: 50244.2, 300 sec: 47097.0). Total num frames: 620494848. Throughput: 0: 11846.3. Samples: 155173376. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:15,767][1648981] Avg episode reward: [(0, '381.890')] [2024-06-15 15:10:16,381][1651669] Updated weights for policy 0, policy_version 302992 (0.0014) [2024-06-15 15:10:17,635][1651669] Updated weights for policy 0, policy_version 303034 (0.0010) [2024-06-15 15:10:20,775][1648981] Fps is (10 sec: 52383.4, 60 sec: 45868.5, 300 sec: 46651.3). Total num frames: 620625920. Throughput: 0: 11750.9. Samples: 155244032. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:20,776][1648981] Avg episode reward: [(0, '372.510')] [2024-06-15 15:10:23,949][1651669] Updated weights for policy 0, policy_version 303105 (0.0013) [2024-06-15 15:10:25,794][1648981] Fps is (10 sec: 42481.0, 60 sec: 48584.9, 300 sec: 47092.6). Total num frames: 620920832. Throughput: 0: 11757.4. Samples: 155282432. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:25,795][1648981] Avg episode reward: [(0, '376.560')] [2024-06-15 15:10:25,818][1651669] Updated weights for policy 0, policy_version 303187 (0.0014) [2024-06-15 15:10:28,333][1651669] Updated weights for policy 0, policy_version 303264 (0.0013) [2024-06-15 15:10:30,766][1648981] Fps is (10 sec: 52475.2, 60 sec: 48066.0, 300 sec: 46652.8). Total num frames: 621150208. Throughput: 0: 11980.8. Samples: 155341824. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:30,767][1648981] Avg episode reward: [(0, '378.000')] [2024-06-15 15:10:33,892][1651669] Updated weights for policy 0, policy_version 303313 (0.0012) [2024-06-15 15:10:35,666][1651669] Updated weights for policy 0, policy_version 303392 (0.0160) [2024-06-15 15:10:35,766][1648981] Fps is (10 sec: 42717.0, 60 sec: 46967.5, 300 sec: 46875.0). Total num frames: 621346816. Throughput: 0: 11844.3. Samples: 155418624. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:35,767][1648981] Avg episode reward: [(0, '377.440')] [2024-06-15 15:10:37,552][1651669] Updated weights for policy 0, policy_version 303472 (0.0146) [2024-06-15 15:10:39,945][1651669] Updated weights for policy 0, policy_version 303525 (0.0014) [2024-06-15 15:10:40,786][1648981] Fps is (10 sec: 52324.6, 60 sec: 49135.7, 300 sec: 46871.8). Total num frames: 621674496. Throughput: 0: 11884.5. Samples: 155448320. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:40,787][1648981] Avg episode reward: [(0, '374.170')] [2024-06-15 15:10:44,755][1651669] Updated weights for policy 0, policy_version 303568 (0.0045) [2024-06-15 15:10:45,787][1648981] Fps is (10 sec: 42509.1, 60 sec: 45313.2, 300 sec: 46538.5). Total num frames: 621772800. Throughput: 0: 11975.2. Samples: 155534336. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:45,788][1648981] Avg episode reward: [(0, '383.540')] [2024-06-15 15:10:46,723][1651669] Updated weights for policy 0, policy_version 303648 (0.0014) [2024-06-15 15:10:48,638][1651669] Updated weights for policy 0, policy_version 303728 (0.0013) [2024-06-15 15:10:50,081][1651669] Updated weights for policy 0, policy_version 303746 (0.0013) [2024-06-15 15:10:50,768][1648981] Fps is (10 sec: 45957.0, 60 sec: 49696.4, 300 sec: 46875.5). Total num frames: 622133248. Throughput: 0: 11832.3. Samples: 155592192. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:50,769][1648981] Avg episode reward: [(0, '362.370')] [2024-06-15 15:10:51,236][1651669] Updated weights for policy 0, policy_version 303807 (0.0013) [2024-06-15 15:10:55,770][1648981] Fps is (10 sec: 45954.1, 60 sec: 44780.1, 300 sec: 46319.2). Total num frames: 622231552. Throughput: 0: 11797.8. Samples: 155631616. Policy #0 lag: (min: 43.0, avg: 205.2, max: 319.0) [2024-06-15 15:10:55,771][1648981] Avg episode reward: [(0, '365.430')] [2024-06-15 15:10:56,238][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000303856_622297088.pth... [2024-06-15 15:10:56,407][1651274] Signal inference workers to stop experience collection... (15950 times) [2024-06-15 15:10:56,448][1651669] InferenceWorker_p0-w0: stopping experience collection (15950 times) [2024-06-15 15:10:56,471][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000298432_611188736.pth [2024-06-15 15:10:56,476][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000303856_622297088.pth [2024-06-15 15:10:56,715][1651274] Signal inference workers to resume experience collection... (15950 times) [2024-06-15 15:10:56,716][1651669] InferenceWorker_p0-w0: resuming experience collection (15950 times) [2024-06-15 15:10:56,870][1651669] Updated weights for policy 0, policy_version 303873 (0.0012) [2024-06-15 15:10:59,089][1651669] Updated weights for policy 0, policy_version 303954 (0.0169) [2024-06-15 15:11:00,111][1651669] Updated weights for policy 0, policy_version 304000 (0.0042) [2024-06-15 15:11:00,767][1648981] Fps is (10 sec: 45883.9, 60 sec: 49698.0, 300 sec: 47097.0). Total num frames: 622592000. Throughput: 0: 11548.4. Samples: 155693056. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:00,767][1648981] Avg episode reward: [(0, '363.540')] [2024-06-15 15:11:05,766][1648981] Fps is (10 sec: 49171.0, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 622723072. Throughput: 0: 11732.8. Samples: 155771904. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:05,767][1648981] Avg episode reward: [(0, '356.560')] [2024-06-15 15:11:07,348][1651669] Updated weights for policy 0, policy_version 304081 (0.0012) [2024-06-15 15:11:08,635][1651669] Updated weights for policy 0, policy_version 304131 (0.0028) [2024-06-15 15:11:10,121][1651669] Updated weights for policy 0, policy_version 304193 (0.0013) [2024-06-15 15:11:10,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 49151.8, 300 sec: 47319.2). Total num frames: 623050752. Throughput: 0: 11612.4. Samples: 155804672. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:10,767][1648981] Avg episode reward: [(0, '342.600')] [2024-06-15 15:11:13,358][1651669] Updated weights for policy 0, policy_version 304288 (0.0015) [2024-06-15 15:11:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 623247360. Throughput: 0: 11787.4. Samples: 155872256. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:15,767][1648981] Avg episode reward: [(0, '344.830')] [2024-06-15 15:11:18,174][1651669] Updated weights for policy 0, policy_version 304352 (0.0011) [2024-06-15 15:11:19,637][1651669] Updated weights for policy 0, policy_version 304404 (0.0076) [2024-06-15 15:11:20,761][1651669] Updated weights for policy 0, policy_version 304451 (0.0013) [2024-06-15 15:11:20,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 48066.8, 300 sec: 47097.1). Total num frames: 623509504. Throughput: 0: 11673.6. Samples: 155943936. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:20,767][1648981] Avg episode reward: [(0, '338.100')] [2024-06-15 15:11:22,059][1651669] Updated weights for policy 0, policy_version 304512 (0.0116) [2024-06-15 15:11:24,528][1651669] Updated weights for policy 0, policy_version 304573 (0.0012) [2024-06-15 15:11:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 47535.6, 300 sec: 47097.1). Total num frames: 623771648. Throughput: 0: 11826.7. Samples: 155980288. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:25,767][1648981] Avg episode reward: [(0, '311.370')] [2024-06-15 15:11:30,578][1651669] Updated weights for policy 0, policy_version 304640 (0.0094) [2024-06-15 15:11:30,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 623902720. Throughput: 0: 11690.4. Samples: 156060160. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:30,767][1648981] Avg episode reward: [(0, '289.440')] [2024-06-15 15:11:32,632][1651669] Updated weights for policy 0, policy_version 304720 (0.0013) [2024-06-15 15:11:35,211][1651274] Signal inference workers to stop experience collection... (16000 times) [2024-06-15 15:11:35,286][1651669] InferenceWorker_p0-w0: stopping experience collection (16000 times) [2024-06-15 15:11:35,409][1651274] Signal inference workers to resume experience collection... (16000 times) [2024-06-15 15:11:35,413][1651669] InferenceWorker_p0-w0: resuming experience collection (16000 times) [2024-06-15 15:11:35,415][1651669] Updated weights for policy 0, policy_version 304784 (0.0030) [2024-06-15 15:11:35,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 46989.1). Total num frames: 624230400. Throughput: 0: 11651.4. Samples: 156116480. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:35,767][1648981] Avg episode reward: [(0, '299.460')] [2024-06-15 15:11:36,135][1651669] Updated weights for policy 0, policy_version 304826 (0.0017) [2024-06-15 15:11:40,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 44251.5, 300 sec: 46319.5). Total num frames: 624328704. Throughput: 0: 11708.7. Samples: 156158464. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:40,767][1648981] Avg episode reward: [(0, '315.880')] [2024-06-15 15:11:42,040][1651669] Updated weights for policy 0, policy_version 304896 (0.0013) [2024-06-15 15:11:44,355][1651669] Updated weights for policy 0, policy_version 304980 (0.0101) [2024-06-15 15:11:45,778][1648981] Fps is (10 sec: 45820.9, 60 sec: 48613.2, 300 sec: 47095.1). Total num frames: 624689152. Throughput: 0: 11625.1. Samples: 156216320. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:45,779][1648981] Avg episode reward: [(0, '315.350')] [2024-06-15 15:11:46,901][1651669] Updated weights for policy 0, policy_version 305027 (0.0011) [2024-06-15 15:11:48,011][1651669] Updated weights for policy 0, policy_version 305088 (0.0012) [2024-06-15 15:11:50,781][1648981] Fps is (10 sec: 49080.0, 60 sec: 44773.5, 300 sec: 46206.2). Total num frames: 624820224. Throughput: 0: 11612.9. Samples: 156294656. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:50,782][1648981] Avg episode reward: [(0, '314.550')] [2024-06-15 15:11:53,740][1651669] Updated weights for policy 0, policy_version 305152 (0.0080) [2024-06-15 15:11:55,458][1651669] Updated weights for policy 0, policy_version 305219 (0.0117) [2024-06-15 15:11:55,766][1648981] Fps is (10 sec: 42649.3, 60 sec: 48062.8, 300 sec: 46986.0). Total num frames: 625115136. Throughput: 0: 11605.4. Samples: 156326912. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:11:55,767][1648981] Avg episode reward: [(0, '332.570')] [2024-06-15 15:11:56,748][1651669] Updated weights for policy 0, policy_version 305279 (0.0079) [2024-06-15 15:11:59,368][1651669] Updated weights for policy 0, policy_version 305333 (0.0093) [2024-06-15 15:12:00,766][1648981] Fps is (10 sec: 52505.7, 60 sec: 45875.3, 300 sec: 46209.7). Total num frames: 625344512. Throughput: 0: 11548.4. Samples: 156391936. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:12:00,767][1648981] Avg episode reward: [(0, '338.430')] [2024-06-15 15:12:04,778][1651669] Updated weights for policy 0, policy_version 305413 (0.0113) [2024-06-15 15:12:05,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46967.5, 300 sec: 46875.2). Total num frames: 625541120. Throughput: 0: 11491.6. Samples: 156461056. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:12:05,767][1648981] Avg episode reward: [(0, '349.950')] [2024-06-15 15:12:06,074][1651669] Updated weights for policy 0, policy_version 305463 (0.0021) [2024-06-15 15:12:07,556][1651669] Updated weights for policy 0, policy_version 305520 (0.0096) [2024-06-15 15:12:09,661][1651669] Updated weights for policy 0, policy_version 305568 (0.0012) [2024-06-15 15:12:10,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46967.7, 300 sec: 46652.8). Total num frames: 625868800. Throughput: 0: 11559.8. Samples: 156500480. Policy #0 lag: (min: 94.0, avg: 169.4, max: 367.0) [2024-06-15 15:12:10,767][1648981] Avg episode reward: [(0, '350.740')] [2024-06-15 15:12:14,248][1651669] Updated weights for policy 0, policy_version 305616 (0.0015) [2024-06-15 15:12:15,767][1648981] Fps is (10 sec: 49149.4, 60 sec: 46420.9, 300 sec: 46763.8). Total num frames: 626032640. Throughput: 0: 11673.5. Samples: 156585472. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:15,768][1648981] Avg episode reward: [(0, '352.910')] [2024-06-15 15:12:16,201][1651274] Signal inference workers to stop experience collection... (16050 times) [2024-06-15 15:12:16,244][1651669] InferenceWorker_p0-w0: stopping experience collection (16050 times) [2024-06-15 15:12:16,478][1651274] Signal inference workers to resume experience collection... (16050 times) [2024-06-15 15:12:16,479][1651669] InferenceWorker_p0-w0: resuming experience collection (16050 times) [2024-06-15 15:12:16,482][1651669] Updated weights for policy 0, policy_version 305712 (0.0192) [2024-06-15 15:12:17,789][1651669] Updated weights for policy 0, policy_version 305760 (0.0014) [2024-06-15 15:12:18,532][1651669] Updated weights for policy 0, policy_version 305790 (0.0012) [2024-06-15 15:12:20,776][1648981] Fps is (10 sec: 52379.0, 60 sec: 48052.1, 300 sec: 46762.3). Total num frames: 626393088. Throughput: 0: 11750.8. Samples: 156645376. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:20,777][1648981] Avg episode reward: [(0, '381.230')] [2024-06-15 15:12:25,232][1651669] Updated weights for policy 0, policy_version 305858 (0.0014) [2024-06-15 15:12:25,766][1648981] Fps is (10 sec: 42600.2, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 626458624. Throughput: 0: 11696.3. Samples: 156684800. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:25,767][1648981] Avg episode reward: [(0, '368.360')] [2024-06-15 15:12:26,226][1651669] Updated weights for policy 0, policy_version 305907 (0.0017) [2024-06-15 15:12:27,625][1651669] Updated weights for policy 0, policy_version 305968 (0.0013) [2024-06-15 15:12:28,960][1651669] Updated weights for policy 0, policy_version 306017 (0.0032) [2024-06-15 15:12:30,777][1648981] Fps is (10 sec: 39315.4, 60 sec: 48050.8, 300 sec: 46652.2). Total num frames: 626786304. Throughput: 0: 11935.5. Samples: 156753408. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:30,780][1648981] Avg episode reward: [(0, '374.580')] [2024-06-15 15:12:31,031][1651669] Updated weights for policy 0, policy_version 306065 (0.0012) [2024-06-15 15:12:32,069][1651669] Updated weights for policy 0, policy_version 306107 (0.0010) [2024-06-15 15:12:35,782][1648981] Fps is (10 sec: 45803.3, 60 sec: 44771.2, 300 sec: 46206.0). Total num frames: 626917376. Throughput: 0: 11912.3. Samples: 156830720. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:35,783][1648981] Avg episode reward: [(0, '360.310')] [2024-06-15 15:12:37,433][1651669] Updated weights for policy 0, policy_version 306180 (0.0014) [2024-06-15 15:12:39,268][1651669] Updated weights for policy 0, policy_version 306256 (0.0035) [2024-06-15 15:12:40,368][1651669] Updated weights for policy 0, policy_version 306303 (0.0015) [2024-06-15 15:12:40,766][1648981] Fps is (10 sec: 52486.7, 60 sec: 49698.1, 300 sec: 46763.8). Total num frames: 627310592. Throughput: 0: 11832.9. Samples: 156859392. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:40,767][1648981] Avg episode reward: [(0, '366.310')] [2024-06-15 15:12:43,131][1651669] Updated weights for policy 0, policy_version 306361 (0.0013) [2024-06-15 15:12:45,802][1648981] Fps is (10 sec: 52325.4, 60 sec: 45857.2, 300 sec: 46204.4). Total num frames: 627441664. Throughput: 0: 12107.8. Samples: 156937216. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:45,802][1648981] Avg episode reward: [(0, '356.620')] [2024-06-15 15:12:48,140][1651669] Updated weights for policy 0, policy_version 306421 (0.0013) [2024-06-15 15:12:49,148][1651669] Updated weights for policy 0, policy_version 306470 (0.0020) [2024-06-15 15:12:50,549][1651669] Updated weights for policy 0, policy_version 306534 (0.0014) [2024-06-15 15:12:50,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49710.3, 300 sec: 47208.1). Total num frames: 627802112. Throughput: 0: 12128.7. Samples: 157006848. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:50,767][1648981] Avg episode reward: [(0, '367.370')] [2024-06-15 15:12:52,504][1651669] Updated weights for policy 0, policy_version 306576 (0.0013) [2024-06-15 15:12:55,767][1648981] Fps is (10 sec: 52614.6, 60 sec: 47513.4, 300 sec: 46319.5). Total num frames: 627965952. Throughput: 0: 12117.3. Samples: 157045760. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:12:55,767][1648981] Avg episode reward: [(0, '353.130')] [2024-06-15 15:12:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000306624_627965952.pth... [2024-06-15 15:12:55,852][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000301104_616660992.pth [2024-06-15 15:12:56,818][1651669] Updated weights for policy 0, policy_version 306626 (0.0109) [2024-06-15 15:12:57,574][1651274] Signal inference workers to stop experience collection... (16100 times) [2024-06-15 15:12:57,666][1651669] InferenceWorker_p0-w0: stopping experience collection (16100 times) [2024-06-15 15:12:57,782][1651274] Signal inference workers to resume experience collection... (16100 times) [2024-06-15 15:12:57,786][1651669] InferenceWorker_p0-w0: resuming experience collection (16100 times) [2024-06-15 15:12:58,504][1651669] Updated weights for policy 0, policy_version 306704 (0.0138) [2024-06-15 15:13:00,636][1651669] Updated weights for policy 0, policy_version 306792 (0.0013) [2024-06-15 15:13:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 49698.3, 300 sec: 47430.3). Total num frames: 628326400. Throughput: 0: 11969.6. Samples: 157124096. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:13:00,767][1648981] Avg episode reward: [(0, '364.640')] [2024-06-15 15:13:03,459][1651669] Updated weights for policy 0, policy_version 306837 (0.0011) [2024-06-15 15:13:04,284][1651669] Updated weights for policy 0, policy_version 306875 (0.0014) [2024-06-15 15:13:05,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 49152.0, 300 sec: 46986.0). Total num frames: 628490240. Throughput: 0: 12415.8. Samples: 157203968. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:13:05,767][1648981] Avg episode reward: [(0, '351.810')] [2024-06-15 15:13:07,741][1651669] Updated weights for policy 0, policy_version 306916 (0.0012) [2024-06-15 15:13:09,265][1651669] Updated weights for policy 0, policy_version 306977 (0.0013) [2024-06-15 15:13:10,776][1648981] Fps is (10 sec: 45832.0, 60 sec: 48598.3, 300 sec: 47209.6). Total num frames: 628785152. Throughput: 0: 12410.6. Samples: 157243392. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:13:10,776][1648981] Avg episode reward: [(0, '361.080')] [2024-06-15 15:13:11,182][1651669] Updated weights for policy 0, policy_version 307040 (0.0013) [2024-06-15 15:13:14,578][1651669] Updated weights for policy 0, policy_version 307111 (0.0016) [2024-06-15 15:13:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.5, 300 sec: 47319.2). Total num frames: 629014528. Throughput: 0: 12336.5. Samples: 157308416. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:13:15,767][1648981] Avg episode reward: [(0, '370.580')] [2024-06-15 15:13:17,936][1651669] Updated weights for policy 0, policy_version 307169 (0.0027) [2024-06-15 15:13:19,622][1651669] Updated weights for policy 0, policy_version 307235 (0.0014) [2024-06-15 15:13:20,766][1648981] Fps is (10 sec: 49198.2, 60 sec: 48067.4, 300 sec: 47097.2). Total num frames: 629276672. Throughput: 0: 12315.1. Samples: 157384704. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:13:20,767][1648981] Avg episode reward: [(0, '352.880')] [2024-06-15 15:13:22,964][1651669] Updated weights for policy 0, policy_version 307297 (0.0014) [2024-06-15 15:13:24,990][1651669] Updated weights for policy 0, policy_version 307361 (0.0013) [2024-06-15 15:13:25,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 51336.2, 300 sec: 47763.5). Total num frames: 629538816. Throughput: 0: 12435.8. Samples: 157419008. Policy #0 lag: (min: 6.0, avg: 72.3, max: 262.0) [2024-06-15 15:13:25,768][1648981] Avg episode reward: [(0, '345.640')] [2024-06-15 15:13:28,409][1651669] Updated weights for policy 0, policy_version 307424 (0.0016) [2024-06-15 15:13:29,856][1651669] Updated weights for policy 0, policy_version 307475 (0.0012) [2024-06-15 15:13:30,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 49707.3, 300 sec: 47209.9). Total num frames: 629768192. Throughput: 0: 12514.0. Samples: 157499904. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:13:30,767][1648981] Avg episode reward: [(0, '352.260')] [2024-06-15 15:13:30,768][1651669] Updated weights for policy 0, policy_version 307519 (0.0015) [2024-06-15 15:13:33,795][1651669] Updated weights for policy 0, policy_version 307577 (0.0028) [2024-06-15 15:13:35,362][1651669] Updated weights for policy 0, policy_version 307632 (0.0031) [2024-06-15 15:13:35,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 52442.5, 300 sec: 47989.5). Total num frames: 630063104. Throughput: 0: 12424.5. Samples: 157565952. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:13:35,767][1648981] Avg episode reward: [(0, '363.280')] [2024-06-15 15:13:38,071][1651274] Signal inference workers to stop experience collection... (16150 times) [2024-06-15 15:13:38,093][1651669] InferenceWorker_p0-w0: stopping experience collection (16150 times) [2024-06-15 15:13:38,275][1651274] Signal inference workers to resume experience collection... (16150 times) [2024-06-15 15:13:38,275][1651669] InferenceWorker_p0-w0: resuming experience collection (16150 times) [2024-06-15 15:13:38,855][1651669] Updated weights for policy 0, policy_version 307687 (0.0102) [2024-06-15 15:13:40,709][1651669] Updated weights for policy 0, policy_version 307730 (0.0010) [2024-06-15 15:13:40,777][1648981] Fps is (10 sec: 45828.7, 60 sec: 48597.7, 300 sec: 47429.3). Total num frames: 630226944. Throughput: 0: 12478.7. Samples: 157607424. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:13:40,777][1648981] Avg episode reward: [(0, '359.620')] [2024-06-15 15:13:41,636][1651669] Updated weights for policy 0, policy_version 307771 (0.0025) [2024-06-15 15:13:44,138][1651669] Updated weights for policy 0, policy_version 307840 (0.0016) [2024-06-15 15:13:45,774][1648981] Fps is (10 sec: 45839.2, 60 sec: 51360.2, 300 sec: 47762.3). Total num frames: 630521856. Throughput: 0: 12411.0. Samples: 157682688. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:13:45,775][1648981] Avg episode reward: [(0, '357.100')] [2024-06-15 15:13:46,323][1651669] Updated weights for policy 0, policy_version 307901 (0.0012) [2024-06-15 15:13:48,916][1651669] Updated weights for policy 0, policy_version 307952 (0.0013) [2024-06-15 15:13:50,767][1648981] Fps is (10 sec: 49199.4, 60 sec: 48605.4, 300 sec: 47874.6). Total num frames: 630718464. Throughput: 0: 12356.1. Samples: 157760000. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:13:50,767][1648981] Avg episode reward: [(0, '337.660')] [2024-06-15 15:13:51,697][1651669] Updated weights for policy 0, policy_version 308002 (0.0012) [2024-06-15 15:13:54,471][1651669] Updated weights for policy 0, policy_version 308064 (0.0013) [2024-06-15 15:13:55,766][1648981] Fps is (10 sec: 49190.7, 60 sec: 50790.5, 300 sec: 47652.4). Total num frames: 631013376. Throughput: 0: 12472.6. Samples: 157804544. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:13:55,767][1648981] Avg episode reward: [(0, '342.750')] [2024-06-15 15:13:56,518][1651669] Updated weights for policy 0, policy_version 308154 (0.0014) [2024-06-15 15:13:59,462][1651669] Updated weights for policy 0, policy_version 308196 (0.0013) [2024-06-15 15:14:00,766][1648981] Fps is (10 sec: 52431.8, 60 sec: 48605.8, 300 sec: 48318.9). Total num frames: 631242752. Throughput: 0: 12435.9. Samples: 157868032. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:00,767][1648981] Avg episode reward: [(0, '325.790')] [2024-06-15 15:14:01,796][1651669] Updated weights for policy 0, policy_version 308241 (0.0024) [2024-06-15 15:14:04,830][1651669] Updated weights for policy 0, policy_version 308304 (0.0013) [2024-06-15 15:14:05,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 49698.1, 300 sec: 47763.5). Total num frames: 631472128. Throughput: 0: 12435.9. Samples: 157944320. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:05,767][1648981] Avg episode reward: [(0, '330.100')] [2024-06-15 15:14:05,933][1651669] Updated weights for policy 0, policy_version 308352 (0.0012) [2024-06-15 15:14:07,467][1651669] Updated weights for policy 0, policy_version 308410 (0.0171) [2024-06-15 15:14:10,440][1651669] Updated weights for policy 0, policy_version 308464 (0.0015) [2024-06-15 15:14:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49705.8, 300 sec: 48430.0). Total num frames: 631767040. Throughput: 0: 12379.1. Samples: 157976064. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:10,767][1648981] Avg episode reward: [(0, '333.070')] [2024-06-15 15:14:12,991][1651669] Updated weights for policy 0, policy_version 308515 (0.0012) [2024-06-15 15:14:15,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 631898112. Throughput: 0: 12288.0. Samples: 158052864. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:15,767][1648981] Avg episode reward: [(0, '345.210')] [2024-06-15 15:14:16,276][1651669] Updated weights for policy 0, policy_version 308563 (0.0013) [2024-06-15 15:14:18,359][1651274] Signal inference workers to stop experience collection... (16200 times) [2024-06-15 15:14:18,409][1651669] InferenceWorker_p0-w0: stopping experience collection (16200 times) [2024-06-15 15:14:18,424][1651669] Updated weights for policy 0, policy_version 308646 (0.0015) [2024-06-15 15:14:18,506][1651274] Signal inference workers to resume experience collection... (16200 times) [2024-06-15 15:14:18,512][1651669] InferenceWorker_p0-w0: resuming experience collection (16200 times) [2024-06-15 15:14:19,966][1651669] Updated weights for policy 0, policy_version 308673 (0.0012) [2024-06-15 15:14:20,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 48208.1). Total num frames: 632225792. Throughput: 0: 12379.0. Samples: 158123008. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:20,767][1648981] Avg episode reward: [(0, '338.730')] [2024-06-15 15:14:21,172][1651669] Updated weights for policy 0, policy_version 308728 (0.0012) [2024-06-15 15:14:24,152][1651669] Updated weights for policy 0, policy_version 308773 (0.0013) [2024-06-15 15:14:25,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48059.9, 300 sec: 47986.9). Total num frames: 632422400. Throughput: 0: 12370.4. Samples: 158163968. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:25,768][1648981] Avg episode reward: [(0, '350.290')] [2024-06-15 15:14:26,711][1651669] Updated weights for policy 0, policy_version 308804 (0.0027) [2024-06-15 15:14:28,187][1651669] Updated weights for policy 0, policy_version 308864 (0.0016) [2024-06-15 15:14:29,537][1651669] Updated weights for policy 0, policy_version 308927 (0.0017) [2024-06-15 15:14:30,767][1648981] Fps is (10 sec: 49149.2, 60 sec: 49151.6, 300 sec: 48096.7). Total num frames: 632717312. Throughput: 0: 12199.0. Samples: 158231552. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:30,767][1648981] Avg episode reward: [(0, '339.630')] [2024-06-15 15:14:31,709][1651669] Updated weights for policy 0, policy_version 308987 (0.0012) [2024-06-15 15:14:35,499][1651669] Updated weights for policy 0, policy_version 309056 (0.0014) [2024-06-15 15:14:35,778][1648981] Fps is (10 sec: 52367.7, 60 sec: 48050.3, 300 sec: 48205.9). Total num frames: 632946688. Throughput: 0: 12171.2. Samples: 158307840. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:35,779][1648981] Avg episode reward: [(0, '337.790')] [2024-06-15 15:14:39,599][1651669] Updated weights for policy 0, policy_version 309137 (0.0036) [2024-06-15 15:14:40,433][1651669] Updated weights for policy 0, policy_version 309184 (0.0013) [2024-06-15 15:14:40,766][1648981] Fps is (10 sec: 49154.6, 60 sec: 49706.6, 300 sec: 47985.7). Total num frames: 633208832. Throughput: 0: 12014.9. Samples: 158345216. Policy #0 lag: (min: 26.0, avg: 122.4, max: 282.0) [2024-06-15 15:14:40,767][1648981] Avg episode reward: [(0, '329.000')] [2024-06-15 15:14:42,767][1651669] Updated weights for policy 0, policy_version 309235 (0.0012) [2024-06-15 15:14:45,766][1648981] Fps is (10 sec: 45929.8, 60 sec: 48066.1, 300 sec: 48318.9). Total num frames: 633405440. Throughput: 0: 12208.4. Samples: 158417408. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:14:45,767][1648981] Avg episode reward: [(0, '337.420')] [2024-06-15 15:14:45,896][1651669] Updated weights for policy 0, policy_version 309282 (0.0013) [2024-06-15 15:14:49,488][1651669] Updated weights for policy 0, policy_version 309345 (0.0038) [2024-06-15 15:14:50,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48606.3, 300 sec: 47763.5). Total num frames: 633634816. Throughput: 0: 11958.1. Samples: 158482432. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:14:50,767][1648981] Avg episode reward: [(0, '353.800')] [2024-06-15 15:14:51,240][1651669] Updated weights for policy 0, policy_version 309424 (0.0014) [2024-06-15 15:14:53,214][1651669] Updated weights for policy 0, policy_version 309475 (0.0014) [2024-06-15 15:14:53,762][1651669] Updated weights for policy 0, policy_version 309502 (0.0011) [2024-06-15 15:14:55,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 633864192. Throughput: 0: 11992.2. Samples: 158515712. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:14:55,767][1648981] Avg episode reward: [(0, '366.010')] [2024-06-15 15:14:56,204][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000309536_633929728.pth... [2024-06-15 15:14:56,351][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000303856_622297088.pth [2024-06-15 15:14:56,810][1651669] Updated weights for policy 0, policy_version 309562 (0.0012) [2024-06-15 15:14:59,957][1651669] Updated weights for policy 0, policy_version 309601 (0.0011) [2024-06-15 15:15:00,673][1651274] Signal inference workers to stop experience collection... (16250 times) [2024-06-15 15:15:00,764][1651669] InferenceWorker_p0-w0: stopping experience collection (16250 times) [2024-06-15 15:15:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 634126336. Throughput: 0: 12288.0. Samples: 158605824. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:00,767][1648981] Avg episode reward: [(0, '375.740')] [2024-06-15 15:15:00,935][1651274] Signal inference workers to resume experience collection... (16250 times) [2024-06-15 15:15:00,936][1651669] InferenceWorker_p0-w0: resuming experience collection (16250 times) [2024-06-15 15:15:01,695][1651669] Updated weights for policy 0, policy_version 309684 (0.0012) [2024-06-15 15:15:03,981][1651669] Updated weights for policy 0, policy_version 309745 (0.0118) [2024-06-15 15:15:05,768][1648981] Fps is (10 sec: 52429.1, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 634388480. Throughput: 0: 12128.7. Samples: 158668800. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:05,770][1648981] Avg episode reward: [(0, '378.210')] [2024-06-15 15:15:07,215][1651669] Updated weights for policy 0, policy_version 309808 (0.0011) [2024-06-15 15:15:10,501][1651669] Updated weights for policy 0, policy_version 309843 (0.0011) [2024-06-15 15:15:10,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46967.6, 300 sec: 47763.6). Total num frames: 634585088. Throughput: 0: 11992.3. Samples: 158703616. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:10,767][1648981] Avg episode reward: [(0, '379.040')] [2024-06-15 15:15:12,094][1651669] Updated weights for policy 0, policy_version 309920 (0.0063) [2024-06-15 15:15:13,746][1651669] Updated weights for policy 0, policy_version 309968 (0.0014) [2024-06-15 15:15:14,909][1651669] Updated weights for policy 0, policy_version 310010 (0.0013) [2024-06-15 15:15:15,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 50241.1, 300 sec: 48430.8). Total num frames: 634912768. Throughput: 0: 12070.9. Samples: 158774784. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:15,771][1648981] Avg episode reward: [(0, '378.000')] [2024-06-15 15:15:18,386][1651669] Updated weights for policy 0, policy_version 310080 (0.0081) [2024-06-15 15:15:20,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 46967.4, 300 sec: 47879.1). Total num frames: 635043840. Throughput: 0: 12086.4. Samples: 158851584. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:20,767][1648981] Avg episode reward: [(0, '385.600')] [2024-06-15 15:15:22,841][1651669] Updated weights for policy 0, policy_version 310145 (0.0015) [2024-06-15 15:15:25,098][1651669] Updated weights for policy 0, policy_version 310240 (0.0013) [2024-06-15 15:15:25,766][1648981] Fps is (10 sec: 49170.7, 60 sec: 49698.3, 300 sec: 48318.9). Total num frames: 635404288. Throughput: 0: 12026.3. Samples: 158886400. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:25,767][1648981] Avg episode reward: [(0, '382.480')] [2024-06-15 15:15:28,025][1651669] Updated weights for policy 0, policy_version 310288 (0.0013) [2024-06-15 15:15:29,005][1651669] Updated weights for policy 0, policy_version 310336 (0.0023) [2024-06-15 15:15:30,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 47514.0, 300 sec: 48207.8). Total num frames: 635568128. Throughput: 0: 11980.8. Samples: 158956544. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:30,767][1648981] Avg episode reward: [(0, '409.130')] [2024-06-15 15:15:33,582][1651669] Updated weights for policy 0, policy_version 310394 (0.0013) [2024-06-15 15:15:34,650][1651669] Updated weights for policy 0, policy_version 310440 (0.0019) [2024-06-15 15:15:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48069.2, 300 sec: 47988.9). Total num frames: 635830272. Throughput: 0: 12162.8. Samples: 159029760. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:35,767][1648981] Avg episode reward: [(0, '418.860')] [2024-06-15 15:15:36,196][1651669] Updated weights for policy 0, policy_version 310496 (0.0014) [2024-06-15 15:15:39,261][1651669] Updated weights for policy 0, policy_version 310560 (0.0014) [2024-06-15 15:15:40,818][1648981] Fps is (10 sec: 52158.0, 60 sec: 48018.2, 300 sec: 48536.0). Total num frames: 636092416. Throughput: 0: 12239.8. Samples: 159067136. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:40,819][1648981] Avg episode reward: [(0, '406.720')] [2024-06-15 15:15:43,674][1651669] Updated weights for policy 0, policy_version 310624 (0.0015) [2024-06-15 15:15:43,775][1651274] Signal inference workers to stop experience collection... (16300 times) [2024-06-15 15:15:43,822][1651669] InferenceWorker_p0-w0: stopping experience collection (16300 times) [2024-06-15 15:15:44,126][1651274] Signal inference workers to resume experience collection... (16300 times) [2024-06-15 15:15:44,127][1651669] InferenceWorker_p0-w0: resuming experience collection (16300 times) [2024-06-15 15:15:45,265][1651669] Updated weights for policy 0, policy_version 310657 (0.0012) [2024-06-15 15:15:45,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 47874.9). Total num frames: 636256256. Throughput: 0: 11719.1. Samples: 159133184. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:45,767][1648981] Avg episode reward: [(0, '398.150')] [2024-06-15 15:15:46,239][1651669] Updated weights for policy 0, policy_version 310712 (0.0013) [2024-06-15 15:15:47,322][1651669] Updated weights for policy 0, policy_version 310755 (0.0012) [2024-06-15 15:15:49,700][1651669] Updated weights for policy 0, policy_version 310800 (0.0108) [2024-06-15 15:15:50,782][1648981] Fps is (10 sec: 52619.0, 60 sec: 49685.1, 300 sec: 48761.3). Total num frames: 636616704. Throughput: 0: 11931.1. Samples: 159205888. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:50,783][1648981] Avg episode reward: [(0, '392.550')] [2024-06-15 15:15:53,355][1651669] Updated weights for policy 0, policy_version 310851 (0.0012) [2024-06-15 15:15:54,536][1651669] Updated weights for policy 0, policy_version 310912 (0.0014) [2024-06-15 15:15:55,787][1648981] Fps is (10 sec: 49051.5, 60 sec: 48043.4, 300 sec: 47982.4). Total num frames: 636747776. Throughput: 0: 12111.8. Samples: 159248896. Policy #0 lag: (min: 47.0, avg: 188.6, max: 303.0) [2024-06-15 15:15:55,787][1648981] Avg episode reward: [(0, '382.030')] [2024-06-15 15:15:57,980][1651669] Updated weights for policy 0, policy_version 310979 (0.0015) [2024-06-15 15:15:59,136][1651669] Updated weights for policy 0, policy_version 311031 (0.0013) [2024-06-15 15:16:00,623][1651669] Updated weights for policy 0, policy_version 311063 (0.0012) [2024-06-15 15:16:00,766][1648981] Fps is (10 sec: 42665.6, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 637042688. Throughput: 0: 12095.6. Samples: 159319040. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:00,767][1648981] Avg episode reward: [(0, '372.760')] [2024-06-15 15:16:01,476][1651669] Updated weights for policy 0, policy_version 311104 (0.0013) [2024-06-15 15:16:04,967][1651669] Updated weights for policy 0, policy_version 311158 (0.0013) [2024-06-15 15:16:05,766][1648981] Fps is (10 sec: 52536.6, 60 sec: 48059.8, 300 sec: 48207.9). Total num frames: 637272064. Throughput: 0: 12015.0. Samples: 159392256. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:05,767][1648981] Avg episode reward: [(0, '384.700')] [2024-06-15 15:16:08,551][1651669] Updated weights for policy 0, policy_version 311218 (0.0015) [2024-06-15 15:16:09,711][1651669] Updated weights for policy 0, policy_version 311265 (0.0028) [2024-06-15 15:16:10,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 49151.8, 300 sec: 48430.0). Total num frames: 637534208. Throughput: 0: 12049.0. Samples: 159428608. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:10,767][1648981] Avg episode reward: [(0, '374.980')] [2024-06-15 15:16:11,344][1651669] Updated weights for policy 0, policy_version 311315 (0.0016) [2024-06-15 15:16:15,770][1648981] Fps is (10 sec: 42582.2, 60 sec: 46421.4, 300 sec: 48096.1). Total num frames: 637698048. Throughput: 0: 12127.7. Samples: 159502336. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:15,771][1648981] Avg episode reward: [(0, '371.670')] [2024-06-15 15:16:16,455][1651669] Updated weights for policy 0, policy_version 311408 (0.0014) [2024-06-15 15:16:18,470][1651669] Updated weights for policy 0, policy_version 311440 (0.0034) [2024-06-15 15:16:20,782][1648981] Fps is (10 sec: 45803.3, 60 sec: 49139.1, 300 sec: 48205.3). Total num frames: 637992960. Throughput: 0: 11874.2. Samples: 159564288. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:20,783][1648981] Avg episode reward: [(0, '357.820')] [2024-06-15 15:16:20,903][1651669] Updated weights for policy 0, policy_version 311536 (0.0014) [2024-06-15 15:16:22,005][1651669] Updated weights for policy 0, policy_version 311573 (0.0013) [2024-06-15 15:16:25,766][1648981] Fps is (10 sec: 49170.2, 60 sec: 46421.3, 300 sec: 48430.0). Total num frames: 638189568. Throughput: 0: 11778.2. Samples: 159596544. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:25,767][1648981] Avg episode reward: [(0, '345.490')] [2024-06-15 15:16:27,409][1651274] Signal inference workers to stop experience collection... (16350 times) [2024-06-15 15:16:27,496][1651669] InferenceWorker_p0-w0: stopping experience collection (16350 times) [2024-06-15 15:16:27,710][1651274] Signal inference workers to resume experience collection... (16350 times) [2024-06-15 15:16:27,711][1651669] InferenceWorker_p0-w0: resuming experience collection (16350 times) [2024-06-15 15:16:27,878][1651669] Updated weights for policy 0, policy_version 311637 (0.0017) [2024-06-15 15:16:30,201][1651669] Updated weights for policy 0, policy_version 311687 (0.0015) [2024-06-15 15:16:30,770][1648981] Fps is (10 sec: 39370.0, 60 sec: 46964.7, 300 sec: 47985.1). Total num frames: 638386176. Throughput: 0: 12025.4. Samples: 159674368. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:30,771][1648981] Avg episode reward: [(0, '330.010')] [2024-06-15 15:16:31,809][1651669] Updated weights for policy 0, policy_version 311748 (0.0012) [2024-06-15 15:16:33,591][1651669] Updated weights for policy 0, policy_version 311840 (0.0012) [2024-06-15 15:16:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 48763.2). Total num frames: 638713856. Throughput: 0: 11757.4. Samples: 159734784. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:35,767][1648981] Avg episode reward: [(0, '338.590')] [2024-06-15 15:16:39,142][1651669] Updated weights for policy 0, policy_version 311888 (0.0031) [2024-06-15 15:16:40,766][1648981] Fps is (10 sec: 45891.4, 60 sec: 45914.9, 300 sec: 47987.6). Total num frames: 638844928. Throughput: 0: 11747.2. Samples: 159777280. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:40,767][1648981] Avg episode reward: [(0, '343.980')] [2024-06-15 15:16:42,087][1651669] Updated weights for policy 0, policy_version 311938 (0.0013) [2024-06-15 15:16:43,006][1651669] Updated weights for policy 0, policy_version 311989 (0.0011) [2024-06-15 15:16:44,552][1651669] Updated weights for policy 0, policy_version 312064 (0.0011) [2024-06-15 15:16:45,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49152.0, 300 sec: 48765.7). Total num frames: 639205376. Throughput: 0: 11685.0. Samples: 159844864. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:45,767][1648981] Avg episode reward: [(0, '336.370')] [2024-06-15 15:16:45,964][1651669] Updated weights for policy 0, policy_version 312125 (0.0026) [2024-06-15 15:16:50,769][1648981] Fps is (10 sec: 45864.9, 60 sec: 44793.0, 300 sec: 48096.4). Total num frames: 639303680. Throughput: 0: 11673.0. Samples: 159917568. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:50,770][1648981] Avg episode reward: [(0, '331.060')] [2024-06-15 15:16:51,206][1651669] Updated weights for policy 0, policy_version 312182 (0.0016) [2024-06-15 15:16:53,801][1651669] Updated weights for policy 0, policy_version 312240 (0.0018) [2024-06-15 15:16:55,235][1651669] Updated weights for policy 0, policy_version 312305 (0.0012) [2024-06-15 15:16:55,773][1648981] Fps is (10 sec: 42569.4, 60 sec: 48070.7, 300 sec: 48428.9). Total num frames: 639631360. Throughput: 0: 11569.5. Samples: 159949312. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:16:55,774][1648981] Avg episode reward: [(0, '322.270')] [2024-06-15 15:16:56,421][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000312352_639696896.pth... [2024-06-15 15:16:56,575][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000306624_627965952.pth [2024-06-15 15:16:56,991][1651669] Updated weights for policy 0, policy_version 312376 (0.0012) [2024-06-15 15:17:00,766][1648981] Fps is (10 sec: 45885.7, 60 sec: 45329.1, 300 sec: 48207.8). Total num frames: 639762432. Throughput: 0: 11572.2. Samples: 160023040. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:17:00,767][1648981] Avg episode reward: [(0, '330.360')] [2024-06-15 15:17:02,043][1651669] Updated weights for policy 0, policy_version 312416 (0.0011) [2024-06-15 15:17:04,594][1651669] Updated weights for policy 0, policy_version 312480 (0.0014) [2024-06-15 15:17:05,763][1651274] Signal inference workers to stop experience collection... (16400 times) [2024-06-15 15:17:05,766][1648981] Fps is (10 sec: 42627.5, 60 sec: 46421.3, 300 sec: 48096.8). Total num frames: 640057344. Throughput: 0: 11666.3. Samples: 160089088. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:17:05,767][1648981] Avg episode reward: [(0, '312.290')] [2024-06-15 15:17:05,796][1651669] InferenceWorker_p0-w0: stopping experience collection (16400 times) [2024-06-15 15:17:06,001][1651274] Signal inference workers to resume experience collection... (16400 times) [2024-06-15 15:17:06,018][1651669] InferenceWorker_p0-w0: resuming experience collection (16400 times) [2024-06-15 15:17:06,162][1651669] Updated weights for policy 0, policy_version 312548 (0.0011) [2024-06-15 15:17:07,143][1651669] Updated weights for policy 0, policy_version 312592 (0.0013) [2024-06-15 15:17:08,331][1651669] Updated weights for policy 0, policy_version 312640 (0.0011) [2024-06-15 15:17:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.3, 300 sec: 48319.0). Total num frames: 640286720. Throughput: 0: 11548.5. Samples: 160116224. Policy #0 lag: (min: 15.0, avg: 106.5, max: 271.0) [2024-06-15 15:17:10,767][1648981] Avg episode reward: [(0, '332.770')] [2024-06-15 15:17:13,406][1651669] Updated weights for policy 0, policy_version 312697 (0.0013) [2024-06-15 15:17:15,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45878.1, 300 sec: 47654.0). Total num frames: 640450560. Throughput: 0: 11663.1. Samples: 160199168. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:15,767][1648981] Avg episode reward: [(0, '331.370')] [2024-06-15 15:17:16,369][1651669] Updated weights for policy 0, policy_version 312752 (0.0014) [2024-06-15 15:17:18,462][1651669] Updated weights for policy 0, policy_version 312835 (0.0016) [2024-06-15 15:17:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46979.9, 300 sec: 48652.2). Total num frames: 640811008. Throughput: 0: 11548.5. Samples: 160254464. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:20,767][1648981] Avg episode reward: [(0, '331.110')] [2024-06-15 15:17:23,841][1651669] Updated weights for policy 0, policy_version 312897 (0.0033) [2024-06-15 15:17:25,207][1651669] Updated weights for policy 0, policy_version 312951 (0.0024) [2024-06-15 15:17:25,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 47987.5). Total num frames: 640942080. Throughput: 0: 11594.0. Samples: 160299008. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:25,767][1648981] Avg episode reward: [(0, '332.580')] [2024-06-15 15:17:27,141][1651669] Updated weights for policy 0, policy_version 312981 (0.0012) [2024-06-15 15:17:29,030][1651669] Updated weights for policy 0, policy_version 313057 (0.0080) [2024-06-15 15:17:29,965][1651669] Updated weights for policy 0, policy_version 313104 (0.0074) [2024-06-15 15:17:30,774][1648981] Fps is (10 sec: 49113.5, 60 sec: 48602.4, 300 sec: 48764.5). Total num frames: 641302528. Throughput: 0: 11398.5. Samples: 160357888. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:30,775][1648981] Avg episode reward: [(0, '315.710')] [2024-06-15 15:17:30,904][1651669] Updated weights for policy 0, policy_version 313150 (0.0015) [2024-06-15 15:17:35,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 44782.8, 300 sec: 47763.5). Total num frames: 641400832. Throughput: 0: 11651.4. Samples: 160441856. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:35,767][1648981] Avg episode reward: [(0, '310.550')] [2024-06-15 15:17:36,181][1651669] Updated weights for policy 0, policy_version 313216 (0.0014) [2024-06-15 15:17:39,691][1651669] Updated weights for policy 0, policy_version 313296 (0.0013) [2024-06-15 15:17:40,767][1648981] Fps is (10 sec: 42628.9, 60 sec: 48059.2, 300 sec: 48435.7). Total num frames: 641728512. Throughput: 0: 11618.3. Samples: 160472064. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:40,768][1648981] Avg episode reward: [(0, '309.700')] [2024-06-15 15:17:41,650][1651669] Updated weights for policy 0, policy_version 313376 (0.0111) [2024-06-15 15:17:45,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 44236.6, 300 sec: 47652.4). Total num frames: 641859584. Throughput: 0: 11446.0. Samples: 160538112. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:45,767][1648981] Avg episode reward: [(0, '307.360')] [2024-06-15 15:17:46,564][1651669] Updated weights for policy 0, policy_version 313424 (0.0035) [2024-06-15 15:17:47,684][1651669] Updated weights for policy 0, policy_version 313469 (0.0014) [2024-06-15 15:17:49,099][1651274] Signal inference workers to stop experience collection... (16450 times) [2024-06-15 15:17:49,193][1651669] InferenceWorker_p0-w0: stopping experience collection (16450 times) [2024-06-15 15:17:49,330][1651274] Signal inference workers to resume experience collection... (16450 times) [2024-06-15 15:17:49,331][1651669] InferenceWorker_p0-w0: resuming experience collection (16450 times) [2024-06-15 15:17:50,217][1651669] Updated weights for policy 0, policy_version 313521 (0.0127) [2024-06-15 15:17:50,766][1648981] Fps is (10 sec: 39324.1, 60 sec: 46969.2, 300 sec: 47985.7). Total num frames: 642121728. Throughput: 0: 11628.1. Samples: 160612352. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:50,767][1648981] Avg episode reward: [(0, '301.300')] [2024-06-15 15:17:51,497][1651669] Updated weights for policy 0, policy_version 313584 (0.0015) [2024-06-15 15:17:52,410][1651669] Updated weights for policy 0, policy_version 313620 (0.0011) [2024-06-15 15:17:55,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 45880.4, 300 sec: 47652.4). Total num frames: 642383872. Throughput: 0: 11741.9. Samples: 160644608. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:17:55,767][1648981] Avg episode reward: [(0, '307.140')] [2024-06-15 15:17:57,404][1651669] Updated weights for policy 0, policy_version 313665 (0.0172) [2024-06-15 15:17:58,656][1651669] Updated weights for policy 0, policy_version 313724 (0.0014) [2024-06-15 15:18:00,200][1651669] Updated weights for policy 0, policy_version 313763 (0.0021) [2024-06-15 15:18:00,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 47513.5, 300 sec: 47874.6). Total num frames: 642613248. Throughput: 0: 11741.8. Samples: 160727552. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:18:00,767][1648981] Avg episode reward: [(0, '298.230')] [2024-06-15 15:18:01,822][1651669] Updated weights for policy 0, policy_version 313827 (0.0013) [2024-06-15 15:18:03,452][1651669] Updated weights for policy 0, policy_version 313892 (0.0115) [2024-06-15 15:18:05,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 47513.6, 300 sec: 47876.1). Total num frames: 642908160. Throughput: 0: 11958.0. Samples: 160792576. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:18:05,767][1648981] Avg episode reward: [(0, '301.320')] [2024-06-15 15:18:08,399][1651669] Updated weights for policy 0, policy_version 313928 (0.0013) [2024-06-15 15:18:10,365][1651669] Updated weights for policy 0, policy_version 314000 (0.0014) [2024-06-15 15:18:10,784][1648981] Fps is (10 sec: 45795.7, 60 sec: 46407.9, 300 sec: 47649.6). Total num frames: 643072000. Throughput: 0: 11896.6. Samples: 160834560. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:18:10,784][1648981] Avg episode reward: [(0, '314.090')] [2024-06-15 15:18:11,541][1651669] Updated weights for policy 0, policy_version 314046 (0.0009) [2024-06-15 15:18:12,854][1651669] Updated weights for policy 0, policy_version 314096 (0.0012) [2024-06-15 15:18:14,769][1651669] Updated weights for policy 0, policy_version 314148 (0.0034) [2024-06-15 15:18:15,786][1648981] Fps is (10 sec: 52324.9, 60 sec: 49681.7, 300 sec: 47982.5). Total num frames: 643432448. Throughput: 0: 11989.0. Samples: 160897536. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:18:15,787][1648981] Avg episode reward: [(0, '314.370')] [2024-06-15 15:18:19,720][1651669] Updated weights for policy 0, policy_version 314192 (0.0028) [2024-06-15 15:18:20,766][1648981] Fps is (10 sec: 45955.2, 60 sec: 45329.0, 300 sec: 47430.4). Total num frames: 643530752. Throughput: 0: 11924.0. Samples: 160978432. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:18:20,767][1648981] Avg episode reward: [(0, '324.010')] [2024-06-15 15:18:21,573][1651669] Updated weights for policy 0, policy_version 314272 (0.0013) [2024-06-15 15:18:22,297][1651669] Updated weights for policy 0, policy_version 314300 (0.0012) [2024-06-15 15:18:24,073][1651669] Updated weights for policy 0, policy_version 314352 (0.0012) [2024-06-15 15:18:25,333][1651669] Updated weights for policy 0, policy_version 314416 (0.0132) [2024-06-15 15:18:25,766][1648981] Fps is (10 sec: 52532.9, 60 sec: 50244.3, 300 sec: 48096.8). Total num frames: 643956736. Throughput: 0: 11924.1. Samples: 161008640. Policy #0 lag: (min: 13.0, avg: 93.3, max: 269.0) [2024-06-15 15:18:25,767][1648981] Avg episode reward: [(0, '319.030')] [2024-06-15 15:18:30,310][1651274] Signal inference workers to stop experience collection... (16500 times) [2024-06-15 15:18:30,400][1651669] InferenceWorker_p0-w0: stopping experience collection (16500 times) [2024-06-15 15:18:30,522][1651274] Signal inference workers to resume experience collection... (16500 times) [2024-06-15 15:18:30,523][1651669] InferenceWorker_p0-w0: resuming experience collection (16500 times) [2024-06-15 15:18:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 44788.8, 300 sec: 47208.1). Total num frames: 643989504. Throughput: 0: 12197.0. Samples: 161086976. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:18:30,767][1648981] Avg episode reward: [(0, '318.810')] [2024-06-15 15:18:31,206][1651669] Updated weights for policy 0, policy_version 314480 (0.0015) [2024-06-15 15:18:32,203][1651669] Updated weights for policy 0, policy_version 314517 (0.0011) [2024-06-15 15:18:33,168][1651669] Updated weights for policy 0, policy_version 314557 (0.0014) [2024-06-15 15:18:34,830][1651669] Updated weights for policy 0, policy_version 314608 (0.0012) [2024-06-15 15:18:35,778][1648981] Fps is (10 sec: 42547.9, 60 sec: 49688.5, 300 sec: 47985.4). Total num frames: 644382720. Throughput: 0: 12114.1. Samples: 161157632. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:18:35,779][1648981] Avg episode reward: [(0, '346.550')] [2024-06-15 15:18:36,100][1651669] Updated weights for policy 0, policy_version 314672 (0.0012) [2024-06-15 15:18:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46421.8, 300 sec: 47431.6). Total num frames: 644513792. Throughput: 0: 12219.7. Samples: 161194496. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:18:40,767][1648981] Avg episode reward: [(0, '357.490')] [2024-06-15 15:18:41,710][1651669] Updated weights for policy 0, policy_version 314736 (0.0013) [2024-06-15 15:18:43,763][1651669] Updated weights for policy 0, policy_version 314773 (0.0013) [2024-06-15 15:18:45,489][1651669] Updated weights for policy 0, policy_version 314848 (0.0011) [2024-06-15 15:18:45,770][1648981] Fps is (10 sec: 42632.8, 60 sec: 49149.1, 300 sec: 47763.0). Total num frames: 644808704. Throughput: 0: 12025.3. Samples: 161268736. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:18:45,771][1648981] Avg episode reward: [(0, '342.800')] [2024-06-15 15:18:47,024][1651669] Updated weights for policy 0, policy_version 314912 (0.0143) [2024-06-15 15:18:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 645005312. Throughput: 0: 12071.8. Samples: 161335808. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:18:50,767][1648981] Avg episode reward: [(0, '360.020')] [2024-06-15 15:18:52,044][1651669] Updated weights for policy 0, policy_version 314976 (0.0013) [2024-06-15 15:18:55,176][1651669] Updated weights for policy 0, policy_version 315024 (0.0019) [2024-06-15 15:18:55,766][1648981] Fps is (10 sec: 39336.1, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 645201920. Throughput: 0: 11860.2. Samples: 161368064. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:18:55,767][1648981] Avg episode reward: [(0, '349.690')] [2024-06-15 15:18:56,392][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000315072_645267456.pth... [2024-06-15 15:18:56,572][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000309536_633929728.pth [2024-06-15 15:18:57,432][1651669] Updated weights for policy 0, policy_version 315105 (0.0153) [2024-06-15 15:18:58,789][1651669] Updated weights for policy 0, policy_version 315174 (0.0073) [2024-06-15 15:19:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 645529600. Throughput: 0: 11883.6. Samples: 161432064. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:00,767][1648981] Avg episode reward: [(0, '367.490')] [2024-06-15 15:19:03,114][1651669] Updated weights for policy 0, policy_version 315219 (0.0015) [2024-06-15 15:19:05,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 645660672. Throughput: 0: 11855.6. Samples: 161511936. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:05,767][1648981] Avg episode reward: [(0, '348.790')] [2024-06-15 15:19:06,852][1651669] Updated weights for policy 0, policy_version 315280 (0.0012) [2024-06-15 15:19:08,469][1651669] Updated weights for policy 0, policy_version 315330 (0.0022) [2024-06-15 15:19:09,288][1651274] Signal inference workers to stop experience collection... (16550 times) [2024-06-15 15:19:09,399][1651669] InferenceWorker_p0-w0: stopping experience collection (16550 times) [2024-06-15 15:19:09,549][1651274] Signal inference workers to resume experience collection... (16550 times) [2024-06-15 15:19:09,554][1651669] InferenceWorker_p0-w0: resuming experience collection (16550 times) [2024-06-15 15:19:10,309][1651669] Updated weights for policy 0, policy_version 315409 (0.0092) [2024-06-15 15:19:10,778][1648981] Fps is (10 sec: 45821.0, 60 sec: 48610.4, 300 sec: 47761.6). Total num frames: 645988352. Throughput: 0: 11875.3. Samples: 161543168. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:10,779][1648981] Avg episode reward: [(0, '354.280')] [2024-06-15 15:19:14,384][1651669] Updated weights for policy 0, policy_version 315488 (0.0012) [2024-06-15 15:19:15,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 45890.4, 300 sec: 47319.2). Total num frames: 646184960. Throughput: 0: 11628.1. Samples: 161610240. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:15,767][1648981] Avg episode reward: [(0, '355.140')] [2024-06-15 15:19:18,266][1651669] Updated weights for policy 0, policy_version 315524 (0.0014) [2024-06-15 15:19:19,913][1651669] Updated weights for policy 0, policy_version 315586 (0.0012) [2024-06-15 15:19:20,766][1648981] Fps is (10 sec: 39367.9, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 646381568. Throughput: 0: 11676.7. Samples: 161682944. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:20,767][1648981] Avg episode reward: [(0, '364.520')] [2024-06-15 15:19:21,192][1651669] Updated weights for policy 0, policy_version 315651 (0.0108) [2024-06-15 15:19:22,060][1651669] Updated weights for policy 0, policy_version 315706 (0.0014) [2024-06-15 15:19:25,107][1651669] Updated weights for policy 0, policy_version 315760 (0.0018) [2024-06-15 15:19:25,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 45875.1, 300 sec: 47430.3). Total num frames: 646709248. Throughput: 0: 11821.5. Samples: 161726464. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:25,767][1648981] Avg episode reward: [(0, '362.750')] [2024-06-15 15:19:28,682][1651669] Updated weights for policy 0, policy_version 315808 (0.0107) [2024-06-15 15:19:30,361][1651669] Updated weights for policy 0, policy_version 315872 (0.0013) [2024-06-15 15:19:30,769][1648981] Fps is (10 sec: 52413.7, 60 sec: 48603.5, 300 sec: 47320.6). Total num frames: 646905856. Throughput: 0: 11787.6. Samples: 161799168. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:30,770][1648981] Avg episode reward: [(0, '349.890')] [2024-06-15 15:19:32,165][1651669] Updated weights for policy 0, policy_version 315940 (0.0012) [2024-06-15 15:19:35,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 45884.3, 300 sec: 47208.1). Total num frames: 647135232. Throughput: 0: 11946.7. Samples: 161873408. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:35,767][1648981] Avg episode reward: [(0, '363.940')] [2024-06-15 15:19:36,405][1651669] Updated weights for policy 0, policy_version 316016 (0.0113) [2024-06-15 15:19:39,630][1651669] Updated weights for policy 0, policy_version 316064 (0.0012) [2024-06-15 15:19:40,766][1648981] Fps is (10 sec: 45888.7, 60 sec: 47513.6, 300 sec: 47319.2). Total num frames: 647364608. Throughput: 0: 11980.8. Samples: 161907200. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:40,767][1648981] Avg episode reward: [(0, '370.330')] [2024-06-15 15:19:41,905][1651669] Updated weights for policy 0, policy_version 316147 (0.0130) [2024-06-15 15:19:43,237][1651669] Updated weights for policy 0, policy_version 316212 (0.0013) [2024-06-15 15:19:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 46970.4, 300 sec: 47430.3). Total num frames: 647626752. Throughput: 0: 11821.5. Samples: 161964032. Policy #0 lag: (min: 0.0, avg: 63.2, max: 255.0) [2024-06-15 15:19:45,767][1648981] Avg episode reward: [(0, '354.180')] [2024-06-15 15:19:48,097][1651669] Updated weights for policy 0, policy_version 316276 (0.0015) [2024-06-15 15:19:50,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 45875.0, 300 sec: 47097.0). Total num frames: 647757824. Throughput: 0: 11798.7. Samples: 162042880. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:19:50,767][1648981] Avg episode reward: [(0, '358.680')] [2024-06-15 15:19:50,942][1651274] Signal inference workers to stop experience collection... (16600 times) [2024-06-15 15:19:51,010][1651669] InferenceWorker_p0-w0: stopping experience collection (16600 times) [2024-06-15 15:19:51,189][1651274] Signal inference workers to resume experience collection... (16600 times) [2024-06-15 15:19:51,190][1651669] InferenceWorker_p0-w0: resuming experience collection (16600 times) [2024-06-15 15:19:51,632][1651669] Updated weights for policy 0, policy_version 316321 (0.0021) [2024-06-15 15:19:53,527][1651669] Updated weights for policy 0, policy_version 316400 (0.0126) [2024-06-15 15:19:55,157][1651669] Updated weights for policy 0, policy_version 316476 (0.0142) [2024-06-15 15:19:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 648151040. Throughput: 0: 11710.8. Samples: 162070016. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:19:55,767][1648981] Avg episode reward: [(0, '347.140')] [2024-06-15 15:19:58,589][1651669] Updated weights for policy 0, policy_version 316536 (0.0013) [2024-06-15 15:20:00,767][1648981] Fps is (10 sec: 52429.7, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 648282112. Throughput: 0: 11798.7. Samples: 162141184. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:00,767][1648981] Avg episode reward: [(0, '378.400')] [2024-06-15 15:20:02,987][1651669] Updated weights for policy 0, policy_version 316576 (0.0013) [2024-06-15 15:20:04,587][1651669] Updated weights for policy 0, policy_version 316640 (0.0012) [2024-06-15 15:20:05,767][1648981] Fps is (10 sec: 42593.5, 60 sec: 48604.9, 300 sec: 47430.1). Total num frames: 648577024. Throughput: 0: 11764.3. Samples: 162212352. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:05,786][1648981] Avg episode reward: [(0, '378.670')] [2024-06-15 15:20:06,239][1651669] Updated weights for policy 0, policy_version 316720 (0.0071) [2024-06-15 15:20:08,789][1651669] Updated weights for policy 0, policy_version 316768 (0.0033) [2024-06-15 15:20:10,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 46976.7, 300 sec: 47097.7). Total num frames: 648806400. Throughput: 0: 11628.1. Samples: 162249728. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:10,767][1648981] Avg episode reward: [(0, '388.200')] [2024-06-15 15:20:13,296][1651669] Updated weights for policy 0, policy_version 316816 (0.0013) [2024-06-15 15:20:14,662][1651669] Updated weights for policy 0, policy_version 316864 (0.0013) [2024-06-15 15:20:15,766][1648981] Fps is (10 sec: 42603.6, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 649003008. Throughput: 0: 11560.6. Samples: 162319360. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:15,767][1648981] Avg episode reward: [(0, '383.970')] [2024-06-15 15:20:16,699][1651669] Updated weights for policy 0, policy_version 316944 (0.0013) [2024-06-15 15:20:20,450][1651669] Updated weights for policy 0, policy_version 316995 (0.0139) [2024-06-15 15:20:20,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 46874.9). Total num frames: 649232384. Throughput: 0: 11366.4. Samples: 162384896. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:20,767][1648981] Avg episode reward: [(0, '386.240')] [2024-06-15 15:20:21,597][1651669] Updated weights for policy 0, policy_version 317044 (0.0012) [2024-06-15 15:20:24,632][1651669] Updated weights for policy 0, policy_version 317072 (0.0012) [2024-06-15 15:20:25,687][1651669] Updated weights for policy 0, policy_version 317110 (0.0010) [2024-06-15 15:20:25,767][1648981] Fps is (10 sec: 42596.0, 60 sec: 45328.8, 300 sec: 46985.9). Total num frames: 649428992. Throughput: 0: 11480.1. Samples: 162423808. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:25,767][1648981] Avg episode reward: [(0, '375.150')] [2024-06-15 15:20:27,082][1651669] Updated weights for policy 0, policy_version 317171 (0.0044) [2024-06-15 15:20:27,338][1651274] Signal inference workers to stop experience collection... (16650 times) [2024-06-15 15:20:27,376][1651669] InferenceWorker_p0-w0: stopping experience collection (16650 times) [2024-06-15 15:20:27,519][1651274] Signal inference workers to resume experience collection... (16650 times) [2024-06-15 15:20:27,520][1651669] InferenceWorker_p0-w0: resuming experience collection (16650 times) [2024-06-15 15:20:28,533][1651669] Updated weights for policy 0, policy_version 317244 (0.0015) [2024-06-15 15:20:30,766][1648981] Fps is (10 sec: 55705.3, 60 sec: 48062.0, 300 sec: 47319.2). Total num frames: 649789440. Throughput: 0: 11867.0. Samples: 162498048. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:30,767][1648981] Avg episode reward: [(0, '362.920')] [2024-06-15 15:20:30,812][1651669] Updated weights for policy 0, policy_version 317285 (0.0013) [2024-06-15 15:20:35,427][1651669] Updated weights for policy 0, policy_version 317330 (0.0019) [2024-06-15 15:20:35,779][1648981] Fps is (10 sec: 49094.6, 60 sec: 46411.9, 300 sec: 46881.2). Total num frames: 649920512. Throughput: 0: 11943.5. Samples: 162580480. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:35,779][1648981] Avg episode reward: [(0, '365.190')] [2024-06-15 15:20:36,923][1651669] Updated weights for policy 0, policy_version 317393 (0.0013) [2024-06-15 15:20:38,957][1651669] Updated weights for policy 0, policy_version 317476 (0.0013) [2024-06-15 15:20:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 650248192. Throughput: 0: 11901.1. Samples: 162605568. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:40,767][1648981] Avg episode reward: [(0, '369.180')] [2024-06-15 15:20:41,378][1651669] Updated weights for policy 0, policy_version 317536 (0.0013) [2024-06-15 15:20:45,680][1651669] Updated weights for policy 0, policy_version 317570 (0.0012) [2024-06-15 15:20:45,766][1648981] Fps is (10 sec: 45931.0, 60 sec: 45875.2, 300 sec: 46655.2). Total num frames: 650379264. Throughput: 0: 12094.6. Samples: 162685440. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:45,767][1648981] Avg episode reward: [(0, '352.820')] [2024-06-15 15:20:47,062][1651669] Updated weights for policy 0, policy_version 317619 (0.0012) [2024-06-15 15:20:48,408][1651669] Updated weights for policy 0, policy_version 317680 (0.0015) [2024-06-15 15:20:50,160][1651669] Updated weights for policy 0, policy_version 317753 (0.0014) [2024-06-15 15:20:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.5, 300 sec: 47544.7). Total num frames: 650772480. Throughput: 0: 12015.2. Samples: 162753024. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:50,767][1648981] Avg episode reward: [(0, '352.410')] [2024-06-15 15:20:52,357][1651669] Updated weights for policy 0, policy_version 317808 (0.0015) [2024-06-15 15:20:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45875.1, 300 sec: 46986.0). Total num frames: 650903552. Throughput: 0: 12003.5. Samples: 162789888. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:20:55,767][1648981] Avg episode reward: [(0, '364.260')] [2024-06-15 15:20:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000317824_650903552.pth... [2024-06-15 15:20:55,817][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000312352_639696896.pth [2024-06-15 15:20:57,171][1651669] Updated weights for policy 0, policy_version 317856 (0.0013) [2024-06-15 15:20:58,830][1651669] Updated weights for policy 0, policy_version 317909 (0.0113) [2024-06-15 15:21:00,527][1651669] Updated weights for policy 0, policy_version 317969 (0.0013) [2024-06-15 15:21:00,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 49152.2, 300 sec: 47319.2). Total num frames: 651231232. Throughput: 0: 12014.9. Samples: 162860032. Policy #0 lag: (min: 9.0, avg: 108.6, max: 265.0) [2024-06-15 15:21:00,767][1648981] Avg episode reward: [(0, '367.260')] [2024-06-15 15:21:03,382][1651669] Updated weights for policy 0, policy_version 318035 (0.0013) [2024-06-15 15:21:05,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 47514.3, 300 sec: 47097.0). Total num frames: 651427840. Throughput: 0: 12140.0. Samples: 162931200. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:05,767][1648981] Avg episode reward: [(0, '360.240')] [2024-06-15 15:21:08,187][1651669] Updated weights for policy 0, policy_version 318082 (0.0012) [2024-06-15 15:21:09,372][1651274] Signal inference workers to stop experience collection... (16700 times) [2024-06-15 15:21:09,483][1651669] InferenceWorker_p0-w0: stopping experience collection (16700 times) [2024-06-15 15:21:09,683][1651274] Signal inference workers to resume experience collection... (16700 times) [2024-06-15 15:21:09,684][1651669] InferenceWorker_p0-w0: resuming experience collection (16700 times) [2024-06-15 15:21:09,916][1651669] Updated weights for policy 0, policy_version 318145 (0.0047) [2024-06-15 15:21:10,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 46967.5, 300 sec: 47208.7). Total num frames: 651624448. Throughput: 0: 12219.9. Samples: 162973696. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:10,767][1648981] Avg episode reward: [(0, '357.510')] [2024-06-15 15:21:11,310][1651669] Updated weights for policy 0, policy_version 318208 (0.0013) [2024-06-15 15:21:14,147][1651669] Updated weights for policy 0, policy_version 318274 (0.0085) [2024-06-15 15:21:15,240][1651669] Updated weights for policy 0, policy_version 318328 (0.0010) [2024-06-15 15:21:15,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 49152.0, 300 sec: 47321.8). Total num frames: 651952128. Throughput: 0: 11935.3. Samples: 163035136. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:15,767][1648981] Avg episode reward: [(0, '367.560')] [2024-06-15 15:21:19,546][1651669] Updated weights for policy 0, policy_version 318356 (0.0013) [2024-06-15 15:21:20,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 652083200. Throughput: 0: 11847.5. Samples: 163113472. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:20,767][1648981] Avg episode reward: [(0, '377.700')] [2024-06-15 15:21:21,566][1651669] Updated weights for policy 0, policy_version 318432 (0.0185) [2024-06-15 15:21:23,190][1651669] Updated weights for policy 0, policy_version 318512 (0.0013) [2024-06-15 15:21:25,058][1651669] Updated weights for policy 0, policy_version 318562 (0.0014) [2024-06-15 15:21:25,592][1651669] Updated weights for policy 0, policy_version 318591 (0.0013) [2024-06-15 15:21:25,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 50790.7, 300 sec: 47764.1). Total num frames: 652476416. Throughput: 0: 11912.5. Samples: 163141632. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:25,767][1648981] Avg episode reward: [(0, '377.360')] [2024-06-15 15:21:30,541][1651669] Updated weights for policy 0, policy_version 318642 (0.0012) [2024-06-15 15:21:30,790][1648981] Fps is (10 sec: 49035.3, 60 sec: 46403.0, 300 sec: 46982.2). Total num frames: 652574720. Throughput: 0: 12122.3. Samples: 163231232. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:30,791][1648981] Avg episode reward: [(0, '394.960')] [2024-06-15 15:21:32,111][1651669] Updated weights for policy 0, policy_version 318713 (0.0017) [2024-06-15 15:21:33,460][1651669] Updated weights for policy 0, policy_version 318759 (0.0015) [2024-06-15 15:21:35,209][1651669] Updated weights for policy 0, policy_version 318801 (0.0012) [2024-06-15 15:21:35,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 50800.7, 300 sec: 47874.6). Total num frames: 652967936. Throughput: 0: 12083.2. Samples: 163296768. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:35,767][1648981] Avg episode reward: [(0, '401.050')] [2024-06-15 15:21:35,895][1651669] Updated weights for policy 0, policy_version 318841 (0.0012) [2024-06-15 15:21:40,445][1651669] Updated weights for policy 0, policy_version 318896 (0.0018) [2024-06-15 15:21:40,766][1648981] Fps is (10 sec: 52553.3, 60 sec: 47513.6, 300 sec: 47097.0). Total num frames: 653099008. Throughput: 0: 12344.9. Samples: 163345408. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:40,767][1648981] Avg episode reward: [(0, '395.780')] [2024-06-15 15:21:42,273][1651669] Updated weights for policy 0, policy_version 318975 (0.0013) [2024-06-15 15:21:44,360][1651669] Updated weights for policy 0, policy_version 319029 (0.0012) [2024-06-15 15:21:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 50790.4, 300 sec: 47875.0). Total num frames: 653426688. Throughput: 0: 12140.1. Samples: 163406336. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:45,767][1648981] Avg episode reward: [(0, '389.310')] [2024-06-15 15:21:46,114][1651274] Signal inference workers to stop experience collection... (16750 times) [2024-06-15 15:21:46,147][1651669] InferenceWorker_p0-w0: stopping experience collection (16750 times) [2024-06-15 15:21:46,377][1651274] Signal inference workers to resume experience collection... (16750 times) [2024-06-15 15:21:46,377][1651669] InferenceWorker_p0-w0: resuming experience collection (16750 times) [2024-06-15 15:21:50,300][1651669] Updated weights for policy 0, policy_version 319105 (0.0042) [2024-06-15 15:21:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46421.4, 300 sec: 47209.2). Total num frames: 653557760. Throughput: 0: 12470.1. Samples: 163492352. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:50,767][1648981] Avg episode reward: [(0, '397.710')] [2024-06-15 15:21:51,537][1651669] Updated weights for policy 0, policy_version 319155 (0.0014) [2024-06-15 15:21:53,048][1651669] Updated weights for policy 0, policy_version 319232 (0.0133) [2024-06-15 15:21:55,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 653918208. Throughput: 0: 12117.3. Samples: 163518976. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:21:55,767][1648981] Avg episode reward: [(0, '404.510')] [2024-06-15 15:21:57,441][1651669] Updated weights for policy 0, policy_version 319344 (0.0148) [2024-06-15 15:22:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 654049280. Throughput: 0: 12288.0. Samples: 163588096. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:22:00,767][1648981] Avg episode reward: [(0, '396.610')] [2024-06-15 15:22:01,556][1651669] Updated weights for policy 0, policy_version 319376 (0.0013) [2024-06-15 15:22:03,341][1651669] Updated weights for policy 0, policy_version 319445 (0.0015) [2024-06-15 15:22:05,289][1651669] Updated weights for policy 0, policy_version 319504 (0.0012) [2024-06-15 15:22:05,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49152.2, 300 sec: 47763.5). Total num frames: 654376960. Throughput: 0: 12196.9. Samples: 163662336. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:22:05,767][1648981] Avg episode reward: [(0, '386.540')] [2024-06-15 15:22:08,045][1651669] Updated weights for policy 0, policy_version 319572 (0.0014) [2024-06-15 15:22:10,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 49148.8, 300 sec: 47874.0). Total num frames: 654573568. Throughput: 0: 12275.6. Samples: 163694080. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:22:10,771][1648981] Avg episode reward: [(0, '387.790')] [2024-06-15 15:22:12,062][1651669] Updated weights for policy 0, policy_version 319632 (0.0013) [2024-06-15 15:22:13,247][1651669] Updated weights for policy 0, policy_version 319675 (0.0016) [2024-06-15 15:22:14,768][1651669] Updated weights for policy 0, policy_version 319734 (0.0015) [2024-06-15 15:22:15,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 654868480. Throughput: 0: 11987.1. Samples: 163770368. Policy #0 lag: (min: 0.0, avg: 142.2, max: 256.0) [2024-06-15 15:22:15,767][1648981] Avg episode reward: [(0, '392.210')] [2024-06-15 15:22:15,806][1651669] Updated weights for policy 0, policy_version 319769 (0.0015) [2024-06-15 15:22:18,425][1651669] Updated weights for policy 0, policy_version 319824 (0.0013) [2024-06-15 15:22:19,692][1651669] Updated weights for policy 0, policy_version 319871 (0.0016) [2024-06-15 15:22:20,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 655097856. Throughput: 0: 12071.8. Samples: 163840000. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:20,767][1648981] Avg episode reward: [(0, '394.250')] [2024-06-15 15:22:24,927][1651669] Updated weights for policy 0, policy_version 319943 (0.0013) [2024-06-15 15:22:25,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46967.6, 300 sec: 47431.6). Total num frames: 655294464. Throughput: 0: 11946.7. Samples: 163883008. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:25,767][1648981] Avg episode reward: [(0, '402.420')] [2024-06-15 15:22:26,132][1651669] Updated weights for policy 0, policy_version 320000 (0.0120) [2024-06-15 15:22:27,527][1651669] Updated weights for policy 0, policy_version 320054 (0.0101) [2024-06-15 15:22:29,773][1651274] Signal inference workers to stop experience collection... (16800 times) [2024-06-15 15:22:29,862][1651669] InferenceWorker_p0-w0: stopping experience collection (16800 times) [2024-06-15 15:22:29,863][1651669] Updated weights for policy 0, policy_version 320084 (0.0012) [2024-06-15 15:22:30,047][1651274] Signal inference workers to resume experience collection... (16800 times) [2024-06-15 15:22:30,048][1651669] InferenceWorker_p0-w0: resuming experience collection (16800 times) [2024-06-15 15:22:30,798][1648981] Fps is (10 sec: 48996.2, 60 sec: 50237.5, 300 sec: 48091.6). Total num frames: 655589376. Throughput: 0: 12086.0. Samples: 163950592. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:30,799][1648981] Avg episode reward: [(0, '412.110')] [2024-06-15 15:22:33,610][1651669] Updated weights for policy 0, policy_version 320129 (0.0014) [2024-06-15 15:22:34,793][1651669] Updated weights for policy 0, policy_version 320190 (0.0180) [2024-06-15 15:22:35,800][1648981] Fps is (10 sec: 45721.0, 60 sec: 46395.2, 300 sec: 47536.0). Total num frames: 655753216. Throughput: 0: 11824.0. Samples: 164024832. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:35,801][1648981] Avg episode reward: [(0, '391.700')] [2024-06-15 15:22:37,118][1651669] Updated weights for policy 0, policy_version 320256 (0.0013) [2024-06-15 15:22:40,130][1651669] Updated weights for policy 0, policy_version 320323 (0.0095) [2024-06-15 15:22:40,774][1648981] Fps is (10 sec: 45987.4, 60 sec: 49145.9, 300 sec: 48095.6). Total num frames: 656048128. Throughput: 0: 11876.5. Samples: 164053504. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:40,774][1648981] Avg episode reward: [(0, '400.960')] [2024-06-15 15:22:41,616][1651669] Updated weights for policy 0, policy_version 320372 (0.0012) [2024-06-15 15:22:45,601][1651669] Updated weights for policy 0, policy_version 320416 (0.0012) [2024-06-15 15:22:45,766][1648981] Fps is (10 sec: 46030.5, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 656211968. Throughput: 0: 12094.6. Samples: 164132352. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:45,767][1648981] Avg episode reward: [(0, '403.570')] [2024-06-15 15:22:46,911][1651669] Updated weights for policy 0, policy_version 320453 (0.0012) [2024-06-15 15:22:48,129][1651669] Updated weights for policy 0, policy_version 320512 (0.0017) [2024-06-15 15:22:49,619][1651669] Updated weights for policy 0, policy_version 320571 (0.0013) [2024-06-15 15:22:50,766][1648981] Fps is (10 sec: 49188.7, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 656539648. Throughput: 0: 11958.1. Samples: 164200448. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:50,767][1648981] Avg episode reward: [(0, '398.680')] [2024-06-15 15:22:52,025][1651669] Updated weights for policy 0, policy_version 320630 (0.0015) [2024-06-15 15:22:55,769][1648981] Fps is (10 sec: 45863.9, 60 sec: 45873.5, 300 sec: 47652.1). Total num frames: 656670720. Throughput: 0: 12106.4. Samples: 164238848. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:22:55,769][1648981] Avg episode reward: [(0, '399.250')] [2024-06-15 15:22:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000320640_656670720.pth... [2024-06-15 15:22:56,029][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000315072_645267456.pth [2024-06-15 15:22:56,547][1651669] Updated weights for policy 0, policy_version 320660 (0.0011) [2024-06-15 15:22:57,757][1651669] Updated weights for policy 0, policy_version 320706 (0.0024) [2024-06-15 15:22:59,440][1651669] Updated weights for policy 0, policy_version 320784 (0.0013) [2024-06-15 15:23:00,328][1651669] Updated weights for policy 0, policy_version 320829 (0.0039) [2024-06-15 15:23:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 657063936. Throughput: 0: 11992.2. Samples: 164310016. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:00,767][1648981] Avg episode reward: [(0, '403.480')] [2024-06-15 15:23:02,233][1651669] Updated weights for policy 0, policy_version 320880 (0.0012) [2024-06-15 15:23:05,775][1648981] Fps is (10 sec: 52398.3, 60 sec: 46961.1, 300 sec: 47876.1). Total num frames: 657195008. Throughput: 0: 12308.5. Samples: 164393984. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:05,775][1648981] Avg episode reward: [(0, '398.920')] [2024-06-15 15:23:07,294][1651669] Updated weights for policy 0, policy_version 320952 (0.0012) [2024-06-15 15:23:08,952][1651669] Updated weights for policy 0, policy_version 320996 (0.0035) [2024-06-15 15:23:10,552][1651669] Updated weights for policy 0, policy_version 321059 (0.0013) [2024-06-15 15:23:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49155.2, 300 sec: 47766.7). Total num frames: 657522688. Throughput: 0: 12037.7. Samples: 164424704. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:10,767][1648981] Avg episode reward: [(0, '380.780')] [2024-06-15 15:23:11,269][1651669] Updated weights for policy 0, policy_version 321088 (0.0012) [2024-06-15 15:23:12,728][1651274] Signal inference workers to stop experience collection... (16850 times) [2024-06-15 15:23:12,769][1651669] InferenceWorker_p0-w0: stopping experience collection (16850 times) [2024-06-15 15:23:12,997][1651274] Signal inference workers to resume experience collection... (16850 times) [2024-06-15 15:23:12,998][1651669] InferenceWorker_p0-w0: resuming experience collection (16850 times) [2024-06-15 15:23:13,295][1651669] Updated weights for policy 0, policy_version 321152 (0.0015) [2024-06-15 15:23:15,767][1648981] Fps is (10 sec: 52470.4, 60 sec: 47513.4, 300 sec: 48096.7). Total num frames: 657719296. Throughput: 0: 12114.5. Samples: 164495360. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:15,768][1648981] Avg episode reward: [(0, '362.090')] [2024-06-15 15:23:18,223][1651669] Updated weights for policy 0, policy_version 321212 (0.0099) [2024-06-15 15:23:20,409][1651669] Updated weights for policy 0, policy_version 321280 (0.0015) [2024-06-15 15:23:20,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 48053.6, 300 sec: 47540.1). Total num frames: 657981440. Throughput: 0: 12090.2. Samples: 164568576. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:20,775][1648981] Avg episode reward: [(0, '369.220')] [2024-06-15 15:23:21,837][1651669] Updated weights for policy 0, policy_version 321335 (0.0013) [2024-06-15 15:23:23,361][1651669] Updated weights for policy 0, policy_version 321377 (0.0012) [2024-06-15 15:23:25,794][1648981] Fps is (10 sec: 52287.1, 60 sec: 49129.5, 300 sec: 48314.4). Total num frames: 658243584. Throughput: 0: 12168.8. Samples: 164601344. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:25,794][1648981] Avg episode reward: [(0, '362.370')] [2024-06-15 15:23:28,805][1651669] Updated weights for policy 0, policy_version 321443 (0.0011) [2024-06-15 15:23:30,197][1651669] Updated weights for policy 0, policy_version 321508 (0.0042) [2024-06-15 15:23:30,786][1648981] Fps is (10 sec: 52367.2, 60 sec: 48615.9, 300 sec: 47873.4). Total num frames: 658505728. Throughput: 0: 12316.8. Samples: 164686848. Policy #0 lag: (min: 15.0, avg: 146.3, max: 271.0) [2024-06-15 15:23:30,786][1648981] Avg episode reward: [(0, '368.190')] [2024-06-15 15:23:31,346][1651669] Updated weights for policy 0, policy_version 321558 (0.0073) [2024-06-15 15:23:32,165][1651669] Updated weights for policy 0, policy_version 321600 (0.0012) [2024-06-15 15:23:34,165][1651669] Updated weights for policy 0, policy_version 321656 (0.0013) [2024-06-15 15:23:35,767][1648981] Fps is (10 sec: 52568.1, 60 sec: 50271.7, 300 sec: 48318.8). Total num frames: 658767872. Throughput: 0: 12481.2. Samples: 164762112. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:23:35,768][1648981] Avg episode reward: [(0, '351.170')] [2024-06-15 15:23:39,020][1651669] Updated weights for policy 0, policy_version 321696 (0.0012) [2024-06-15 15:23:40,697][1651669] Updated weights for policy 0, policy_version 321776 (0.0016) [2024-06-15 15:23:40,771][1648981] Fps is (10 sec: 49226.4, 60 sec: 49154.5, 300 sec: 48096.7). Total num frames: 658997248. Throughput: 0: 12594.6. Samples: 164805632. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:23:40,771][1648981] Avg episode reward: [(0, '342.410')] [2024-06-15 15:23:42,322][1651669] Updated weights for policy 0, policy_version 321851 (0.0015) [2024-06-15 15:23:44,849][1651669] Updated weights for policy 0, policy_version 321914 (0.0014) [2024-06-15 15:23:45,766][1648981] Fps is (10 sec: 52433.4, 60 sec: 51336.5, 300 sec: 48430.0). Total num frames: 659292160. Throughput: 0: 12276.6. Samples: 164862464. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:23:45,767][1648981] Avg episode reward: [(0, '343.980')] [2024-06-15 15:23:50,392][1651669] Updated weights for policy 0, policy_version 321956 (0.0013) [2024-06-15 15:23:50,766][1648981] Fps is (10 sec: 39338.7, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 659390464. Throughput: 0: 12324.4. Samples: 164948480. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:23:50,767][1648981] Avg episode reward: [(0, '333.760')] [2024-06-15 15:23:52,003][1651669] Updated weights for policy 0, policy_version 322032 (0.0013) [2024-06-15 15:23:53,008][1651274] Signal inference workers to stop experience collection... (16900 times) [2024-06-15 15:23:53,078][1651669] InferenceWorker_p0-w0: stopping experience collection (16900 times) [2024-06-15 15:23:53,354][1651274] Signal inference workers to resume experience collection... (16900 times) [2024-06-15 15:23:53,355][1651669] InferenceWorker_p0-w0: resuming experience collection (16900 times) [2024-06-15 15:23:53,618][1651669] Updated weights for policy 0, policy_version 322112 (0.0013) [2024-06-15 15:23:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 51338.6, 300 sec: 48207.8). Total num frames: 659750912. Throughput: 0: 12197.0. Samples: 164973568. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:23:55,767][1648981] Avg episode reward: [(0, '348.060')] [2024-06-15 15:23:56,253][1651669] Updated weights for policy 0, policy_version 322167 (0.0012) [2024-06-15 15:24:00,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 659816448. Throughput: 0: 12436.0. Samples: 165054976. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:00,767][1648981] Avg episode reward: [(0, '366.570')] [2024-06-15 15:24:01,851][1651669] Updated weights for policy 0, policy_version 322224 (0.0013) [2024-06-15 15:24:02,949][1651669] Updated weights for policy 0, policy_version 322272 (0.0012) [2024-06-15 15:24:04,941][1651669] Updated weights for policy 0, policy_version 322367 (0.0094) [2024-06-15 15:24:05,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 50251.2, 300 sec: 48209.8). Total num frames: 660209664. Throughput: 0: 12142.2. Samples: 165114880. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:05,767][1648981] Avg episode reward: [(0, '363.710')] [2024-06-15 15:24:07,675][1651669] Updated weights for policy 0, policy_version 322419 (0.0012) [2024-06-15 15:24:10,767][1648981] Fps is (10 sec: 52423.8, 60 sec: 46966.7, 300 sec: 47985.5). Total num frames: 660340736. Throughput: 0: 12170.0. Samples: 165148672. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:10,768][1648981] Avg episode reward: [(0, '363.810')] [2024-06-15 15:24:12,479][1651669] Updated weights for policy 0, policy_version 322451 (0.0025) [2024-06-15 15:24:13,891][1651669] Updated weights for policy 0, policy_version 322513 (0.0012) [2024-06-15 15:24:15,316][1651669] Updated weights for policy 0, policy_version 322576 (0.0013) [2024-06-15 15:24:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.3, 300 sec: 48430.0). Total num frames: 660668416. Throughput: 0: 11963.2. Samples: 165224960. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:15,767][1648981] Avg episode reward: [(0, '367.140')] [2024-06-15 15:24:16,458][1651669] Updated weights for policy 0, policy_version 322624 (0.0015) [2024-06-15 15:24:18,810][1651669] Updated weights for policy 0, policy_version 322672 (0.0087) [2024-06-15 15:24:20,766][1648981] Fps is (10 sec: 52433.5, 60 sec: 48065.9, 300 sec: 47985.7). Total num frames: 660865024. Throughput: 0: 11776.2. Samples: 165292032. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:20,767][1648981] Avg episode reward: [(0, '369.230')] [2024-06-15 15:24:24,209][1651669] Updated weights for policy 0, policy_version 322738 (0.0021) [2024-06-15 15:24:25,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 47535.2, 300 sec: 48097.2). Total num frames: 661094400. Throughput: 0: 11811.3. Samples: 165337088. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:25,767][1648981] Avg episode reward: [(0, '373.020')] [2024-06-15 15:24:25,784][1651669] Updated weights for policy 0, policy_version 322802 (0.0012) [2024-06-15 15:24:27,301][1651669] Updated weights for policy 0, policy_version 322864 (0.0146) [2024-06-15 15:24:29,639][1651669] Updated weights for policy 0, policy_version 322916 (0.0014) [2024-06-15 15:24:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48075.4, 300 sec: 48318.9). Total num frames: 661389312. Throughput: 0: 11741.9. Samples: 165390848. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:30,767][1648981] Avg episode reward: [(0, '378.810')] [2024-06-15 15:24:35,711][1651669] Updated weights for policy 0, policy_version 322997 (0.0014) [2024-06-15 15:24:35,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 45329.8, 300 sec: 47874.6). Total num frames: 661487616. Throughput: 0: 11673.6. Samples: 165473792. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:35,767][1648981] Avg episode reward: [(0, '379.970')] [2024-06-15 15:24:36,051][1651274] Signal inference workers to stop experience collection... (16950 times) [2024-06-15 15:24:36,117][1651669] InferenceWorker_p0-w0: stopping experience collection (16950 times) [2024-06-15 15:24:36,239][1651274] Signal inference workers to resume experience collection... (16950 times) [2024-06-15 15:24:36,240][1651669] InferenceWorker_p0-w0: resuming experience collection (16950 times) [2024-06-15 15:24:37,956][1651669] Updated weights for policy 0, policy_version 323088 (0.0109) [2024-06-15 15:24:40,122][1651669] Updated weights for policy 0, policy_version 323155 (0.0015) [2024-06-15 15:24:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48063.3, 300 sec: 48318.9). Total num frames: 661880832. Throughput: 0: 11594.0. Samples: 165495296. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:40,767][1648981] Avg episode reward: [(0, '375.370')] [2024-06-15 15:24:45,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 47985.7). Total num frames: 661913600. Throughput: 0: 11662.2. Samples: 165579776. Policy #0 lag: (min: 29.0, avg: 184.4, max: 285.0) [2024-06-15 15:24:45,767][1648981] Avg episode reward: [(0, '376.100')] [2024-06-15 15:24:45,897][1651669] Updated weights for policy 0, policy_version 323216 (0.0012) [2024-06-15 15:24:47,712][1651669] Updated weights for policy 0, policy_version 323280 (0.0012) [2024-06-15 15:24:49,200][1651669] Updated weights for policy 0, policy_version 323344 (0.0016) [2024-06-15 15:24:50,520][1651669] Updated weights for policy 0, policy_version 323392 (0.0012) [2024-06-15 15:24:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 662306816. Throughput: 0: 11616.7. Samples: 165637632. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:24:50,767][1648981] Avg episode reward: [(0, '377.900')] [2024-06-15 15:24:51,700][1651669] Updated weights for policy 0, policy_version 323451 (0.0103) [2024-06-15 15:24:55,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 44782.7, 300 sec: 47985.7). Total num frames: 662437888. Throughput: 0: 11742.0. Samples: 165677056. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:24:55,767][1648981] Avg episode reward: [(0, '368.910')] [2024-06-15 15:24:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000323456_662437888.pth... [2024-06-15 15:24:55,817][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000317824_650903552.pth [2024-06-15 15:24:58,356][1651669] Updated weights for policy 0, policy_version 323492 (0.0136) [2024-06-15 15:24:59,642][1651669] Updated weights for policy 0, policy_version 323552 (0.0013) [2024-06-15 15:25:00,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48059.7, 300 sec: 47874.8). Total num frames: 662700032. Throughput: 0: 11719.1. Samples: 165752320. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:00,767][1648981] Avg episode reward: [(0, '365.850')] [2024-06-15 15:25:01,248][1651669] Updated weights for policy 0, policy_version 323616 (0.0026) [2024-06-15 15:25:02,619][1651669] Updated weights for policy 0, policy_version 323669 (0.0036) [2024-06-15 15:25:05,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 662962176. Throughput: 0: 11821.5. Samples: 165824000. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:05,767][1648981] Avg episode reward: [(0, '351.030')] [2024-06-15 15:25:08,523][1651669] Updated weights for policy 0, policy_version 323728 (0.0013) [2024-06-15 15:25:10,363][1651669] Updated weights for policy 0, policy_version 323793 (0.0012) [2024-06-15 15:25:10,807][1648981] Fps is (10 sec: 45692.0, 60 sec: 46936.8, 300 sec: 47979.1). Total num frames: 663158784. Throughput: 0: 11776.9. Samples: 165867520. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:10,807][1648981] Avg episode reward: [(0, '349.990')] [2024-06-15 15:25:11,993][1651669] Updated weights for policy 0, policy_version 323872 (0.0012) [2024-06-15 15:25:12,098][1651274] Signal inference workers to stop experience collection... (17000 times) [2024-06-15 15:25:12,123][1651669] InferenceWorker_p0-w0: stopping experience collection (17000 times) [2024-06-15 15:25:12,457][1651274] Signal inference workers to resume experience collection... (17000 times) [2024-06-15 15:25:12,457][1651669] InferenceWorker_p0-w0: resuming experience collection (17000 times) [2024-06-15 15:25:14,143][1651669] Updated weights for policy 0, policy_version 323964 (0.0013) [2024-06-15 15:25:15,770][1648981] Fps is (10 sec: 52409.2, 60 sec: 46964.5, 300 sec: 48318.3). Total num frames: 663486464. Throughput: 0: 11797.8. Samples: 165921792. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:15,771][1648981] Avg episode reward: [(0, '337.400')] [2024-06-15 15:25:20,766][1648981] Fps is (10 sec: 42770.3, 60 sec: 45329.1, 300 sec: 47985.8). Total num frames: 663584768. Throughput: 0: 11707.7. Samples: 166000640. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:20,767][1648981] Avg episode reward: [(0, '325.750')] [2024-06-15 15:25:20,803][1651669] Updated weights for policy 0, policy_version 324021 (0.0113) [2024-06-15 15:25:22,097][1651669] Updated weights for policy 0, policy_version 324064 (0.0012) [2024-06-15 15:25:24,014][1651669] Updated weights for policy 0, policy_version 324129 (0.0012) [2024-06-15 15:25:25,260][1651669] Updated weights for policy 0, policy_version 324192 (0.0020) [2024-06-15 15:25:25,766][1648981] Fps is (10 sec: 49170.0, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 663977984. Throughput: 0: 11810.1. Samples: 166026752. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:25,767][1648981] Avg episode reward: [(0, '318.970')] [2024-06-15 15:25:30,278][1651669] Updated weights for policy 0, policy_version 324227 (0.0014) [2024-06-15 15:25:30,774][1648981] Fps is (10 sec: 45839.2, 60 sec: 44231.0, 300 sec: 47875.3). Total num frames: 664043520. Throughput: 0: 11762.6. Samples: 166109184. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:30,775][1648981] Avg episode reward: [(0, '314.750')] [2024-06-15 15:25:32,469][1651669] Updated weights for policy 0, policy_version 324326 (0.0030) [2024-06-15 15:25:34,779][1651669] Updated weights for policy 0, policy_version 324401 (0.0041) [2024-06-15 15:25:35,768][1648981] Fps is (10 sec: 49143.7, 60 sec: 49696.6, 300 sec: 48207.6). Total num frames: 664469504. Throughput: 0: 11684.5. Samples: 166163456. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:35,769][1648981] Avg episode reward: [(0, '325.750')] [2024-06-15 15:25:35,920][1651669] Updated weights for policy 0, policy_version 324454 (0.0019) [2024-06-15 15:25:40,766][1648981] Fps is (10 sec: 49190.6, 60 sec: 44236.8, 300 sec: 47985.7). Total num frames: 664535040. Throughput: 0: 11696.4. Samples: 166203392. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:40,767][1648981] Avg episode reward: [(0, '326.100')] [2024-06-15 15:25:42,470][1651669] Updated weights for policy 0, policy_version 324500 (0.0014) [2024-06-15 15:25:44,025][1651669] Updated weights for policy 0, policy_version 324562 (0.0013) [2024-06-15 15:25:45,361][1651669] Updated weights for policy 0, policy_version 324612 (0.0014) [2024-06-15 15:25:45,766][1648981] Fps is (10 sec: 36051.2, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 664829952. Throughput: 0: 11696.4. Samples: 166278656. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:45,767][1648981] Avg episode reward: [(0, '331.810')] [2024-06-15 15:25:46,840][1651669] Updated weights for policy 0, policy_version 324676 (0.0013) [2024-06-15 15:25:50,778][1648981] Fps is (10 sec: 52366.9, 60 sec: 45866.2, 300 sec: 47983.8). Total num frames: 665059328. Throughput: 0: 11636.4. Samples: 166347776. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:50,779][1648981] Avg episode reward: [(0, '320.480')] [2024-06-15 15:25:52,976][1651669] Updated weights for policy 0, policy_version 324739 (0.0015) [2024-06-15 15:25:53,315][1651274] Signal inference workers to stop experience collection... (17050 times) [2024-06-15 15:25:53,387][1651669] InferenceWorker_p0-w0: stopping experience collection (17050 times) [2024-06-15 15:25:53,645][1651274] Signal inference workers to resume experience collection... (17050 times) [2024-06-15 15:25:53,648][1651669] InferenceWorker_p0-w0: resuming experience collection (17050 times) [2024-06-15 15:25:54,477][1651669] Updated weights for policy 0, policy_version 324800 (0.0066) [2024-06-15 15:25:55,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.8, 300 sec: 47652.4). Total num frames: 665288704. Throughput: 0: 11752.3. Samples: 166395904. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:25:55,767][1648981] Avg episode reward: [(0, '318.620')] [2024-06-15 15:25:56,476][1651669] Updated weights for policy 0, policy_version 324880 (0.0014) [2024-06-15 15:25:58,747][1651669] Updated weights for policy 0, policy_version 324961 (0.0091) [2024-06-15 15:26:00,767][1648981] Fps is (10 sec: 52489.4, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 665583616. Throughput: 0: 11651.8. Samples: 166446080. Policy #0 lag: (min: 125.0, avg: 178.1, max: 383.0) [2024-06-15 15:26:00,767][1648981] Avg episode reward: [(0, '318.360')] [2024-06-15 15:26:03,815][1651669] Updated weights for policy 0, policy_version 324993 (0.0013) [2024-06-15 15:26:05,127][1651669] Updated weights for policy 0, policy_version 325044 (0.0011) [2024-06-15 15:26:05,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 46421.4, 300 sec: 47874.6). Total num frames: 665747456. Throughput: 0: 11787.4. Samples: 166531072. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:05,767][1648981] Avg episode reward: [(0, '313.860')] [2024-06-15 15:26:07,069][1651669] Updated weights for policy 0, policy_version 325120 (0.0129) [2024-06-15 15:26:08,919][1651669] Updated weights for policy 0, policy_version 325187 (0.0014) [2024-06-15 15:26:10,767][1648981] Fps is (10 sec: 52428.9, 60 sec: 49184.7, 300 sec: 47985.6). Total num frames: 666107904. Throughput: 0: 11684.9. Samples: 166552576. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:10,767][1648981] Avg episode reward: [(0, '320.510')] [2024-06-15 15:26:15,125][1651669] Updated weights for policy 0, policy_version 325254 (0.0012) [2024-06-15 15:26:15,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 44785.7, 300 sec: 47763.5). Total num frames: 666173440. Throughput: 0: 11687.0. Samples: 166635008. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:15,768][1648981] Avg episode reward: [(0, '320.060')] [2024-06-15 15:26:16,494][1651669] Updated weights for policy 0, policy_version 325305 (0.0014) [2024-06-15 15:26:17,783][1651669] Updated weights for policy 0, policy_version 325360 (0.0014) [2024-06-15 15:26:19,794][1651669] Updated weights for policy 0, policy_version 325440 (0.0257) [2024-06-15 15:26:20,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 666533888. Throughput: 0: 11833.4. Samples: 166695936. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:20,767][1648981] Avg episode reward: [(0, '332.580')] [2024-06-15 15:26:21,537][1651669] Updated weights for policy 0, policy_version 325499 (0.0012) [2024-06-15 15:26:25,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 44236.9, 300 sec: 47656.3). Total num frames: 666632192. Throughput: 0: 11764.6. Samples: 166732800. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:25,767][1648981] Avg episode reward: [(0, '314.430')] [2024-06-15 15:26:26,994][1651669] Updated weights for policy 0, policy_version 325562 (0.0018) [2024-06-15 15:26:28,824][1651274] Signal inference workers to stop experience collection... (17100 times) [2024-06-15 15:26:28,854][1651669] Updated weights for policy 0, policy_version 325617 (0.0013) [2024-06-15 15:26:28,868][1651669] InferenceWorker_p0-w0: stopping experience collection (17100 times) [2024-06-15 15:26:29,031][1651274] Signal inference workers to resume experience collection... (17100 times) [2024-06-15 15:26:29,032][1651669] InferenceWorker_p0-w0: resuming experience collection (17100 times) [2024-06-15 15:26:30,375][1651669] Updated weights for policy 0, policy_version 325691 (0.0012) [2024-06-15 15:26:30,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49704.6, 300 sec: 47652.4). Total num frames: 667025408. Throughput: 0: 11650.8. Samples: 166802944. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:30,767][1648981] Avg episode reward: [(0, '316.170')] [2024-06-15 15:26:32,048][1651669] Updated weights for policy 0, policy_version 325744 (0.0015) [2024-06-15 15:26:35,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 44784.2, 300 sec: 47652.4). Total num frames: 667156480. Throughput: 0: 11653.9. Samples: 166872064. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:35,768][1648981] Avg episode reward: [(0, '315.590')] [2024-06-15 15:26:38,023][1651669] Updated weights for policy 0, policy_version 325783 (0.0053) [2024-06-15 15:26:39,383][1651669] Updated weights for policy 0, policy_version 325840 (0.0015) [2024-06-15 15:26:40,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 667418624. Throughput: 0: 11605.3. Samples: 166918144. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:40,767][1648981] Avg episode reward: [(0, '327.120')] [2024-06-15 15:26:41,484][1651669] Updated weights for policy 0, policy_version 325920 (0.0014) [2024-06-15 15:26:43,077][1651669] Updated weights for policy 0, policy_version 325973 (0.0042) [2024-06-15 15:26:45,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 667680768. Throughput: 0: 11594.0. Samples: 166967808. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:45,767][1648981] Avg episode reward: [(0, '335.520')] [2024-06-15 15:26:49,494][1651669] Updated weights for policy 0, policy_version 326034 (0.0013) [2024-06-15 15:26:50,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 45884.2, 300 sec: 47097.1). Total num frames: 667811840. Throughput: 0: 11548.4. Samples: 167050752. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:50,767][1648981] Avg episode reward: [(0, '337.600')] [2024-06-15 15:26:51,022][1651669] Updated weights for policy 0, policy_version 326096 (0.0031) [2024-06-15 15:26:52,579][1651669] Updated weights for policy 0, policy_version 326162 (0.0013) [2024-06-15 15:26:53,512][1651669] Updated weights for policy 0, policy_version 326208 (0.0018) [2024-06-15 15:26:55,777][1648981] Fps is (10 sec: 52374.2, 60 sec: 48597.5, 300 sec: 47984.0). Total num frames: 668205056. Throughput: 0: 11636.8. Samples: 167076352. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:26:55,777][1648981] Avg episode reward: [(0, '327.900')] [2024-06-15 15:26:55,785][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000326272_668205056.pth... [2024-06-15 15:26:55,864][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000320640_656670720.pth [2024-06-15 15:26:59,593][1651669] Updated weights for policy 0, policy_version 326274 (0.0014) [2024-06-15 15:27:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45329.2, 300 sec: 47208.1). Total num frames: 668303360. Throughput: 0: 11616.7. Samples: 167157760. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:27:00,767][1648981] Avg episode reward: [(0, '327.740')] [2024-06-15 15:27:01,550][1651669] Updated weights for policy 0, policy_version 326352 (0.0074) [2024-06-15 15:27:03,778][1651669] Updated weights for policy 0, policy_version 326432 (0.0014) [2024-06-15 15:27:04,696][1651669] Updated weights for policy 0, policy_version 326467 (0.0092) [2024-06-15 15:27:05,768][1648981] Fps is (10 sec: 45915.3, 60 sec: 48604.4, 300 sec: 47763.9). Total num frames: 668663808. Throughput: 0: 11502.5. Samples: 167213568. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:27:05,769][1648981] Avg episode reward: [(0, '335.630')] [2024-06-15 15:27:06,238][1651669] Updated weights for policy 0, policy_version 326528 (0.0014) [2024-06-15 15:27:10,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 43690.8, 300 sec: 46986.0). Total num frames: 668729344. Throughput: 0: 11514.3. Samples: 167250944. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:27:10,767][1648981] Avg episode reward: [(0, '323.650')] [2024-06-15 15:27:11,017][1651274] Signal inference workers to stop experience collection... (17150 times) [2024-06-15 15:27:11,068][1651669] InferenceWorker_p0-w0: stopping experience collection (17150 times) [2024-06-15 15:27:11,295][1651274] Signal inference workers to resume experience collection... (17150 times) [2024-06-15 15:27:11,295][1651669] InferenceWorker_p0-w0: resuming experience collection (17150 times) [2024-06-15 15:27:11,795][1651669] Updated weights for policy 0, policy_version 326585 (0.0014) [2024-06-15 15:27:13,427][1651669] Updated weights for policy 0, policy_version 326640 (0.0021) [2024-06-15 15:27:13,858][1651669] Updated weights for policy 0, policy_version 326656 (0.0010) [2024-06-15 15:27:15,766][1648981] Fps is (10 sec: 45883.1, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 669122560. Throughput: 0: 11878.4. Samples: 167337472. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:27:15,767][1648981] Avg episode reward: [(0, '329.480')] [2024-06-15 15:27:15,797][1651669] Updated weights for policy 0, policy_version 326723 (0.0014) [2024-06-15 15:27:17,187][1651669] Updated weights for policy 0, policy_version 326780 (0.0013) [2024-06-15 15:27:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 47319.2). Total num frames: 669253632. Throughput: 0: 11889.8. Samples: 167407104. Policy #0 lag: (min: 31.0, avg: 99.3, max: 287.0) [2024-06-15 15:27:20,767][1648981] Avg episode reward: [(0, '325.390')] [2024-06-15 15:27:21,774][1651669] Updated weights for policy 0, policy_version 326840 (0.0090) [2024-06-15 15:27:23,237][1651669] Updated weights for policy 0, policy_version 326880 (0.0065) [2024-06-15 15:27:23,933][1651669] Updated weights for policy 0, policy_version 326912 (0.0020) [2024-06-15 15:27:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48605.8, 300 sec: 47324.3). Total num frames: 669548544. Throughput: 0: 11650.9. Samples: 167442432. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:25,767][1648981] Avg episode reward: [(0, '338.770')] [2024-06-15 15:27:27,355][1651669] Updated weights for policy 0, policy_version 327008 (0.0116) [2024-06-15 15:27:30,771][1648981] Fps is (10 sec: 52406.4, 60 sec: 45872.0, 300 sec: 47546.1). Total num frames: 669777920. Throughput: 0: 12059.3. Samples: 167510528. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:30,772][1648981] Avg episode reward: [(0, '323.000')] [2024-06-15 15:27:32,465][1651669] Updated weights for policy 0, policy_version 327072 (0.0013) [2024-06-15 15:27:33,164][1651669] Updated weights for policy 0, policy_version 327101 (0.0011) [2024-06-15 15:27:34,903][1651669] Updated weights for policy 0, policy_version 327161 (0.0012) [2024-06-15 15:27:35,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 47431.5). Total num frames: 670040064. Throughput: 0: 11912.5. Samples: 167586816. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:35,767][1648981] Avg episode reward: [(0, '318.340')] [2024-06-15 15:27:37,139][1651669] Updated weights for policy 0, policy_version 327201 (0.0105) [2024-06-15 15:27:38,798][1651669] Updated weights for policy 0, policy_version 327288 (0.0014) [2024-06-15 15:27:40,766][1648981] Fps is (10 sec: 52450.9, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 670302208. Throughput: 0: 12063.2. Samples: 167619072. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:40,767][1648981] Avg episode reward: [(0, '328.710')] [2024-06-15 15:27:42,838][1651669] Updated weights for policy 0, policy_version 327332 (0.0012) [2024-06-15 15:27:44,453][1651669] Updated weights for policy 0, policy_version 327376 (0.0014) [2024-06-15 15:27:45,596][1651669] Updated weights for policy 0, policy_version 327424 (0.0013) [2024-06-15 15:27:45,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 670564352. Throughput: 0: 11958.1. Samples: 167695872. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:45,767][1648981] Avg episode reward: [(0, '335.950')] [2024-06-15 15:27:49,242][1651669] Updated weights for policy 0, policy_version 327504 (0.0053) [2024-06-15 15:27:49,840][1651274] Signal inference workers to stop experience collection... (17200 times) [2024-06-15 15:27:49,910][1651669] InferenceWorker_p0-w0: stopping experience collection (17200 times) [2024-06-15 15:27:50,080][1651274] Signal inference workers to resume experience collection... (17200 times) [2024-06-15 15:27:50,081][1651669] InferenceWorker_p0-w0: resuming experience collection (17200 times) [2024-06-15 15:27:50,768][1648981] Fps is (10 sec: 52422.9, 60 sec: 50243.3, 300 sec: 47985.9). Total num frames: 670826496. Throughput: 0: 12174.4. Samples: 167761408. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:50,768][1648981] Avg episode reward: [(0, '342.040')] [2024-06-15 15:27:53,087][1651669] Updated weights for policy 0, policy_version 327561 (0.0013) [2024-06-15 15:27:55,647][1651669] Updated weights for policy 0, policy_version 327617 (0.0015) [2024-06-15 15:27:55,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 45883.2, 300 sec: 47097.1). Total num frames: 670957568. Throughput: 0: 12219.7. Samples: 167800832. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:27:55,767][1648981] Avg episode reward: [(0, '334.170')] [2024-06-15 15:27:56,944][1651669] Updated weights for policy 0, policy_version 327674 (0.0012) [2024-06-15 15:28:00,290][1651669] Updated weights for policy 0, policy_version 327728 (0.0012) [2024-06-15 15:28:00,767][1648981] Fps is (10 sec: 39325.3, 60 sec: 48605.7, 300 sec: 47542.7). Total num frames: 671219712. Throughput: 0: 11912.5. Samples: 167873536. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:00,769][1648981] Avg episode reward: [(0, '334.460')] [2024-06-15 15:28:01,869][1651669] Updated weights for policy 0, policy_version 327798 (0.0014) [2024-06-15 15:28:04,484][1651669] Updated weights for policy 0, policy_version 327844 (0.0014) [2024-06-15 15:28:05,098][1651669] Updated weights for policy 0, policy_version 327872 (0.0037) [2024-06-15 15:28:05,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 46968.9, 300 sec: 47319.2). Total num frames: 671481856. Throughput: 0: 11832.9. Samples: 167939584. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:05,767][1648981] Avg episode reward: [(0, '348.530')] [2024-06-15 15:28:07,688][1651669] Updated weights for policy 0, policy_version 327928 (0.0117) [2024-06-15 15:28:10,767][1648981] Fps is (10 sec: 42596.1, 60 sec: 48605.2, 300 sec: 47208.1). Total num frames: 671645696. Throughput: 0: 11821.3. Samples: 167974400. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:10,768][1648981] Avg episode reward: [(0, '350.900')] [2024-06-15 15:28:11,151][1651669] Updated weights for policy 0, policy_version 327972 (0.0013) [2024-06-15 15:28:12,509][1651669] Updated weights for policy 0, policy_version 328020 (0.0015) [2024-06-15 15:28:14,881][1651669] Updated weights for policy 0, policy_version 328068 (0.0015) [2024-06-15 15:28:15,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 47513.6, 300 sec: 47431.5). Total num frames: 671973376. Throughput: 0: 11936.4. Samples: 168047616. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:15,767][1648981] Avg episode reward: [(0, '369.450')] [2024-06-15 15:28:15,932][1651669] Updated weights for policy 0, policy_version 328121 (0.0012) [2024-06-15 15:28:18,552][1651669] Updated weights for policy 0, policy_version 328176 (0.0022) [2024-06-15 15:28:20,767][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.2, 300 sec: 47101.3). Total num frames: 672137216. Throughput: 0: 11901.0. Samples: 168122368. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:20,768][1648981] Avg episode reward: [(0, '375.830')] [2024-06-15 15:28:21,238][1651669] Updated weights for policy 0, policy_version 328208 (0.0012) [2024-06-15 15:28:22,651][1651669] Updated weights for policy 0, policy_version 328272 (0.0027) [2024-06-15 15:28:23,813][1651669] Updated weights for policy 0, policy_version 328318 (0.0014) [2024-06-15 15:28:25,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48605.9, 300 sec: 47322.3). Total num frames: 672464896. Throughput: 0: 11889.8. Samples: 168154112. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:25,767][1648981] Avg episode reward: [(0, '382.500')] [2024-06-15 15:28:26,189][1651669] Updated weights for policy 0, policy_version 328369 (0.0155) [2024-06-15 15:28:29,165][1651669] Updated weights for policy 0, policy_version 328416 (0.0013) [2024-06-15 15:28:30,766][1648981] Fps is (10 sec: 52432.5, 60 sec: 48063.2, 300 sec: 47097.2). Total num frames: 672661504. Throughput: 0: 11912.5. Samples: 168231936. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:30,767][1648981] Avg episode reward: [(0, '396.300')] [2024-06-15 15:28:31,196][1651669] Updated weights for policy 0, policy_version 328449 (0.0012) [2024-06-15 15:28:32,466][1651669] Updated weights for policy 0, policy_version 328512 (0.0096) [2024-06-15 15:28:33,423][1651274] Signal inference workers to stop experience collection... (17250 times) [2024-06-15 15:28:33,467][1651669] InferenceWorker_p0-w0: stopping experience collection (17250 times) [2024-06-15 15:28:33,622][1651274] Signal inference workers to resume experience collection... (17250 times) [2024-06-15 15:28:33,623][1651669] InferenceWorker_p0-w0: resuming experience collection (17250 times) [2024-06-15 15:28:34,203][1651669] Updated weights for policy 0, policy_version 328573 (0.0012) [2024-06-15 15:28:35,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48605.8, 300 sec: 47319.9). Total num frames: 672956416. Throughput: 0: 12140.4. Samples: 168307712. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 15:28:35,767][1648981] Avg episode reward: [(0, '379.530')] [2024-06-15 15:28:36,511][1651669] Updated weights for policy 0, policy_version 328625 (0.0013) [2024-06-15 15:28:39,616][1651669] Updated weights for policy 0, policy_version 328656 (0.0024) [2024-06-15 15:28:40,735][1651669] Updated weights for policy 0, policy_version 328701 (0.0012) [2024-06-15 15:28:40,782][1648981] Fps is (10 sec: 49074.6, 60 sec: 47501.2, 300 sec: 46983.5). Total num frames: 673153024. Throughput: 0: 11999.4. Samples: 168340992. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:28:40,783][1648981] Avg episode reward: [(0, '390.980')] [2024-06-15 15:28:43,390][1651669] Updated weights for policy 0, policy_version 328762 (0.0012) [2024-06-15 15:28:44,955][1651669] Updated weights for policy 0, policy_version 328803 (0.0095) [2024-06-15 15:28:45,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 48059.5, 300 sec: 47652.4). Total num frames: 673447936. Throughput: 0: 12037.7. Samples: 168415232. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:28:45,767][1648981] Avg episode reward: [(0, '402.540')] [2024-06-15 15:28:46,847][1651669] Updated weights for policy 0, policy_version 328849 (0.0013) [2024-06-15 15:28:50,414][1651669] Updated weights for policy 0, policy_version 328899 (0.0013) [2024-06-15 15:28:50,766][1648981] Fps is (10 sec: 45947.6, 60 sec: 46422.2, 300 sec: 46986.0). Total num frames: 673611776. Throughput: 0: 12117.3. Samples: 168484864. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:28:50,767][1648981] Avg episode reward: [(0, '407.940')] [2024-06-15 15:28:51,684][1651669] Updated weights for policy 0, policy_version 328960 (0.0137) [2024-06-15 15:28:55,615][1651669] Updated weights for policy 0, policy_version 329026 (0.0116) [2024-06-15 15:28:55,767][1648981] Fps is (10 sec: 39321.5, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 673841152. Throughput: 0: 12185.8. Samples: 168522752. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:28:55,767][1648981] Avg episode reward: [(0, '400.270')] [2024-06-15 15:28:56,358][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000329056_673906688.pth... [2024-06-15 15:28:56,486][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000323456_662437888.pth [2024-06-15 15:28:57,324][1651669] Updated weights for policy 0, policy_version 329088 (0.0012) [2024-06-15 15:28:58,844][1651669] Updated weights for policy 0, policy_version 329144 (0.0012) [2024-06-15 15:29:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 674103296. Throughput: 0: 12003.6. Samples: 168587776. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:00,767][1648981] Avg episode reward: [(0, '395.820')] [2024-06-15 15:29:01,852][1651669] Updated weights for policy 0, policy_version 329200 (0.0016) [2024-06-15 15:29:05,464][1651669] Updated weights for policy 0, policy_version 329252 (0.0012) [2024-06-15 15:29:05,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 47513.5, 300 sec: 47430.4). Total num frames: 674332672. Throughput: 0: 12003.7. Samples: 168662528. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:05,767][1648981] Avg episode reward: [(0, '387.730')] [2024-06-15 15:29:07,057][1651669] Updated weights for policy 0, policy_version 329297 (0.0070) [2024-06-15 15:29:08,056][1651669] Updated weights for policy 0, policy_version 329339 (0.0011) [2024-06-15 15:29:09,527][1651669] Updated weights for policy 0, policy_version 329408 (0.0013) [2024-06-15 15:29:10,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 49698.7, 300 sec: 47319.2). Total num frames: 674627584. Throughput: 0: 12060.4. Samples: 168696832. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:10,767][1648981] Avg episode reward: [(0, '378.610')] [2024-06-15 15:29:13,366][1651669] Updated weights for policy 0, policy_version 329464 (0.0092) [2024-06-15 15:29:15,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 47097.1). Total num frames: 674758656. Throughput: 0: 11923.9. Samples: 168768512. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:15,767][1648981] Avg episode reward: [(0, '375.390')] [2024-06-15 15:29:17,025][1651669] Updated weights for policy 0, policy_version 329520 (0.0066) [2024-06-15 15:29:17,705][1651669] Updated weights for policy 0, policy_version 329541 (0.0012) [2024-06-15 15:29:18,040][1651274] Signal inference workers to stop experience collection... (17300 times) [2024-06-15 15:29:18,212][1651669] InferenceWorker_p0-w0: stopping experience collection (17300 times) [2024-06-15 15:29:18,403][1651274] Signal inference workers to resume experience collection... (17300 times) [2024-06-15 15:29:18,404][1651669] InferenceWorker_p0-w0: resuming experience collection (17300 times) [2024-06-15 15:29:19,252][1651669] Updated weights for policy 0, policy_version 329600 (0.0012) [2024-06-15 15:29:20,647][1651669] Updated weights for policy 0, policy_version 329662 (0.0011) [2024-06-15 15:29:20,794][1648981] Fps is (10 sec: 52283.9, 60 sec: 50221.6, 300 sec: 47648.0). Total num frames: 675151872. Throughput: 0: 11814.2. Samples: 168839680. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:20,795][1648981] Avg episode reward: [(0, '377.510')] [2024-06-15 15:29:24,259][1651669] Updated weights for policy 0, policy_version 329716 (0.0012) [2024-06-15 15:29:25,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 46967.3, 300 sec: 47097.0). Total num frames: 675282944. Throughput: 0: 11871.1. Samples: 168875008. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:25,767][1648981] Avg episode reward: [(0, '372.980')] [2024-06-15 15:29:27,263][1651669] Updated weights for policy 0, policy_version 329744 (0.0050) [2024-06-15 15:29:29,076][1651669] Updated weights for policy 0, policy_version 329808 (0.0018) [2024-06-15 15:29:30,393][1651669] Updated weights for policy 0, policy_version 329860 (0.0025) [2024-06-15 15:29:30,766][1648981] Fps is (10 sec: 42717.3, 60 sec: 48605.9, 300 sec: 47763.5). Total num frames: 675577856. Throughput: 0: 11764.7. Samples: 168944640. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:30,767][1648981] Avg episode reward: [(0, '368.660')] [2024-06-15 15:29:31,653][1651669] Updated weights for policy 0, policy_version 329917 (0.0020) [2024-06-15 15:29:35,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 46421.4, 300 sec: 46986.0). Total num frames: 675741696. Throughput: 0: 11798.8. Samples: 169015808. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:35,767][1648981] Avg episode reward: [(0, '378.690')] [2024-06-15 15:29:36,332][1651669] Updated weights for policy 0, policy_version 329974 (0.0013) [2024-06-15 15:29:39,739][1651669] Updated weights for policy 0, policy_version 330045 (0.0011) [2024-06-15 15:29:40,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46979.8, 300 sec: 47652.4). Total num frames: 675971072. Throughput: 0: 11764.7. Samples: 169052160. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:40,767][1648981] Avg episode reward: [(0, '377.020')] [2024-06-15 15:29:41,257][1651669] Updated weights for policy 0, policy_version 330084 (0.0014) [2024-06-15 15:29:42,872][1651669] Updated weights for policy 0, policy_version 330160 (0.0012) [2024-06-15 15:29:45,708][1651669] Updated weights for policy 0, policy_version 330208 (0.0037) [2024-06-15 15:29:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.6, 300 sec: 47319.2). Total num frames: 676265984. Throughput: 0: 11901.2. Samples: 169123328. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:45,767][1648981] Avg episode reward: [(0, '376.130')] [2024-06-15 15:29:46,390][1651669] Updated weights for policy 0, policy_version 330240 (0.0014) [2024-06-15 15:29:49,999][1651669] Updated weights for policy 0, policy_version 330304 (0.0013) [2024-06-15 15:29:50,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 47513.4, 300 sec: 47541.4). Total num frames: 676462592. Throughput: 0: 11935.2. Samples: 169199616. Policy #0 lag: (min: 47.0, avg: 179.1, max: 303.0) [2024-06-15 15:29:50,768][1648981] Avg episode reward: [(0, '375.250')] [2024-06-15 15:29:52,229][1651669] Updated weights for policy 0, policy_version 330358 (0.0013) [2024-06-15 15:29:53,829][1651669] Updated weights for policy 0, policy_version 330422 (0.0017) [2024-06-15 15:29:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 676724736. Throughput: 0: 11946.7. Samples: 169234432. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:29:55,767][1648981] Avg episode reward: [(0, '373.830')] [2024-06-15 15:29:56,735][1651669] Updated weights for policy 0, policy_version 330469 (0.0026) [2024-06-15 15:29:57,292][1651669] Updated weights for policy 0, policy_version 330496 (0.0021) [2024-06-15 15:30:00,618][1651274] Signal inference workers to stop experience collection... (17350 times) [2024-06-15 15:30:00,648][1651669] InferenceWorker_p0-w0: stopping experience collection (17350 times) [2024-06-15 15:30:00,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 676921344. Throughput: 0: 11935.2. Samples: 169305600. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:00,767][1648981] Avg episode reward: [(0, '357.100')] [2024-06-15 15:30:00,825][1651274] Signal inference workers to resume experience collection... (17350 times) [2024-06-15 15:30:00,844][1651669] InferenceWorker_p0-w0: resuming experience collection (17350 times) [2024-06-15 15:30:02,157][1651669] Updated weights for policy 0, policy_version 330577 (0.0012) [2024-06-15 15:30:03,467][1651669] Updated weights for policy 0, policy_version 330645 (0.0011) [2024-06-15 15:30:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 47770.0). Total num frames: 677249024. Throughput: 0: 12056.5. Samples: 169381888. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:05,767][1648981] Avg episode reward: [(0, '360.120')] [2024-06-15 15:30:07,330][1651669] Updated weights for policy 0, policy_version 330736 (0.0015) [2024-06-15 15:30:10,767][1648981] Fps is (10 sec: 45873.2, 60 sec: 45874.9, 300 sec: 47097.6). Total num frames: 677380096. Throughput: 0: 12049.0. Samples: 169417216. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:10,768][1648981] Avg episode reward: [(0, '364.820')] [2024-06-15 15:30:11,631][1651669] Updated weights for policy 0, policy_version 330778 (0.0013) [2024-06-15 15:30:13,067][1651669] Updated weights for policy 0, policy_version 330851 (0.0012) [2024-06-15 15:30:14,579][1651669] Updated weights for policy 0, policy_version 330940 (0.0014) [2024-06-15 15:30:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.2, 300 sec: 48096.8). Total num frames: 677773312. Throughput: 0: 12083.2. Samples: 169488384. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:15,767][1648981] Avg episode reward: [(0, '360.120')] [2024-06-15 15:30:17,673][1651669] Updated weights for policy 0, policy_version 330992 (0.0013) [2024-06-15 15:30:20,766][1648981] Fps is (10 sec: 52431.8, 60 sec: 45896.5, 300 sec: 47208.2). Total num frames: 677904384. Throughput: 0: 12401.8. Samples: 169573888. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:20,767][1648981] Avg episode reward: [(0, '352.240')] [2024-06-15 15:30:22,198][1651669] Updated weights for policy 0, policy_version 331042 (0.0014) [2024-06-15 15:30:23,521][1651669] Updated weights for policy 0, policy_version 331107 (0.0099) [2024-06-15 15:30:24,899][1651669] Updated weights for policy 0, policy_version 331184 (0.0012) [2024-06-15 15:30:25,785][1648981] Fps is (10 sec: 52329.1, 60 sec: 50228.5, 300 sec: 48317.1). Total num frames: 678297600. Throughput: 0: 12294.2. Samples: 169605632. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:25,786][1648981] Avg episode reward: [(0, '364.890')] [2024-06-15 15:30:27,857][1651669] Updated weights for policy 0, policy_version 331234 (0.0019) [2024-06-15 15:30:30,794][1648981] Fps is (10 sec: 52283.0, 60 sec: 47491.5, 300 sec: 47315.0). Total num frames: 678428672. Throughput: 0: 12269.0. Samples: 169675776. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:30,797][1648981] Avg episode reward: [(0, '357.220')] [2024-06-15 15:30:32,498][1651669] Updated weights for policy 0, policy_version 331266 (0.0013) [2024-06-15 15:30:34,078][1651669] Updated weights for policy 0, policy_version 331328 (0.0011) [2024-06-15 15:30:35,271][1651669] Updated weights for policy 0, policy_version 331392 (0.0013) [2024-06-15 15:30:35,766][1648981] Fps is (10 sec: 42679.7, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 678723584. Throughput: 0: 12242.6. Samples: 169750528. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:35,767][1648981] Avg episode reward: [(0, '355.800')] [2024-06-15 15:30:36,295][1651669] Updated weights for policy 0, policy_version 331450 (0.0042) [2024-06-15 15:30:37,120][1651274] Signal inference workers to stop experience collection... (17400 times) [2024-06-15 15:30:37,147][1651669] InferenceWorker_p0-w0: stopping experience collection (17400 times) [2024-06-15 15:30:37,311][1651274] Signal inference workers to resume experience collection... (17400 times) [2024-06-15 15:30:37,312][1651669] InferenceWorker_p0-w0: resuming experience collection (17400 times) [2024-06-15 15:30:40,766][1648981] Fps is (10 sec: 52575.2, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 678952960. Throughput: 0: 12208.3. Samples: 169783808. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:40,767][1648981] Avg episode reward: [(0, '357.820')] [2024-06-15 15:30:42,926][1651669] Updated weights for policy 0, policy_version 331521 (0.0013) [2024-06-15 15:30:44,907][1651669] Updated weights for policy 0, policy_version 331601 (0.0013) [2024-06-15 15:30:45,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 48605.8, 300 sec: 47876.5). Total num frames: 679182336. Throughput: 0: 12310.8. Samples: 169859584. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:45,767][1648981] Avg episode reward: [(0, '354.060')] [2024-06-15 15:30:46,376][1651669] Updated weights for policy 0, policy_version 331650 (0.0012) [2024-06-15 15:30:47,237][1651669] Updated weights for policy 0, policy_version 331705 (0.0022) [2024-06-15 15:30:49,881][1651669] Updated weights for policy 0, policy_version 331776 (0.0014) [2024-06-15 15:30:50,768][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.3, 300 sec: 48096.7). Total num frames: 679477248. Throughput: 0: 12105.9. Samples: 169926656. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:50,768][1648981] Avg episode reward: [(0, '368.820')] [2024-06-15 15:30:55,748][1651669] Updated weights for policy 0, policy_version 331858 (0.0109) [2024-06-15 15:30:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48605.8, 300 sec: 47652.5). Total num frames: 679641088. Throughput: 0: 12265.4. Samples: 169969152. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:30:55,767][1648981] Avg episode reward: [(0, '367.130')] [2024-06-15 15:30:56,270][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000331888_679706624.pth... [2024-06-15 15:30:56,321][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000326272_668205056.pth [2024-06-15 15:30:56,326][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000331888_679706624.pth [2024-06-15 15:30:56,595][1651669] Updated weights for policy 0, policy_version 331901 (0.0012) [2024-06-15 15:30:57,943][1651669] Updated weights for policy 0, policy_version 331955 (0.0014) [2024-06-15 15:30:59,749][1651669] Updated weights for policy 0, policy_version 331986 (0.0012) [2024-06-15 15:31:00,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 51336.7, 300 sec: 48318.9). Total num frames: 680001536. Throughput: 0: 12310.8. Samples: 170042368. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:31:00,767][1648981] Avg episode reward: [(0, '372.300')] [2024-06-15 15:31:05,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 46421.3, 300 sec: 47208.2). Total num frames: 680034304. Throughput: 0: 12094.6. Samples: 170118144. Policy #0 lag: (min: 63.0, avg: 157.9, max: 303.0) [2024-06-15 15:31:05,767][1648981] Avg episode reward: [(0, '378.520')] [2024-06-15 15:31:06,178][1651669] Updated weights for policy 0, policy_version 332080 (0.0017) [2024-06-15 15:31:08,036][1651669] Updated weights for policy 0, policy_version 332160 (0.0012) [2024-06-15 15:31:10,774][1648981] Fps is (10 sec: 39290.7, 60 sec: 50238.2, 300 sec: 48206.6). Total num frames: 680394752. Throughput: 0: 11915.5. Samples: 170141696. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:10,775][1648981] Avg episode reward: [(0, '380.260')] [2024-06-15 15:31:11,094][1651669] Updated weights for policy 0, policy_version 332228 (0.0024) [2024-06-15 15:31:15,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 45875.1, 300 sec: 47430.3). Total num frames: 680525824. Throughput: 0: 11965.4. Samples: 170213888. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:15,767][1648981] Avg episode reward: [(0, '363.940')] [2024-06-15 15:31:16,715][1651669] Updated weights for policy 0, policy_version 332304 (0.0023) [2024-06-15 15:31:18,613][1651669] Updated weights for policy 0, policy_version 332384 (0.0126) [2024-06-15 15:31:19,953][1651274] Signal inference workers to stop experience collection... (17450 times) [2024-06-15 15:31:20,000][1651669] InferenceWorker_p0-w0: stopping experience collection (17450 times) [2024-06-15 15:31:20,304][1651274] Signal inference workers to resume experience collection... (17450 times) [2024-06-15 15:31:20,305][1651669] InferenceWorker_p0-w0: resuming experience collection (17450 times) [2024-06-15 15:31:20,778][1648981] Fps is (10 sec: 42581.8, 60 sec: 48596.4, 300 sec: 48094.8). Total num frames: 680820736. Throughput: 0: 11670.5. Samples: 170275840. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:20,779][1648981] Avg episode reward: [(0, '364.700')] [2024-06-15 15:31:21,370][1651669] Updated weights for policy 0, policy_version 332477 (0.0011) [2024-06-15 15:31:23,446][1651669] Updated weights for policy 0, policy_version 332534 (0.0023) [2024-06-15 15:31:25,786][1648981] Fps is (10 sec: 52328.4, 60 sec: 45875.0, 300 sec: 47538.3). Total num frames: 681050112. Throughput: 0: 11611.7. Samples: 170306560. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:25,786][1648981] Avg episode reward: [(0, '355.310')] [2024-06-15 15:31:28,227][1651669] Updated weights for policy 0, policy_version 332581 (0.0010) [2024-06-15 15:31:29,279][1651669] Updated weights for policy 0, policy_version 332624 (0.0021) [2024-06-15 15:31:30,384][1651669] Updated weights for policy 0, policy_version 332672 (0.0011) [2024-06-15 15:31:30,774][1648981] Fps is (10 sec: 52449.4, 60 sec: 48622.1, 300 sec: 48095.5). Total num frames: 681345024. Throughput: 0: 11819.5. Samples: 170391552. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:30,775][1648981] Avg episode reward: [(0, '357.020')] [2024-06-15 15:31:31,799][1651669] Updated weights for policy 0, policy_version 332726 (0.0121) [2024-06-15 15:31:32,681][1651669] Updated weights for policy 0, policy_version 332752 (0.0010) [2024-06-15 15:31:35,769][1648981] Fps is (10 sec: 52518.3, 60 sec: 47511.8, 300 sec: 47985.3). Total num frames: 681574400. Throughput: 0: 11821.0. Samples: 170458624. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:35,769][1648981] Avg episode reward: [(0, '365.770')] [2024-06-15 15:31:38,453][1651669] Updated weights for policy 0, policy_version 332816 (0.0012) [2024-06-15 15:31:40,728][1651669] Updated weights for policy 0, policy_version 332880 (0.0012) [2024-06-15 15:31:40,774][1648981] Fps is (10 sec: 39321.4, 60 sec: 46415.3, 300 sec: 47651.2). Total num frames: 681738240. Throughput: 0: 11796.7. Samples: 170500096. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:40,775][1648981] Avg episode reward: [(0, '374.200')] [2024-06-15 15:31:42,232][1651669] Updated weights for policy 0, policy_version 332947 (0.0014) [2024-06-15 15:31:44,370][1651669] Updated weights for policy 0, policy_version 333026 (0.0111) [2024-06-15 15:31:45,766][1648981] Fps is (10 sec: 52440.7, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 682098688. Throughput: 0: 11491.5. Samples: 170559488. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:45,767][1648981] Avg episode reward: [(0, '372.690')] [2024-06-15 15:31:49,625][1651669] Updated weights for policy 0, policy_version 333088 (0.0012) [2024-06-15 15:31:50,766][1648981] Fps is (10 sec: 49190.7, 60 sec: 45875.3, 300 sec: 47543.0). Total num frames: 682229760. Throughput: 0: 11776.0. Samples: 170648064. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:50,767][1648981] Avg episode reward: [(0, '353.360')] [2024-06-15 15:31:51,172][1651669] Updated weights for policy 0, policy_version 333152 (0.0031) [2024-06-15 15:31:53,060][1651669] Updated weights for policy 0, policy_version 333202 (0.0013) [2024-06-15 15:31:54,832][1651669] Updated weights for policy 0, policy_version 333280 (0.0014) [2024-06-15 15:31:55,778][1648981] Fps is (10 sec: 52366.7, 60 sec: 49688.4, 300 sec: 48539.1). Total num frames: 682622976. Throughput: 0: 12013.9. Samples: 170682368. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:31:55,779][1648981] Avg episode reward: [(0, '362.630')] [2024-06-15 15:31:59,441][1651669] Updated weights for policy 0, policy_version 333313 (0.0012) [2024-06-15 15:32:00,497][1651669] Updated weights for policy 0, policy_version 333369 (0.0013) [2024-06-15 15:32:00,762][1651274] Signal inference workers to stop experience collection... (17500 times) [2024-06-15 15:32:00,767][1648981] Fps is (10 sec: 52425.9, 60 sec: 45874.7, 300 sec: 47763.7). Total num frames: 682754048. Throughput: 0: 12253.7. Samples: 170765312. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:32:00,768][1648981] Avg episode reward: [(0, '345.110')] [2024-06-15 15:32:00,812][1651669] InferenceWorker_p0-w0: stopping experience collection (17500 times) [2024-06-15 15:32:01,002][1651274] Signal inference workers to resume experience collection... (17500 times) [2024-06-15 15:32:01,003][1651669] InferenceWorker_p0-w0: resuming experience collection (17500 times) [2024-06-15 15:32:01,923][1651669] Updated weights for policy 0, policy_version 333440 (0.0017) [2024-06-15 15:32:04,190][1651669] Updated weights for policy 0, policy_version 333507 (0.0012) [2024-06-15 15:32:05,647][1651669] Updated weights for policy 0, policy_version 333568 (0.0013) [2024-06-15 15:32:05,766][1648981] Fps is (10 sec: 52491.2, 60 sec: 51882.7, 300 sec: 48874.3). Total num frames: 683147264. Throughput: 0: 12279.8. Samples: 170828288. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:32:05,767][1648981] Avg episode reward: [(0, '344.850')] [2024-06-15 15:32:10,782][1648981] Fps is (10 sec: 39261.0, 60 sec: 45869.0, 300 sec: 47538.8). Total num frames: 683147264. Throughput: 0: 12459.6. Samples: 170867200. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:32:10,783][1648981] Avg episode reward: [(0, '360.220')] [2024-06-15 15:32:11,617][1651669] Updated weights for policy 0, policy_version 333627 (0.0012) [2024-06-15 15:32:13,885][1651669] Updated weights for policy 0, policy_version 333712 (0.0013) [2024-06-15 15:32:15,786][1648981] Fps is (10 sec: 42514.3, 60 sec: 50773.8, 300 sec: 48537.8). Total num frames: 683573248. Throughput: 0: 12136.9. Samples: 170937856. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:32:15,787][1648981] Avg episode reward: [(0, '352.640')] [2024-06-15 15:32:15,830][1651669] Updated weights for policy 0, policy_version 333781 (0.0011) [2024-06-15 15:32:20,766][1648981] Fps is (10 sec: 52513.1, 60 sec: 47522.9, 300 sec: 47874.6). Total num frames: 683671552. Throughput: 0: 12220.4. Samples: 171008512. Policy #0 lag: (min: 143.0, avg: 198.4, max: 351.0) [2024-06-15 15:32:20,767][1648981] Avg episode reward: [(0, '347.150')] [2024-06-15 15:32:22,494][1651669] Updated weights for policy 0, policy_version 333858 (0.0017) [2024-06-15 15:32:23,952][1651669] Updated weights for policy 0, policy_version 333936 (0.0013) [2024-06-15 15:32:25,387][1651669] Updated weights for policy 0, policy_version 333969 (0.0013) [2024-06-15 15:32:25,766][1648981] Fps is (10 sec: 42682.7, 60 sec: 49167.8, 300 sec: 48208.5). Total num frames: 683999232. Throughput: 0: 12199.1. Samples: 171048960. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:25,767][1648981] Avg episode reward: [(0, '341.260')] [2024-06-15 15:32:27,866][1651669] Updated weights for policy 0, policy_version 334069 (0.0013) [2024-06-15 15:32:30,777][1648981] Fps is (10 sec: 52371.8, 60 sec: 47511.2, 300 sec: 47983.9). Total num frames: 684195840. Throughput: 0: 12273.7. Samples: 171111936. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:30,778][1648981] Avg episode reward: [(0, '340.530')] [2024-06-15 15:32:33,361][1651669] Updated weights for policy 0, policy_version 334101 (0.0013) [2024-06-15 15:32:34,870][1651669] Updated weights for policy 0, policy_version 334176 (0.0012) [2024-06-15 15:32:35,802][1648981] Fps is (10 sec: 45711.3, 60 sec: 48032.9, 300 sec: 47979.9). Total num frames: 684457984. Throughput: 0: 12050.9. Samples: 171190784. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:35,803][1648981] Avg episode reward: [(0, '344.950')] [2024-06-15 15:32:36,618][1651669] Updated weights for policy 0, policy_version 334243 (0.0021) [2024-06-15 15:32:38,656][1651669] Updated weights for policy 0, policy_version 334325 (0.0013) [2024-06-15 15:32:40,766][1648981] Fps is (10 sec: 52485.4, 60 sec: 49704.6, 300 sec: 47985.7). Total num frames: 684720128. Throughput: 0: 11790.5. Samples: 171212800. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:40,767][1648981] Avg episode reward: [(0, '339.440')] [2024-06-15 15:32:44,045][1651274] Signal inference workers to stop experience collection... (17550 times) [2024-06-15 15:32:44,086][1651669] InferenceWorker_p0-w0: stopping experience collection (17550 times) [2024-06-15 15:32:44,257][1651274] Signal inference workers to resume experience collection... (17550 times) [2024-06-15 15:32:44,258][1651669] InferenceWorker_p0-w0: resuming experience collection (17550 times) [2024-06-15 15:32:45,208][1651669] Updated weights for policy 0, policy_version 334394 (0.0012) [2024-06-15 15:32:45,772][1648981] Fps is (10 sec: 39438.8, 60 sec: 45870.5, 300 sec: 47540.6). Total num frames: 684851200. Throughput: 0: 11774.5. Samples: 171295232. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:45,773][1648981] Avg episode reward: [(0, '335.460')] [2024-06-15 15:32:47,004][1651669] Updated weights for policy 0, policy_version 334448 (0.0014) [2024-06-15 15:32:49,450][1651669] Updated weights for policy 0, policy_version 334545 (0.0123) [2024-06-15 15:32:50,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 50244.1, 300 sec: 48429.9). Total num frames: 685244416. Throughput: 0: 11548.4. Samples: 171347968. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:50,767][1648981] Avg episode reward: [(0, '335.810')] [2024-06-15 15:32:55,640][1651669] Updated weights for policy 0, policy_version 334597 (0.0020) [2024-06-15 15:32:55,774][1648981] Fps is (10 sec: 39317.4, 60 sec: 43694.0, 300 sec: 47540.2). Total num frames: 685244416. Throughput: 0: 11698.6. Samples: 171393536. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:32:55,774][1648981] Avg episode reward: [(0, '362.670')] [2024-06-15 15:32:56,144][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000334624_685309952.pth... [2024-06-15 15:32:56,249][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000329056_673906688.pth [2024-06-15 15:32:56,786][1651669] Updated weights for policy 0, policy_version 334656 (0.0011) [2024-06-15 15:32:58,218][1651669] Updated weights for policy 0, policy_version 334704 (0.0011) [2024-06-15 15:32:59,311][1651669] Updated weights for policy 0, policy_version 334752 (0.0027) [2024-06-15 15:33:00,767][1648981] Fps is (10 sec: 42599.1, 60 sec: 48606.2, 300 sec: 48096.7). Total num frames: 685670400. Throughput: 0: 11781.1. Samples: 171467776. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:00,767][1648981] Avg episode reward: [(0, '357.590')] [2024-06-15 15:33:01,024][1651669] Updated weights for policy 0, policy_version 334817 (0.0103) [2024-06-15 15:33:05,766][1648981] Fps is (10 sec: 52466.6, 60 sec: 43690.6, 300 sec: 47874.7). Total num frames: 685768704. Throughput: 0: 11810.1. Samples: 171539968. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:05,767][1648981] Avg episode reward: [(0, '357.510')] [2024-06-15 15:33:07,359][1651669] Updated weights for policy 0, policy_version 334880 (0.0014) [2024-06-15 15:33:09,255][1651669] Updated weights for policy 0, policy_version 334947 (0.0102) [2024-06-15 15:33:10,648][1651669] Updated weights for policy 0, policy_version 334993 (0.0016) [2024-06-15 15:33:10,772][1648981] Fps is (10 sec: 39299.0, 60 sec: 48614.1, 300 sec: 47762.6). Total num frames: 686063616. Throughput: 0: 11592.4. Samples: 171570688. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:10,773][1648981] Avg episode reward: [(0, '341.700')] [2024-06-15 15:33:12,537][1651669] Updated weights for policy 0, policy_version 335072 (0.0010) [2024-06-15 15:33:15,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 45343.9, 300 sec: 47985.8). Total num frames: 686292992. Throughput: 0: 11505.7. Samples: 171629568. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:15,767][1648981] Avg episode reward: [(0, '355.670')] [2024-06-15 15:33:19,429][1651669] Updated weights for policy 0, policy_version 335120 (0.0012) [2024-06-15 15:33:20,770][1648981] Fps is (10 sec: 36052.4, 60 sec: 45872.3, 300 sec: 47318.6). Total num frames: 686424064. Throughput: 0: 11465.6. Samples: 171706368. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:20,771][1648981] Avg episode reward: [(0, '343.110')] [2024-06-15 15:33:20,847][1651669] Updated weights for policy 0, policy_version 335170 (0.0011) [2024-06-15 15:33:22,398][1651274] Signal inference workers to stop experience collection... (17600 times) [2024-06-15 15:33:22,464][1651669] InferenceWorker_p0-w0: stopping experience collection (17600 times) [2024-06-15 15:33:22,636][1651274] Signal inference workers to resume experience collection... (17600 times) [2024-06-15 15:33:22,637][1651669] InferenceWorker_p0-w0: resuming experience collection (17600 times) [2024-06-15 15:33:23,239][1651669] Updated weights for policy 0, policy_version 335264 (0.0077) [2024-06-15 15:33:24,844][1651669] Updated weights for policy 0, policy_version 335329 (0.0026) [2024-06-15 15:33:25,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 686817280. Throughput: 0: 11434.7. Samples: 171727360. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:25,767][1648981] Avg episode reward: [(0, '365.520')] [2024-06-15 15:33:30,766][1648981] Fps is (10 sec: 39336.3, 60 sec: 43698.5, 300 sec: 46986.0). Total num frames: 686817280. Throughput: 0: 11276.9. Samples: 171802624. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:30,767][1648981] Avg episode reward: [(0, '360.500')] [2024-06-15 15:33:31,751][1651669] Updated weights for policy 0, policy_version 335381 (0.0012) [2024-06-15 15:33:33,429][1651669] Updated weights for policy 0, policy_version 335442 (0.0011) [2024-06-15 15:33:35,177][1651669] Updated weights for policy 0, policy_version 335505 (0.0012) [2024-06-15 15:33:35,766][1648981] Fps is (10 sec: 36045.0, 60 sec: 45356.2, 300 sec: 47543.9). Total num frames: 687177728. Throughput: 0: 11343.7. Samples: 171858432. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:35,767][1648981] Avg episode reward: [(0, '371.990')] [2024-06-15 15:33:37,002][1651669] Updated weights for policy 0, policy_version 335587 (0.0012) [2024-06-15 15:33:40,774][1648981] Fps is (10 sec: 52388.1, 60 sec: 43685.0, 300 sec: 47095.8). Total num frames: 687341568. Throughput: 0: 11070.4. Samples: 171891712. Policy #0 lag: (min: 12.0, avg: 140.7, max: 268.0) [2024-06-15 15:33:40,775][1648981] Avg episode reward: [(0, '376.260')] [2024-06-15 15:33:43,429][1651669] Updated weights for policy 0, policy_version 335617 (0.0011) [2024-06-15 15:33:45,476][1651669] Updated weights for policy 0, policy_version 335696 (0.0152) [2024-06-15 15:33:45,766][1648981] Fps is (10 sec: 32768.1, 60 sec: 44241.4, 300 sec: 47097.1). Total num frames: 687505408. Throughput: 0: 11218.5. Samples: 171972608. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:33:45,767][1648981] Avg episode reward: [(0, '378.340')] [2024-06-15 15:33:47,068][1651669] Updated weights for policy 0, policy_version 335762 (0.0011) [2024-06-15 15:33:48,619][1651669] Updated weights for policy 0, policy_version 335825 (0.0026) [2024-06-15 15:33:50,766][1648981] Fps is (10 sec: 52469.6, 60 sec: 43690.9, 300 sec: 47541.4). Total num frames: 687865856. Throughput: 0: 10843.0. Samples: 172027904. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:33:50,767][1648981] Avg episode reward: [(0, '386.690')] [2024-06-15 15:33:55,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 44788.3, 300 sec: 46874.9). Total num frames: 687931392. Throughput: 0: 11037.9. Samples: 172067328. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:33:55,767][1648981] Avg episode reward: [(0, '370.220')] [2024-06-15 15:33:56,103][1651669] Updated weights for policy 0, policy_version 335920 (0.0013) [2024-06-15 15:33:58,934][1651669] Updated weights for policy 0, policy_version 336029 (0.0196) [2024-06-15 15:34:00,297][1651274] Signal inference workers to stop experience collection... (17650 times) [2024-06-15 15:34:00,331][1651669] InferenceWorker_p0-w0: stopping experience collection (17650 times) [2024-06-15 15:34:00,334][1651669] Updated weights for policy 0, policy_version 336082 (0.0010) [2024-06-15 15:34:00,636][1651274] Signal inference workers to resume experience collection... (17650 times) [2024-06-15 15:34:00,638][1651669] InferenceWorker_p0-w0: resuming experience collection (17650 times) [2024-06-15 15:34:00,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 44236.8, 300 sec: 47430.3). Total num frames: 688324608. Throughput: 0: 10865.8. Samples: 172118528. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:00,767][1648981] Avg episode reward: [(0, '376.920')] [2024-06-15 15:34:01,284][1651669] Updated weights for policy 0, policy_version 336128 (0.0013) [2024-06-15 15:34:05,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 43687.9, 300 sec: 46652.2). Total num frames: 688390144. Throughput: 0: 10831.6. Samples: 172193792. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:05,771][1648981] Avg episode reward: [(0, '388.660')] [2024-06-15 15:34:08,955][1651669] Updated weights for policy 0, policy_version 336193 (0.0013) [2024-06-15 15:34:10,390][1651669] Updated weights for policy 0, policy_version 336256 (0.0012) [2024-06-15 15:34:10,766][1648981] Fps is (10 sec: 32768.5, 60 sec: 43148.8, 300 sec: 47097.1). Total num frames: 688652288. Throughput: 0: 11161.6. Samples: 172229632. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:10,767][1648981] Avg episode reward: [(0, '386.570')] [2024-06-15 15:34:11,704][1651669] Updated weights for policy 0, policy_version 336305 (0.0021) [2024-06-15 15:34:13,344][1651669] Updated weights for policy 0, policy_version 336382 (0.0013) [2024-06-15 15:34:15,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 43690.8, 300 sec: 46657.1). Total num frames: 688914432. Throughput: 0: 10831.7. Samples: 172290048. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:15,767][1648981] Avg episode reward: [(0, '383.770')] [2024-06-15 15:34:19,959][1651669] Updated weights for policy 0, policy_version 336448 (0.0020) [2024-06-15 15:34:20,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 44785.8, 300 sec: 46874.9). Total num frames: 689111040. Throughput: 0: 11241.3. Samples: 172364288. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:20,767][1648981] Avg episode reward: [(0, '380.630')] [2024-06-15 15:34:21,222][1651669] Updated weights for policy 0, policy_version 336499 (0.0012) [2024-06-15 15:34:23,579][1651669] Updated weights for policy 0, policy_version 336594 (0.0010) [2024-06-15 15:34:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 43690.7, 300 sec: 46986.0). Total num frames: 689438720. Throughput: 0: 11038.3. Samples: 172388352. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:25,767][1648981] Avg episode reward: [(0, '386.450')] [2024-06-15 15:34:30,145][1651669] Updated weights for policy 0, policy_version 336656 (0.0012) [2024-06-15 15:34:30,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 45329.1, 300 sec: 46763.8). Total num frames: 689537024. Throughput: 0: 11070.6. Samples: 172470784. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:30,767][1648981] Avg episode reward: [(0, '372.780')] [2024-06-15 15:34:31,392][1651669] Updated weights for policy 0, policy_version 336720 (0.0015) [2024-06-15 15:34:33,480][1651669] Updated weights for policy 0, policy_version 336800 (0.0218) [2024-06-15 15:34:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 689930240. Throughput: 0: 11002.3. Samples: 172523008. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:35,767][1648981] Avg episode reward: [(0, '371.580')] [2024-06-15 15:34:35,804][1651669] Updated weights for policy 0, policy_version 336888 (0.0229) [2024-06-15 15:34:40,795][1648981] Fps is (10 sec: 42476.8, 60 sec: 43675.5, 300 sec: 46426.1). Total num frames: 689963008. Throughput: 0: 10949.8. Samples: 172560384. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:40,796][1648981] Avg episode reward: [(0, '349.090')] [2024-06-15 15:34:42,195][1651669] Updated weights for policy 0, policy_version 336929 (0.0013) [2024-06-15 15:34:42,945][1651274] Signal inference workers to stop experience collection... (17700 times) [2024-06-15 15:34:42,984][1651669] InferenceWorker_p0-w0: stopping experience collection (17700 times) [2024-06-15 15:34:43,188][1651274] Signal inference workers to resume experience collection... (17700 times) [2024-06-15 15:34:43,189][1651669] InferenceWorker_p0-w0: resuming experience collection (17700 times) [2024-06-15 15:34:43,574][1651669] Updated weights for policy 0, policy_version 336992 (0.0100) [2024-06-15 15:34:45,628][1651669] Updated weights for policy 0, policy_version 337057 (0.0012) [2024-06-15 15:34:45,766][1648981] Fps is (10 sec: 36044.7, 60 sec: 46421.3, 300 sec: 46874.9). Total num frames: 690290688. Throughput: 0: 11571.2. Samples: 172639232. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:45,767][1648981] Avg episode reward: [(0, '347.780')] [2024-06-15 15:34:47,689][1651669] Updated weights for policy 0, policy_version 337139 (0.0012) [2024-06-15 15:34:50,766][1648981] Fps is (10 sec: 52579.1, 60 sec: 43690.7, 300 sec: 46652.7). Total num frames: 690487296. Throughput: 0: 11299.1. Samples: 172702208. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:50,767][1648981] Avg episode reward: [(0, '338.770')] [2024-06-15 15:34:54,238][1651669] Updated weights for policy 0, policy_version 337201 (0.0012) [2024-06-15 15:34:55,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 46421.0, 300 sec: 46763.8). Total num frames: 690716672. Throughput: 0: 11423.2. Samples: 172743680. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:34:55,767][1648981] Avg episode reward: [(0, '339.790')] [2024-06-15 15:34:55,845][1651669] Updated weights for policy 0, policy_version 337280 (0.0012) [2024-06-15 15:34:56,246][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000337296_690782208.pth... [2024-06-15 15:34:56,392][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000331888_679706624.pth [2024-06-15 15:34:58,130][1651669] Updated weights for policy 0, policy_version 337361 (0.0013) [2024-06-15 15:35:00,767][1648981] Fps is (10 sec: 52424.1, 60 sec: 44782.3, 300 sec: 46652.6). Total num frames: 691011584. Throughput: 0: 11229.6. Samples: 172795392. Policy #0 lag: (min: 15.0, avg: 68.6, max: 271.0) [2024-06-15 15:35:00,768][1648981] Avg episode reward: [(0, '340.480')] [2024-06-15 15:35:05,143][1651669] Updated weights for policy 0, policy_version 337410 (0.0013) [2024-06-15 15:35:05,766][1648981] Fps is (10 sec: 36046.2, 60 sec: 44785.8, 300 sec: 46430.7). Total num frames: 691077120. Throughput: 0: 11514.3. Samples: 172882432. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:05,767][1648981] Avg episode reward: [(0, '334.180')] [2024-06-15 15:35:06,291][1651669] Updated weights for policy 0, policy_version 337472 (0.0013) [2024-06-15 15:35:07,764][1651669] Updated weights for policy 0, policy_version 337536 (0.0012) [2024-06-15 15:35:09,647][1651669] Updated weights for policy 0, policy_version 337603 (0.0014) [2024-06-15 15:35:10,766][1648981] Fps is (10 sec: 49156.2, 60 sec: 47513.5, 300 sec: 46541.7). Total num frames: 691503104. Throughput: 0: 11650.8. Samples: 172912640. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:10,767][1648981] Avg episode reward: [(0, '328.890')] [2024-06-15 15:35:10,931][1651669] Updated weights for policy 0, policy_version 337664 (0.0013) [2024-06-15 15:35:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 46208.4). Total num frames: 691535872. Throughput: 0: 11446.0. Samples: 172985856. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:15,767][1648981] Avg episode reward: [(0, '326.030')] [2024-06-15 15:35:16,991][1651669] Updated weights for policy 0, policy_version 337728 (0.0012) [2024-06-15 15:35:18,461][1651669] Updated weights for policy 0, policy_version 337792 (0.0013) [2024-06-15 15:35:18,966][1651274] Signal inference workers to stop experience collection... (17750 times) [2024-06-15 15:35:19,008][1651669] InferenceWorker_p0-w0: stopping experience collection (17750 times) [2024-06-15 15:35:19,185][1651274] Signal inference workers to resume experience collection... (17750 times) [2024-06-15 15:35:19,186][1651669] InferenceWorker_p0-w0: resuming experience collection (17750 times) [2024-06-15 15:35:20,162][1651669] Updated weights for policy 0, policy_version 337858 (0.0020) [2024-06-15 15:35:20,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 46322.5). Total num frames: 691961856. Throughput: 0: 11639.5. Samples: 173046784. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:20,767][1648981] Avg episode reward: [(0, '319.060')] [2024-06-15 15:35:25,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 43690.6, 300 sec: 46212.8). Total num frames: 692060160. Throughput: 0: 11646.8. Samples: 173084160. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:25,767][1648981] Avg episode reward: [(0, '337.240')] [2024-06-15 15:35:26,941][1651669] Updated weights for policy 0, policy_version 337936 (0.0013) [2024-06-15 15:35:28,351][1651669] Updated weights for policy 0, policy_version 337986 (0.0087) [2024-06-15 15:35:29,799][1651669] Updated weights for policy 0, policy_version 338064 (0.0012) [2024-06-15 15:35:30,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 46430.6). Total num frames: 692420608. Throughput: 0: 11559.8. Samples: 173159424. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:30,767][1648981] Avg episode reward: [(0, '340.700')] [2024-06-15 15:35:31,368][1651669] Updated weights for policy 0, policy_version 338128 (0.0013) [2024-06-15 15:35:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 44236.7, 300 sec: 46208.4). Total num frames: 692584448. Throughput: 0: 11673.6. Samples: 173227520. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:35,767][1648981] Avg episode reward: [(0, '351.220')] [2024-06-15 15:35:38,261][1651669] Updated weights for policy 0, policy_version 338177 (0.0012) [2024-06-15 15:35:39,641][1651669] Updated weights for policy 0, policy_version 338227 (0.0038) [2024-06-15 15:35:40,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 47536.3, 300 sec: 46208.4). Total num frames: 692813824. Throughput: 0: 11855.7. Samples: 173277184. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:40,767][1648981] Avg episode reward: [(0, '361.080')] [2024-06-15 15:35:41,048][1651669] Updated weights for policy 0, policy_version 338304 (0.0011) [2024-06-15 15:35:42,374][1651669] Updated weights for policy 0, policy_version 338368 (0.0012) [2024-06-15 15:35:43,663][1651669] Updated weights for policy 0, policy_version 338430 (0.0012) [2024-06-15 15:35:45,767][1648981] Fps is (10 sec: 52425.3, 60 sec: 46966.9, 300 sec: 46208.3). Total num frames: 693108736. Throughput: 0: 12071.9. Samples: 173338624. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:45,768][1648981] Avg episode reward: [(0, '359.850')] [2024-06-15 15:35:49,604][1651669] Updated weights for policy 0, policy_version 338468 (0.0014) [2024-06-15 15:35:50,773][1648981] Fps is (10 sec: 42571.0, 60 sec: 45870.3, 300 sec: 46096.4). Total num frames: 693239808. Throughput: 0: 11967.7. Samples: 173421056. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:50,773][1648981] Avg episode reward: [(0, '353.210')] [2024-06-15 15:35:51,758][1651669] Updated weights for policy 0, policy_version 338544 (0.0044) [2024-06-15 15:35:52,922][1651669] Updated weights for policy 0, policy_version 338594 (0.0106) [2024-06-15 15:35:54,387][1651274] Signal inference workers to stop experience collection... (17800 times) [2024-06-15 15:35:54,430][1651669] InferenceWorker_p0-w0: stopping experience collection (17800 times) [2024-06-15 15:35:54,581][1651274] Signal inference workers to resume experience collection... (17800 times) [2024-06-15 15:35:54,582][1651669] InferenceWorker_p0-w0: resuming experience collection (17800 times) [2024-06-15 15:35:54,623][1651669] Updated weights for policy 0, policy_version 338673 (0.0146) [2024-06-15 15:35:55,798][1648981] Fps is (10 sec: 52266.1, 60 sec: 48580.4, 300 sec: 46203.4). Total num frames: 693633024. Throughput: 0: 11835.9. Samples: 173445632. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:35:55,799][1648981] Avg episode reward: [(0, '353.610')] [2024-06-15 15:35:59,812][1651669] Updated weights for policy 0, policy_version 338697 (0.0011) [2024-06-15 15:36:00,766][1648981] Fps is (10 sec: 49183.7, 60 sec: 45329.8, 300 sec: 46430.6). Total num frames: 693731328. Throughput: 0: 12071.8. Samples: 173529088. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:36:00,767][1648981] Avg episode reward: [(0, '352.710')] [2024-06-15 15:36:01,065][1651669] Updated weights for policy 0, policy_version 338753 (0.0010) [2024-06-15 15:36:02,889][1651669] Updated weights for policy 0, policy_version 338818 (0.0012) [2024-06-15 15:36:04,301][1651669] Updated weights for policy 0, policy_version 338880 (0.0046) [2024-06-15 15:36:05,767][1648981] Fps is (10 sec: 49308.3, 60 sec: 50790.3, 300 sec: 46542.9). Total num frames: 694124544. Throughput: 0: 11878.4. Samples: 173581312. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:36:05,767][1648981] Avg episode reward: [(0, '352.650')] [2024-06-15 15:36:05,772][1651669] Updated weights for policy 0, policy_version 338941 (0.0014) [2024-06-15 15:36:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 46319.5). Total num frames: 694190080. Throughput: 0: 12060.5. Samples: 173626880. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:36:10,767][1648981] Avg episode reward: [(0, '359.780')] [2024-06-15 15:36:11,367][1651669] Updated weights for policy 0, policy_version 338995 (0.0012) [2024-06-15 15:36:13,118][1651669] Updated weights for policy 0, policy_version 339072 (0.0034) [2024-06-15 15:36:15,199][1651669] Updated weights for policy 0, policy_version 339138 (0.0014) [2024-06-15 15:36:15,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 50790.4, 300 sec: 46654.6). Total num frames: 694583296. Throughput: 0: 11844.3. Samples: 173692416. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:36:15,767][1648981] Avg episode reward: [(0, '350.320')] [2024-06-15 15:36:20,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 45329.1, 300 sec: 46211.5). Total num frames: 694681600. Throughput: 0: 11935.3. Samples: 173764608. Policy #0 lag: (min: 15.0, avg: 57.1, max: 234.0) [2024-06-15 15:36:20,767][1648981] Avg episode reward: [(0, '366.120')] [2024-06-15 15:36:22,064][1651669] Updated weights for policy 0, policy_version 339201 (0.0014) [2024-06-15 15:36:23,824][1651669] Updated weights for policy 0, policy_version 339265 (0.0011) [2024-06-15 15:36:25,332][1651669] Updated weights for policy 0, policy_version 339332 (0.0012) [2024-06-15 15:36:25,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 48606.0, 300 sec: 46209.7). Total num frames: 694976512. Throughput: 0: 11719.1. Samples: 173804544. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:25,767][1648981] Avg episode reward: [(0, '350.150')] [2024-06-15 15:36:26,841][1651669] Updated weights for policy 0, policy_version 339392 (0.0039) [2024-06-15 15:36:28,368][1651669] Updated weights for policy 0, policy_version 339450 (0.0013) [2024-06-15 15:36:30,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 46421.1, 300 sec: 46208.7). Total num frames: 695205888. Throughput: 0: 11707.8. Samples: 173865472. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:30,767][1648981] Avg episode reward: [(0, '354.100')] [2024-06-15 15:36:33,986][1651669] Updated weights for policy 0, policy_version 339481 (0.0013) [2024-06-15 15:36:35,263][1651274] Signal inference workers to stop experience collection... (17850 times) [2024-06-15 15:36:35,304][1651669] InferenceWorker_p0-w0: stopping experience collection (17850 times) [2024-06-15 15:36:35,305][1651669] Updated weights for policy 0, policy_version 339540 (0.0012) [2024-06-15 15:36:35,469][1651274] Signal inference workers to resume experience collection... (17850 times) [2024-06-15 15:36:35,470][1651669] InferenceWorker_p0-w0: resuming experience collection (17850 times) [2024-06-15 15:36:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 47513.7, 300 sec: 46431.8). Total num frames: 695435264. Throughput: 0: 11629.8. Samples: 173944320. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:35,767][1648981] Avg episode reward: [(0, '328.440')] [2024-06-15 15:36:36,984][1651669] Updated weights for policy 0, policy_version 339616 (0.0088) [2024-06-15 15:36:38,970][1651669] Updated weights for policy 0, policy_version 339685 (0.0018) [2024-06-15 15:36:40,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 48605.8, 300 sec: 46208.4). Total num frames: 695730176. Throughput: 0: 11647.7. Samples: 173969408. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:40,767][1648981] Avg episode reward: [(0, '348.860')] [2024-06-15 15:36:45,161][1651669] Updated weights for policy 0, policy_version 339744 (0.0023) [2024-06-15 15:36:45,766][1648981] Fps is (10 sec: 39321.0, 60 sec: 45329.6, 300 sec: 46097.4). Total num frames: 695828480. Throughput: 0: 11582.6. Samples: 174050304. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:45,767][1648981] Avg episode reward: [(0, '355.900')] [2024-06-15 15:36:46,557][1651669] Updated weights for policy 0, policy_version 339808 (0.0011) [2024-06-15 15:36:48,102][1651669] Updated weights for policy 0, policy_version 339874 (0.0012) [2024-06-15 15:36:49,575][1651669] Updated weights for policy 0, policy_version 339936 (0.0013) [2024-06-15 15:36:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50249.6, 300 sec: 46210.3). Total num frames: 696254464. Throughput: 0: 11707.8. Samples: 174108160. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:50,767][1648981] Avg episode reward: [(0, '339.580')] [2024-06-15 15:36:55,767][1648981] Fps is (10 sec: 42598.2, 60 sec: 43713.8, 300 sec: 45764.2). Total num frames: 696254464. Throughput: 0: 11593.9. Samples: 174148608. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:36:55,767][1648981] Avg episode reward: [(0, '338.270')] [2024-06-15 15:36:56,047][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000339984_696287232.pth... [2024-06-15 15:36:56,164][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000334624_685309952.pth [2024-06-15 15:36:56,340][1651669] Updated weights for policy 0, policy_version 340000 (0.0013) [2024-06-15 15:36:57,888][1651669] Updated weights for policy 0, policy_version 340064 (0.0104) [2024-06-15 15:36:59,555][1651669] Updated weights for policy 0, policy_version 340115 (0.0017) [2024-06-15 15:37:00,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 48606.0, 300 sec: 45764.1). Total num frames: 696647680. Throughput: 0: 11685.0. Samples: 174218240. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:00,767][1648981] Avg episode reward: [(0, '341.050')] [2024-06-15 15:37:01,817][1651669] Updated weights for policy 0, policy_version 340208 (0.0037) [2024-06-15 15:37:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 46210.9). Total num frames: 696778752. Throughput: 0: 11616.7. Samples: 174287360. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:05,767][1648981] Avg episode reward: [(0, '344.960')] [2024-06-15 15:37:07,795][1651669] Updated weights for policy 0, policy_version 340272 (0.0014) [2024-06-15 15:37:08,896][1651669] Updated weights for policy 0, policy_version 340323 (0.0014) [2024-06-15 15:37:09,577][1651669] Updated weights for policy 0, policy_version 340352 (0.0011) [2024-06-15 15:37:10,767][1648981] Fps is (10 sec: 39320.2, 60 sec: 47513.5, 300 sec: 45656.1). Total num frames: 697040896. Throughput: 0: 11480.1. Samples: 174321152. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:10,767][1648981] Avg episode reward: [(0, '338.600')] [2024-06-15 15:37:12,248][1651274] Signal inference workers to stop experience collection... (17900 times) [2024-06-15 15:37:12,285][1651669] Updated weights for policy 0, policy_version 340418 (0.0014) [2024-06-15 15:37:12,309][1651669] InferenceWorker_p0-w0: stopping experience collection (17900 times) [2024-06-15 15:37:12,489][1651274] Signal inference workers to resume experience collection... (17900 times) [2024-06-15 15:37:12,489][1651669] InferenceWorker_p0-w0: resuming experience collection (17900 times) [2024-06-15 15:37:13,701][1651669] Updated weights for policy 0, policy_version 340475 (0.0021) [2024-06-15 15:37:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45329.0, 300 sec: 46208.4). Total num frames: 697303040. Throughput: 0: 11628.2. Samples: 174388736. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:15,767][1648981] Avg episode reward: [(0, '330.330')] [2024-06-15 15:37:19,071][1651669] Updated weights for policy 0, policy_version 340544 (0.0073) [2024-06-15 15:37:20,774][1648981] Fps is (10 sec: 52388.8, 60 sec: 48053.4, 300 sec: 45985.1). Total num frames: 697565184. Throughput: 0: 11387.1. Samples: 174456832. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:20,775][1648981] Avg episode reward: [(0, '320.780')] [2024-06-15 15:37:22,324][1651669] Updated weights for policy 0, policy_version 340624 (0.0152) [2024-06-15 15:37:23,895][1651669] Updated weights for policy 0, policy_version 340688 (0.0095) [2024-06-15 15:37:25,165][1651669] Updated weights for policy 0, policy_version 340733 (0.0010) [2024-06-15 15:37:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47513.5, 300 sec: 46210.1). Total num frames: 697827328. Throughput: 0: 11594.0. Samples: 174491136. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:25,767][1648981] Avg episode reward: [(0, '313.060')] [2024-06-15 15:37:29,982][1651669] Updated weights for policy 0, policy_version 340768 (0.0015) [2024-06-15 15:37:30,766][1648981] Fps is (10 sec: 39352.5, 60 sec: 45875.5, 300 sec: 45769.7). Total num frames: 697958400. Throughput: 0: 11457.4. Samples: 174565888. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:30,767][1648981] Avg episode reward: [(0, '317.920')] [2024-06-15 15:37:30,993][1651669] Updated weights for policy 0, policy_version 340819 (0.0016) [2024-06-15 15:37:32,009][1651669] Updated weights for policy 0, policy_version 340864 (0.0012) [2024-06-15 15:37:35,152][1651669] Updated weights for policy 0, policy_version 340960 (0.0100) [2024-06-15 15:37:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 46097.4). Total num frames: 698318848. Throughput: 0: 11525.7. Samples: 174626816. Policy #0 lag: (min: 15.0, avg: 68.2, max: 271.0) [2024-06-15 15:37:35,767][1648981] Avg episode reward: [(0, '304.930')] [2024-06-15 15:37:40,671][1651669] Updated weights for policy 0, policy_version 341008 (0.0013) [2024-06-15 15:37:40,774][1648981] Fps is (10 sec: 42565.3, 60 sec: 44231.1, 300 sec: 45874.9). Total num frames: 698384384. Throughput: 0: 11637.5. Samples: 174672384. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:37:40,775][1648981] Avg episode reward: [(0, '319.090')] [2024-06-15 15:37:42,951][1651669] Updated weights for policy 0, policy_version 341108 (0.0012) [2024-06-15 15:37:44,595][1651669] Updated weights for policy 0, policy_version 341139 (0.0011) [2024-06-15 15:37:45,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48059.8, 300 sec: 45653.1). Total num frames: 698712064. Throughput: 0: 11582.5. Samples: 174739456. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:37:45,767][1648981] Avg episode reward: [(0, '332.370')] [2024-06-15 15:37:46,179][1651669] Updated weights for policy 0, policy_version 341205 (0.0011) [2024-06-15 15:37:50,766][1648981] Fps is (10 sec: 49190.5, 60 sec: 43690.7, 300 sec: 46209.6). Total num frames: 698875904. Throughput: 0: 11980.8. Samples: 174826496. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:37:50,767][1648981] Avg episode reward: [(0, '337.410')] [2024-06-15 15:37:50,909][1651669] Updated weights for policy 0, policy_version 341264 (0.0012) [2024-06-15 15:37:52,934][1651669] Updated weights for policy 0, policy_version 341329 (0.0012) [2024-06-15 15:37:53,330][1651274] Signal inference workers to stop experience collection... (17950 times) [2024-06-15 15:37:53,426][1651669] InferenceWorker_p0-w0: stopping experience collection (17950 times) [2024-06-15 15:37:53,602][1651274] Signal inference workers to resume experience collection... (17950 times) [2024-06-15 15:37:53,603][1651669] InferenceWorker_p0-w0: resuming experience collection (17950 times) [2024-06-15 15:37:54,123][1651669] Updated weights for policy 0, policy_version 341385 (0.0012) [2024-06-15 15:37:55,766][1648981] Fps is (10 sec: 58982.5, 60 sec: 50790.5, 300 sec: 46208.5). Total num frames: 699301888. Throughput: 0: 11969.5. Samples: 174859776. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:37:55,767][1648981] Avg episode reward: [(0, '347.190')] [2024-06-15 15:37:55,826][1651669] Updated weights for policy 0, policy_version 341457 (0.0153) [2024-06-15 15:37:56,808][1651669] Updated weights for policy 0, policy_version 341503 (0.0011) [2024-06-15 15:38:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.1, 300 sec: 46208.4). Total num frames: 699400192. Throughput: 0: 11969.4. Samples: 174927360. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:00,767][1648981] Avg episode reward: [(0, '347.230')] [2024-06-15 15:38:02,405][1651669] Updated weights for policy 0, policy_version 341541 (0.0011) [2024-06-15 15:38:03,933][1651669] Updated weights for policy 0, policy_version 341603 (0.0015) [2024-06-15 15:38:05,480][1651669] Updated weights for policy 0, policy_version 341668 (0.0039) [2024-06-15 15:38:05,768][1648981] Fps is (10 sec: 45869.5, 60 sec: 49697.2, 300 sec: 46431.3). Total num frames: 699760640. Throughput: 0: 12130.5. Samples: 175002624. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:05,768][1648981] Avg episode reward: [(0, '335.710')] [2024-06-15 15:38:07,171][1651669] Updated weights for policy 0, policy_version 341733 (0.0036) [2024-06-15 15:38:10,768][1648981] Fps is (10 sec: 52418.9, 60 sec: 48058.4, 300 sec: 46208.2). Total num frames: 699924480. Throughput: 0: 11991.7. Samples: 175030784. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:10,769][1648981] Avg episode reward: [(0, '350.880')] [2024-06-15 15:38:12,691][1651669] Updated weights for policy 0, policy_version 341780 (0.0014) [2024-06-15 15:38:14,460][1651669] Updated weights for policy 0, policy_version 341857 (0.0012) [2024-06-15 15:38:15,147][1651669] Updated weights for policy 0, policy_version 341888 (0.0030) [2024-06-15 15:38:15,766][1648981] Fps is (10 sec: 42603.4, 60 sec: 48059.7, 300 sec: 46653.3). Total num frames: 700186624. Throughput: 0: 12185.6. Samples: 175114240. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:15,767][1648981] Avg episode reward: [(0, '364.160')] [2024-06-15 15:38:17,427][1651669] Updated weights for policy 0, policy_version 341968 (0.0017) [2024-06-15 15:38:18,332][1651669] Updated weights for policy 0, policy_version 342009 (0.0011) [2024-06-15 15:38:20,767][1648981] Fps is (10 sec: 52438.2, 60 sec: 48065.9, 300 sec: 46208.4). Total num frames: 700448768. Throughput: 0: 12299.3. Samples: 175180288. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:20,767][1648981] Avg episode reward: [(0, '366.270')] [2024-06-15 15:38:24,304][1651669] Updated weights for policy 0, policy_version 342064 (0.0012) [2024-06-15 15:38:25,768][1648981] Fps is (10 sec: 45866.3, 60 sec: 46965.9, 300 sec: 46874.6). Total num frames: 700645376. Throughput: 0: 12369.2. Samples: 175228928. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:25,769][1648981] Avg episode reward: [(0, '362.930')] [2024-06-15 15:38:25,861][1651669] Updated weights for policy 0, policy_version 342128 (0.0013) [2024-06-15 15:38:27,501][1651669] Updated weights for policy 0, policy_version 342200 (0.0217) [2024-06-15 15:38:28,363][1651274] Signal inference workers to stop experience collection... (18000 times) [2024-06-15 15:38:28,426][1651669] InferenceWorker_p0-w0: stopping experience collection (18000 times) [2024-06-15 15:38:28,456][1651669] Updated weights for policy 0, policy_version 342228 (0.0011) [2024-06-15 15:38:28,685][1651274] Signal inference workers to resume experience collection... (18000 times) [2024-06-15 15:38:28,686][1651669] InferenceWorker_p0-w0: resuming experience collection (18000 times) [2024-06-15 15:38:30,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.1, 300 sec: 46763.8). Total num frames: 700973056. Throughput: 0: 12174.2. Samples: 175287296. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:30,768][1648981] Avg episode reward: [(0, '372.040')] [2024-06-15 15:38:34,844][1651669] Updated weights for policy 0, policy_version 342291 (0.0137) [2024-06-15 15:38:35,766][1648981] Fps is (10 sec: 42606.6, 60 sec: 45875.1, 300 sec: 46542.9). Total num frames: 701071360. Throughput: 0: 11969.4. Samples: 175365120. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:35,767][1648981] Avg episode reward: [(0, '375.710')] [2024-06-15 15:38:36,168][1651669] Updated weights for policy 0, policy_version 342338 (0.0026) [2024-06-15 15:38:37,516][1651669] Updated weights for policy 0, policy_version 342404 (0.0012) [2024-06-15 15:38:38,869][1651669] Updated weights for policy 0, policy_version 342463 (0.0027) [2024-06-15 15:38:40,510][1651669] Updated weights for policy 0, policy_version 342524 (0.0151) [2024-06-15 15:38:40,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 51889.3, 300 sec: 47430.3). Total num frames: 701497344. Throughput: 0: 11810.1. Samples: 175391232. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:40,767][1648981] Avg episode reward: [(0, '373.000')] [2024-06-15 15:38:45,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 46208.4). Total num frames: 701497344. Throughput: 0: 11958.0. Samples: 175465472. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:45,767][1648981] Avg episode reward: [(0, '359.980')] [2024-06-15 15:38:46,940][1651669] Updated weights for policy 0, policy_version 342579 (0.0014) [2024-06-15 15:38:48,479][1651669] Updated weights for policy 0, policy_version 342629 (0.0012) [2024-06-15 15:38:50,741][1651669] Updated weights for policy 0, policy_version 342707 (0.0011) [2024-06-15 15:38:50,782][1648981] Fps is (10 sec: 35988.5, 60 sec: 49685.0, 300 sec: 47205.6). Total num frames: 701857792. Throughput: 0: 11692.6. Samples: 175528960. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:50,783][1648981] Avg episode reward: [(0, '351.530')] [2024-06-15 15:38:52,123][1651669] Updated weights for policy 0, policy_version 342775 (0.0012) [2024-06-15 15:38:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 45329.0, 300 sec: 46430.6). Total num frames: 702021632. Throughput: 0: 11856.1. Samples: 175564288. Policy #0 lag: (min: 6.0, avg: 77.2, max: 262.0) [2024-06-15 15:38:55,767][1648981] Avg episode reward: [(0, '354.020')] [2024-06-15 15:38:55,830][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000342784_702021632.pth... [2024-06-15 15:38:55,880][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000337296_690782208.pth [2024-06-15 15:38:57,761][1651669] Updated weights for policy 0, policy_version 342816 (0.0095) [2024-06-15 15:38:58,968][1651669] Updated weights for policy 0, policy_version 342849 (0.0012) [2024-06-15 15:39:00,163][1651669] Updated weights for policy 0, policy_version 342907 (0.0018) [2024-06-15 15:39:00,767][1648981] Fps is (10 sec: 45947.1, 60 sec: 48605.8, 300 sec: 47208.7). Total num frames: 702316544. Throughput: 0: 11696.3. Samples: 175640576. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:00,767][1648981] Avg episode reward: [(0, '365.080')] [2024-06-15 15:39:02,291][1651669] Updated weights for policy 0, policy_version 342992 (0.0012) [2024-06-15 15:39:05,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46422.3, 300 sec: 47097.0). Total num frames: 702545920. Throughput: 0: 11844.3. Samples: 175713280. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:05,767][1648981] Avg episode reward: [(0, '363.190')] [2024-06-15 15:39:07,757][1651669] Updated weights for policy 0, policy_version 343041 (0.0011) [2024-06-15 15:39:08,843][1651669] Updated weights for policy 0, policy_version 343102 (0.0095) [2024-06-15 15:39:10,770][1648981] Fps is (10 sec: 45858.0, 60 sec: 47512.0, 300 sec: 46985.4). Total num frames: 702775296. Throughput: 0: 11570.7. Samples: 175749632. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:10,771][1648981] Avg episode reward: [(0, '355.810')] [2024-06-15 15:39:10,854][1651669] Updated weights for policy 0, policy_version 343168 (0.0118) [2024-06-15 15:39:11,655][1651274] Signal inference workers to stop experience collection... (18050 times) [2024-06-15 15:39:11,715][1651669] InferenceWorker_p0-w0: stopping experience collection (18050 times) [2024-06-15 15:39:11,917][1651274] Signal inference workers to resume experience collection... (18050 times) [2024-06-15 15:39:11,919][1651669] InferenceWorker_p0-w0: resuming experience collection (18050 times) [2024-06-15 15:39:13,532][1651669] Updated weights for policy 0, policy_version 343248 (0.0048) [2024-06-15 15:39:15,773][1648981] Fps is (10 sec: 52395.2, 60 sec: 48054.6, 300 sec: 47318.2). Total num frames: 703070208. Throughput: 0: 11512.7. Samples: 175805440. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:15,774][1648981] Avg episode reward: [(0, '356.810')] [2024-06-15 15:39:18,892][1651669] Updated weights for policy 0, policy_version 343297 (0.0012) [2024-06-15 15:39:20,312][1651669] Updated weights for policy 0, policy_version 343357 (0.0011) [2024-06-15 15:39:20,766][1648981] Fps is (10 sec: 45893.2, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 703234048. Throughput: 0: 11662.2. Samples: 175889920. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:20,767][1648981] Avg episode reward: [(0, '365.840')] [2024-06-15 15:39:21,858][1651669] Updated weights for policy 0, policy_version 343424 (0.0011) [2024-06-15 15:39:25,126][1651669] Updated weights for policy 0, policy_version 343509 (0.0240) [2024-06-15 15:39:25,766][1648981] Fps is (10 sec: 49183.5, 60 sec: 48607.4, 300 sec: 47541.4). Total num frames: 703561728. Throughput: 0: 11832.9. Samples: 175923712. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:25,767][1648981] Avg episode reward: [(0, '378.480')] [2024-06-15 15:39:29,183][1651669] Updated weights for policy 0, policy_version 343576 (0.0138) [2024-06-15 15:39:29,961][1651669] Updated weights for policy 0, policy_version 343615 (0.0012) [2024-06-15 15:39:30,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 45875.4, 300 sec: 46763.8). Total num frames: 703725568. Throughput: 0: 11787.4. Samples: 175995904. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:30,767][1648981] Avg episode reward: [(0, '372.060')] [2024-06-15 15:39:32,876][1651669] Updated weights for policy 0, policy_version 343675 (0.0013) [2024-06-15 15:39:35,452][1651669] Updated weights for policy 0, policy_version 343744 (0.0012) [2024-06-15 15:39:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49152.0, 300 sec: 47657.1). Total num frames: 704020480. Throughput: 0: 11916.7. Samples: 176065024. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:35,767][1648981] Avg episode reward: [(0, '376.990')] [2024-06-15 15:39:36,961][1651669] Updated weights for policy 0, policy_version 343800 (0.0012) [2024-06-15 15:39:39,791][1651669] Updated weights for policy 0, policy_version 343825 (0.0010) [2024-06-15 15:39:40,774][1648981] Fps is (10 sec: 52388.0, 60 sec: 45869.4, 300 sec: 47318.0). Total num frames: 704249856. Throughput: 0: 11990.2. Samples: 176103936. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:40,776][1648981] Avg episode reward: [(0, '369.590')] [2024-06-15 15:39:42,900][1651669] Updated weights for policy 0, policy_version 343889 (0.0048) [2024-06-15 15:39:45,766][1648981] Fps is (10 sec: 36045.0, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 704380928. Throughput: 0: 11855.7. Samples: 176174080. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:45,767][1648981] Avg episode reward: [(0, '363.880')] [2024-06-15 15:39:46,113][1651669] Updated weights for policy 0, policy_version 343968 (0.0013) [2024-06-15 15:39:47,337][1651669] Updated weights for policy 0, policy_version 344016 (0.0012) [2024-06-15 15:39:50,766][1648981] Fps is (10 sec: 45911.0, 60 sec: 47526.2, 300 sec: 47430.4). Total num frames: 704708608. Throughput: 0: 11719.1. Samples: 176240640. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:50,767][1648981] Avg episode reward: [(0, '367.180')] [2024-06-15 15:39:51,012][1651669] Updated weights for policy 0, policy_version 344123 (0.0081) [2024-06-15 15:39:54,149][1651274] Signal inference workers to stop experience collection... (18100 times) [2024-06-15 15:39:54,200][1651669] InferenceWorker_p0-w0: stopping experience collection (18100 times) [2024-06-15 15:39:54,452][1651274] Signal inference workers to resume experience collection... (18100 times) [2024-06-15 15:39:54,453][1651669] InferenceWorker_p0-w0: resuming experience collection (18100 times) [2024-06-15 15:39:55,095][1651669] Updated weights for policy 0, policy_version 344163 (0.0012) [2024-06-15 15:39:55,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 48056.8, 300 sec: 47096.6). Total num frames: 704905216. Throughput: 0: 11878.4. Samples: 176284160. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:39:55,771][1648981] Avg episode reward: [(0, '359.660')] [2024-06-15 15:39:57,097][1651669] Updated weights for policy 0, policy_version 344208 (0.0155) [2024-06-15 15:39:58,273][1651669] Updated weights for policy 0, policy_version 344259 (0.0017) [2024-06-15 15:40:00,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 47513.7, 300 sec: 47763.5). Total num frames: 705167360. Throughput: 0: 12005.3. Samples: 176345600. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:40:00,767][1648981] Avg episode reward: [(0, '359.970')] [2024-06-15 15:40:01,182][1651669] Updated weights for policy 0, policy_version 344336 (0.0072) [2024-06-15 15:40:05,272][1651669] Updated weights for policy 0, policy_version 344400 (0.0012) [2024-06-15 15:40:05,770][1648981] Fps is (10 sec: 45875.6, 60 sec: 46964.6, 300 sec: 46985.4). Total num frames: 705363968. Throughput: 0: 11979.8. Samples: 176429056. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:40:05,771][1648981] Avg episode reward: [(0, '363.820')] [2024-06-15 15:40:08,293][1651669] Updated weights for policy 0, policy_version 344465 (0.0012) [2024-06-15 15:40:09,790][1651669] Updated weights for policy 0, policy_version 344528 (0.0012) [2024-06-15 15:40:10,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48062.8, 300 sec: 47874.6). Total num frames: 705658880. Throughput: 0: 11844.3. Samples: 176456704. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:40:10,767][1648981] Avg episode reward: [(0, '356.040')] [2024-06-15 15:40:12,480][1651669] Updated weights for policy 0, policy_version 344577 (0.0015) [2024-06-15 15:40:13,756][1651669] Updated weights for policy 0, policy_version 344635 (0.0012) [2024-06-15 15:40:15,767][1648981] Fps is (10 sec: 45891.2, 60 sec: 45879.9, 300 sec: 46985.9). Total num frames: 705822720. Throughput: 0: 11662.1. Samples: 176520704. Policy #0 lag: (min: 15.0, avg: 86.3, max: 271.0) [2024-06-15 15:40:15,767][1648981] Avg episode reward: [(0, '361.500')] [2024-06-15 15:40:18,059][1651669] Updated weights for policy 0, policy_version 344703 (0.0014) [2024-06-15 15:40:20,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 706052096. Throughput: 0: 11741.9. Samples: 176593408. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:20,767][1648981] Avg episode reward: [(0, '366.110')] [2024-06-15 15:40:21,039][1651669] Updated weights for policy 0, policy_version 344773 (0.0014) [2024-06-15 15:40:22,104][1651669] Updated weights for policy 0, policy_version 344821 (0.0012) [2024-06-15 15:40:23,844][1651669] Updated weights for policy 0, policy_version 344864 (0.0011) [2024-06-15 15:40:25,768][1648981] Fps is (10 sec: 52420.3, 60 sec: 46419.9, 300 sec: 47207.9). Total num frames: 706347008. Throughput: 0: 11629.6. Samples: 176627200. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:25,769][1648981] Avg episode reward: [(0, '352.740')] [2024-06-15 15:40:27,731][1651669] Updated weights for policy 0, policy_version 344903 (0.0012) [2024-06-15 15:40:30,231][1651669] Updated weights for policy 0, policy_version 344961 (0.0012) [2024-06-15 15:40:30,786][1648981] Fps is (10 sec: 45784.2, 60 sec: 46405.8, 300 sec: 47205.0). Total num frames: 706510848. Throughput: 0: 11827.6. Samples: 176706560. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:30,787][1648981] Avg episode reward: [(0, '348.400')] [2024-06-15 15:40:31,483][1651669] Updated weights for policy 0, policy_version 345024 (0.0012) [2024-06-15 15:40:32,436][1651669] Updated weights for policy 0, policy_version 345060 (0.0013) [2024-06-15 15:40:33,869][1651274] Signal inference workers to stop experience collection... (18150 times) [2024-06-15 15:40:33,946][1651669] InferenceWorker_p0-w0: stopping experience collection (18150 times) [2024-06-15 15:40:34,224][1651274] Signal inference workers to resume experience collection... (18150 times) [2024-06-15 15:40:34,225][1651669] InferenceWorker_p0-w0: resuming experience collection (18150 times) [2024-06-15 15:40:34,819][1651669] Updated weights for policy 0, policy_version 345140 (0.0015) [2024-06-15 15:40:35,766][1648981] Fps is (10 sec: 52438.8, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 706871296. Throughput: 0: 11764.6. Samples: 176770048. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:35,767][1648981] Avg episode reward: [(0, '343.900')] [2024-06-15 15:40:38,751][1651669] Updated weights for policy 0, policy_version 345168 (0.0011) [2024-06-15 15:40:40,044][1651669] Updated weights for policy 0, policy_version 345214 (0.0011) [2024-06-15 15:40:40,766][1648981] Fps is (10 sec: 49250.2, 60 sec: 45881.1, 300 sec: 47097.2). Total num frames: 707002368. Throughput: 0: 11811.1. Samples: 176815616. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:40,767][1648981] Avg episode reward: [(0, '340.880')] [2024-06-15 15:40:42,476][1651669] Updated weights for policy 0, policy_version 345283 (0.0015) [2024-06-15 15:40:43,686][1651669] Updated weights for policy 0, policy_version 345342 (0.0013) [2024-06-15 15:40:44,950][1651669] Updated weights for policy 0, policy_version 345392 (0.0014) [2024-06-15 15:40:45,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.2, 300 sec: 47986.7). Total num frames: 707395584. Throughput: 0: 11935.3. Samples: 176882688. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:45,767][1648981] Avg episode reward: [(0, '346.490')] [2024-06-15 15:40:49,299][1651669] Updated weights for policy 0, policy_version 345409 (0.0013) [2024-06-15 15:40:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.3, 300 sec: 47102.1). Total num frames: 707526656. Throughput: 0: 11777.0. Samples: 176958976. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:50,767][1648981] Avg episode reward: [(0, '348.870')] [2024-06-15 15:40:52,426][1651669] Updated weights for policy 0, policy_version 345488 (0.0012) [2024-06-15 15:40:53,821][1651669] Updated weights for policy 0, policy_version 345552 (0.0013) [2024-06-15 15:40:55,733][1651669] Updated weights for policy 0, policy_version 345618 (0.0012) [2024-06-15 15:40:55,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 48608.7, 300 sec: 47763.5). Total num frames: 707821568. Throughput: 0: 12003.5. Samples: 176996864. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:40:55,767][1648981] Avg episode reward: [(0, '354.670')] [2024-06-15 15:40:56,423][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000345648_707887104.pth... [2024-06-15 15:40:56,461][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000339984_696287232.pth [2024-06-15 15:41:00,784][1648981] Fps is (10 sec: 39253.9, 60 sec: 45862.0, 300 sec: 46761.1). Total num frames: 707919872. Throughput: 0: 12078.6. Samples: 177064448. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:00,784][1648981] Avg episode reward: [(0, '354.390')] [2024-06-15 15:41:01,019][1651669] Updated weights for policy 0, policy_version 345680 (0.0012) [2024-06-15 15:41:02,114][1651669] Updated weights for policy 0, policy_version 345725 (0.0014) [2024-06-15 15:41:04,255][1651669] Updated weights for policy 0, policy_version 345781 (0.0080) [2024-06-15 15:41:05,628][1651669] Updated weights for policy 0, policy_version 345856 (0.0053) [2024-06-15 15:41:05,766][1648981] Fps is (10 sec: 49153.6, 60 sec: 49155.0, 300 sec: 47874.6). Total num frames: 708313088. Throughput: 0: 11878.4. Samples: 177127936. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:05,767][1648981] Avg episode reward: [(0, '356.780')] [2024-06-15 15:41:07,617][1651669] Updated weights for policy 0, policy_version 345908 (0.0016) [2024-06-15 15:41:10,768][1648981] Fps is (10 sec: 52510.2, 60 sec: 46420.0, 300 sec: 46985.7). Total num frames: 708444160. Throughput: 0: 11935.3. Samples: 177164288. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:10,769][1648981] Avg episode reward: [(0, '357.630')] [2024-06-15 15:41:12,447][1651669] Updated weights for policy 0, policy_version 345957 (0.0012) [2024-06-15 15:41:14,869][1651669] Updated weights for policy 0, policy_version 346004 (0.0013) [2024-06-15 15:41:15,787][1648981] Fps is (10 sec: 39238.9, 60 sec: 48043.0, 300 sec: 47538.0). Total num frames: 708706304. Throughput: 0: 12003.2. Samples: 177246720. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:15,788][1648981] Avg episode reward: [(0, '363.120')] [2024-06-15 15:41:16,144][1651274] Signal inference workers to stop experience collection... (18200 times) [2024-06-15 15:41:16,187][1651669] InferenceWorker_p0-w0: stopping experience collection (18200 times) [2024-06-15 15:41:16,442][1651274] Signal inference workers to resume experience collection... (18200 times) [2024-06-15 15:41:16,451][1651669] InferenceWorker_p0-w0: resuming experience collection (18200 times) [2024-06-15 15:41:16,568][1651669] Updated weights for policy 0, policy_version 346081 (0.0012) [2024-06-15 15:41:18,091][1651669] Updated weights for policy 0, policy_version 346144 (0.0014) [2024-06-15 15:41:20,782][1648981] Fps is (10 sec: 52355.6, 60 sec: 48593.2, 300 sec: 47427.8). Total num frames: 708968448. Throughput: 0: 12090.3. Samples: 177314304. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:20,783][1648981] Avg episode reward: [(0, '368.590')] [2024-06-15 15:41:22,354][1651669] Updated weights for policy 0, policy_version 346192 (0.0021) [2024-06-15 15:41:23,434][1651669] Updated weights for policy 0, policy_version 346240 (0.0011) [2024-06-15 15:41:25,774][1648981] Fps is (10 sec: 42654.6, 60 sec: 46416.6, 300 sec: 47206.9). Total num frames: 709132288. Throughput: 0: 11910.4. Samples: 177351680. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:25,775][1648981] Avg episode reward: [(0, '368.790')] [2024-06-15 15:41:27,691][1651669] Updated weights for policy 0, policy_version 346336 (0.0016) [2024-06-15 15:41:29,566][1651669] Updated weights for policy 0, policy_version 346425 (0.0017) [2024-06-15 15:41:30,767][1648981] Fps is (10 sec: 52509.7, 60 sec: 49714.3, 300 sec: 47652.4). Total num frames: 709492736. Throughput: 0: 11775.9. Samples: 177412608. Policy #0 lag: (min: 15.0, avg: 104.3, max: 271.0) [2024-06-15 15:41:30,768][1648981] Avg episode reward: [(0, '393.670')] [2024-06-15 15:41:33,355][1651669] Updated weights for policy 0, policy_version 346464 (0.0011) [2024-06-15 15:41:34,064][1651669] Updated weights for policy 0, policy_version 346496 (0.0013) [2024-06-15 15:41:35,782][1648981] Fps is (10 sec: 49113.1, 60 sec: 45863.1, 300 sec: 47094.5). Total num frames: 709623808. Throughput: 0: 12090.3. Samples: 177503232. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:41:35,783][1648981] Avg episode reward: [(0, '403.550')] [2024-06-15 15:41:38,421][1651669] Updated weights for policy 0, policy_version 346576 (0.0013) [2024-06-15 15:41:39,644][1651669] Updated weights for policy 0, policy_version 346625 (0.0014) [2024-06-15 15:41:40,818][1648981] Fps is (10 sec: 52159.9, 60 sec: 50200.8, 300 sec: 48088.3). Total num frames: 710017024. Throughput: 0: 11864.8. Samples: 177531392. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:41:40,819][1648981] Avg episode reward: [(0, '391.760')] [2024-06-15 15:41:43,652][1651669] Updated weights for policy 0, policy_version 346704 (0.0014) [2024-06-15 15:41:45,766][1648981] Fps is (10 sec: 52511.5, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 710148096. Throughput: 0: 11928.5. Samples: 177601024. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:41:45,767][1648981] Avg episode reward: [(0, '390.720')] [2024-06-15 15:41:47,424][1651669] Updated weights for policy 0, policy_version 346757 (0.0012) [2024-06-15 15:41:49,002][1651669] Updated weights for policy 0, policy_version 346816 (0.0015) [2024-06-15 15:41:50,770][1648981] Fps is (10 sec: 39510.7, 60 sec: 48056.5, 300 sec: 47985.0). Total num frames: 710410240. Throughput: 0: 12093.5. Samples: 177672192. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:41:50,771][1648981] Avg episode reward: [(0, '393.120')] [2024-06-15 15:41:50,927][1651669] Updated weights for policy 0, policy_version 346881 (0.0015) [2024-06-15 15:41:52,134][1651669] Updated weights for policy 0, policy_version 346944 (0.0013) [2024-06-15 15:41:55,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46421.6, 300 sec: 47319.2). Total num frames: 710606848. Throughput: 0: 12060.9. Samples: 177707008. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:41:55,767][1648981] Avg episode reward: [(0, '393.610')] [2024-06-15 15:41:55,967][1651669] Updated weights for policy 0, policy_version 346997 (0.0011) [2024-06-15 15:41:58,412][1651274] Signal inference workers to stop experience collection... (18250 times) [2024-06-15 15:41:58,528][1651669] InferenceWorker_p0-w0: stopping experience collection (18250 times) [2024-06-15 15:41:58,750][1651274] Signal inference workers to resume experience collection... (18250 times) [2024-06-15 15:41:58,751][1651669] InferenceWorker_p0-w0: resuming experience collection (18250 times) [2024-06-15 15:41:59,455][1651669] Updated weights for policy 0, policy_version 347042 (0.0011) [2024-06-15 15:42:00,796][1648981] Fps is (10 sec: 42490.3, 60 sec: 48596.0, 300 sec: 47647.7). Total num frames: 710836224. Throughput: 0: 12035.4. Samples: 177788416. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:00,796][1648981] Avg episode reward: [(0, '394.130')] [2024-06-15 15:42:01,441][1651669] Updated weights for policy 0, policy_version 347107 (0.0011) [2024-06-15 15:42:03,140][1651669] Updated weights for policy 0, policy_version 347185 (0.0147) [2024-06-15 15:42:05,779][1648981] Fps is (10 sec: 45816.5, 60 sec: 45865.4, 300 sec: 47539.3). Total num frames: 711065600. Throughput: 0: 11879.2. Samples: 177848832. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:05,780][1648981] Avg episode reward: [(0, '400.800')] [2024-06-15 15:42:06,482][1651669] Updated weights for policy 0, policy_version 347248 (0.0011) [2024-06-15 15:42:09,819][1651669] Updated weights for policy 0, policy_version 347281 (0.0011) [2024-06-15 15:42:10,777][1648981] Fps is (10 sec: 45962.0, 60 sec: 47506.6, 300 sec: 47428.6). Total num frames: 711294976. Throughput: 0: 12025.6. Samples: 177892864. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:10,778][1648981] Avg episode reward: [(0, '380.020')] [2024-06-15 15:42:11,603][1651669] Updated weights for policy 0, policy_version 347344 (0.0111) [2024-06-15 15:42:13,627][1651669] Updated weights for policy 0, policy_version 347408 (0.0014) [2024-06-15 15:42:15,770][1648981] Fps is (10 sec: 52475.0, 60 sec: 48073.4, 300 sec: 47542.0). Total num frames: 711589888. Throughput: 0: 11922.9. Samples: 177949184. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:15,771][1648981] Avg episode reward: [(0, '398.580')] [2024-06-15 15:42:16,897][1651669] Updated weights for policy 0, policy_version 347457 (0.0017) [2024-06-15 15:42:17,722][1651669] Updated weights for policy 0, policy_version 347510 (0.0088) [2024-06-15 15:42:20,766][1648981] Fps is (10 sec: 45924.1, 60 sec: 46433.6, 300 sec: 47208.1). Total num frames: 711753728. Throughput: 0: 11882.6. Samples: 178037760. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:20,767][1648981] Avg episode reward: [(0, '380.530')] [2024-06-15 15:42:21,266][1651669] Updated weights for policy 0, policy_version 347553 (0.0011) [2024-06-15 15:42:22,951][1651669] Updated weights for policy 0, policy_version 347619 (0.0013) [2024-06-15 15:42:24,623][1651669] Updated weights for policy 0, policy_version 347680 (0.0015) [2024-06-15 15:42:25,768][1648981] Fps is (10 sec: 52440.9, 60 sec: 49703.3, 300 sec: 47985.4). Total num frames: 712114176. Throughput: 0: 11868.9. Samples: 178064896. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:25,769][1648981] Avg episode reward: [(0, '369.260')] [2024-06-15 15:42:28,214][1651669] Updated weights for policy 0, policy_version 347744 (0.0013) [2024-06-15 15:42:30,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 45875.4, 300 sec: 47208.1). Total num frames: 712245248. Throughput: 0: 11867.0. Samples: 178135040. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:30,767][1648981] Avg episode reward: [(0, '379.580')] [2024-06-15 15:42:32,446][1651669] Updated weights for policy 0, policy_version 347794 (0.0013) [2024-06-15 15:42:34,655][1651669] Updated weights for policy 0, policy_version 347875 (0.0012) [2024-06-15 15:42:35,479][1651274] Signal inference workers to stop experience collection... (18300 times) [2024-06-15 15:42:35,525][1651669] InferenceWorker_p0-w0: stopping experience collection (18300 times) [2024-06-15 15:42:35,766][1648981] Fps is (10 sec: 39327.9, 60 sec: 48072.4, 300 sec: 47875.9). Total num frames: 712507392. Throughput: 0: 11890.8. Samples: 178207232. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:35,767][1648981] Avg episode reward: [(0, '377.670')] [2024-06-15 15:42:35,845][1651274] Signal inference workers to resume experience collection... (18300 times) [2024-06-15 15:42:35,855][1651669] InferenceWorker_p0-w0: resuming experience collection (18300 times) [2024-06-15 15:42:36,347][1651669] Updated weights for policy 0, policy_version 347940 (0.0013) [2024-06-15 15:42:38,834][1651669] Updated weights for policy 0, policy_version 348000 (0.0010) [2024-06-15 15:42:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45914.8, 300 sec: 47652.4). Total num frames: 712769536. Throughput: 0: 11958.0. Samples: 178245120. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:40,767][1648981] Avg episode reward: [(0, '373.410')] [2024-06-15 15:42:43,748][1651669] Updated weights for policy 0, policy_version 348053 (0.0016) [2024-06-15 15:42:45,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 712966144. Throughput: 0: 11829.2. Samples: 178320384. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:45,769][1648981] Avg episode reward: [(0, '376.040')] [2024-06-15 15:42:45,985][1651669] Updated weights for policy 0, policy_version 348132 (0.0141) [2024-06-15 15:42:48,064][1651669] Updated weights for policy 0, policy_version 348214 (0.0013) [2024-06-15 15:42:50,576][1651669] Updated weights for policy 0, policy_version 348257 (0.0011) [2024-06-15 15:42:50,770][1648981] Fps is (10 sec: 45858.6, 60 sec: 46967.7, 300 sec: 47207.5). Total num frames: 713228288. Throughput: 0: 11755.6. Samples: 178377728. Policy #0 lag: (min: 47.0, avg: 147.5, max: 303.0) [2024-06-15 15:42:50,771][1648981] Avg episode reward: [(0, '359.350')] [2024-06-15 15:42:55,230][1651669] Updated weights for policy 0, policy_version 348320 (0.0016) [2024-06-15 15:42:55,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46421.2, 300 sec: 47430.3). Total num frames: 713392128. Throughput: 0: 11676.3. Samples: 178418176. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:42:55,767][1648981] Avg episode reward: [(0, '356.670')] [2024-06-15 15:42:56,362][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000348368_713457664.pth... [2024-06-15 15:42:56,480][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000342784_702021632.pth [2024-06-15 15:42:56,553][1651669] Updated weights for policy 0, policy_version 348369 (0.0012) [2024-06-15 15:42:58,251][1651669] Updated weights for policy 0, policy_version 348433 (0.0012) [2024-06-15 15:42:59,411][1651669] Updated weights for policy 0, policy_version 348477 (0.0011) [2024-06-15 15:43:00,766][1648981] Fps is (10 sec: 45892.3, 60 sec: 47536.9, 300 sec: 47208.3). Total num frames: 713687040. Throughput: 0: 11697.4. Samples: 178475520. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:00,767][1648981] Avg episode reward: [(0, '352.860')] [2024-06-15 15:43:02,511][1651669] Updated weights for policy 0, policy_version 348532 (0.0011) [2024-06-15 15:43:05,777][1648981] Fps is (10 sec: 42553.0, 60 sec: 45876.7, 300 sec: 47095.6). Total num frames: 713818112. Throughput: 0: 11625.3. Samples: 178561024. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:05,778][1648981] Avg episode reward: [(0, '351.220')] [2024-06-15 15:43:06,660][1651669] Updated weights for policy 0, policy_version 348564 (0.0015) [2024-06-15 15:43:08,482][1651669] Updated weights for policy 0, policy_version 348644 (0.0204) [2024-06-15 15:43:10,291][1651669] Updated weights for policy 0, policy_version 348720 (0.0012) [2024-06-15 15:43:10,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 48614.2, 300 sec: 47541.3). Total num frames: 714211328. Throughput: 0: 11662.6. Samples: 178589696. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:10,767][1648981] Avg episode reward: [(0, '365.310')] [2024-06-15 15:43:12,992][1651669] Updated weights for policy 0, policy_version 348770 (0.0014) [2024-06-15 15:43:15,767][1648981] Fps is (10 sec: 52482.2, 60 sec: 45877.8, 300 sec: 47097.0). Total num frames: 714342400. Throughput: 0: 11684.9. Samples: 178660864. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:15,768][1648981] Avg episode reward: [(0, '358.180')] [2024-06-15 15:43:17,899][1651274] Signal inference workers to stop experience collection... (18350 times) [2024-06-15 15:43:17,938][1651669] InferenceWorker_p0-w0: stopping experience collection (18350 times) [2024-06-15 15:43:18,091][1651274] Signal inference workers to resume experience collection... (18350 times) [2024-06-15 15:43:18,092][1651669] InferenceWorker_p0-w0: resuming experience collection (18350 times) [2024-06-15 15:43:18,528][1651669] Updated weights for policy 0, policy_version 348866 (0.0017) [2024-06-15 15:43:20,313][1651669] Updated weights for policy 0, policy_version 348945 (0.0014) [2024-06-15 15:43:20,774][1648981] Fps is (10 sec: 45840.0, 60 sec: 48599.3, 300 sec: 47540.4). Total num frames: 714670080. Throughput: 0: 11637.4. Samples: 178731008. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:20,775][1648981] Avg episode reward: [(0, '355.370')] [2024-06-15 15:43:23,253][1651669] Updated weights for policy 0, policy_version 349008 (0.0013) [2024-06-15 15:43:24,473][1651669] Updated weights for policy 0, policy_version 349049 (0.0017) [2024-06-15 15:43:25,766][1648981] Fps is (10 sec: 52431.7, 60 sec: 45876.4, 300 sec: 47097.1). Total num frames: 714866688. Throughput: 0: 11764.6. Samples: 178774528. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:25,767][1648981] Avg episode reward: [(0, '354.960')] [2024-06-15 15:43:28,401][1651669] Updated weights for policy 0, policy_version 349104 (0.0014) [2024-06-15 15:43:29,489][1651669] Updated weights for policy 0, policy_version 349156 (0.0013) [2024-06-15 15:43:30,533][1651669] Updated weights for policy 0, policy_version 349203 (0.0063) [2024-06-15 15:43:30,766][1648981] Fps is (10 sec: 52471.2, 60 sec: 49152.2, 300 sec: 47874.6). Total num frames: 715194368. Throughput: 0: 11753.3. Samples: 178849280. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:30,767][1648981] Avg episode reward: [(0, '376.700')] [2024-06-15 15:43:33,996][1651669] Updated weights for policy 0, policy_version 349280 (0.0141) [2024-06-15 15:43:35,782][1648981] Fps is (10 sec: 52346.7, 60 sec: 48047.2, 300 sec: 47094.6). Total num frames: 715390976. Throughput: 0: 12239.2. Samples: 178928640. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:35,783][1648981] Avg episode reward: [(0, '389.020')] [2024-06-15 15:43:38,171][1651669] Updated weights for policy 0, policy_version 349319 (0.0028) [2024-06-15 15:43:39,758][1651669] Updated weights for policy 0, policy_version 349379 (0.0015) [2024-06-15 15:43:40,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 47513.5, 300 sec: 47874.6). Total num frames: 715620352. Throughput: 0: 12265.2. Samples: 178970112. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:40,769][1648981] Avg episode reward: [(0, '387.320')] [2024-06-15 15:43:41,186][1651669] Updated weights for policy 0, policy_version 349442 (0.0014) [2024-06-15 15:43:42,503][1651669] Updated weights for policy 0, policy_version 349499 (0.0019) [2024-06-15 15:43:44,700][1651669] Updated weights for policy 0, policy_version 349560 (0.0012) [2024-06-15 15:43:45,766][1648981] Fps is (10 sec: 52511.6, 60 sec: 49152.2, 300 sec: 47655.0). Total num frames: 715915264. Throughput: 0: 12344.9. Samples: 179031040. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:45,767][1648981] Avg episode reward: [(0, '383.640')] [2024-06-15 15:43:49,641][1651669] Updated weights for policy 0, policy_version 349618 (0.0012) [2024-06-15 15:43:50,767][1648981] Fps is (10 sec: 49152.6, 60 sec: 48062.7, 300 sec: 47763.5). Total num frames: 716111872. Throughput: 0: 12199.9. Samples: 179109888. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:50,767][1648981] Avg episode reward: [(0, '387.900')] [2024-06-15 15:43:50,920][1651669] Updated weights for policy 0, policy_version 349680 (0.0103) [2024-06-15 15:43:52,116][1651274] Signal inference workers to stop experience collection... (18400 times) [2024-06-15 15:43:52,162][1651669] InferenceWorker_p0-w0: stopping experience collection (18400 times) [2024-06-15 15:43:52,347][1651274] Signal inference workers to resume experience collection... (18400 times) [2024-06-15 15:43:52,348][1651669] InferenceWorker_p0-w0: resuming experience collection (18400 times) [2024-06-15 15:43:52,350][1651669] Updated weights for policy 0, policy_version 349744 (0.0040) [2024-06-15 15:43:54,527][1651669] Updated weights for policy 0, policy_version 349797 (0.0011) [2024-06-15 15:43:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50790.5, 300 sec: 47874.6). Total num frames: 716439552. Throughput: 0: 12288.1. Samples: 179142656. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:43:55,767][1648981] Avg episode reward: [(0, '393.990')] [2024-06-15 15:43:58,880][1651669] Updated weights for policy 0, policy_version 349840 (0.0013) [2024-06-15 15:44:00,770][1648981] Fps is (10 sec: 49133.8, 60 sec: 48602.8, 300 sec: 47651.8). Total num frames: 716603392. Throughput: 0: 12685.3. Samples: 179231744. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:44:00,771][1648981] Avg episode reward: [(0, '403.370')] [2024-06-15 15:44:00,987][1651669] Updated weights for policy 0, policy_version 349912 (0.0013) [2024-06-15 15:44:02,630][1651669] Updated weights for policy 0, policy_version 349988 (0.0013) [2024-06-15 15:44:05,282][1651669] Updated weights for policy 0, policy_version 350038 (0.0012) [2024-06-15 15:44:05,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 51892.1, 300 sec: 47986.3). Total num frames: 716931072. Throughput: 0: 12472.3. Samples: 179292160. Policy #0 lag: (min: 39.0, avg: 186.8, max: 295.0) [2024-06-15 15:44:05,767][1648981] Avg episode reward: [(0, '419.360')] [2024-06-15 15:44:10,011][1651669] Updated weights for policy 0, policy_version 350112 (0.0015) [2024-06-15 15:44:10,766][1648981] Fps is (10 sec: 49170.5, 60 sec: 48060.0, 300 sec: 47542.4). Total num frames: 717094912. Throughput: 0: 12367.7. Samples: 179331072. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:10,767][1648981] Avg episode reward: [(0, '425.010')] [2024-06-15 15:44:11,939][1651669] Updated weights for policy 0, policy_version 350163 (0.0011) [2024-06-15 15:44:13,346][1651669] Updated weights for policy 0, policy_version 350224 (0.0012) [2024-06-15 15:44:14,580][1651669] Updated weights for policy 0, policy_version 350272 (0.0045) [2024-06-15 15:44:15,766][1648981] Fps is (10 sec: 42597.7, 60 sec: 50244.8, 300 sec: 47874.6). Total num frames: 717357056. Throughput: 0: 12140.1. Samples: 179395584. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:15,767][1648981] Avg episode reward: [(0, '410.310')] [2024-06-15 15:44:16,863][1651669] Updated weights for policy 0, policy_version 350336 (0.0011) [2024-06-15 15:44:20,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 46973.8, 300 sec: 47208.1). Total num frames: 717488128. Throughput: 0: 12098.8. Samples: 179472896. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:20,767][1648981] Avg episode reward: [(0, '393.640')] [2024-06-15 15:44:21,902][1651669] Updated weights for policy 0, policy_version 350392 (0.0013) [2024-06-15 15:44:23,318][1651669] Updated weights for policy 0, policy_version 350432 (0.0015) [2024-06-15 15:44:24,667][1651669] Updated weights for policy 0, policy_version 350496 (0.0010) [2024-06-15 15:44:25,460][1651669] Updated weights for policy 0, policy_version 350528 (0.0012) [2024-06-15 15:44:25,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.4, 300 sec: 47985.7). Total num frames: 717881344. Throughput: 0: 11992.2. Samples: 179509760. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:25,767][1648981] Avg episode reward: [(0, '401.020')] [2024-06-15 15:44:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.4, 300 sec: 47430.3). Total num frames: 718012416. Throughput: 0: 12219.7. Samples: 179580928. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:30,767][1648981] Avg episode reward: [(0, '401.070')] [2024-06-15 15:44:31,209][1651669] Updated weights for policy 0, policy_version 350594 (0.0016) [2024-06-15 15:44:32,518][1651669] Updated weights for policy 0, policy_version 350655 (0.0015) [2024-06-15 15:44:34,777][1651669] Updated weights for policy 0, policy_version 350704 (0.0054) [2024-06-15 15:44:34,852][1651274] Signal inference workers to stop experience collection... (18450 times) [2024-06-15 15:44:34,923][1651669] InferenceWorker_p0-w0: stopping experience collection (18450 times) [2024-06-15 15:44:35,125][1651274] Signal inference workers to resume experience collection... (18450 times) [2024-06-15 15:44:35,126][1651669] InferenceWorker_p0-w0: resuming experience collection (18450 times) [2024-06-15 15:44:35,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48618.6, 300 sec: 47653.7). Total num frames: 718307328. Throughput: 0: 11969.4. Samples: 179648512. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:35,767][1648981] Avg episode reward: [(0, '401.530')] [2024-06-15 15:44:36,431][1651669] Updated weights for policy 0, policy_version 350775 (0.0014) [2024-06-15 15:44:38,116][1651669] Updated weights for policy 0, policy_version 350804 (0.0012) [2024-06-15 15:44:40,769][1648981] Fps is (10 sec: 52416.8, 60 sec: 48604.2, 300 sec: 47985.3). Total num frames: 718536704. Throughput: 0: 11991.6. Samples: 179682304. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:40,769][1648981] Avg episode reward: [(0, '403.940')] [2024-06-15 15:44:43,098][1651669] Updated weights for policy 0, policy_version 350885 (0.0012) [2024-06-15 15:44:44,826][1651669] Updated weights for policy 0, policy_version 350928 (0.0019) [2024-06-15 15:44:45,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 718798848. Throughput: 0: 11731.5. Samples: 179759616. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:45,767][1648981] Avg episode reward: [(0, '395.040')] [2024-06-15 15:44:45,929][1651669] Updated weights for policy 0, policy_version 350981 (0.0014) [2024-06-15 15:44:47,017][1651669] Updated weights for policy 0, policy_version 351035 (0.0012) [2024-06-15 15:44:49,780][1651669] Updated weights for policy 0, policy_version 351102 (0.0011) [2024-06-15 15:44:50,772][1648981] Fps is (10 sec: 52412.7, 60 sec: 49147.7, 300 sec: 47985.4). Total num frames: 719060992. Throughput: 0: 11968.0. Samples: 179830784. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:50,772][1648981] Avg episode reward: [(0, '390.370')] [2024-06-15 15:44:53,971][1651669] Updated weights for policy 0, policy_version 351166 (0.0013) [2024-06-15 15:44:55,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 719257600. Throughput: 0: 11992.2. Samples: 179870720. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:44:55,767][1648981] Avg episode reward: [(0, '345.760')] [2024-06-15 15:44:56,085][1651669] Updated weights for policy 0, policy_version 351222 (0.0013) [2024-06-15 15:44:56,316][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000351232_719323136.pth... [2024-06-15 15:44:56,458][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000345648_707887104.pth [2024-06-15 15:44:57,445][1651669] Updated weights for policy 0, policy_version 351280 (0.0021) [2024-06-15 15:45:00,080][1651669] Updated weights for policy 0, policy_version 351328 (0.0011) [2024-06-15 15:45:00,766][1648981] Fps is (10 sec: 52456.5, 60 sec: 49701.2, 300 sec: 48208.4). Total num frames: 719585280. Throughput: 0: 12071.8. Samples: 179938816. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:45:00,767][1648981] Avg episode reward: [(0, '341.050')] [2024-06-15 15:45:03,544][1651669] Updated weights for policy 0, policy_version 351378 (0.0014) [2024-06-15 15:45:04,655][1651669] Updated weights for policy 0, policy_version 351424 (0.0030) [2024-06-15 15:45:05,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46421.2, 300 sec: 47652.4). Total num frames: 719716352. Throughput: 0: 12117.3. Samples: 180018176. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:45:05,767][1648981] Avg episode reward: [(0, '336.940')] [2024-06-15 15:45:06,636][1651669] Updated weights for policy 0, policy_version 351477 (0.0093) [2024-06-15 15:45:08,445][1651669] Updated weights for policy 0, policy_version 351550 (0.0018) [2024-06-15 15:45:10,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.1, 300 sec: 48207.9). Total num frames: 720044032. Throughput: 0: 12026.3. Samples: 180050944. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:45:10,767][1648981] Avg episode reward: [(0, '341.780')] [2024-06-15 15:45:11,015][1651669] Updated weights for policy 0, policy_version 351606 (0.0013) [2024-06-15 15:45:14,462][1651669] Updated weights for policy 0, policy_version 351634 (0.0011) [2024-06-15 15:45:15,545][1651669] Updated weights for policy 0, policy_version 351680 (0.0021) [2024-06-15 15:45:15,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 720240640. Throughput: 0: 12231.1. Samples: 180131328. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:45:15,767][1648981] Avg episode reward: [(0, '322.880')] [2024-06-15 15:45:17,051][1651274] Signal inference workers to stop experience collection... (18500 times) [2024-06-15 15:45:17,079][1651669] InferenceWorker_p0-w0: stopping experience collection (18500 times) [2024-06-15 15:45:17,349][1651274] Signal inference workers to resume experience collection... (18500 times) [2024-06-15 15:45:17,350][1651669] InferenceWorker_p0-w0: resuming experience collection (18500 times) [2024-06-15 15:45:18,046][1651669] Updated weights for policy 0, policy_version 351760 (0.0025) [2024-06-15 15:45:18,968][1651669] Updated weights for policy 0, policy_version 351805 (0.0070) [2024-06-15 15:45:20,791][1648981] Fps is (10 sec: 49031.5, 60 sec: 50769.6, 300 sec: 48093.1). Total num frames: 720535552. Throughput: 0: 12201.7. Samples: 180197888. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:45:20,791][1648981] Avg episode reward: [(0, '334.460')] [2024-06-15 15:45:21,769][1651669] Updated weights for policy 0, policy_version 351869 (0.0012) [2024-06-15 15:45:25,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46421.3, 300 sec: 47988.9). Total num frames: 720666624. Throughput: 0: 12231.7. Samples: 180232704. Policy #0 lag: (min: 15.0, avg: 98.7, max: 271.0) [2024-06-15 15:45:25,767][1648981] Avg episode reward: [(0, '344.670')] [2024-06-15 15:45:26,553][1651669] Updated weights for policy 0, policy_version 351923 (0.0011) [2024-06-15 15:45:28,149][1651669] Updated weights for policy 0, policy_version 351955 (0.0008) [2024-06-15 15:45:29,822][1651669] Updated weights for policy 0, policy_version 352032 (0.0011) [2024-06-15 15:45:30,766][1648981] Fps is (10 sec: 49272.8, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 721027072. Throughput: 0: 12094.6. Samples: 180303872. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:45:30,767][1648981] Avg episode reward: [(0, '344.830')] [2024-06-15 15:45:32,712][1651669] Updated weights for policy 0, policy_version 352121 (0.0014) [2024-06-15 15:45:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 721158144. Throughput: 0: 12175.7. Samples: 180378624. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:45:35,767][1648981] Avg episode reward: [(0, '336.880')] [2024-06-15 15:45:36,942][1651669] Updated weights for policy 0, policy_version 352160 (0.0012) [2024-06-15 15:45:39,202][1651669] Updated weights for policy 0, policy_version 352197 (0.0011) [2024-06-15 15:45:40,649][1651669] Updated weights for policy 0, policy_version 352257 (0.0011) [2024-06-15 15:45:40,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 48061.4, 300 sec: 47541.3). Total num frames: 721420288. Throughput: 0: 12083.2. Samples: 180414464. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:45:40,767][1648981] Avg episode reward: [(0, '347.930')] [2024-06-15 15:45:42,069][1651669] Updated weights for policy 0, policy_version 352320 (0.0011) [2024-06-15 15:45:43,814][1651669] Updated weights for policy 0, policy_version 352368 (0.0013) [2024-06-15 15:45:45,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48059.6, 300 sec: 47985.6). Total num frames: 721682432. Throughput: 0: 12026.3. Samples: 180480000. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:45:45,769][1648981] Avg episode reward: [(0, '355.000')] [2024-06-15 15:45:48,188][1651669] Updated weights for policy 0, policy_version 352417 (0.0013) [2024-06-15 15:45:50,462][1651669] Updated weights for policy 0, policy_version 352482 (0.0015) [2024-06-15 15:45:50,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 47517.9, 300 sec: 47763.6). Total num frames: 721911808. Throughput: 0: 11969.5. Samples: 180556800. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:45:50,767][1648981] Avg episode reward: [(0, '367.920')] [2024-06-15 15:45:52,206][1651669] Updated weights for policy 0, policy_version 352560 (0.0151) [2024-06-15 15:45:54,056][1651669] Updated weights for policy 0, policy_version 352592 (0.0013) [2024-06-15 15:45:55,007][1651669] Updated weights for policy 0, policy_version 352636 (0.0042) [2024-06-15 15:45:55,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 49152.0, 300 sec: 48432.8). Total num frames: 722206720. Throughput: 0: 11923.9. Samples: 180587520. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:45:55,767][1648981] Avg episode reward: [(0, '381.300')] [2024-06-15 15:45:59,495][1651669] Updated weights for policy 0, policy_version 352699 (0.0013) [2024-06-15 15:46:00,689][1651274] Signal inference workers to stop experience collection... (18550 times) [2024-06-15 15:46:00,734][1651669] InferenceWorker_p0-w0: stopping experience collection (18550 times) [2024-06-15 15:46:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 722337792. Throughput: 0: 11776.0. Samples: 180661248. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:00,767][1648981] Avg episode reward: [(0, '389.210')] [2024-06-15 15:46:00,930][1651274] Signal inference workers to resume experience collection... (18550 times) [2024-06-15 15:46:00,930][1651669] InferenceWorker_p0-w0: resuming experience collection (18550 times) [2024-06-15 15:46:01,625][1651669] Updated weights for policy 0, policy_version 352752 (0.0010) [2024-06-15 15:46:03,593][1651669] Updated weights for policy 0, policy_version 352826 (0.0191) [2024-06-15 15:46:05,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 48208.1). Total num frames: 722665472. Throughput: 0: 11816.6. Samples: 180729344. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:05,767][1648981] Avg episode reward: [(0, '400.300')] [2024-06-15 15:46:05,947][1651669] Updated weights for policy 0, policy_version 352867 (0.0015) [2024-06-15 15:46:10,116][1651669] Updated weights for policy 0, policy_version 352928 (0.0013) [2024-06-15 15:46:10,792][1648981] Fps is (10 sec: 52292.1, 60 sec: 46947.0, 300 sec: 47984.9). Total num frames: 722862080. Throughput: 0: 11882.9. Samples: 180767744. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:10,793][1648981] Avg episode reward: [(0, '392.180')] [2024-06-15 15:46:12,476][1651669] Updated weights for policy 0, policy_version 352992 (0.0015) [2024-06-15 15:46:13,883][1651669] Updated weights for policy 0, policy_version 353056 (0.0009) [2024-06-15 15:46:14,682][1651669] Updated weights for policy 0, policy_version 353088 (0.0013) [2024-06-15 15:46:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.7, 300 sec: 47988.2). Total num frames: 723124224. Throughput: 0: 11867.0. Samples: 180837888. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:15,767][1648981] Avg episode reward: [(0, '392.560')] [2024-06-15 15:46:17,144][1651669] Updated weights for policy 0, policy_version 353148 (0.0033) [2024-06-15 15:46:20,766][1648981] Fps is (10 sec: 45995.4, 60 sec: 46440.3, 300 sec: 48098.1). Total num frames: 723320832. Throughput: 0: 11958.1. Samples: 180916736. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:20,767][1648981] Avg episode reward: [(0, '391.620')] [2024-06-15 15:46:22,719][1651669] Updated weights for policy 0, policy_version 353223 (0.0018) [2024-06-15 15:46:23,991][1651669] Updated weights for policy 0, policy_version 353280 (0.0011) [2024-06-15 15:46:25,655][1651669] Updated weights for policy 0, policy_version 353344 (0.0015) [2024-06-15 15:46:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 723648512. Throughput: 0: 11923.9. Samples: 180951040. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:25,767][1648981] Avg episode reward: [(0, '379.830')] [2024-06-15 15:46:30,779][1648981] Fps is (10 sec: 45815.7, 60 sec: 45865.3, 300 sec: 47986.1). Total num frames: 723779584. Throughput: 0: 11761.3. Samples: 181009408. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:30,780][1648981] Avg episode reward: [(0, '401.900')] [2024-06-15 15:46:31,772][1651669] Updated weights for policy 0, policy_version 353412 (0.0013) [2024-06-15 15:46:32,938][1651669] Updated weights for policy 0, policy_version 353470 (0.0011) [2024-06-15 15:46:35,401][1651669] Updated weights for policy 0, policy_version 353522 (0.0014) [2024-06-15 15:46:35,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 48059.7, 300 sec: 47549.7). Total num frames: 724041728. Throughput: 0: 11764.6. Samples: 181086208. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:35,767][1648981] Avg episode reward: [(0, '409.040')] [2024-06-15 15:46:37,018][1651669] Updated weights for policy 0, policy_version 353591 (0.0011) [2024-06-15 15:46:39,819][1651669] Updated weights for policy 0, policy_version 353648 (0.0015) [2024-06-15 15:46:40,766][1648981] Fps is (10 sec: 52497.0, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 724303872. Throughput: 0: 11832.9. Samples: 181120000. Policy #0 lag: (min: 61.0, avg: 147.2, max: 317.0) [2024-06-15 15:46:40,767][1648981] Avg episode reward: [(0, '409.720')] [2024-06-15 15:46:43,149][1651274] Signal inference workers to stop experience collection... (18600 times) [2024-06-15 15:46:43,189][1651669] InferenceWorker_p0-w0: stopping experience collection (18600 times) [2024-06-15 15:46:43,356][1651274] Signal inference workers to resume experience collection... (18600 times) [2024-06-15 15:46:43,374][1651669] InferenceWorker_p0-w0: resuming experience collection (18600 times) [2024-06-15 15:46:43,529][1651669] Updated weights for policy 0, policy_version 353699 (0.0013) [2024-06-15 15:46:45,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46967.6, 300 sec: 47764.2). Total num frames: 724500480. Throughput: 0: 11912.5. Samples: 181197312. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:46:45,767][1648981] Avg episode reward: [(0, '409.550')] [2024-06-15 15:46:46,116][1651669] Updated weights for policy 0, policy_version 353784 (0.0012) [2024-06-15 15:46:47,254][1651669] Updated weights for policy 0, policy_version 353840 (0.0024) [2024-06-15 15:46:50,224][1651669] Updated weights for policy 0, policy_version 353889 (0.0013) [2024-06-15 15:46:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 48207.8). Total num frames: 724828160. Throughput: 0: 11958.1. Samples: 181267456. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:46:50,767][1648981] Avg episode reward: [(0, '398.540')] [2024-06-15 15:46:53,627][1651669] Updated weights for policy 0, policy_version 353954 (0.0057) [2024-06-15 15:46:55,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 46967.1, 300 sec: 48101.5). Total num frames: 725024768. Throughput: 0: 12033.2. Samples: 181308928. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:46:55,767][1648981] Avg episode reward: [(0, '399.330')] [2024-06-15 15:46:55,908][1651669] Updated weights for policy 0, policy_version 354022 (0.0012) [2024-06-15 15:46:56,099][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000354032_725057536.pth... [2024-06-15 15:46:56,271][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000348368_713457664.pth [2024-06-15 15:46:57,519][1651669] Updated weights for policy 0, policy_version 354096 (0.0013) [2024-06-15 15:47:00,771][1648981] Fps is (10 sec: 45856.9, 60 sec: 49148.7, 300 sec: 48209.3). Total num frames: 725286912. Throughput: 0: 12116.3. Samples: 181383168. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:00,774][1648981] Avg episode reward: [(0, '390.940')] [2024-06-15 15:47:00,954][1651669] Updated weights for policy 0, policy_version 354148 (0.0040) [2024-06-15 15:47:03,895][1651669] Updated weights for policy 0, policy_version 354194 (0.0012) [2024-06-15 15:47:05,019][1651669] Updated weights for policy 0, policy_version 354236 (0.0012) [2024-06-15 15:47:05,766][1648981] Fps is (10 sec: 45877.5, 60 sec: 46967.5, 300 sec: 48098.5). Total num frames: 725483520. Throughput: 0: 11889.8. Samples: 181451776. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:05,767][1648981] Avg episode reward: [(0, '377.720')] [2024-06-15 15:47:07,155][1651669] Updated weights for policy 0, policy_version 354300 (0.0075) [2024-06-15 15:47:08,379][1651669] Updated weights for policy 0, policy_version 354368 (0.0012) [2024-06-15 15:47:10,801][1648981] Fps is (10 sec: 45734.2, 60 sec: 48052.8, 300 sec: 47980.7). Total num frames: 725745664. Throughput: 0: 11835.1. Samples: 181484032. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:10,802][1648981] Avg episode reward: [(0, '379.840')] [2024-06-15 15:47:14,951][1651669] Updated weights for policy 0, policy_version 354449 (0.0015) [2024-06-15 15:47:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 725975040. Throughput: 0: 12234.7. Samples: 181559808. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:15,767][1648981] Avg episode reward: [(0, '376.940')] [2024-06-15 15:47:15,914][1651669] Updated weights for policy 0, policy_version 354495 (0.0011) [2024-06-15 15:47:18,063][1651669] Updated weights for policy 0, policy_version 354550 (0.0014) [2024-06-15 15:47:19,209][1651669] Updated weights for policy 0, policy_version 354580 (0.0012) [2024-06-15 15:47:20,766][1648981] Fps is (10 sec: 52611.8, 60 sec: 49152.0, 300 sec: 47985.9). Total num frames: 726269952. Throughput: 0: 11958.0. Samples: 181624320. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:20,767][1648981] Avg episode reward: [(0, '383.190')] [2024-06-15 15:47:23,274][1651669] Updated weights for policy 0, policy_version 354660 (0.0130) [2024-06-15 15:47:25,676][1651274] Signal inference workers to stop experience collection... (18650 times) [2024-06-15 15:47:25,725][1651669] InferenceWorker_p0-w0: stopping experience collection (18650 times) [2024-06-15 15:47:25,786][1648981] Fps is (10 sec: 42513.6, 60 sec: 45860.1, 300 sec: 47982.5). Total num frames: 726401024. Throughput: 0: 12009.6. Samples: 181660672. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:25,787][1648981] Avg episode reward: [(0, '368.260')] [2024-06-15 15:47:25,999][1651274] Signal inference workers to resume experience collection... (18650 times) [2024-06-15 15:47:26,000][1651669] InferenceWorker_p0-w0: resuming experience collection (18650 times) [2024-06-15 15:47:26,001][1651669] Updated weights for policy 0, policy_version 354704 (0.0011) [2024-06-15 15:47:29,430][1651669] Updated weights for policy 0, policy_version 354784 (0.0012) [2024-06-15 15:47:30,496][1651669] Updated weights for policy 0, policy_version 354816 (0.0011) [2024-06-15 15:47:30,767][1648981] Fps is (10 sec: 39320.4, 60 sec: 48069.9, 300 sec: 47985.6). Total num frames: 726663168. Throughput: 0: 11878.3. Samples: 181731840. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:30,769][1648981] Avg episode reward: [(0, '384.230')] [2024-06-15 15:47:31,934][1651669] Updated weights for policy 0, policy_version 354880 (0.0029) [2024-06-15 15:47:34,868][1651669] Updated weights for policy 0, policy_version 354928 (0.0016) [2024-06-15 15:47:35,378][1651669] Updated weights for policy 0, policy_version 354944 (0.0012) [2024-06-15 15:47:35,771][1648981] Fps is (10 sec: 52509.1, 60 sec: 48056.0, 300 sec: 47985.0). Total num frames: 726925312. Throughput: 0: 11911.3. Samples: 181803520. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:35,772][1648981] Avg episode reward: [(0, '386.530')] [2024-06-15 15:47:37,704][1651669] Updated weights for policy 0, policy_version 355006 (0.0015) [2024-06-15 15:47:40,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 727121920. Throughput: 0: 11798.9. Samples: 181839872. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:40,767][1648981] Avg episode reward: [(0, '370.830')] [2024-06-15 15:47:40,936][1651669] Updated weights for policy 0, policy_version 355061 (0.0097) [2024-06-15 15:47:42,308][1651669] Updated weights for policy 0, policy_version 355106 (0.0013) [2024-06-15 15:47:45,191][1651669] Updated weights for policy 0, policy_version 355152 (0.0012) [2024-06-15 15:47:45,766][1648981] Fps is (10 sec: 45896.4, 60 sec: 48059.8, 300 sec: 47986.3). Total num frames: 727384064. Throughput: 0: 11754.3. Samples: 181912064. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:45,767][1648981] Avg episode reward: [(0, '366.920')] [2024-06-15 15:47:47,291][1651669] Updated weights for policy 0, policy_version 355202 (0.0015) [2024-06-15 15:47:48,401][1651669] Updated weights for policy 0, policy_version 355263 (0.0113) [2024-06-15 15:47:50,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45875.2, 300 sec: 48096.8). Total num frames: 727580672. Throughput: 0: 11753.2. Samples: 181980672. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:50,767][1648981] Avg episode reward: [(0, '368.540')] [2024-06-15 15:47:52,576][1651669] Updated weights for policy 0, policy_version 355328 (0.0014) [2024-06-15 15:47:53,846][1651669] Updated weights for policy 0, policy_version 355388 (0.0012) [2024-06-15 15:47:55,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46967.8, 300 sec: 47985.7). Total num frames: 727842816. Throughput: 0: 11830.6. Samples: 182016000. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:47:55,767][1648981] Avg episode reward: [(0, '358.880')] [2024-06-15 15:47:56,777][1651669] Updated weights for policy 0, policy_version 355451 (0.0012) [2024-06-15 15:47:58,781][1651669] Updated weights for policy 0, policy_version 355512 (0.0012) [2024-06-15 15:48:00,791][1648981] Fps is (10 sec: 52301.1, 60 sec: 46951.4, 300 sec: 48427.7). Total num frames: 728104960. Throughput: 0: 11792.3. Samples: 182090752. Policy #0 lag: (min: 58.0, avg: 153.6, max: 314.0) [2024-06-15 15:48:00,791][1648981] Avg episode reward: [(0, '386.880')] [2024-06-15 15:48:02,479][1651669] Updated weights for policy 0, policy_version 355556 (0.0012) [2024-06-15 15:48:03,249][1651669] Updated weights for policy 0, policy_version 355587 (0.0012) [2024-06-15 15:48:04,639][1651669] Updated weights for policy 0, policy_version 355646 (0.0024) [2024-06-15 15:48:05,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47985.8). Total num frames: 728367104. Throughput: 0: 11992.2. Samples: 182163968. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:05,767][1648981] Avg episode reward: [(0, '384.210')] [2024-06-15 15:48:07,014][1651274] Signal inference workers to stop experience collection... (18700 times) [2024-06-15 15:48:07,059][1651669] InferenceWorker_p0-w0: stopping experience collection (18700 times) [2024-06-15 15:48:07,239][1651274] Signal inference workers to resume experience collection... (18700 times) [2024-06-15 15:48:07,239][1651669] InferenceWorker_p0-w0: resuming experience collection (18700 times) [2024-06-15 15:48:08,452][1651669] Updated weights for policy 0, policy_version 355713 (0.0012) [2024-06-15 15:48:09,775][1651669] Updated weights for policy 0, policy_version 355772 (0.0023) [2024-06-15 15:48:10,767][1648981] Fps is (10 sec: 52555.3, 60 sec: 48087.3, 300 sec: 48430.0). Total num frames: 728629248. Throughput: 0: 12088.4. Samples: 182204416. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:10,768][1648981] Avg episode reward: [(0, '380.470')] [2024-06-15 15:48:13,704][1651669] Updated weights for policy 0, policy_version 355824 (0.0011) [2024-06-15 15:48:15,736][1651669] Updated weights for policy 0, policy_version 355898 (0.0012) [2024-06-15 15:48:15,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 48059.7, 300 sec: 48098.1). Total num frames: 728858624. Throughput: 0: 11912.6. Samples: 182267904. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:15,767][1648981] Avg episode reward: [(0, '375.150')] [2024-06-15 15:48:19,140][1651669] Updated weights for policy 0, policy_version 355965 (0.0013) [2024-06-15 15:48:20,766][1648981] Fps is (10 sec: 45876.8, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 729088000. Throughput: 0: 11936.5. Samples: 182340608. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:20,767][1648981] Avg episode reward: [(0, '373.240')] [2024-06-15 15:48:21,154][1651669] Updated weights for policy 0, policy_version 356028 (0.0017) [2024-06-15 15:48:24,704][1651669] Updated weights for policy 0, policy_version 356072 (0.0012) [2024-06-15 15:48:25,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 48621.9, 300 sec: 47874.6). Total num frames: 729317376. Throughput: 0: 12014.9. Samples: 182380544. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:25,767][1648981] Avg episode reward: [(0, '382.840')] [2024-06-15 15:48:26,281][1651669] Updated weights for policy 0, policy_version 356134 (0.0012) [2024-06-15 15:48:26,799][1651669] Updated weights for policy 0, policy_version 356160 (0.0011) [2024-06-15 15:48:30,769][1648981] Fps is (10 sec: 45863.5, 60 sec: 48057.9, 300 sec: 47987.8). Total num frames: 729546752. Throughput: 0: 12002.9. Samples: 182452224. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:30,769][1648981] Avg episode reward: [(0, '380.560')] [2024-06-15 15:48:31,323][1651669] Updated weights for policy 0, policy_version 356240 (0.0013) [2024-06-15 15:48:32,435][1651669] Updated weights for policy 0, policy_version 356288 (0.0013) [2024-06-15 15:48:35,767][1648981] Fps is (10 sec: 42595.2, 60 sec: 46970.4, 300 sec: 47874.5). Total num frames: 729743360. Throughput: 0: 12026.1. Samples: 182521856. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:35,768][1648981] Avg episode reward: [(0, '379.910')] [2024-06-15 15:48:36,380][1651669] Updated weights for policy 0, policy_version 356356 (0.0014) [2024-06-15 15:48:37,606][1651669] Updated weights for policy 0, policy_version 356407 (0.0012) [2024-06-15 15:48:40,455][1651669] Updated weights for policy 0, policy_version 356437 (0.0011) [2024-06-15 15:48:40,766][1648981] Fps is (10 sec: 45887.5, 60 sec: 48059.8, 300 sec: 47763.5). Total num frames: 730005504. Throughput: 0: 11901.2. Samples: 182551552. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:40,767][1648981] Avg episode reward: [(0, '375.870')] [2024-06-15 15:48:42,402][1651669] Updated weights for policy 0, policy_version 356501 (0.0012) [2024-06-15 15:48:43,258][1651669] Updated weights for policy 0, policy_version 356544 (0.0012) [2024-06-15 15:48:45,770][1648981] Fps is (10 sec: 45862.2, 60 sec: 46964.5, 300 sec: 47762.9). Total num frames: 730202112. Throughput: 0: 11861.1. Samples: 182624256. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:45,771][1648981] Avg episode reward: [(0, '382.900')] [2024-06-15 15:48:47,611][1651669] Updated weights for policy 0, policy_version 356594 (0.0011) [2024-06-15 15:48:48,542][1651274] Signal inference workers to stop experience collection... (18750 times) [2024-06-15 15:48:48,604][1651669] InferenceWorker_p0-w0: stopping experience collection (18750 times) [2024-06-15 15:48:48,610][1651669] Updated weights for policy 0, policy_version 356645 (0.0011) [2024-06-15 15:48:48,772][1651274] Signal inference workers to resume experience collection... (18750 times) [2024-06-15 15:48:48,773][1651669] InferenceWorker_p0-w0: resuming experience collection (18750 times) [2024-06-15 15:48:50,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 730464256. Throughput: 0: 11821.5. Samples: 182695936. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:50,767][1648981] Avg episode reward: [(0, '381.000')] [2024-06-15 15:48:51,714][1651669] Updated weights for policy 0, policy_version 356689 (0.0013) [2024-06-15 15:48:52,737][1651669] Updated weights for policy 0, policy_version 356731 (0.0013) [2024-06-15 15:48:53,651][1651669] Updated weights for policy 0, policy_version 356770 (0.0011) [2024-06-15 15:48:55,772][1648981] Fps is (10 sec: 52420.7, 60 sec: 48055.6, 300 sec: 47874.4). Total num frames: 730726400. Throughput: 0: 11752.0. Samples: 182733312. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:48:55,772][1648981] Avg episode reward: [(0, '383.140')] [2024-06-15 15:48:55,781][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000356800_730726400.pth... [2024-06-15 15:48:55,839][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000351232_719323136.pth [2024-06-15 15:48:58,064][1651669] Updated weights for policy 0, policy_version 356849 (0.0012) [2024-06-15 15:48:59,720][1651669] Updated weights for policy 0, policy_version 356923 (0.0013) [2024-06-15 15:49:00,774][1648981] Fps is (10 sec: 52387.2, 60 sec: 48072.9, 300 sec: 47651.1). Total num frames: 730988544. Throughput: 0: 11773.9. Samples: 182797824. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:49:00,775][1648981] Avg episode reward: [(0, '386.310')] [2024-06-15 15:49:02,851][1651669] Updated weights for policy 0, policy_version 356992 (0.0011) [2024-06-15 15:49:04,984][1651669] Updated weights for policy 0, policy_version 357056 (0.0102) [2024-06-15 15:49:05,772][1648981] Fps is (10 sec: 52425.9, 60 sec: 48055.0, 300 sec: 47984.7). Total num frames: 731250688. Throughput: 0: 11922.4. Samples: 182877184. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:49:05,773][1648981] Avg episode reward: [(0, '386.350')] [2024-06-15 15:49:09,550][1651669] Updated weights for policy 0, policy_version 357123 (0.0011) [2024-06-15 15:49:10,765][1651669] Updated weights for policy 0, policy_version 357183 (0.0013) [2024-06-15 15:49:10,789][1648981] Fps is (10 sec: 52352.5, 60 sec: 48042.0, 300 sec: 47982.0). Total num frames: 731512832. Throughput: 0: 11952.1. Samples: 182918656. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:49:10,790][1648981] Avg episode reward: [(0, '374.330')] [2024-06-15 15:49:13,233][1651669] Updated weights for policy 0, policy_version 357232 (0.0011) [2024-06-15 15:49:15,055][1651669] Updated weights for policy 0, policy_version 357280 (0.0012) [2024-06-15 15:49:15,766][1648981] Fps is (10 sec: 52459.4, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 731774976. Throughput: 0: 11970.1. Samples: 182990848. Policy #0 lag: (min: 8.0, avg: 103.1, max: 264.0) [2024-06-15 15:49:15,767][1648981] Avg episode reward: [(0, '361.130')] [2024-06-15 15:49:18,020][1651669] Updated weights for policy 0, policy_version 357314 (0.0012) [2024-06-15 15:49:19,591][1651669] Updated weights for policy 0, policy_version 357382 (0.0011) [2024-06-15 15:49:20,766][1648981] Fps is (10 sec: 49263.2, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 732004352. Throughput: 0: 11958.3. Samples: 183059968. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:20,767][1648981] Avg episode reward: [(0, '374.410')] [2024-06-15 15:49:23,688][1651669] Updated weights for policy 0, policy_version 357443 (0.0014) [2024-06-15 15:49:24,729][1651669] Updated weights for policy 0, policy_version 357502 (0.0015) [2024-06-15 15:49:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.9, 300 sec: 48096.8). Total num frames: 732200960. Throughput: 0: 12140.1. Samples: 183097856. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:25,767][1648981] Avg episode reward: [(0, '369.830')] [2024-06-15 15:49:26,740][1651669] Updated weights for policy 0, policy_version 357564 (0.0012) [2024-06-15 15:49:30,281][1651669] Updated weights for policy 0, policy_version 357627 (0.0012) [2024-06-15 15:49:30,601][1651274] Signal inference workers to stop experience collection... (18800 times) [2024-06-15 15:49:30,715][1651669] InferenceWorker_p0-w0: stopping experience collection (18800 times) [2024-06-15 15:49:30,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48061.8, 300 sec: 47874.6). Total num frames: 732430336. Throughput: 0: 12163.9. Samples: 183171584. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:30,767][1648981] Avg episode reward: [(0, '364.440')] [2024-06-15 15:49:30,904][1651274] Signal inference workers to resume experience collection... (18800 times) [2024-06-15 15:49:30,905][1651669] InferenceWorker_p0-w0: resuming experience collection (18800 times) [2024-06-15 15:49:31,849][1651669] Updated weights for policy 0, policy_version 357691 (0.0013) [2024-06-15 15:49:35,498][1651669] Updated weights for policy 0, policy_version 357744 (0.0014) [2024-06-15 15:49:35,770][1648981] Fps is (10 sec: 45858.0, 60 sec: 48603.6, 300 sec: 47874.4). Total num frames: 732659712. Throughput: 0: 12116.3. Samples: 183241216. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:35,771][1648981] Avg episode reward: [(0, '369.520')] [2024-06-15 15:49:36,424][1651669] Updated weights for policy 0, policy_version 357761 (0.0011) [2024-06-15 15:49:37,513][1651669] Updated weights for policy 0, policy_version 357809 (0.0012) [2024-06-15 15:49:39,574][1651669] Updated weights for policy 0, policy_version 357840 (0.0012) [2024-06-15 15:49:40,770][1648981] Fps is (10 sec: 49133.5, 60 sec: 48602.8, 300 sec: 47874.0). Total num frames: 732921856. Throughput: 0: 12106.4. Samples: 183278080. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:40,771][1648981] Avg episode reward: [(0, '385.750')] [2024-06-15 15:49:40,798][1651669] Updated weights for policy 0, policy_version 357887 (0.0016) [2024-06-15 15:49:42,656][1651669] Updated weights for policy 0, policy_version 357936 (0.0012) [2024-06-15 15:49:45,766][1648981] Fps is (10 sec: 45892.3, 60 sec: 48608.9, 300 sec: 47653.3). Total num frames: 733118464. Throughput: 0: 12301.6. Samples: 183351296. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:45,767][1648981] Avg episode reward: [(0, '379.160')] [2024-06-15 15:49:46,033][1651669] Updated weights for policy 0, policy_version 357984 (0.0010) [2024-06-15 15:49:47,531][1651669] Updated weights for policy 0, policy_version 358018 (0.0014) [2024-06-15 15:49:49,950][1651669] Updated weights for policy 0, policy_version 358085 (0.0014) [2024-06-15 15:49:50,766][1648981] Fps is (10 sec: 49170.6, 60 sec: 49152.1, 300 sec: 47985.7). Total num frames: 733413376. Throughput: 0: 12255.5. Samples: 183428608. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:50,767][1648981] Avg episode reward: [(0, '382.890')] [2024-06-15 15:49:51,135][1651669] Updated weights for policy 0, policy_version 358144 (0.0012) [2024-06-15 15:49:52,815][1651669] Updated weights for policy 0, policy_version 358204 (0.0011) [2024-06-15 15:49:55,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48063.9, 300 sec: 47541.4). Total num frames: 733609984. Throughput: 0: 12021.0. Samples: 183459328. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:49:55,767][1648981] Avg episode reward: [(0, '373.580')] [2024-06-15 15:49:56,482][1651669] Updated weights for policy 0, policy_version 358248 (0.0013) [2024-06-15 15:49:57,725][1651669] Updated weights for policy 0, policy_version 358292 (0.0012) [2024-06-15 15:50:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48066.2, 300 sec: 47985.7). Total num frames: 733872128. Throughput: 0: 12185.6. Samples: 183539200. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:00,767][1648981] Avg episode reward: [(0, '366.650')] [2024-06-15 15:50:01,369][1651669] Updated weights for policy 0, policy_version 358353 (0.0017) [2024-06-15 15:50:03,185][1651669] Updated weights for policy 0, policy_version 358418 (0.0012) [2024-06-15 15:50:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48064.4, 300 sec: 47763.5). Total num frames: 734134272. Throughput: 0: 12162.8. Samples: 183607296. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:05,767][1648981] Avg episode reward: [(0, '368.760')] [2024-06-15 15:50:06,537][1651669] Updated weights for policy 0, policy_version 358467 (0.0014) [2024-06-15 15:50:07,540][1651669] Updated weights for policy 0, policy_version 358518 (0.0090) [2024-06-15 15:50:09,030][1651669] Updated weights for policy 0, policy_version 358560 (0.0012) [2024-06-15 15:50:09,794][1651669] Updated weights for policy 0, policy_version 358587 (0.0030) [2024-06-15 15:50:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48077.8, 300 sec: 47985.7). Total num frames: 734396416. Throughput: 0: 12322.1. Samples: 183652352. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:10,767][1648981] Avg episode reward: [(0, '370.300')] [2024-06-15 15:50:13,187][1651274] Signal inference workers to stop experience collection... (18850 times) [2024-06-15 15:50:13,261][1651669] InferenceWorker_p0-w0: stopping experience collection (18850 times) [2024-06-15 15:50:13,274][1651669] Updated weights for policy 0, policy_version 358660 (0.0023) [2024-06-15 15:50:13,442][1651274] Signal inference workers to resume experience collection... (18850 times) [2024-06-15 15:50:13,450][1651669] InferenceWorker_p0-w0: resuming experience collection (18850 times) [2024-06-15 15:50:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47878.6). Total num frames: 734658560. Throughput: 0: 12037.7. Samples: 183713280. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:15,767][1648981] Avg episode reward: [(0, '383.140')] [2024-06-15 15:50:17,746][1651669] Updated weights for policy 0, policy_version 358736 (0.0120) [2024-06-15 15:50:19,574][1651669] Updated weights for policy 0, policy_version 358790 (0.0138) [2024-06-15 15:50:20,681][1651669] Updated weights for policy 0, policy_version 358845 (0.0020) [2024-06-15 15:50:20,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 48605.6, 300 sec: 48318.9). Total num frames: 734920704. Throughput: 0: 12277.6. Samples: 183793664. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:20,767][1648981] Avg episode reward: [(0, '394.350')] [2024-06-15 15:50:24,063][1651669] Updated weights for policy 0, policy_version 358900 (0.0015) [2024-06-15 15:50:25,725][1651669] Updated weights for policy 0, policy_version 358968 (0.0012) [2024-06-15 15:50:25,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49151.9, 300 sec: 47874.6). Total num frames: 735150080. Throughput: 0: 12300.4. Samples: 183831552. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:25,767][1648981] Avg episode reward: [(0, '394.290')] [2024-06-15 15:50:28,746][1651669] Updated weights for policy 0, policy_version 358998 (0.0013) [2024-06-15 15:50:30,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 735346688. Throughput: 0: 12185.6. Samples: 183899648. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:30,767][1648981] Avg episode reward: [(0, '411.270')] [2024-06-15 15:50:30,804][1651669] Updated weights for policy 0, policy_version 359056 (0.0126) [2024-06-15 15:50:34,730][1651669] Updated weights for policy 0, policy_version 359136 (0.0013) [2024-06-15 15:50:35,782][1648981] Fps is (10 sec: 42531.2, 60 sec: 48596.1, 300 sec: 47983.1). Total num frames: 735576064. Throughput: 0: 11976.6. Samples: 183967744. Policy #0 lag: (min: 63.0, avg: 159.2, max: 319.0) [2024-06-15 15:50:35,783][1648981] Avg episode reward: [(0, '430.420')] [2024-06-15 15:50:36,802][1651669] Updated weights for policy 0, policy_version 359216 (0.0021) [2024-06-15 15:50:39,170][1651669] Updated weights for policy 0, policy_version 359236 (0.0015) [2024-06-15 15:50:40,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 48608.8, 300 sec: 47985.7). Total num frames: 735838208. Throughput: 0: 12151.4. Samples: 184006144. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:50:40,767][1648981] Avg episode reward: [(0, '430.510')] [2024-06-15 15:50:41,239][1651669] Updated weights for policy 0, policy_version 359299 (0.0011) [2024-06-15 15:50:42,642][1651669] Updated weights for policy 0, policy_version 359357 (0.0010) [2024-06-15 15:50:45,766][1648981] Fps is (10 sec: 45948.3, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 736034816. Throughput: 0: 11946.7. Samples: 184076800. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:50:45,767][1648981] Avg episode reward: [(0, '408.580')] [2024-06-15 15:50:45,991][1651669] Updated weights for policy 0, policy_version 359408 (0.0033) [2024-06-15 15:50:46,745][1651669] Updated weights for policy 0, policy_version 359440 (0.0014) [2024-06-15 15:50:50,156][1651669] Updated weights for policy 0, policy_version 359506 (0.0153) [2024-06-15 15:50:50,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 48605.8, 300 sec: 47874.6). Total num frames: 736329728. Throughput: 0: 12128.7. Samples: 184153088. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:50:50,767][1648981] Avg episode reward: [(0, '402.720')] [2024-06-15 15:50:50,935][1651669] Updated weights for policy 0, policy_version 359550 (0.0011) [2024-06-15 15:50:52,699][1651669] Updated weights for policy 0, policy_version 359601 (0.0058) [2024-06-15 15:50:55,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 49151.8, 300 sec: 48207.8). Total num frames: 736559104. Throughput: 0: 11992.1. Samples: 184192000. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:50:55,767][1648981] Avg episode reward: [(0, '403.960')] [2024-06-15 15:50:55,813][1651274] Signal inference workers to stop experience collection... (18900 times) [2024-06-15 15:50:55,859][1651669] InferenceWorker_p0-w0: stopping experience collection (18900 times) [2024-06-15 15:50:55,888][1651669] Updated weights for policy 0, policy_version 359649 (0.0016) [2024-06-15 15:50:56,169][1651274] Signal inference workers to resume experience collection... (18900 times) [2024-06-15 15:50:56,169][1651669] InferenceWorker_p0-w0: resuming experience collection (18900 times) [2024-06-15 15:50:56,171][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000359664_736591872.pth... [2024-06-15 15:50:56,210][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000354032_725057536.pth [2024-06-15 15:50:56,213][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000359664_736591872.pth [2024-06-15 15:50:57,547][1651669] Updated weights for policy 0, policy_version 359715 (0.0019) [2024-06-15 15:51:00,445][1651669] Updated weights for policy 0, policy_version 359776 (0.0013) [2024-06-15 15:51:00,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 736821248. Throughput: 0: 12299.4. Samples: 184266752. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:00,767][1648981] Avg episode reward: [(0, '402.640')] [2024-06-15 15:51:02,772][1651669] Updated weights for policy 0, policy_version 359814 (0.0011) [2024-06-15 15:51:05,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 48059.3, 300 sec: 47989.8). Total num frames: 737017856. Throughput: 0: 12140.0. Samples: 184339968. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:05,768][1648981] Avg episode reward: [(0, '395.410')] [2024-06-15 15:51:06,409][1651669] Updated weights for policy 0, policy_version 359873 (0.0012) [2024-06-15 15:51:08,481][1651669] Updated weights for policy 0, policy_version 359952 (0.0055) [2024-06-15 15:51:10,780][1648981] Fps is (10 sec: 45813.4, 60 sec: 48049.0, 300 sec: 47983.5). Total num frames: 737280000. Throughput: 0: 11977.2. Samples: 184370688. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:10,780][1648981] Avg episode reward: [(0, '386.150')] [2024-06-15 15:51:10,955][1651669] Updated weights for policy 0, policy_version 360016 (0.0011) [2024-06-15 15:51:12,032][1651669] Updated weights for policy 0, policy_version 360063 (0.0011) [2024-06-15 15:51:14,964][1651669] Updated weights for policy 0, policy_version 360126 (0.0109) [2024-06-15 15:51:15,766][1648981] Fps is (10 sec: 52431.5, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 737542144. Throughput: 0: 11958.1. Samples: 184437760. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:15,767][1648981] Avg episode reward: [(0, '372.890')] [2024-06-15 15:51:19,840][1651669] Updated weights for policy 0, policy_version 360208 (0.0091) [2024-06-15 15:51:20,770][1648981] Fps is (10 sec: 49199.5, 60 sec: 47510.9, 300 sec: 47874.0). Total num frames: 737771520. Throughput: 0: 11961.2. Samples: 184505856. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:20,771][1648981] Avg episode reward: [(0, '369.930')] [2024-06-15 15:51:20,822][1651669] Updated weights for policy 0, policy_version 360256 (0.0012) [2024-06-15 15:51:23,663][1651669] Updated weights for policy 0, policy_version 360312 (0.0013) [2024-06-15 15:51:25,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 48059.6, 300 sec: 48321.0). Total num frames: 738033664. Throughput: 0: 12014.9. Samples: 184546816. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:25,768][1648981] Avg episode reward: [(0, '377.250')] [2024-06-15 15:51:25,855][1651669] Updated weights for policy 0, policy_version 360381 (0.0013) [2024-06-15 15:51:30,719][1651669] Updated weights for policy 0, policy_version 360432 (0.0011) [2024-06-15 15:51:30,766][1648981] Fps is (10 sec: 39336.8, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 738164736. Throughput: 0: 12151.5. Samples: 184623616. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:30,767][1648981] Avg episode reward: [(0, '383.060')] [2024-06-15 15:51:32,521][1651669] Updated weights for policy 0, policy_version 360508 (0.0012) [2024-06-15 15:51:34,396][1651669] Updated weights for policy 0, policy_version 360560 (0.0012) [2024-06-15 15:51:35,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 48072.5, 300 sec: 47985.7). Total num frames: 738459648. Throughput: 0: 11889.8. Samples: 184688128. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:35,767][1648981] Avg episode reward: [(0, '370.340')] [2024-06-15 15:51:36,407][1651669] Updated weights for policy 0, policy_version 360613 (0.0011) [2024-06-15 15:51:40,222][1651274] Signal inference workers to stop experience collection... (18950 times) [2024-06-15 15:51:40,264][1651669] InferenceWorker_p0-w0: stopping experience collection (18950 times) [2024-06-15 15:51:40,363][1651274] Signal inference workers to resume experience collection... (18950 times) [2024-06-15 15:51:40,364][1651669] InferenceWorker_p0-w0: resuming experience collection (18950 times) [2024-06-15 15:51:40,366][1651669] Updated weights for policy 0, policy_version 360656 (0.0012) [2024-06-15 15:51:40,767][1648981] Fps is (10 sec: 45873.3, 60 sec: 46421.2, 300 sec: 47874.6). Total num frames: 738623488. Throughput: 0: 11832.9. Samples: 184724480. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:40,768][1648981] Avg episode reward: [(0, '350.640')] [2024-06-15 15:51:42,218][1651669] Updated weights for policy 0, policy_version 360720 (0.0012) [2024-06-15 15:51:43,230][1651669] Updated weights for policy 0, policy_version 360767 (0.0015) [2024-06-15 15:51:45,327][1651669] Updated weights for policy 0, policy_version 360823 (0.0012) [2024-06-15 15:51:45,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 49151.9, 300 sec: 47985.7). Total num frames: 738983936. Throughput: 0: 11730.5. Samples: 184794624. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:45,767][1648981] Avg episode reward: [(0, '338.410')] [2024-06-15 15:51:47,629][1651669] Updated weights for policy 0, policy_version 360868 (0.0014) [2024-06-15 15:51:50,766][1648981] Fps is (10 sec: 49154.0, 60 sec: 46421.4, 300 sec: 47763.6). Total num frames: 739115008. Throughput: 0: 11776.1. Samples: 184869888. Policy #0 lag: (min: 102.0, avg: 207.8, max: 376.0) [2024-06-15 15:51:50,767][1648981] Avg episode reward: [(0, '358.320')] [2024-06-15 15:51:52,346][1651669] Updated weights for policy 0, policy_version 360928 (0.0100) [2024-06-15 15:51:53,739][1651669] Updated weights for policy 0, policy_version 360980 (0.0010) [2024-06-15 15:51:54,753][1651669] Updated weights for policy 0, policy_version 361023 (0.0010) [2024-06-15 15:51:55,773][1648981] Fps is (10 sec: 42568.7, 60 sec: 47508.3, 300 sec: 47874.1). Total num frames: 739409920. Throughput: 0: 11698.0. Samples: 184897024. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:51:55,774][1648981] Avg episode reward: [(0, '377.220')] [2024-06-15 15:51:56,519][1651669] Updated weights for policy 0, policy_version 361088 (0.0052) [2024-06-15 15:51:58,850][1651669] Updated weights for policy 0, policy_version 361141 (0.0014) [2024-06-15 15:52:00,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 739639296. Throughput: 0: 11889.8. Samples: 184972800. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:00,767][1648981] Avg episode reward: [(0, '381.840')] [2024-06-15 15:52:03,353][1651669] Updated weights for policy 0, policy_version 361185 (0.0015) [2024-06-15 15:52:04,518][1651669] Updated weights for policy 0, policy_version 361248 (0.0070) [2024-06-15 15:52:05,766][1648981] Fps is (10 sec: 49186.6, 60 sec: 48060.2, 300 sec: 47991.3). Total num frames: 739901440. Throughput: 0: 11947.7. Samples: 185043456. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:05,767][1648981] Avg episode reward: [(0, '396.980')] [2024-06-15 15:52:07,023][1651669] Updated weights for policy 0, policy_version 361312 (0.0139) [2024-06-15 15:52:08,755][1651669] Updated weights for policy 0, policy_version 361360 (0.0016) [2024-06-15 15:52:09,814][1651669] Updated weights for policy 0, policy_version 361408 (0.0011) [2024-06-15 15:52:10,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48070.5, 300 sec: 48096.8). Total num frames: 740163584. Throughput: 0: 11889.8. Samples: 185081856. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:10,767][1648981] Avg episode reward: [(0, '399.190')] [2024-06-15 15:52:14,324][1651669] Updated weights for policy 0, policy_version 361461 (0.0012) [2024-06-15 15:52:15,629][1651669] Updated weights for policy 0, policy_version 361506 (0.0025) [2024-06-15 15:52:15,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 46967.3, 300 sec: 47763.5). Total num frames: 740360192. Throughput: 0: 11878.3. Samples: 185158144. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:15,767][1648981] Avg episode reward: [(0, '386.950')] [2024-06-15 15:52:17,195][1651669] Updated weights for policy 0, policy_version 361556 (0.0012) [2024-06-15 15:52:18,942][1651669] Updated weights for policy 0, policy_version 361601 (0.0013) [2024-06-15 15:52:19,858][1651274] Signal inference workers to stop experience collection... (19000 times) [2024-06-15 15:52:19,887][1651669] InferenceWorker_p0-w0: stopping experience collection (19000 times) [2024-06-15 15:52:20,075][1651274] Signal inference workers to resume experience collection... (19000 times) [2024-06-15 15:52:20,076][1651669] InferenceWorker_p0-w0: resuming experience collection (19000 times) [2024-06-15 15:52:20,360][1651669] Updated weights for policy 0, policy_version 361664 (0.0013) [2024-06-15 15:52:20,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48609.0, 300 sec: 48433.3). Total num frames: 740687872. Throughput: 0: 11923.9. Samples: 185224704. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:20,767][1648981] Avg episode reward: [(0, '392.290')] [2024-06-15 15:52:25,585][1651669] Updated weights for policy 0, policy_version 361733 (0.0012) [2024-06-15 15:52:25,766][1648981] Fps is (10 sec: 49152.9, 60 sec: 46967.6, 300 sec: 48096.8). Total num frames: 740851712. Throughput: 0: 12276.7. Samples: 185276928. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:25,767][1648981] Avg episode reward: [(0, '380.280')] [2024-06-15 15:52:27,665][1651669] Updated weights for policy 0, policy_version 361808 (0.0011) [2024-06-15 15:52:30,336][1651669] Updated weights for policy 0, policy_version 361888 (0.0019) [2024-06-15 15:52:30,778][1648981] Fps is (10 sec: 49093.8, 60 sec: 50234.4, 300 sec: 48317.7). Total num frames: 741179392. Throughput: 0: 11920.8. Samples: 185331200. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:30,779][1648981] Avg episode reward: [(0, '371.730')] [2024-06-15 15:52:35,442][1651669] Updated weights for policy 0, policy_version 361922 (0.0012) [2024-06-15 15:52:35,768][1648981] Fps is (10 sec: 39315.7, 60 sec: 46420.1, 300 sec: 47874.4). Total num frames: 741244928. Throughput: 0: 12230.7. Samples: 185420288. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:35,769][1648981] Avg episode reward: [(0, '385.110')] [2024-06-15 15:52:36,667][1651669] Updated weights for policy 0, policy_version 361986 (0.0248) [2024-06-15 15:52:38,843][1651669] Updated weights for policy 0, policy_version 362080 (0.0104) [2024-06-15 15:52:40,766][1648981] Fps is (10 sec: 45929.2, 60 sec: 50244.6, 300 sec: 48318.9). Total num frames: 741638144. Throughput: 0: 12130.6. Samples: 185442816. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:40,767][1648981] Avg episode reward: [(0, '385.690')] [2024-06-15 15:52:40,890][1651669] Updated weights for policy 0, policy_version 362144 (0.0100) [2024-06-15 15:52:45,766][1648981] Fps is (10 sec: 49159.5, 60 sec: 45875.3, 300 sec: 47985.7). Total num frames: 741736448. Throughput: 0: 12288.0. Samples: 185525760. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:45,767][1648981] Avg episode reward: [(0, '380.880')] [2024-06-15 15:52:46,303][1651669] Updated weights for policy 0, policy_version 362178 (0.0010) [2024-06-15 15:52:47,627][1651669] Updated weights for policy 0, policy_version 362241 (0.0011) [2024-06-15 15:52:49,213][1651669] Updated weights for policy 0, policy_version 362305 (0.0012) [2024-06-15 15:52:50,425][1651669] Updated weights for policy 0, policy_version 362358 (0.0011) [2024-06-15 15:52:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 742129664. Throughput: 0: 12344.9. Samples: 185598976. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:50,767][1648981] Avg episode reward: [(0, '377.980')] [2024-06-15 15:52:51,992][1651669] Updated weights for policy 0, policy_version 362426 (0.0015) [2024-06-15 15:52:55,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 47519.0, 300 sec: 47989.6). Total num frames: 742260736. Throughput: 0: 12288.0. Samples: 185634816. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:52:55,767][1648981] Avg episode reward: [(0, '382.110')] [2024-06-15 15:52:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000362432_742260736.pth... [2024-06-15 15:52:55,834][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000356800_730726400.pth [2024-06-15 15:52:58,019][1651669] Updated weights for policy 0, policy_version 362481 (0.0011) [2024-06-15 15:52:59,286][1651274] Signal inference workers to stop experience collection... (19050 times) [2024-06-15 15:52:59,329][1651669] InferenceWorker_p0-w0: stopping experience collection (19050 times) [2024-06-15 15:52:59,510][1651274] Signal inference workers to resume experience collection... (19050 times) [2024-06-15 15:52:59,511][1651669] InferenceWorker_p0-w0: resuming experience collection (19050 times) [2024-06-15 15:52:59,513][1651669] Updated weights for policy 0, policy_version 362544 (0.0015) [2024-06-15 15:53:00,782][1648981] Fps is (10 sec: 45802.9, 60 sec: 49139.1, 300 sec: 48205.2). Total num frames: 742588416. Throughput: 0: 12317.9. Samples: 185712640. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:53:00,783][1648981] Avg episode reward: [(0, '380.950')] [2024-06-15 15:53:01,085][1651669] Updated weights for policy 0, policy_version 362624 (0.0019) [2024-06-15 15:53:02,862][1651669] Updated weights for policy 0, policy_version 362684 (0.0012) [2024-06-15 15:53:05,767][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 742785024. Throughput: 0: 12265.2. Samples: 185776640. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:53:05,767][1648981] Avg episode reward: [(0, '391.650')] [2024-06-15 15:53:10,452][1651669] Updated weights for policy 0, policy_version 362784 (0.0107) [2024-06-15 15:53:10,766][1648981] Fps is (10 sec: 39383.9, 60 sec: 46967.4, 300 sec: 47874.6). Total num frames: 742981632. Throughput: 0: 12094.6. Samples: 185821184. Policy #0 lag: (min: 82.0, avg: 186.6, max: 356.0) [2024-06-15 15:53:10,767][1648981] Avg episode reward: [(0, '409.050')] [2024-06-15 15:53:11,913][1651669] Updated weights for policy 0, policy_version 362853 (0.0106) [2024-06-15 15:53:13,440][1651669] Updated weights for policy 0, policy_version 362913 (0.0012) [2024-06-15 15:53:15,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 49152.2, 300 sec: 48207.8). Total num frames: 743309312. Throughput: 0: 11995.3. Samples: 185870848. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:15,767][1648981] Avg episode reward: [(0, '407.740')] [2024-06-15 15:53:20,430][1651669] Updated weights for policy 0, policy_version 362978 (0.0012) [2024-06-15 15:53:20,772][1648981] Fps is (10 sec: 42575.1, 60 sec: 45324.8, 300 sec: 47762.7). Total num frames: 743407616. Throughput: 0: 11888.7. Samples: 185955328. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:20,772][1648981] Avg episode reward: [(0, '393.840')] [2024-06-15 15:53:22,092][1651669] Updated weights for policy 0, policy_version 363056 (0.0014) [2024-06-15 15:53:23,520][1651669] Updated weights for policy 0, policy_version 363111 (0.0013) [2024-06-15 15:53:24,786][1651669] Updated weights for policy 0, policy_version 363168 (0.0014) [2024-06-15 15:53:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 48430.4). Total num frames: 743833600. Throughput: 0: 11923.9. Samples: 185979392. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:25,767][1648981] Avg episode reward: [(0, '422.160')] [2024-06-15 15:53:30,766][1648981] Fps is (10 sec: 42621.9, 60 sec: 44245.5, 300 sec: 47763.7). Total num frames: 743833600. Throughput: 0: 11730.5. Samples: 186053632. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:30,767][1648981] Avg episode reward: [(0, '425.070')] [2024-06-15 15:53:31,115][1651669] Updated weights for policy 0, policy_version 363213 (0.0013) [2024-06-15 15:53:33,167][1651669] Updated weights for policy 0, policy_version 363296 (0.0012) [2024-06-15 15:53:34,891][1651669] Updated weights for policy 0, policy_version 363365 (0.0108) [2024-06-15 15:53:35,450][1651274] Signal inference workers to stop experience collection... (19100 times) [2024-06-15 15:53:35,508][1651669] InferenceWorker_p0-w0: stopping experience collection (19100 times) [2024-06-15 15:53:35,636][1651274] Signal inference workers to resume experience collection... (19100 times) [2024-06-15 15:53:35,638][1651669] InferenceWorker_p0-w0: resuming experience collection (19100 times) [2024-06-15 15:53:35,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 50245.6, 300 sec: 48318.9). Total num frames: 744259584. Throughput: 0: 11423.3. Samples: 186113024. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:35,767][1648981] Avg episode reward: [(0, '426.330')] [2024-06-15 15:53:36,123][1651669] Updated weights for policy 0, policy_version 363428 (0.0014) [2024-06-15 15:53:40,767][1648981] Fps is (10 sec: 52426.2, 60 sec: 45328.7, 300 sec: 47986.2). Total num frames: 744357888. Throughput: 0: 11525.6. Samples: 186153472. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:40,768][1648981] Avg episode reward: [(0, '413.540')] [2024-06-15 15:53:42,337][1651669] Updated weights for policy 0, policy_version 363459 (0.0014) [2024-06-15 15:53:43,474][1651669] Updated weights for policy 0, policy_version 363520 (0.0048) [2024-06-15 15:53:45,296][1651669] Updated weights for policy 0, policy_version 363585 (0.0190) [2024-06-15 15:53:45,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 744652800. Throughput: 0: 11472.8. Samples: 186228736. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:45,767][1648981] Avg episode reward: [(0, '410.910')] [2024-06-15 15:53:46,694][1651669] Updated weights for policy 0, policy_version 363648 (0.0012) [2024-06-15 15:53:48,021][1651669] Updated weights for policy 0, policy_version 363707 (0.0017) [2024-06-15 15:53:50,766][1648981] Fps is (10 sec: 52431.1, 60 sec: 45875.2, 300 sec: 47986.5). Total num frames: 744882176. Throughput: 0: 11491.6. Samples: 186293760. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:50,767][1648981] Avg episode reward: [(0, '412.960')] [2024-06-15 15:53:54,776][1651669] Updated weights for policy 0, policy_version 363760 (0.0051) [2024-06-15 15:53:55,806][1648981] Fps is (10 sec: 42428.8, 60 sec: 46936.3, 300 sec: 47758.4). Total num frames: 745078784. Throughput: 0: 11458.6. Samples: 186337280. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:53:55,807][1648981] Avg episode reward: [(0, '396.280')] [2024-06-15 15:53:56,547][1651669] Updated weights for policy 0, policy_version 363840 (0.0082) [2024-06-15 15:53:59,099][1651669] Updated weights for policy 0, policy_version 363941 (0.0134) [2024-06-15 15:54:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46979.9, 300 sec: 47986.6). Total num frames: 745406464. Throughput: 0: 11446.0. Samples: 186385920. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:00,767][1648981] Avg episode reward: [(0, '407.600')] [2024-06-15 15:54:05,118][1651669] Updated weights for policy 0, policy_version 363972 (0.0011) [2024-06-15 15:54:05,774][1648981] Fps is (10 sec: 39448.3, 60 sec: 44777.2, 300 sec: 47321.6). Total num frames: 745472000. Throughput: 0: 11525.1. Samples: 186473984. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:05,775][1648981] Avg episode reward: [(0, '414.020')] [2024-06-15 15:54:06,824][1651669] Updated weights for policy 0, policy_version 364064 (0.0011) [2024-06-15 15:54:08,773][1651669] Updated weights for policy 0, policy_version 364144 (0.0016) [2024-06-15 15:54:09,973][1651669] Updated weights for policy 0, policy_version 364192 (0.0011) [2024-06-15 15:54:10,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 745930752. Throughput: 0: 11616.7. Samples: 186502144. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:10,767][1648981] Avg episode reward: [(0, '409.630')] [2024-06-15 15:54:15,345][1651669] Updated weights for policy 0, policy_version 364225 (0.0014) [2024-06-15 15:54:15,786][1648981] Fps is (10 sec: 49092.7, 60 sec: 44222.1, 300 sec: 47316.0). Total num frames: 745963520. Throughput: 0: 11804.9. Samples: 186585088. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:15,787][1648981] Avg episode reward: [(0, '381.940')] [2024-06-15 15:54:16,614][1651274] Signal inference workers to stop experience collection... (19150 times) [2024-06-15 15:54:16,642][1651669] InferenceWorker_p0-w0: stopping experience collection (19150 times) [2024-06-15 15:54:16,779][1651274] Signal inference workers to resume experience collection... (19150 times) [2024-06-15 15:54:16,780][1651669] InferenceWorker_p0-w0: resuming experience collection (19150 times) [2024-06-15 15:54:17,035][1651669] Updated weights for policy 0, policy_version 364293 (0.0013) [2024-06-15 15:54:18,599][1651669] Updated weights for policy 0, policy_version 364368 (0.0011) [2024-06-15 15:54:20,534][1651669] Updated weights for policy 0, policy_version 364419 (0.0013) [2024-06-15 15:54:20,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 48610.3, 300 sec: 47874.6). Total num frames: 746323968. Throughput: 0: 11923.9. Samples: 186649600. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:20,767][1648981] Avg episode reward: [(0, '400.060')] [2024-06-15 15:54:21,832][1651669] Updated weights for policy 0, policy_version 364479 (0.0013) [2024-06-15 15:54:25,766][1648981] Fps is (10 sec: 49250.1, 60 sec: 43690.7, 300 sec: 47541.4). Total num frames: 746455040. Throughput: 0: 11912.7. Samples: 186689536. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:25,767][1648981] Avg episode reward: [(0, '401.780')] [2024-06-15 15:54:26,915][1651669] Updated weights for policy 0, policy_version 364542 (0.0013) [2024-06-15 15:54:28,821][1651669] Updated weights for policy 0, policy_version 364624 (0.0012) [2024-06-15 15:54:30,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.3, 300 sec: 48097.4). Total num frames: 746848256. Throughput: 0: 11719.1. Samples: 186756096. Policy #0 lag: (min: 63.0, avg: 197.8, max: 335.0) [2024-06-15 15:54:30,767][1648981] Avg episode reward: [(0, '395.010')] [2024-06-15 15:54:31,524][1651669] Updated weights for policy 0, policy_version 364688 (0.0023) [2024-06-15 15:54:32,731][1651669] Updated weights for policy 0, policy_version 364732 (0.0014) [2024-06-15 15:54:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45329.0, 300 sec: 47653.1). Total num frames: 746979328. Throughput: 0: 12037.7. Samples: 186835456. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:54:35,767][1648981] Avg episode reward: [(0, '392.210')] [2024-06-15 15:54:38,685][1651669] Updated weights for policy 0, policy_version 364801 (0.0014) [2024-06-15 15:54:40,492][1651669] Updated weights for policy 0, policy_version 364880 (0.0012) [2024-06-15 15:54:40,777][1648981] Fps is (10 sec: 42552.6, 60 sec: 48597.6, 300 sec: 47984.0). Total num frames: 747274240. Throughput: 0: 11920.3. Samples: 186873344. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:54:40,778][1648981] Avg episode reward: [(0, '392.860')] [2024-06-15 15:54:42,376][1651669] Updated weights for policy 0, policy_version 364931 (0.0022) [2024-06-15 15:54:43,467][1651669] Updated weights for policy 0, policy_version 364983 (0.0010) [2024-06-15 15:54:45,798][1648981] Fps is (10 sec: 52262.6, 60 sec: 47488.4, 300 sec: 47758.4). Total num frames: 747503616. Throughput: 0: 12324.8. Samples: 186940928. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:54:45,799][1648981] Avg episode reward: [(0, '394.940')] [2024-06-15 15:54:48,514][1651669] Updated weights for policy 0, policy_version 365040 (0.0012) [2024-06-15 15:54:50,003][1651669] Updated weights for policy 0, policy_version 365106 (0.0012) [2024-06-15 15:54:50,766][1648981] Fps is (10 sec: 52484.3, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 747798528. Throughput: 0: 12005.6. Samples: 187014144. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:54:50,767][1648981] Avg episode reward: [(0, '411.550')] [2024-06-15 15:54:51,994][1651669] Updated weights for policy 0, policy_version 365182 (0.0013) [2024-06-15 15:54:53,560][1651274] Signal inference workers to stop experience collection... (19200 times) [2024-06-15 15:54:53,609][1651669] InferenceWorker_p0-w0: stopping experience collection (19200 times) [2024-06-15 15:54:53,824][1651274] Signal inference workers to resume experience collection... (19200 times) [2024-06-15 15:54:53,825][1651669] InferenceWorker_p0-w0: resuming experience collection (19200 times) [2024-06-15 15:54:54,394][1651669] Updated weights for policy 0, policy_version 365221 (0.0012) [2024-06-15 15:54:55,771][1648981] Fps is (10 sec: 52570.6, 60 sec: 49180.8, 300 sec: 47984.9). Total num frames: 748027904. Throughput: 0: 12025.0. Samples: 187043328. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:54:55,772][1648981] Avg episode reward: [(0, '402.840')] [2024-06-15 15:54:55,778][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000365248_748027904.pth... [2024-06-15 15:54:55,881][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000359664_736591872.pth [2024-06-15 15:54:59,263][1651669] Updated weights for policy 0, policy_version 365249 (0.0011) [2024-06-15 15:55:00,618][1651669] Updated weights for policy 0, policy_version 365319 (0.0011) [2024-06-15 15:55:00,766][1648981] Fps is (10 sec: 36044.9, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 748158976. Throughput: 0: 12031.6. Samples: 187126272. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:00,767][1648981] Avg episode reward: [(0, '406.270')] [2024-06-15 15:55:02,631][1651669] Updated weights for policy 0, policy_version 365392 (0.0013) [2024-06-15 15:55:05,244][1651669] Updated weights for policy 0, policy_version 365444 (0.0012) [2024-06-15 15:55:05,766][1648981] Fps is (10 sec: 42619.0, 60 sec: 49704.6, 300 sec: 47652.5). Total num frames: 748453888. Throughput: 0: 11889.8. Samples: 187184640. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:05,767][1648981] Avg episode reward: [(0, '393.520')] [2024-06-15 15:55:09,705][1651669] Updated weights for policy 0, policy_version 365505 (0.0071) [2024-06-15 15:55:10,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45329.0, 300 sec: 47430.3). Total num frames: 748650496. Throughput: 0: 11798.7. Samples: 187220480. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:10,767][1648981] Avg episode reward: [(0, '401.840')] [2024-06-15 15:55:11,528][1651669] Updated weights for policy 0, policy_version 365570 (0.0013) [2024-06-15 15:55:13,281][1651669] Updated weights for policy 0, policy_version 365648 (0.0012) [2024-06-15 15:55:14,484][1651669] Updated weights for policy 0, policy_version 365695 (0.0035) [2024-06-15 15:55:15,786][1648981] Fps is (10 sec: 49054.6, 60 sec: 49698.2, 300 sec: 47538.2). Total num frames: 748945408. Throughput: 0: 11793.5. Samples: 187287040. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:15,787][1648981] Avg episode reward: [(0, '378.040')] [2024-06-15 15:55:20,752][1651669] Updated weights for policy 0, policy_version 365761 (0.0012) [2024-06-15 15:55:20,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 749076480. Throughput: 0: 11821.5. Samples: 187367424. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:20,767][1648981] Avg episode reward: [(0, '388.560')] [2024-06-15 15:55:22,149][1651669] Updated weights for policy 0, policy_version 365817 (0.0013) [2024-06-15 15:55:23,255][1651669] Updated weights for policy 0, policy_version 365858 (0.0013) [2024-06-15 15:55:24,584][1651669] Updated weights for policy 0, policy_version 365905 (0.0032) [2024-06-15 15:55:25,692][1651669] Updated weights for policy 0, policy_version 365951 (0.0015) [2024-06-15 15:55:25,766][1648981] Fps is (10 sec: 52532.8, 60 sec: 50244.2, 300 sec: 47874.6). Total num frames: 749469696. Throughput: 0: 11630.8. Samples: 187396608. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:25,767][1648981] Avg episode reward: [(0, '386.850')] [2024-06-15 15:55:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45875.1, 300 sec: 47543.9). Total num frames: 749600768. Throughput: 0: 11636.3. Samples: 187464192. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:30,767][1648981] Avg episode reward: [(0, '396.010')] [2024-06-15 15:55:32,528][1651669] Updated weights for policy 0, policy_version 366018 (0.0123) [2024-06-15 15:55:33,993][1651669] Updated weights for policy 0, policy_version 366080 (0.0010) [2024-06-15 15:55:35,423][1651274] Signal inference workers to stop experience collection... (19250 times) [2024-06-15 15:55:35,490][1651669] InferenceWorker_p0-w0: stopping experience collection (19250 times) [2024-06-15 15:55:35,491][1651669] Updated weights for policy 0, policy_version 366139 (0.0146) [2024-06-15 15:55:35,566][1651274] Signal inference workers to resume experience collection... (19250 times) [2024-06-15 15:55:35,566][1651669] InferenceWorker_p0-w0: resuming experience collection (19250 times) [2024-06-15 15:55:35,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 749862912. Throughput: 0: 11616.7. Samples: 187536896. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:35,767][1648981] Avg episode reward: [(0, '395.270')] [2024-06-15 15:55:36,980][1651669] Updated weights for policy 0, policy_version 366208 (0.0016) [2024-06-15 15:55:39,840][1651669] Updated weights for policy 0, policy_version 366265 (0.0011) [2024-06-15 15:55:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47522.0, 300 sec: 47763.5). Total num frames: 750125056. Throughput: 0: 11743.1. Samples: 187571712. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:40,767][1648981] Avg episode reward: [(0, '376.440')] [2024-06-15 15:55:44,204][1651669] Updated weights for policy 0, policy_version 366293 (0.0010) [2024-06-15 15:55:45,767][1648981] Fps is (10 sec: 42598.1, 60 sec: 46445.9, 300 sec: 47319.2). Total num frames: 750288896. Throughput: 0: 11628.1. Samples: 187649536. Policy #0 lag: (min: 15.0, avg: 163.8, max: 271.0) [2024-06-15 15:55:45,767][1648981] Avg episode reward: [(0, '380.060')] [2024-06-15 15:55:46,348][1651669] Updated weights for policy 0, policy_version 366369 (0.0011) [2024-06-15 15:55:48,099][1651669] Updated weights for policy 0, policy_version 366436 (0.0011) [2024-06-15 15:55:50,675][1651669] Updated weights for policy 0, policy_version 366498 (0.0122) [2024-06-15 15:55:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 750583808. Throughput: 0: 11616.7. Samples: 187707392. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:55:50,767][1648981] Avg episode reward: [(0, '389.740')] [2024-06-15 15:55:55,774][1648981] Fps is (10 sec: 39290.9, 60 sec: 44234.5, 300 sec: 46984.7). Total num frames: 750682112. Throughput: 0: 11591.9. Samples: 187742208. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:55:55,775][1648981] Avg episode reward: [(0, '381.870')] [2024-06-15 15:55:56,371][1651669] Updated weights for policy 0, policy_version 366561 (0.0013) [2024-06-15 15:55:58,563][1651669] Updated weights for policy 0, policy_version 366643 (0.0012) [2024-06-15 15:56:00,013][1651669] Updated weights for policy 0, policy_version 366695 (0.0015) [2024-06-15 15:56:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.8, 300 sec: 47541.5). Total num frames: 751042560. Throughput: 0: 11496.6. Samples: 187804160. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:00,767][1648981] Avg episode reward: [(0, '396.090')] [2024-06-15 15:56:01,978][1651669] Updated weights for policy 0, policy_version 366726 (0.0010) [2024-06-15 15:56:03,175][1651669] Updated weights for policy 0, policy_version 366773 (0.0014) [2024-06-15 15:56:05,766][1648981] Fps is (10 sec: 49191.1, 60 sec: 45329.1, 300 sec: 47099.2). Total num frames: 751173632. Throughput: 0: 11252.6. Samples: 187873792. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:05,767][1648981] Avg episode reward: [(0, '400.680')] [2024-06-15 15:56:08,360][1651669] Updated weights for policy 0, policy_version 366845 (0.0013) [2024-06-15 15:56:10,289][1651669] Updated weights for policy 0, policy_version 366896 (0.0132) [2024-06-15 15:56:10,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 46421.3, 300 sec: 47097.0). Total num frames: 751435776. Throughput: 0: 11366.4. Samples: 187908096. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:10,767][1648981] Avg episode reward: [(0, '399.530')] [2024-06-15 15:56:11,891][1651669] Updated weights for policy 0, policy_version 366963 (0.0030) [2024-06-15 15:56:14,732][1651669] Updated weights for policy 0, policy_version 367024 (0.0084) [2024-06-15 15:56:15,254][1651669] Updated weights for policy 0, policy_version 367040 (0.0011) [2024-06-15 15:56:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45890.4, 300 sec: 47208.7). Total num frames: 751697920. Throughput: 0: 11320.9. Samples: 187973632. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:15,767][1648981] Avg episode reward: [(0, '424.480')] [2024-06-15 15:56:19,454][1651274] Signal inference workers to stop experience collection... (19300 times) [2024-06-15 15:56:19,511][1651669] InferenceWorker_p0-w0: stopping experience collection (19300 times) [2024-06-15 15:56:19,683][1651274] Signal inference workers to resume experience collection... (19300 times) [2024-06-15 15:56:19,683][1651669] InferenceWorker_p0-w0: resuming experience collection (19300 times) [2024-06-15 15:56:19,898][1651669] Updated weights for policy 0, policy_version 367101 (0.0011) [2024-06-15 15:56:20,768][1648981] Fps is (10 sec: 39316.3, 60 sec: 45874.1, 300 sec: 46763.6). Total num frames: 751828992. Throughput: 0: 11343.3. Samples: 188047360. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:20,768][1648981] Avg episode reward: [(0, '412.190')] [2024-06-15 15:56:21,759][1651669] Updated weights for policy 0, policy_version 367152 (0.0010) [2024-06-15 15:56:23,335][1651669] Updated weights for policy 0, policy_version 367224 (0.0011) [2024-06-15 15:56:25,374][1651669] Updated weights for policy 0, policy_version 367266 (0.0012) [2024-06-15 15:56:25,777][1648981] Fps is (10 sec: 49100.4, 60 sec: 45321.2, 300 sec: 47539.7). Total num frames: 752189440. Throughput: 0: 11227.2. Samples: 188077056. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:25,777][1648981] Avg episode reward: [(0, '414.440')] [2024-06-15 15:56:30,603][1651669] Updated weights for policy 0, policy_version 367329 (0.0014) [2024-06-15 15:56:30,766][1648981] Fps is (10 sec: 45881.8, 60 sec: 44782.9, 300 sec: 46874.9). Total num frames: 752287744. Throughput: 0: 11252.6. Samples: 188155904. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:30,767][1648981] Avg episode reward: [(0, '403.840')] [2024-06-15 15:56:32,585][1651669] Updated weights for policy 0, policy_version 367392 (0.0011) [2024-06-15 15:56:34,866][1651669] Updated weights for policy 0, policy_version 367480 (0.0142) [2024-06-15 15:56:35,766][1648981] Fps is (10 sec: 42643.4, 60 sec: 45875.2, 300 sec: 47430.4). Total num frames: 752615424. Throughput: 0: 11264.0. Samples: 188214272. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:35,767][1648981] Avg episode reward: [(0, '412.070')] [2024-06-15 15:56:36,755][1651669] Updated weights for policy 0, policy_version 367527 (0.0013) [2024-06-15 15:56:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 43690.7, 300 sec: 46652.8). Total num frames: 752746496. Throughput: 0: 11391.2. Samples: 188254720. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:40,767][1648981] Avg episode reward: [(0, '404.260')] [2024-06-15 15:56:41,358][1651669] Updated weights for policy 0, policy_version 367571 (0.0013) [2024-06-15 15:56:42,054][1651669] Updated weights for policy 0, policy_version 367612 (0.0013) [2024-06-15 15:56:44,146][1651669] Updated weights for policy 0, policy_version 367673 (0.0013) [2024-06-15 15:56:45,376][1651669] Updated weights for policy 0, policy_version 367715 (0.0014) [2024-06-15 15:56:45,778][1648981] Fps is (10 sec: 49094.1, 60 sec: 46958.3, 300 sec: 47428.4). Total num frames: 753106944. Throughput: 0: 11636.4. Samples: 188327936. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:45,779][1648981] Avg episode reward: [(0, '389.630')] [2024-06-15 15:56:46,730][1651669] Updated weights for policy 0, policy_version 367761 (0.0011) [2024-06-15 15:56:50,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 44782.9, 300 sec: 46987.1). Total num frames: 753270784. Throughput: 0: 11832.9. Samples: 188406272. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:50,767][1648981] Avg episode reward: [(0, '397.060')] [2024-06-15 15:56:52,101][1651669] Updated weights for policy 0, policy_version 367825 (0.0020) [2024-06-15 15:56:54,130][1651669] Updated weights for policy 0, policy_version 367888 (0.0012) [2024-06-15 15:56:55,291][1651669] Updated weights for policy 0, policy_version 367937 (0.0013) [2024-06-15 15:56:55,786][1648981] Fps is (10 sec: 45838.4, 60 sec: 48050.2, 300 sec: 47205.0). Total num frames: 753565696. Throughput: 0: 11839.1. Samples: 188441088. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:56:55,787][1648981] Avg episode reward: [(0, '401.680')] [2024-06-15 15:56:56,212][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000367984_753631232.pth... [2024-06-15 15:56:56,257][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000362432_742260736.pth [2024-06-15 15:56:56,453][1651669] Updated weights for policy 0, policy_version 367995 (0.0015) [2024-06-15 15:56:57,787][1651669] Updated weights for policy 0, policy_version 368048 (0.0013) [2024-06-15 15:57:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 753795072. Throughput: 0: 11958.0. Samples: 188511744. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:57:00,767][1648981] Avg episode reward: [(0, '417.060')] [2024-06-15 15:57:02,370][1651274] Signal inference workers to stop experience collection... (19350 times) [2024-06-15 15:57:02,409][1651669] Updated weights for policy 0, policy_version 368081 (0.0014) [2024-06-15 15:57:02,449][1651669] InferenceWorker_p0-w0: stopping experience collection (19350 times) [2024-06-15 15:57:02,644][1651274] Signal inference workers to resume experience collection... (19350 times) [2024-06-15 15:57:02,666][1651669] InferenceWorker_p0-w0: resuming experience collection (19350 times) [2024-06-15 15:57:03,602][1651669] Updated weights for policy 0, policy_version 368128 (0.0014) [2024-06-15 15:57:05,499][1651669] Updated weights for policy 0, policy_version 368180 (0.0012) [2024-06-15 15:57:05,766][1648981] Fps is (10 sec: 49249.0, 60 sec: 48059.6, 300 sec: 47097.0). Total num frames: 754057216. Throughput: 0: 12151.8. Samples: 188594176. Policy #0 lag: (min: 15.0, avg: 161.2, max: 271.0) [2024-06-15 15:57:05,767][1648981] Avg episode reward: [(0, '406.190')] [2024-06-15 15:57:08,082][1651669] Updated weights for policy 0, policy_version 368298 (0.0129) [2024-06-15 15:57:10,775][1648981] Fps is (10 sec: 52383.7, 60 sec: 48052.9, 300 sec: 47317.9). Total num frames: 754319360. Throughput: 0: 12061.0. Samples: 188619776. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:10,776][1648981] Avg episode reward: [(0, '415.450')] [2024-06-15 15:57:14,150][1651669] Updated weights for policy 0, policy_version 368357 (0.0013) [2024-06-15 15:57:14,667][1651669] Updated weights for policy 0, policy_version 368384 (0.0028) [2024-06-15 15:57:15,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 754450432. Throughput: 0: 11980.8. Samples: 188695040. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:15,767][1648981] Avg episode reward: [(0, '395.730')] [2024-06-15 15:57:16,869][1651669] Updated weights for policy 0, policy_version 368448 (0.0011) [2024-06-15 15:57:18,272][1651669] Updated weights for policy 0, policy_version 368513 (0.0030) [2024-06-15 15:57:20,767][1648981] Fps is (10 sec: 52471.3, 60 sec: 50245.1, 300 sec: 47430.2). Total num frames: 754843648. Throughput: 0: 12447.1. Samples: 188774400. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:20,769][1648981] Avg episode reward: [(0, '404.400')] [2024-06-15 15:57:23,599][1651669] Updated weights for policy 0, policy_version 368608 (0.0013) [2024-06-15 15:57:25,770][1648981] Fps is (10 sec: 52408.2, 60 sec: 46426.5, 300 sec: 46765.1). Total num frames: 754974720. Throughput: 0: 12446.2. Samples: 188814848. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:25,771][1648981] Avg episode reward: [(0, '406.900')] [2024-06-15 15:57:27,318][1651669] Updated weights for policy 0, policy_version 368676 (0.0087) [2024-06-15 15:57:28,906][1651669] Updated weights for policy 0, policy_version 368753 (0.0109) [2024-06-15 15:57:29,896][1651669] Updated weights for policy 0, policy_version 368802 (0.0036) [2024-06-15 15:57:30,767][1648981] Fps is (10 sec: 52430.7, 60 sec: 51336.4, 300 sec: 47874.8). Total num frames: 755367936. Throughput: 0: 12359.5. Samples: 188883968. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:30,767][1648981] Avg episode reward: [(0, '434.590')] [2024-06-15 15:57:34,033][1651669] Updated weights for policy 0, policy_version 368868 (0.0036) [2024-06-15 15:57:35,766][1648981] Fps is (10 sec: 52449.0, 60 sec: 48059.7, 300 sec: 46986.0). Total num frames: 755499008. Throughput: 0: 12344.9. Samples: 188961792. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:35,767][1648981] Avg episode reward: [(0, '431.730')] [2024-06-15 15:57:37,658][1651669] Updated weights for policy 0, policy_version 368913 (0.0012) [2024-06-15 15:57:39,158][1651669] Updated weights for policy 0, policy_version 368992 (0.0099) [2024-06-15 15:57:39,349][1651274] Signal inference workers to stop experience collection... (19400 times) [2024-06-15 15:57:39,387][1651669] InferenceWorker_p0-w0: stopping experience collection (19400 times) [2024-06-15 15:57:39,540][1651274] Signal inference workers to resume experience collection... (19400 times) [2024-06-15 15:57:39,541][1651669] InferenceWorker_p0-w0: resuming experience collection (19400 times) [2024-06-15 15:57:40,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 51336.5, 300 sec: 47763.5). Total num frames: 755826688. Throughput: 0: 12373.1. Samples: 188997632. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:40,767][1648981] Avg episode reward: [(0, '441.490')] [2024-06-15 15:57:41,187][1651669] Updated weights for policy 0, policy_version 369084 (0.0133) [2024-06-15 15:57:45,565][1651669] Updated weights for policy 0, policy_version 369122 (0.0013) [2024-06-15 15:57:45,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47522.9, 300 sec: 46874.9). Total num frames: 755957760. Throughput: 0: 12333.5. Samples: 189066752. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:45,767][1648981] Avg episode reward: [(0, '445.470')] [2024-06-15 15:57:48,446][1651669] Updated weights for policy 0, policy_version 369157 (0.0011) [2024-06-15 15:57:50,020][1651669] Updated weights for policy 0, policy_version 369219 (0.0025) [2024-06-15 15:57:50,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 756219904. Throughput: 0: 12060.5. Samples: 189136896. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:50,767][1648981] Avg episode reward: [(0, '448.560')] [2024-06-15 15:57:51,547][1651669] Updated weights for policy 0, policy_version 369281 (0.0012) [2024-06-15 15:57:52,680][1651669] Updated weights for policy 0, policy_version 369341 (0.0011) [2024-06-15 15:57:55,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47529.2, 300 sec: 46877.4). Total num frames: 756416512. Throughput: 0: 12267.6. Samples: 189171712. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:57:55,767][1648981] Avg episode reward: [(0, '449.100')] [2024-06-15 15:57:57,002][1651669] Updated weights for policy 0, policy_version 369404 (0.0060) [2024-06-15 15:58:00,659][1651669] Updated weights for policy 0, policy_version 369456 (0.0122) [2024-06-15 15:58:00,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 756645888. Throughput: 0: 12265.2. Samples: 189246976. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:58:00,767][1648981] Avg episode reward: [(0, '430.980')] [2024-06-15 15:58:02,047][1651669] Updated weights for policy 0, policy_version 369506 (0.0011) [2024-06-15 15:58:03,318][1651669] Updated weights for policy 0, policy_version 369568 (0.0046) [2024-06-15 15:58:03,970][1651669] Updated weights for policy 0, policy_version 369600 (0.0011) [2024-06-15 15:58:05,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 756940800. Throughput: 0: 12049.2. Samples: 189316608. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:58:05,767][1648981] Avg episode reward: [(0, '431.570')] [2024-06-15 15:58:08,139][1651669] Updated weights for policy 0, policy_version 369660 (0.0014) [2024-06-15 15:58:10,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 46428.0, 300 sec: 46763.8). Total num frames: 757104640. Throughput: 0: 11811.2. Samples: 189346304. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:58:10,767][1648981] Avg episode reward: [(0, '440.830')] [2024-06-15 15:58:11,545][1651669] Updated weights for policy 0, policy_version 369726 (0.0012) [2024-06-15 15:58:13,594][1651669] Updated weights for policy 0, policy_version 369792 (0.0010) [2024-06-15 15:58:15,059][1651669] Updated weights for policy 0, policy_version 369849 (0.0099) [2024-06-15 15:58:15,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 50244.3, 300 sec: 47653.3). Total num frames: 757465088. Throughput: 0: 11491.6. Samples: 189401088. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:58:15,767][1648981] Avg episode reward: [(0, '435.060')] [2024-06-15 15:58:20,114][1651669] Updated weights for policy 0, policy_version 369892 (0.0011) [2024-06-15 15:58:20,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45875.6, 300 sec: 46652.7). Total num frames: 757596160. Throughput: 0: 11389.2. Samples: 189474304. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:58:20,767][1648981] Avg episode reward: [(0, '420.740')] [2024-06-15 15:58:23,251][1651274] Signal inference workers to stop experience collection... (19450 times) [2024-06-15 15:58:23,292][1651669] InferenceWorker_p0-w0: stopping experience collection (19450 times) [2024-06-15 15:58:23,469][1651274] Signal inference workers to resume experience collection... (19450 times) [2024-06-15 15:58:23,470][1651669] InferenceWorker_p0-w0: resuming experience collection (19450 times) [2024-06-15 15:58:23,548][1651669] Updated weights for policy 0, policy_version 369952 (0.0087) [2024-06-15 15:58:25,498][1651669] Updated weights for policy 0, policy_version 370035 (0.0152) [2024-06-15 15:58:25,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48062.9, 300 sec: 47541.4). Total num frames: 757858304. Throughput: 0: 11389.2. Samples: 189510144. Policy #0 lag: (min: 84.0, avg: 219.9, max: 308.0) [2024-06-15 15:58:25,767][1648981] Avg episode reward: [(0, '410.180')] [2024-06-15 15:58:27,098][1651669] Updated weights for policy 0, policy_version 370096 (0.0011) [2024-06-15 15:58:30,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 43690.8, 300 sec: 46541.7). Total num frames: 757989376. Throughput: 0: 11161.6. Samples: 189569024. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:58:30,767][1648981] Avg episode reward: [(0, '398.880')] [2024-06-15 15:58:32,537][1651669] Updated weights for policy 0, policy_version 370160 (0.0014) [2024-06-15 15:58:35,752][1651669] Updated weights for policy 0, policy_version 370213 (0.0011) [2024-06-15 15:58:35,786][1648981] Fps is (10 sec: 32702.4, 60 sec: 44768.1, 300 sec: 46871.8). Total num frames: 758185984. Throughput: 0: 11179.4. Samples: 189640192. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:58:35,787][1648981] Avg episode reward: [(0, '386.580')] [2024-06-15 15:58:37,436][1651669] Updated weights for policy 0, policy_version 370288 (0.0011) [2024-06-15 15:58:38,372][1651669] Updated weights for policy 0, policy_version 370336 (0.0023) [2024-06-15 15:58:40,769][1648981] Fps is (10 sec: 52414.4, 60 sec: 44780.9, 300 sec: 46985.5). Total num frames: 758513664. Throughput: 0: 11035.8. Samples: 189668352. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:58:40,770][1648981] Avg episode reward: [(0, '402.910')] [2024-06-15 15:58:43,271][1651669] Updated weights for policy 0, policy_version 370384 (0.0014) [2024-06-15 15:58:45,766][1648981] Fps is (10 sec: 45967.0, 60 sec: 44783.0, 300 sec: 46652.8). Total num frames: 758644736. Throughput: 0: 11002.3. Samples: 189742080. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:58:45,767][1648981] Avg episode reward: [(0, '404.190')] [2024-06-15 15:58:46,588][1651669] Updated weights for policy 0, policy_version 370448 (0.0018) [2024-06-15 15:58:48,334][1651669] Updated weights for policy 0, policy_version 370528 (0.0110) [2024-06-15 15:58:49,962][1651669] Updated weights for policy 0, policy_version 370592 (0.0012) [2024-06-15 15:58:50,766][1648981] Fps is (10 sec: 49165.5, 60 sec: 46421.4, 300 sec: 47214.5). Total num frames: 759005184. Throughput: 0: 10820.3. Samples: 189803520. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:58:50,767][1648981] Avg episode reward: [(0, '415.770')] [2024-06-15 15:58:53,821][1651669] Updated weights for policy 0, policy_version 370625 (0.0013) [2024-06-15 15:58:55,008][1651669] Updated weights for policy 0, policy_version 370687 (0.0012) [2024-06-15 15:58:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 759169024. Throughput: 0: 11241.2. Samples: 189852160. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:58:55,767][1648981] Avg episode reward: [(0, '393.420')] [2024-06-15 15:58:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000370688_759169024.pth... [2024-06-15 15:58:55,873][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000365248_748027904.pth [2024-06-15 15:58:59,483][1651669] Updated weights for policy 0, policy_version 370770 (0.0030) [2024-06-15 15:59:00,766][1648981] Fps is (10 sec: 42597.7, 60 sec: 46421.3, 300 sec: 47320.5). Total num frames: 759431168. Throughput: 0: 11423.2. Samples: 189915136. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:00,767][1648981] Avg episode reward: [(0, '398.720')] [2024-06-15 15:59:00,957][1651274] Signal inference workers to stop experience collection... (19500 times) [2024-06-15 15:59:01,093][1651669] InferenceWorker_p0-w0: stopping experience collection (19500 times) [2024-06-15 15:59:01,095][1651669] Updated weights for policy 0, policy_version 370839 (0.0011) [2024-06-15 15:59:01,289][1651274] Signal inference workers to resume experience collection... (19500 times) [2024-06-15 15:59:01,290][1651669] InferenceWorker_p0-w0: resuming experience collection (19500 times) [2024-06-15 15:59:04,840][1651669] Updated weights for policy 0, policy_version 370882 (0.0020) [2024-06-15 15:59:05,663][1651669] Updated weights for policy 0, policy_version 370934 (0.0011) [2024-06-15 15:59:05,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 45329.0, 300 sec: 46541.6). Total num frames: 759660544. Throughput: 0: 11616.7. Samples: 189997056. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:05,767][1648981] Avg episode reward: [(0, '404.370')] [2024-06-15 15:59:09,396][1651669] Updated weights for policy 0, policy_version 370995 (0.0026) [2024-06-15 15:59:10,759][1651669] Updated weights for policy 0, policy_version 371056 (0.0112) [2024-06-15 15:59:10,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.4, 300 sec: 47322.4). Total num frames: 759922688. Throughput: 0: 11662.2. Samples: 190034944. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:10,767][1648981] Avg episode reward: [(0, '412.430')] [2024-06-15 15:59:12,432][1651669] Updated weights for policy 0, policy_version 371128 (0.0013) [2024-06-15 15:59:15,770][1648981] Fps is (10 sec: 45858.7, 60 sec: 44234.0, 300 sec: 46763.2). Total num frames: 760119296. Throughput: 0: 11752.3. Samples: 190097920. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:15,771][1648981] Avg episode reward: [(0, '400.040')] [2024-06-15 15:59:16,446][1651669] Updated weights for policy 0, policy_version 371184 (0.0010) [2024-06-15 15:59:19,262][1651669] Updated weights for policy 0, policy_version 371233 (0.0030) [2024-06-15 15:59:20,641][1651669] Updated weights for policy 0, policy_version 371301 (0.0012) [2024-06-15 15:59:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 760414208. Throughput: 0: 11849.5. Samples: 190173184. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:20,767][1648981] Avg episode reward: [(0, '394.770')] [2024-06-15 15:59:22,075][1651669] Updated weights for policy 0, policy_version 371363 (0.0011) [2024-06-15 15:59:25,767][1648981] Fps is (10 sec: 49169.4, 60 sec: 45875.0, 300 sec: 46652.7). Total num frames: 760610816. Throughput: 0: 11947.3. Samples: 190205952. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:25,768][1648981] Avg episode reward: [(0, '407.450')] [2024-06-15 15:59:27,134][1651669] Updated weights for policy 0, policy_version 371412 (0.0011) [2024-06-15 15:59:29,854][1651669] Updated weights for policy 0, policy_version 371491 (0.0013) [2024-06-15 15:59:30,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 760905728. Throughput: 0: 12162.9. Samples: 190289408. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:30,767][1648981] Avg episode reward: [(0, '421.290')] [2024-06-15 15:59:30,888][1651669] Updated weights for policy 0, policy_version 371552 (0.0011) [2024-06-15 15:59:32,047][1651669] Updated weights for policy 0, policy_version 371603 (0.0014) [2024-06-15 15:59:35,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 49168.3, 300 sec: 46987.7). Total num frames: 761135104. Throughput: 0: 12401.8. Samples: 190361600. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:35,767][1648981] Avg episode reward: [(0, '413.540')] [2024-06-15 15:59:37,844][1651669] Updated weights for policy 0, policy_version 371666 (0.0015) [2024-06-15 15:59:39,952][1651669] Updated weights for policy 0, policy_version 371728 (0.0012) [2024-06-15 15:59:40,762][1651274] Signal inference workers to stop experience collection... (19550 times) [2024-06-15 15:59:40,782][1648981] Fps is (10 sec: 45802.3, 60 sec: 47503.2, 300 sec: 46988.5). Total num frames: 761364480. Throughput: 0: 12033.5. Samples: 190393856. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:40,783][1648981] Avg episode reward: [(0, '402.890')] [2024-06-15 15:59:40,841][1651669] InferenceWorker_p0-w0: stopping experience collection (19550 times) [2024-06-15 15:59:40,983][1651274] Signal inference workers to resume experience collection... (19550 times) [2024-06-15 15:59:40,983][1651669] InferenceWorker_p0-w0: resuming experience collection (19550 times) [2024-06-15 15:59:41,541][1651669] Updated weights for policy 0, policy_version 371794 (0.0012) [2024-06-15 15:59:42,792][1651669] Updated weights for policy 0, policy_version 371845 (0.0011) [2024-06-15 15:59:43,935][1651669] Updated weights for policy 0, policy_version 371904 (0.0011) [2024-06-15 15:59:45,790][1648981] Fps is (10 sec: 52304.4, 60 sec: 50224.3, 300 sec: 46982.2). Total num frames: 761659392. Throughput: 0: 12304.3. Samples: 190469120. Policy #0 lag: (min: 120.0, avg: 211.3, max: 334.0) [2024-06-15 15:59:45,791][1648981] Avg episode reward: [(0, '400.510')] [2024-06-15 15:59:48,382][1651669] Updated weights for policy 0, policy_version 371957 (0.0012) [2024-06-15 15:59:50,033][1651669] Updated weights for policy 0, policy_version 371990 (0.0011) [2024-06-15 15:59:50,766][1648981] Fps is (10 sec: 52511.8, 60 sec: 48059.7, 300 sec: 46986.7). Total num frames: 761888768. Throughput: 0: 12242.5. Samples: 190547968. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 15:59:50,767][1648981] Avg episode reward: [(0, '415.100')] [2024-06-15 15:59:50,925][1651669] Updated weights for policy 0, policy_version 372031 (0.0011) [2024-06-15 15:59:52,288][1651669] Updated weights for policy 0, policy_version 372094 (0.0011) [2024-06-15 15:59:53,951][1651669] Updated weights for policy 0, policy_version 372160 (0.0027) [2024-06-15 15:59:55,766][1648981] Fps is (10 sec: 52553.5, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 762183680. Throughput: 0: 12333.5. Samples: 190589952. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 15:59:55,767][1648981] Avg episode reward: [(0, '407.410')] [2024-06-15 15:59:58,433][1651669] Updated weights for policy 0, policy_version 372218 (0.0014) [2024-06-15 16:00:00,774][1648981] Fps is (10 sec: 52387.9, 60 sec: 49691.7, 300 sec: 47318.0). Total num frames: 762413056. Throughput: 0: 12753.3. Samples: 190671872. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:00,777][1648981] Avg episode reward: [(0, '401.090')] [2024-06-15 16:00:00,850][1651669] Updated weights for policy 0, policy_version 372276 (0.0012) [2024-06-15 16:00:02,572][1651669] Updated weights for policy 0, policy_version 372352 (0.0128) [2024-06-15 16:00:04,865][1651669] Updated weights for policy 0, policy_version 372412 (0.0013) [2024-06-15 16:00:05,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50790.6, 300 sec: 47652.5). Total num frames: 762707968. Throughput: 0: 12515.6. Samples: 190736384. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:05,767][1648981] Avg episode reward: [(0, '398.970')] [2024-06-15 16:00:09,216][1651669] Updated weights for policy 0, policy_version 372473 (0.0013) [2024-06-15 16:00:10,766][1648981] Fps is (10 sec: 42632.0, 60 sec: 48606.0, 300 sec: 47100.2). Total num frames: 762839040. Throughput: 0: 12754.6. Samples: 190779904. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:10,767][1648981] Avg episode reward: [(0, '394.510')] [2024-06-15 16:00:12,022][1651669] Updated weights for policy 0, policy_version 372544 (0.0012) [2024-06-15 16:00:13,587][1651669] Updated weights for policy 0, policy_version 372608 (0.0010) [2024-06-15 16:00:15,717][1651669] Updated weights for policy 0, policy_version 372668 (0.0013) [2024-06-15 16:00:15,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 51339.7, 300 sec: 47874.6). Total num frames: 763199488. Throughput: 0: 12276.6. Samples: 190841856. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:15,767][1648981] Avg episode reward: [(0, '400.570')] [2024-06-15 16:00:19,578][1651669] Updated weights for policy 0, policy_version 372706 (0.0059) [2024-06-15 16:00:20,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.0, 300 sec: 47097.1). Total num frames: 763363328. Throughput: 0: 12549.7. Samples: 190926336. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:20,767][1648981] Avg episode reward: [(0, '404.480')] [2024-06-15 16:00:22,069][1651669] Updated weights for policy 0, policy_version 372768 (0.0123) [2024-06-15 16:00:22,177][1651274] Signal inference workers to stop experience collection... (19600 times) [2024-06-15 16:00:22,222][1651669] InferenceWorker_p0-w0: stopping experience collection (19600 times) [2024-06-15 16:00:22,471][1651274] Signal inference workers to resume experience collection... (19600 times) [2024-06-15 16:00:22,477][1651669] InferenceWorker_p0-w0: resuming experience collection (19600 times) [2024-06-15 16:00:23,979][1651669] Updated weights for policy 0, policy_version 372834 (0.0090) [2024-06-15 16:00:25,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 50244.3, 300 sec: 47541.3). Total num frames: 763625472. Throughput: 0: 12428.9. Samples: 190952960. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:25,767][1648981] Avg episode reward: [(0, '404.040')] [2024-06-15 16:00:25,972][1651669] Updated weights for policy 0, policy_version 372880 (0.0033) [2024-06-15 16:00:26,828][1651669] Updated weights for policy 0, policy_version 372928 (0.0010) [2024-06-15 16:00:30,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49152.0, 300 sec: 47430.3). Total num frames: 763854848. Throughput: 0: 12579.1. Samples: 191034880. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:30,767][1648981] Avg episode reward: [(0, '401.410')] [2024-06-15 16:00:30,774][1651669] Updated weights for policy 0, policy_version 372981 (0.0013) [2024-06-15 16:00:33,539][1651669] Updated weights for policy 0, policy_version 373040 (0.0013) [2024-06-15 16:00:35,625][1651669] Updated weights for policy 0, policy_version 373113 (0.0278) [2024-06-15 16:00:35,779][1648981] Fps is (10 sec: 52365.6, 60 sec: 50234.1, 300 sec: 47539.4). Total num frames: 764149760. Throughput: 0: 12102.7. Samples: 191092736. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:35,779][1648981] Avg episode reward: [(0, '400.670')] [2024-06-15 16:00:37,770][1651669] Updated weights for policy 0, policy_version 373172 (0.0012) [2024-06-15 16:00:40,770][1648981] Fps is (10 sec: 42582.1, 60 sec: 48615.7, 300 sec: 47429.7). Total num frames: 764280832. Throughput: 0: 11900.2. Samples: 191125504. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:40,771][1648981] Avg episode reward: [(0, '387.120')] [2024-06-15 16:00:42,084][1651669] Updated weights for policy 0, policy_version 373223 (0.0098) [2024-06-15 16:00:43,829][1651669] Updated weights for policy 0, policy_version 373269 (0.0019) [2024-06-15 16:00:45,179][1651669] Updated weights for policy 0, policy_version 373328 (0.0014) [2024-06-15 16:00:45,767][1648981] Fps is (10 sec: 45931.1, 60 sec: 49171.5, 300 sec: 47541.4). Total num frames: 764608512. Throughput: 0: 11903.2. Samples: 191207424. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:45,769][1648981] Avg episode reward: [(0, '377.910')] [2024-06-15 16:00:48,497][1651669] Updated weights for policy 0, policy_version 373410 (0.0013) [2024-06-15 16:00:50,766][1648981] Fps is (10 sec: 52448.5, 60 sec: 48605.9, 300 sec: 47875.9). Total num frames: 764805120. Throughput: 0: 12037.7. Samples: 191278080. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:50,767][1648981] Avg episode reward: [(0, '380.310')] [2024-06-15 16:00:52,491][1651669] Updated weights for policy 0, policy_version 373472 (0.0013) [2024-06-15 16:00:54,360][1651669] Updated weights for policy 0, policy_version 373522 (0.0012) [2024-06-15 16:00:55,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 47541.3). Total num frames: 765067264. Throughput: 0: 11969.4. Samples: 191318528. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:00:55,767][1648981] Avg episode reward: [(0, '349.820')] [2024-06-15 16:00:56,006][1651669] Updated weights for policy 0, policy_version 373586 (0.0013) [2024-06-15 16:00:56,268][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000373600_765132800.pth... [2024-06-15 16:00:56,409][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000367984_753631232.pth [2024-06-15 16:00:57,961][1651669] Updated weights for policy 0, policy_version 373639 (0.0014) [2024-06-15 16:00:59,198][1651669] Updated weights for policy 0, policy_version 373689 (0.0013) [2024-06-15 16:01:00,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 48612.1, 300 sec: 47985.7). Total num frames: 765329408. Throughput: 0: 12140.1. Samples: 191388160. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:01:00,767][1648981] Avg episode reward: [(0, '347.210')] [2024-06-15 16:01:03,942][1651669] Updated weights for policy 0, policy_version 373744 (0.0104) [2024-06-15 16:01:05,377][1651669] Updated weights for policy 0, policy_version 373763 (0.0049) [2024-06-15 16:01:05,647][1651274] Signal inference workers to stop experience collection... (19650 times) [2024-06-15 16:01:05,708][1651669] InferenceWorker_p0-w0: stopping experience collection (19650 times) [2024-06-15 16:01:05,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 46421.3, 300 sec: 47652.5). Total num frames: 765493248. Throughput: 0: 11855.7. Samples: 191459840. Policy #0 lag: (min: 15.0, avg: 103.6, max: 271.0) [2024-06-15 16:01:05,767][1648981] Avg episode reward: [(0, '346.000')] [2024-06-15 16:01:05,803][1651274] Signal inference workers to resume experience collection... (19650 times) [2024-06-15 16:01:05,811][1651669] InferenceWorker_p0-w0: resuming experience collection (19650 times) [2024-06-15 16:01:06,331][1651669] Updated weights for policy 0, policy_version 373824 (0.0012) [2024-06-15 16:01:07,664][1651669] Updated weights for policy 0, policy_version 373887 (0.0013) [2024-06-15 16:01:10,230][1651669] Updated weights for policy 0, policy_version 373952 (0.0015) [2024-06-15 16:01:10,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 50244.4, 300 sec: 47985.7). Total num frames: 765853696. Throughput: 0: 12071.9. Samples: 191496192. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:10,767][1648981] Avg episode reward: [(0, '349.740')] [2024-06-15 16:01:14,976][1651669] Updated weights for policy 0, policy_version 374005 (0.0013) [2024-06-15 16:01:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.5, 300 sec: 48097.0). Total num frames: 766017536. Throughput: 0: 11901.1. Samples: 191570432. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:15,767][1648981] Avg episode reward: [(0, '356.870')] [2024-06-15 16:01:16,483][1651669] Updated weights for policy 0, policy_version 374067 (0.0012) [2024-06-15 16:01:17,958][1651669] Updated weights for policy 0, policy_version 374117 (0.0011) [2024-06-15 16:01:20,730][1651669] Updated weights for policy 0, policy_version 374176 (0.0014) [2024-06-15 16:01:20,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 49152.0, 300 sec: 47876.3). Total num frames: 766312448. Throughput: 0: 12177.5. Samples: 191640576. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:20,767][1648981] Avg episode reward: [(0, '363.450')] [2024-06-15 16:01:24,983][1651669] Updated weights for policy 0, policy_version 374224 (0.0010) [2024-06-15 16:01:25,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 766443520. Throughput: 0: 12402.8. Samples: 191683584. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:25,767][1648981] Avg episode reward: [(0, '339.450')] [2024-06-15 16:01:27,147][1651669] Updated weights for policy 0, policy_version 374304 (0.0012) [2024-06-15 16:01:28,620][1651669] Updated weights for policy 0, policy_version 374353 (0.0013) [2024-06-15 16:01:30,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 766771200. Throughput: 0: 11935.3. Samples: 191744512. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:30,767][1648981] Avg episode reward: [(0, '335.720')] [2024-06-15 16:01:31,162][1651669] Updated weights for policy 0, policy_version 374416 (0.0010) [2024-06-15 16:01:32,071][1651669] Updated weights for policy 0, policy_version 374460 (0.0013) [2024-06-15 16:01:35,767][1648981] Fps is (10 sec: 49152.0, 60 sec: 46430.7, 300 sec: 48096.7). Total num frames: 766935040. Throughput: 0: 12344.9. Samples: 191833600. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:35,767][1648981] Avg episode reward: [(0, '347.740')] [2024-06-15 16:01:37,350][1651669] Updated weights for policy 0, policy_version 374545 (0.0011) [2024-06-15 16:01:39,457][1651669] Updated weights for policy 0, policy_version 374593 (0.0013) [2024-06-15 16:01:40,782][1648981] Fps is (10 sec: 49075.2, 60 sec: 49688.3, 300 sec: 47985.1). Total num frames: 767262720. Throughput: 0: 12044.9. Samples: 191860736. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:40,782][1648981] Avg episode reward: [(0, '362.750')] [2024-06-15 16:01:40,856][1651669] Updated weights for policy 0, policy_version 374656 (0.0010) [2024-06-15 16:01:42,645][1651669] Updated weights for policy 0, policy_version 374710 (0.0021) [2024-06-15 16:01:45,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 767426560. Throughput: 0: 12037.7. Samples: 191929856. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:45,767][1648981] Avg episode reward: [(0, '365.750')] [2024-06-15 16:01:48,377][1651274] Signal inference workers to stop experience collection... (19700 times) [2024-06-15 16:01:48,453][1651669] InferenceWorker_p0-w0: stopping experience collection (19700 times) [2024-06-15 16:01:48,651][1651274] Signal inference workers to resume experience collection... (19700 times) [2024-06-15 16:01:48,666][1651669] InferenceWorker_p0-w0: resuming experience collection (19700 times) [2024-06-15 16:01:48,668][1651669] Updated weights for policy 0, policy_version 374800 (0.0083) [2024-06-15 16:01:50,439][1651669] Updated weights for policy 0, policy_version 374850 (0.0013) [2024-06-15 16:01:50,766][1648981] Fps is (10 sec: 45947.2, 60 sec: 48605.9, 300 sec: 47988.9). Total num frames: 767721472. Throughput: 0: 12140.1. Samples: 192006144. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:50,767][1648981] Avg episode reward: [(0, '383.700')] [2024-06-15 16:01:52,029][1651669] Updated weights for policy 0, policy_version 374913 (0.0015) [2024-06-15 16:01:53,345][1651669] Updated weights for policy 0, policy_version 374975 (0.0148) [2024-06-15 16:01:55,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 767950848. Throughput: 0: 11889.8. Samples: 192031232. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:01:55,767][1648981] Avg episode reward: [(0, '406.810')] [2024-06-15 16:01:58,936][1651669] Updated weights for policy 0, policy_version 375025 (0.0016) [2024-06-15 16:02:00,493][1651669] Updated weights for policy 0, policy_version 375099 (0.0013) [2024-06-15 16:02:00,767][1648981] Fps is (10 sec: 49150.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 768212992. Throughput: 0: 12060.4. Samples: 192113152. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:02:00,768][1648981] Avg episode reward: [(0, '414.660')] [2024-06-15 16:02:02,554][1651669] Updated weights for policy 0, policy_version 375156 (0.0012) [2024-06-15 16:02:04,271][1651669] Updated weights for policy 0, policy_version 375216 (0.0026) [2024-06-15 16:02:05,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49698.1, 300 sec: 47987.1). Total num frames: 768475136. Throughput: 0: 12094.6. Samples: 192184832. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:02:05,767][1648981] Avg episode reward: [(0, '408.620')] [2024-06-15 16:02:08,629][1651669] Updated weights for policy 0, policy_version 375248 (0.0009) [2024-06-15 16:02:10,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 46420.9, 300 sec: 48096.7). Total num frames: 768638976. Throughput: 0: 12128.6. Samples: 192229376. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:02:10,768][1648981] Avg episode reward: [(0, '408.770')] [2024-06-15 16:02:10,798][1651669] Updated weights for policy 0, policy_version 375328 (0.0015) [2024-06-15 16:02:12,377][1651669] Updated weights for policy 0, policy_version 375364 (0.0012) [2024-06-15 16:02:13,654][1651669] Updated weights for policy 0, policy_version 375421 (0.0013) [2024-06-15 16:02:15,712][1651669] Updated weights for policy 0, policy_version 375459 (0.0024) [2024-06-15 16:02:15,786][1648981] Fps is (10 sec: 45784.5, 60 sec: 48589.8, 300 sec: 47760.4). Total num frames: 768933888. Throughput: 0: 12077.9. Samples: 192288256. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:02:15,787][1648981] Avg episode reward: [(0, '422.450')] [2024-06-15 16:02:19,651][1651669] Updated weights for policy 0, policy_version 375520 (0.0012) [2024-06-15 16:02:20,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 47513.6, 300 sec: 48097.4). Total num frames: 769163264. Throughput: 0: 11832.9. Samples: 192366080. Policy #0 lag: (min: 31.0, avg: 135.6, max: 287.0) [2024-06-15 16:02:20,767][1648981] Avg episode reward: [(0, '445.790')] [2024-06-15 16:02:21,110][1651669] Updated weights for policy 0, policy_version 375586 (0.0104) [2024-06-15 16:02:23,303][1651669] Updated weights for policy 0, policy_version 375633 (0.0019) [2024-06-15 16:02:25,020][1651669] Updated weights for policy 0, policy_version 375684 (0.0015) [2024-06-15 16:02:25,766][1648981] Fps is (10 sec: 52533.3, 60 sec: 50244.4, 300 sec: 47763.6). Total num frames: 769458176. Throughput: 0: 12121.6. Samples: 192406016. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:25,767][1648981] Avg episode reward: [(0, '447.350')] [2024-06-15 16:02:26,340][1651669] Updated weights for policy 0, policy_version 375743 (0.0019) [2024-06-15 16:02:30,486][1651274] Signal inference workers to stop experience collection... (19750 times) [2024-06-15 16:02:30,541][1651669] InferenceWorker_p0-w0: stopping experience collection (19750 times) [2024-06-15 16:02:30,713][1651274] Signal inference workers to resume experience collection... (19750 times) [2024-06-15 16:02:30,714][1651669] InferenceWorker_p0-w0: resuming experience collection (19750 times) [2024-06-15 16:02:30,770][1648981] Fps is (10 sec: 42582.2, 60 sec: 46964.4, 300 sec: 47762.9). Total num frames: 769589248. Throughput: 0: 12252.8. Samples: 192481280. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:30,771][1648981] Avg episode reward: [(0, '443.190')] [2024-06-15 16:02:31,255][1651669] Updated weights for policy 0, policy_version 375794 (0.0012) [2024-06-15 16:02:32,459][1651669] Updated weights for policy 0, policy_version 375856 (0.0011) [2024-06-15 16:02:34,941][1651669] Updated weights for policy 0, policy_version 375929 (0.0012) [2024-06-15 16:02:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49698.3, 300 sec: 47763.5). Total num frames: 769916928. Throughput: 0: 12037.7. Samples: 192547840. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:35,767][1648981] Avg episode reward: [(0, '437.320')] [2024-06-15 16:02:36,733][1651669] Updated weights for policy 0, policy_version 375969 (0.0012) [2024-06-15 16:02:40,790][1648981] Fps is (10 sec: 49054.2, 60 sec: 46961.1, 300 sec: 47870.7). Total num frames: 770080768. Throughput: 0: 12281.5. Samples: 192584192. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:40,791][1648981] Avg episode reward: [(0, '420.150')] [2024-06-15 16:02:41,092][1651669] Updated weights for policy 0, policy_version 376032 (0.0012) [2024-06-15 16:02:43,002][1651669] Updated weights for policy 0, policy_version 376112 (0.0027) [2024-06-15 16:02:45,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 770375680. Throughput: 0: 12049.1. Samples: 192655360. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:45,767][1648981] Avg episode reward: [(0, '415.450')] [2024-06-15 16:02:46,519][1651669] Updated weights for policy 0, policy_version 376192 (0.0013) [2024-06-15 16:02:47,987][1651669] Updated weights for policy 0, policy_version 376244 (0.0027) [2024-06-15 16:02:50,774][1648981] Fps is (10 sec: 49230.9, 60 sec: 47507.4, 300 sec: 47984.4). Total num frames: 770572288. Throughput: 0: 12172.1. Samples: 192732672. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:50,775][1648981] Avg episode reward: [(0, '412.200')] [2024-06-15 16:02:51,890][1651669] Updated weights for policy 0, policy_version 376290 (0.0012) [2024-06-15 16:02:53,160][1651669] Updated weights for policy 0, policy_version 376352 (0.0015) [2024-06-15 16:02:55,774][1648981] Fps is (10 sec: 45838.9, 60 sec: 48053.4, 300 sec: 48095.5). Total num frames: 770834432. Throughput: 0: 11830.9. Samples: 192761856. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:02:55,775][1648981] Avg episode reward: [(0, '396.180')] [2024-06-15 16:02:55,780][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000376384_770834432.pth... [2024-06-15 16:02:55,822][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000370688_759169024.pth [2024-06-15 16:02:57,428][1651669] Updated weights for policy 0, policy_version 376432 (0.0013) [2024-06-15 16:02:58,864][1651669] Updated weights for policy 0, policy_version 376480 (0.0012) [2024-06-15 16:03:00,766][1648981] Fps is (10 sec: 52470.1, 60 sec: 48060.0, 300 sec: 47985.7). Total num frames: 771096576. Throughput: 0: 12179.6. Samples: 192836096. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:00,767][1648981] Avg episode reward: [(0, '394.880')] [2024-06-15 16:03:01,743][1651669] Updated weights for policy 0, policy_version 376528 (0.0014) [2024-06-15 16:03:03,701][1651669] Updated weights for policy 0, policy_version 376608 (0.0110) [2024-06-15 16:03:05,778][1648981] Fps is (10 sec: 52408.3, 60 sec: 48050.3, 300 sec: 48317.0). Total num frames: 771358720. Throughput: 0: 12080.1. Samples: 192909824. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:05,779][1648981] Avg episode reward: [(0, '396.710')] [2024-06-15 16:03:07,832][1651669] Updated weights for policy 0, policy_version 376658 (0.0048) [2024-06-15 16:03:09,011][1651669] Updated weights for policy 0, policy_version 376705 (0.0013) [2024-06-15 16:03:09,631][1651274] Signal inference workers to stop experience collection... (19800 times) [2024-06-15 16:03:09,687][1651669] InferenceWorker_p0-w0: stopping experience collection (19800 times) [2024-06-15 16:03:09,789][1651274] Signal inference workers to resume experience collection... (19800 times) [2024-06-15 16:03:09,791][1651669] InferenceWorker_p0-w0: resuming experience collection (19800 times) [2024-06-15 16:03:09,962][1651669] Updated weights for policy 0, policy_version 376763 (0.0012) [2024-06-15 16:03:10,767][1648981] Fps is (10 sec: 52426.9, 60 sec: 49698.2, 300 sec: 47985.6). Total num frames: 771620864. Throughput: 0: 12060.4. Samples: 192948736. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:10,767][1648981] Avg episode reward: [(0, '393.430')] [2024-06-15 16:03:13,107][1651669] Updated weights for policy 0, policy_version 376816 (0.0010) [2024-06-15 16:03:14,684][1651669] Updated weights for policy 0, policy_version 376885 (0.0011) [2024-06-15 16:03:15,766][1648981] Fps is (10 sec: 52490.9, 60 sec: 49168.3, 300 sec: 48430.0). Total num frames: 771883008. Throughput: 0: 12061.5. Samples: 193024000. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:15,767][1648981] Avg episode reward: [(0, '385.870')] [2024-06-15 16:03:19,771][1651669] Updated weights for policy 0, policy_version 376951 (0.0120) [2024-06-15 16:03:20,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 772046848. Throughput: 0: 12060.4. Samples: 193090560. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:20,767][1648981] Avg episode reward: [(0, '368.590')] [2024-06-15 16:03:21,505][1651669] Updated weights for policy 0, policy_version 377024 (0.0012) [2024-06-15 16:03:24,251][1651669] Updated weights for policy 0, policy_version 377075 (0.0012) [2024-06-15 16:03:25,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 772374528. Throughput: 0: 12169.3. Samples: 193131520. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:25,767][1648981] Avg episode reward: [(0, '370.080')] [2024-06-15 16:03:25,921][1651669] Updated weights for policy 0, policy_version 377143 (0.0012) [2024-06-15 16:03:30,577][1651669] Updated weights for policy 0, policy_version 377184 (0.0012) [2024-06-15 16:03:30,779][1648981] Fps is (10 sec: 42543.7, 60 sec: 48052.5, 300 sec: 48431.2). Total num frames: 772472832. Throughput: 0: 12079.7. Samples: 193199104. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:30,780][1648981] Avg episode reward: [(0, '376.180')] [2024-06-15 16:03:31,771][1651669] Updated weights for policy 0, policy_version 377233 (0.0012) [2024-06-15 16:03:32,681][1651669] Updated weights for policy 0, policy_version 377276 (0.0011) [2024-06-15 16:03:34,929][1651669] Updated weights for policy 0, policy_version 377328 (0.0013) [2024-06-15 16:03:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48605.7, 300 sec: 48541.5). Total num frames: 772833280. Throughput: 0: 11903.2. Samples: 193268224. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:35,767][1648981] Avg episode reward: [(0, '401.060')] [2024-06-15 16:03:36,855][1651669] Updated weights for policy 0, policy_version 377405 (0.0011) [2024-06-15 16:03:40,766][1648981] Fps is (10 sec: 45934.2, 60 sec: 47532.5, 300 sec: 48430.0). Total num frames: 772931584. Throughput: 0: 11982.9. Samples: 193300992. Policy #0 lag: (min: 15.0, avg: 148.0, max: 271.0) [2024-06-15 16:03:40,767][1648981] Avg episode reward: [(0, '404.240')] [2024-06-15 16:03:41,828][1651669] Updated weights for policy 0, policy_version 377456 (0.0012) [2024-06-15 16:03:43,412][1651669] Updated weights for policy 0, policy_version 377520 (0.0012) [2024-06-15 16:03:45,691][1651669] Updated weights for policy 0, policy_version 377568 (0.0016) [2024-06-15 16:03:45,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 773259264. Throughput: 0: 12037.7. Samples: 193377792. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:03:45,767][1648981] Avg episode reward: [(0, '406.310')] [2024-06-15 16:03:47,020][1651669] Updated weights for policy 0, policy_version 377617 (0.0012) [2024-06-15 16:03:48,003][1651669] Updated weights for policy 0, policy_version 377661 (0.0016) [2024-06-15 16:03:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48066.0, 300 sec: 48430.0). Total num frames: 773455872. Throughput: 0: 11984.0. Samples: 193448960. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:03:50,767][1648981] Avg episode reward: [(0, '409.350')] [2024-06-15 16:03:52,122][1651274] Signal inference workers to stop experience collection... (19850 times) [2024-06-15 16:03:52,237][1651669] InferenceWorker_p0-w0: stopping experience collection (19850 times) [2024-06-15 16:03:52,347][1651274] Signal inference workers to resume experience collection... (19850 times) [2024-06-15 16:03:52,348][1651669] InferenceWorker_p0-w0: resuming experience collection (19850 times) [2024-06-15 16:03:52,535][1651669] Updated weights for policy 0, policy_version 377697 (0.0029) [2024-06-15 16:03:53,682][1651669] Updated weights for policy 0, policy_version 377760 (0.0020) [2024-06-15 16:03:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48066.0, 300 sec: 48430.0). Total num frames: 773718016. Throughput: 0: 12003.6. Samples: 193488896. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:03:55,767][1648981] Avg episode reward: [(0, '402.970')] [2024-06-15 16:03:56,762][1651669] Updated weights for policy 0, policy_version 377809 (0.0011) [2024-06-15 16:03:58,250][1651669] Updated weights for policy 0, policy_version 377872 (0.0015) [2024-06-15 16:04:00,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 48541.1). Total num frames: 773980160. Throughput: 0: 11753.2. Samples: 193552896. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:00,767][1648981] Avg episode reward: [(0, '414.970')] [2024-06-15 16:04:03,093][1651669] Updated weights for policy 0, policy_version 377936 (0.0098) [2024-06-15 16:04:04,960][1651669] Updated weights for policy 0, policy_version 378016 (0.0103) [2024-06-15 16:04:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48069.1, 300 sec: 48541.1). Total num frames: 774242304. Throughput: 0: 11855.6. Samples: 193624064. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:05,767][1648981] Avg episode reward: [(0, '406.240')] [2024-06-15 16:04:07,208][1651669] Updated weights for policy 0, policy_version 378064 (0.0013) [2024-06-15 16:04:09,325][1651669] Updated weights for policy 0, policy_version 378160 (0.0011) [2024-06-15 16:04:10,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.9, 300 sec: 48763.8). Total num frames: 774504448. Throughput: 0: 11776.0. Samples: 193661440. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:10,767][1648981] Avg episode reward: [(0, '410.800')] [2024-06-15 16:04:14,799][1651669] Updated weights for policy 0, policy_version 378208 (0.0021) [2024-06-15 16:04:15,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 45875.2, 300 sec: 48207.8). Total num frames: 774635520. Throughput: 0: 11870.4. Samples: 193733120. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:15,767][1648981] Avg episode reward: [(0, '400.020')] [2024-06-15 16:04:15,979][1651669] Updated weights for policy 0, policy_version 378261 (0.0028) [2024-06-15 16:04:16,805][1651669] Updated weights for policy 0, policy_version 378302 (0.0010) [2024-06-15 16:04:19,634][1651669] Updated weights for policy 0, policy_version 378375 (0.0013) [2024-06-15 16:04:20,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 48763.3). Total num frames: 774995968. Throughput: 0: 11764.6. Samples: 193797632. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:20,767][1648981] Avg episode reward: [(0, '403.880')] [2024-06-15 16:04:20,942][1651669] Updated weights for policy 0, policy_version 378431 (0.0058) [2024-06-15 16:04:25,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 44236.8, 300 sec: 47874.6). Total num frames: 775028736. Throughput: 0: 11867.0. Samples: 193835008. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:25,767][1648981] Avg episode reward: [(0, '396.410')] [2024-06-15 16:04:27,212][1651669] Updated weights for policy 0, policy_version 378512 (0.0016) [2024-06-15 16:04:29,895][1651669] Updated weights for policy 0, policy_version 378576 (0.0120) [2024-06-15 16:04:30,367][1651274] Signal inference workers to stop experience collection... (19900 times) [2024-06-15 16:04:30,469][1651669] InferenceWorker_p0-w0: stopping experience collection (19900 times) [2024-06-15 16:04:30,553][1651274] Signal inference workers to resume experience collection... (19900 times) [2024-06-15 16:04:30,553][1651669] InferenceWorker_p0-w0: resuming experience collection (19900 times) [2024-06-15 16:04:30,749][1651669] Updated weights for policy 0, policy_version 378617 (0.0013) [2024-06-15 16:04:30,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48616.3, 300 sec: 48318.9). Total num frames: 775389184. Throughput: 0: 11685.0. Samples: 193903616. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:30,767][1648981] Avg episode reward: [(0, '389.840')] [2024-06-15 16:04:31,735][1651669] Updated weights for policy 0, policy_version 378656 (0.0011) [2024-06-15 16:04:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 45329.2, 300 sec: 48099.4). Total num frames: 775553024. Throughput: 0: 11821.5. Samples: 193980928. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:35,767][1648981] Avg episode reward: [(0, '385.210')] [2024-06-15 16:04:37,341][1651669] Updated weights for policy 0, policy_version 378720 (0.0011) [2024-06-15 16:04:38,926][1651669] Updated weights for policy 0, policy_version 378786 (0.0013) [2024-06-15 16:04:40,589][1651669] Updated weights for policy 0, policy_version 378848 (0.0010) [2024-06-15 16:04:40,775][1648981] Fps is (10 sec: 49107.8, 60 sec: 49144.6, 300 sec: 48210.3). Total num frames: 775880704. Throughput: 0: 11682.7. Samples: 194014720. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:40,776][1648981] Avg episode reward: [(0, '378.200')] [2024-06-15 16:04:42,397][1651669] Updated weights for policy 0, policy_version 378928 (0.0012) [2024-06-15 16:04:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 776077312. Throughput: 0: 11855.7. Samples: 194086400. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:45,767][1648981] Avg episode reward: [(0, '379.180')] [2024-06-15 16:04:49,070][1651669] Updated weights for policy 0, policy_version 378992 (0.0019) [2024-06-15 16:04:50,300][1651669] Updated weights for policy 0, policy_version 379043 (0.0012) [2024-06-15 16:04:50,767][1648981] Fps is (10 sec: 42636.1, 60 sec: 47513.4, 300 sec: 47874.6). Total num frames: 776306688. Throughput: 0: 12037.7. Samples: 194165760. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:50,769][1648981] Avg episode reward: [(0, '379.620')] [2024-06-15 16:04:52,434][1651669] Updated weights for policy 0, policy_version 379122 (0.0023) [2024-06-15 16:04:53,768][1651669] Updated weights for policy 0, policy_version 379200 (0.0014) [2024-06-15 16:04:55,774][1648981] Fps is (10 sec: 52387.1, 60 sec: 48053.4, 300 sec: 48096.7). Total num frames: 776601600. Throughput: 0: 11819.4. Samples: 194193408. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:04:55,775][1648981] Avg episode reward: [(0, '384.490')] [2024-06-15 16:04:55,780][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000379200_776601600.pth... [2024-06-15 16:04:55,843][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000373600_765132800.pth [2024-06-15 16:05:00,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 776699904. Throughput: 0: 11992.2. Samples: 194272768. Policy #0 lag: (min: 13.0, avg: 127.6, max: 269.0) [2024-06-15 16:05:00,767][1648981] Avg episode reward: [(0, '375.680')] [2024-06-15 16:05:01,341][1651669] Updated weights for policy 0, policy_version 379280 (0.0014) [2024-06-15 16:05:03,072][1651669] Updated weights for policy 0, policy_version 379344 (0.0011) [2024-06-15 16:05:04,660][1651669] Updated weights for policy 0, policy_version 379408 (0.0011) [2024-06-15 16:05:05,422][1651669] Updated weights for policy 0, policy_version 379456 (0.0011) [2024-06-15 16:05:05,766][1648981] Fps is (10 sec: 52470.3, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 777125888. Throughput: 0: 11685.0. Samples: 194323456. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:05,767][1648981] Avg episode reward: [(0, '358.350')] [2024-06-15 16:05:10,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 43690.5, 300 sec: 47208.1). Total num frames: 777125888. Throughput: 0: 11810.1. Samples: 194366464. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:10,768][1648981] Avg episode reward: [(0, '357.800')] [2024-06-15 16:05:12,553][1651274] Signal inference workers to stop experience collection... (19950 times) [2024-06-15 16:05:12,629][1651669] InferenceWorker_p0-w0: stopping experience collection (19950 times) [2024-06-15 16:05:12,753][1651274] Signal inference workers to resume experience collection... (19950 times) [2024-06-15 16:05:12,755][1651669] InferenceWorker_p0-w0: resuming experience collection (19950 times) [2024-06-15 16:05:12,834][1651669] Updated weights for policy 0, policy_version 379536 (0.0090) [2024-06-15 16:05:14,138][1651669] Updated weights for policy 0, policy_version 379586 (0.0009) [2024-06-15 16:05:15,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48605.7, 300 sec: 48096.7). Total num frames: 777551872. Throughput: 0: 11730.5. Samples: 194431488. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:15,767][1648981] Avg episode reward: [(0, '354.020')] [2024-06-15 16:05:15,866][1651669] Updated weights for policy 0, policy_version 379665 (0.0010) [2024-06-15 16:05:20,786][1648981] Fps is (10 sec: 52327.3, 60 sec: 44222.4, 300 sec: 47538.2). Total num frames: 777650176. Throughput: 0: 11702.6. Samples: 194507776. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:20,787][1648981] Avg episode reward: [(0, '365.240')] [2024-06-15 16:05:22,303][1651669] Updated weights for policy 0, policy_version 379731 (0.0063) [2024-06-15 16:05:23,824][1651669] Updated weights for policy 0, policy_version 379794 (0.0012) [2024-06-15 16:05:25,716][1651669] Updated weights for policy 0, policy_version 379877 (0.0012) [2024-06-15 16:05:25,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49151.9, 300 sec: 47874.6). Total num frames: 777977856. Throughput: 0: 11687.3. Samples: 194540544. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:25,767][1648981] Avg episode reward: [(0, '372.910')] [2024-06-15 16:05:26,977][1651669] Updated weights for policy 0, policy_version 379905 (0.0011) [2024-06-15 16:05:30,769][1648981] Fps is (10 sec: 52520.1, 60 sec: 46419.6, 300 sec: 47543.0). Total num frames: 778174464. Throughput: 0: 11627.5. Samples: 194609664. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:30,769][1648981] Avg episode reward: [(0, '393.110')] [2024-06-15 16:05:33,064][1651669] Updated weights for policy 0, policy_version 379970 (0.0011) [2024-06-15 16:05:35,059][1651669] Updated weights for policy 0, policy_version 380050 (0.0069) [2024-06-15 16:05:35,773][1648981] Fps is (10 sec: 42572.5, 60 sec: 47508.6, 300 sec: 47874.2). Total num frames: 778403840. Throughput: 0: 11421.8. Samples: 194679808. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:35,775][1648981] Avg episode reward: [(0, '399.490')] [2024-06-15 16:05:36,028][1651669] Updated weights for policy 0, policy_version 380103 (0.0017) [2024-06-15 16:05:37,086][1651669] Updated weights for policy 0, policy_version 380159 (0.0011) [2024-06-15 16:05:38,428][1651669] Updated weights for policy 0, policy_version 380214 (0.0011) [2024-06-15 16:05:40,766][1648981] Fps is (10 sec: 52439.7, 60 sec: 46974.4, 300 sec: 47763.5). Total num frames: 778698752. Throughput: 0: 11607.4. Samples: 194715648. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:40,767][1648981] Avg episode reward: [(0, '390.450')] [2024-06-15 16:05:45,164][1651669] Updated weights for policy 0, policy_version 380274 (0.0114) [2024-06-15 16:05:45,772][1648981] Fps is (10 sec: 45877.2, 60 sec: 46416.9, 300 sec: 47651.5). Total num frames: 778862592. Throughput: 0: 11660.7. Samples: 194797568. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:45,773][1648981] Avg episode reward: [(0, '392.330')] [2024-06-15 16:05:46,548][1651669] Updated weights for policy 0, policy_version 380347 (0.0084) [2024-06-15 16:05:47,781][1651669] Updated weights for policy 0, policy_version 380400 (0.0011) [2024-06-15 16:05:48,413][1651274] Signal inference workers to stop experience collection... (20000 times) [2024-06-15 16:05:48,504][1651669] InferenceWorker_p0-w0: stopping experience collection (20000 times) [2024-06-15 16:05:48,634][1651274] Signal inference workers to resume experience collection... (20000 times) [2024-06-15 16:05:48,634][1651669] InferenceWorker_p0-w0: resuming experience collection (20000 times) [2024-06-15 16:05:49,024][1651669] Updated weights for policy 0, policy_version 380448 (0.0011) [2024-06-15 16:05:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48606.0, 300 sec: 47985.7). Total num frames: 779223040. Throughput: 0: 11958.1. Samples: 194861568. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:50,767][1648981] Avg episode reward: [(0, '401.930')] [2024-06-15 16:05:54,726][1651669] Updated weights for policy 0, policy_version 380512 (0.0012) [2024-06-15 16:05:55,766][1648981] Fps is (10 sec: 49180.3, 60 sec: 45881.3, 300 sec: 47541.4). Total num frames: 779354112. Throughput: 0: 12026.4. Samples: 194907648. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:05:55,767][1648981] Avg episode reward: [(0, '403.470')] [2024-06-15 16:05:56,030][1651669] Updated weights for policy 0, policy_version 380560 (0.0012) [2024-06-15 16:05:57,718][1651669] Updated weights for policy 0, policy_version 380628 (0.0013) [2024-06-15 16:05:58,562][1651669] Updated weights for policy 0, policy_version 380672 (0.0012) [2024-06-15 16:05:59,962][1651669] Updated weights for policy 0, policy_version 380736 (0.0033) [2024-06-15 16:06:00,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 50790.1, 300 sec: 48318.9). Total num frames: 779747328. Throughput: 0: 12060.4. Samples: 194974208. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:06:00,768][1648981] Avg episode reward: [(0, '406.370')] [2024-06-15 16:06:05,456][1651669] Updated weights for policy 0, policy_version 380791 (0.0126) [2024-06-15 16:06:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47541.3). Total num frames: 779878400. Throughput: 0: 12190.9. Samples: 195056128. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:06:05,767][1648981] Avg episode reward: [(0, '388.890')] [2024-06-15 16:06:07,290][1651669] Updated weights for policy 0, policy_version 380863 (0.0012) [2024-06-15 16:06:09,061][1651669] Updated weights for policy 0, policy_version 380917 (0.0061) [2024-06-15 16:06:10,459][1651669] Updated weights for policy 0, policy_version 380983 (0.0011) [2024-06-15 16:06:10,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 52429.0, 300 sec: 48318.9). Total num frames: 780271616. Throughput: 0: 12128.7. Samples: 195086336. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:06:10,767][1648981] Avg episode reward: [(0, '401.360')] [2024-06-15 16:06:15,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 45875.3, 300 sec: 47430.3). Total num frames: 780304384. Throughput: 0: 12345.5. Samples: 195165184. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:06:15,767][1648981] Avg episode reward: [(0, '386.830')] [2024-06-15 16:06:16,464][1651669] Updated weights for policy 0, policy_version 381043 (0.0012) [2024-06-15 16:06:17,129][1651669] Updated weights for policy 0, policy_version 381072 (0.0012) [2024-06-15 16:06:18,322][1651669] Updated weights for policy 0, policy_version 381118 (0.0013) [2024-06-15 16:06:19,414][1651669] Updated weights for policy 0, policy_version 381168 (0.0012) [2024-06-15 16:06:20,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 50806.9, 300 sec: 48318.9). Total num frames: 780697600. Throughput: 0: 12335.2. Samples: 195234816. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:06:20,767][1648981] Avg episode reward: [(0, '410.180')] [2024-06-15 16:06:21,098][1651669] Updated weights for policy 0, policy_version 381219 (0.0097) [2024-06-15 16:06:25,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 780795904. Throughput: 0: 12288.0. Samples: 195268608. Policy #0 lag: (min: 22.0, avg: 189.8, max: 326.0) [2024-06-15 16:06:25,767][1648981] Avg episode reward: [(0, '383.700')] [2024-06-15 16:06:26,811][1651669] Updated weights for policy 0, policy_version 381302 (0.0013) [2024-06-15 16:06:28,784][1651669] Updated weights for policy 0, policy_version 381360 (0.0011) [2024-06-15 16:06:29,022][1651274] Signal inference workers to stop experience collection... (20050 times) [2024-06-15 16:06:29,084][1651669] InferenceWorker_p0-w0: stopping experience collection (20050 times) [2024-06-15 16:06:29,245][1651274] Signal inference workers to resume experience collection... (20050 times) [2024-06-15 16:06:29,247][1651669] InferenceWorker_p0-w0: resuming experience collection (20050 times) [2024-06-15 16:06:29,753][1651669] Updated weights for policy 0, policy_version 381395 (0.0012) [2024-06-15 16:06:30,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 50245.9, 300 sec: 48318.9). Total num frames: 781189120. Throughput: 0: 12084.7. Samples: 195341312. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:06:30,767][1648981] Avg episode reward: [(0, '396.480')] [2024-06-15 16:06:31,610][1651669] Updated weights for policy 0, policy_version 381456 (0.0027) [2024-06-15 16:06:32,620][1651669] Updated weights for policy 0, policy_version 381503 (0.0012) [2024-06-15 16:06:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48610.8, 300 sec: 47655.0). Total num frames: 781320192. Throughput: 0: 12424.5. Samples: 195420672. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:06:35,767][1648981] Avg episode reward: [(0, '400.490')] [2024-06-15 16:06:37,205][1651669] Updated weights for policy 0, policy_version 381561 (0.0016) [2024-06-15 16:06:39,426][1651669] Updated weights for policy 0, policy_version 381629 (0.0046) [2024-06-15 16:06:40,769][1648981] Fps is (10 sec: 42587.8, 60 sec: 48603.7, 300 sec: 48096.3). Total num frames: 781615104. Throughput: 0: 12230.4. Samples: 195458048. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:06:40,770][1648981] Avg episode reward: [(0, '386.430')] [2024-06-15 16:06:41,274][1651669] Updated weights for policy 0, policy_version 381680 (0.0014) [2024-06-15 16:06:42,818][1651669] Updated weights for policy 0, policy_version 381744 (0.0030) [2024-06-15 16:06:45,782][1648981] Fps is (10 sec: 52348.1, 60 sec: 49690.1, 300 sec: 47872.1). Total num frames: 781844480. Throughput: 0: 12363.5. Samples: 195530752. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:06:45,783][1648981] Avg episode reward: [(0, '397.190')] [2024-06-15 16:06:47,437][1651669] Updated weights for policy 0, policy_version 381782 (0.0012) [2024-06-15 16:06:48,757][1651669] Updated weights for policy 0, policy_version 381842 (0.0011) [2024-06-15 16:06:50,268][1651669] Updated weights for policy 0, policy_version 381890 (0.0033) [2024-06-15 16:06:50,766][1648981] Fps is (10 sec: 52443.0, 60 sec: 48605.8, 300 sec: 48096.7). Total num frames: 782139392. Throughput: 0: 12219.7. Samples: 195606016. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:06:50,767][1648981] Avg episode reward: [(0, '400.670')] [2024-06-15 16:06:52,653][1651669] Updated weights for policy 0, policy_version 381972 (0.0013) [2024-06-15 16:06:55,766][1648981] Fps is (10 sec: 52509.4, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 782368768. Throughput: 0: 12140.1. Samples: 195632640. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:06:55,767][1648981] Avg episode reward: [(0, '392.080')] [2024-06-15 16:06:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000382016_782368768.pth... [2024-06-15 16:06:55,822][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000376384_770834432.pth [2024-06-15 16:06:57,857][1651669] Updated weights for policy 0, policy_version 382018 (0.0011) [2024-06-15 16:06:59,243][1651669] Updated weights for policy 0, policy_version 382080 (0.0012) [2024-06-15 16:07:00,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48060.0, 300 sec: 47985.7). Total num frames: 782630912. Throughput: 0: 12310.7. Samples: 195719168. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:00,767][1648981] Avg episode reward: [(0, '389.600')] [2024-06-15 16:07:01,565][1651669] Updated weights for policy 0, policy_version 382148 (0.0011) [2024-06-15 16:07:03,363][1651669] Updated weights for policy 0, policy_version 382224 (0.0105) [2024-06-15 16:07:05,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 50244.3, 300 sec: 48319.0). Total num frames: 782893056. Throughput: 0: 12185.6. Samples: 195783168. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:05,767][1648981] Avg episode reward: [(0, '402.760')] [2024-06-15 16:07:08,820][1651669] Updated weights for policy 0, policy_version 382288 (0.0019) [2024-06-15 16:07:10,334][1651274] Signal inference workers to stop experience collection... (20100 times) [2024-06-15 16:07:10,392][1651669] InferenceWorker_p0-w0: stopping experience collection (20100 times) [2024-06-15 16:07:10,612][1651274] Signal inference workers to resume experience collection... (20100 times) [2024-06-15 16:07:10,613][1651669] InferenceWorker_p0-w0: resuming experience collection (20100 times) [2024-06-15 16:07:10,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 46421.2, 300 sec: 47877.8). Total num frames: 783056896. Throughput: 0: 12526.9. Samples: 195832320. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:10,767][1648981] Avg episode reward: [(0, '394.070')] [2024-06-15 16:07:10,819][1651669] Updated weights for policy 0, policy_version 382354 (0.0012) [2024-06-15 16:07:11,934][1651669] Updated weights for policy 0, policy_version 382405 (0.0013) [2024-06-15 16:07:13,922][1651669] Updated weights for policy 0, policy_version 382481 (0.0015) [2024-06-15 16:07:15,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 51882.6, 300 sec: 48318.9). Total num frames: 783417344. Throughput: 0: 12231.2. Samples: 195891712. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:15,767][1648981] Avg episode reward: [(0, '401.560')] [2024-06-15 16:07:19,987][1651669] Updated weights for policy 0, policy_version 382563 (0.0015) [2024-06-15 16:07:20,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 783548416. Throughput: 0: 12333.5. Samples: 195975680. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:20,767][1648981] Avg episode reward: [(0, '385.710')] [2024-06-15 16:07:21,135][1651669] Updated weights for policy 0, policy_version 382608 (0.0012) [2024-06-15 16:07:22,448][1651669] Updated weights for policy 0, policy_version 382659 (0.0018) [2024-06-15 16:07:23,549][1651669] Updated weights for policy 0, policy_version 382720 (0.0013) [2024-06-15 16:07:25,277][1651669] Updated weights for policy 0, policy_version 382781 (0.0013) [2024-06-15 16:07:25,780][1648981] Fps is (10 sec: 52355.1, 60 sec: 52416.5, 300 sec: 48650.5). Total num frames: 783941632. Throughput: 0: 12228.0. Samples: 196008448. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:25,781][1648981] Avg episode reward: [(0, '375.500')] [2024-06-15 16:07:30,698][1651669] Updated weights for policy 0, policy_version 382840 (0.0159) [2024-06-15 16:07:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.8, 300 sec: 47874.6). Total num frames: 784039936. Throughput: 0: 12531.2. Samples: 196094464. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:30,767][1648981] Avg episode reward: [(0, '381.310')] [2024-06-15 16:07:31,904][1651669] Updated weights for policy 0, policy_version 382882 (0.0011) [2024-06-15 16:07:33,390][1651669] Updated weights for policy 0, policy_version 382929 (0.0016) [2024-06-15 16:07:34,611][1651669] Updated weights for policy 0, policy_version 382983 (0.0011) [2024-06-15 16:07:35,520][1651669] Updated weights for policy 0, policy_version 383035 (0.0012) [2024-06-15 16:07:35,766][1648981] Fps is (10 sec: 52502.5, 60 sec: 52428.8, 300 sec: 48767.1). Total num frames: 784465920. Throughput: 0: 12265.2. Samples: 196157952. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:35,767][1648981] Avg episode reward: [(0, '396.860')] [2024-06-15 16:07:40,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 48061.6, 300 sec: 47874.5). Total num frames: 784498688. Throughput: 0: 12561.0. Samples: 196197888. Policy #0 lag: (min: 15.0, avg: 90.9, max: 271.0) [2024-06-15 16:07:40,768][1648981] Avg episode reward: [(0, '421.870')] [2024-06-15 16:07:41,708][1651669] Updated weights for policy 0, policy_version 383104 (0.0016) [2024-06-15 16:07:42,990][1651669] Updated weights for policy 0, policy_version 383161 (0.0011) [2024-06-15 16:07:44,935][1651669] Updated weights for policy 0, policy_version 383216 (0.0013) [2024-06-15 16:07:45,703][1651669] Updated weights for policy 0, policy_version 383254 (0.0012) [2024-06-15 16:07:45,775][1648981] Fps is (10 sec: 42560.0, 60 sec: 50795.8, 300 sec: 48540.9). Total num frames: 784891904. Throughput: 0: 12217.3. Samples: 196269056. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:07:45,776][1648981] Avg episode reward: [(0, '426.890')] [2024-06-15 16:07:50,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 47513.6, 300 sec: 47987.0). Total num frames: 784990208. Throughput: 0: 12367.6. Samples: 196339712. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:07:50,767][1648981] Avg episode reward: [(0, '438.250')] [2024-06-15 16:07:51,761][1651274] Signal inference workers to stop experience collection... (20150 times) [2024-06-15 16:07:51,785][1651669] Updated weights for policy 0, policy_version 383313 (0.0095) [2024-06-15 16:07:51,829][1651669] InferenceWorker_p0-w0: stopping experience collection (20150 times) [2024-06-15 16:07:52,080][1651274] Signal inference workers to resume experience collection... (20150 times) [2024-06-15 16:07:52,081][1651669] InferenceWorker_p0-w0: resuming experience collection (20150 times) [2024-06-15 16:07:53,790][1651669] Updated weights for policy 0, policy_version 383395 (0.0012) [2024-06-15 16:07:55,299][1651669] Updated weights for policy 0, policy_version 383429 (0.0018) [2024-06-15 16:07:55,766][1648981] Fps is (10 sec: 39357.3, 60 sec: 48606.0, 300 sec: 48096.7). Total num frames: 785285120. Throughput: 0: 12015.0. Samples: 196372992. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:07:55,767][1648981] Avg episode reward: [(0, '442.350')] [2024-06-15 16:07:56,504][1651669] Updated weights for policy 0, policy_version 383475 (0.0012) [2024-06-15 16:08:00,780][1648981] Fps is (10 sec: 52359.6, 60 sec: 48049.1, 300 sec: 47985.4). Total num frames: 785514496. Throughput: 0: 12216.2. Samples: 196441600. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:00,781][1648981] Avg episode reward: [(0, '430.880')] [2024-06-15 16:08:02,749][1651669] Updated weights for policy 0, policy_version 383554 (0.0033) [2024-06-15 16:08:03,945][1651669] Updated weights for policy 0, policy_version 383603 (0.0014) [2024-06-15 16:08:05,650][1651669] Updated weights for policy 0, policy_version 383678 (0.0242) [2024-06-15 16:08:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 785776640. Throughput: 0: 11867.0. Samples: 196509696. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:05,767][1648981] Avg episode reward: [(0, '437.060')] [2024-06-15 16:08:08,790][1651669] Updated weights for policy 0, policy_version 383760 (0.0011) [2024-06-15 16:08:10,766][1648981] Fps is (10 sec: 52497.7, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 786038784. Throughput: 0: 11950.4. Samples: 196546048. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:10,767][1648981] Avg episode reward: [(0, '428.770')] [2024-06-15 16:08:14,849][1651669] Updated weights for policy 0, policy_version 383829 (0.0012) [2024-06-15 16:08:15,783][1648981] Fps is (10 sec: 35985.2, 60 sec: 45316.6, 300 sec: 47760.8). Total num frames: 786137088. Throughput: 0: 11760.3. Samples: 196623872. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:15,783][1648981] Avg episode reward: [(0, '430.970')] [2024-06-15 16:08:16,664][1651669] Updated weights for policy 0, policy_version 383891 (0.0021) [2024-06-15 16:08:18,600][1651669] Updated weights for policy 0, policy_version 383953 (0.0012) [2024-06-15 16:08:20,266][1651669] Updated weights for policy 0, policy_version 384032 (0.0011) [2024-06-15 16:08:20,789][1648981] Fps is (10 sec: 52312.6, 60 sec: 50225.7, 300 sec: 48093.1). Total num frames: 786563072. Throughput: 0: 11667.8. Samples: 196683264. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:20,789][1648981] Avg episode reward: [(0, '457.560')] [2024-06-15 16:08:25,766][1648981] Fps is (10 sec: 42669.3, 60 sec: 43701.0, 300 sec: 47765.6). Total num frames: 786563072. Throughput: 0: 11594.1. Samples: 196719616. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:25,767][1648981] Avg episode reward: [(0, '451.650')] [2024-06-15 16:08:26,853][1651669] Updated weights for policy 0, policy_version 384112 (0.0013) [2024-06-15 16:08:28,417][1651669] Updated weights for policy 0, policy_version 384164 (0.0012) [2024-06-15 16:08:30,710][1651274] Signal inference workers to stop experience collection... (20200 times) [2024-06-15 16:08:30,727][1651669] Updated weights for policy 0, policy_version 384225 (0.0011) [2024-06-15 16:08:30,758][1651669] InferenceWorker_p0-w0: stopping experience collection (20200 times) [2024-06-15 16:08:30,766][1648981] Fps is (10 sec: 32841.6, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 786890752. Throughput: 0: 11448.4. Samples: 196784128. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:30,767][1648981] Avg episode reward: [(0, '450.980')] [2024-06-15 16:08:30,954][1651274] Signal inference workers to resume experience collection... (20200 times) [2024-06-15 16:08:30,955][1651669] InferenceWorker_p0-w0: resuming experience collection (20200 times) [2024-06-15 16:08:31,956][1651669] Updated weights for policy 0, policy_version 384289 (0.0109) [2024-06-15 16:08:35,782][1648981] Fps is (10 sec: 52346.0, 60 sec: 43679.2, 300 sec: 47983.1). Total num frames: 787087360. Throughput: 0: 11624.0. Samples: 196862976. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:35,783][1648981] Avg episode reward: [(0, '446.220')] [2024-06-15 16:08:37,827][1651669] Updated weights for policy 0, policy_version 384352 (0.0011) [2024-06-15 16:08:40,035][1651669] Updated weights for policy 0, policy_version 384432 (0.0014) [2024-06-15 16:08:40,767][1648981] Fps is (10 sec: 45871.9, 60 sec: 47513.4, 300 sec: 47763.4). Total num frames: 787349504. Throughput: 0: 11650.7. Samples: 196897280. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:40,767][1648981] Avg episode reward: [(0, '446.640')] [2024-06-15 16:08:41,235][1651669] Updated weights for policy 0, policy_version 384480 (0.0012) [2024-06-15 16:08:42,224][1651669] Updated weights for policy 0, policy_version 384515 (0.0010) [2024-06-15 16:08:43,431][1651669] Updated weights for policy 0, policy_version 384576 (0.0013) [2024-06-15 16:08:45,802][1648981] Fps is (10 sec: 52324.1, 60 sec: 45308.9, 300 sec: 47979.9). Total num frames: 787611648. Throughput: 0: 11576.8. Samples: 196962816. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:45,803][1648981] Avg episode reward: [(0, '422.800')] [2024-06-15 16:08:49,916][1651669] Updated weights for policy 0, policy_version 384640 (0.0014) [2024-06-15 16:08:50,766][1648981] Fps is (10 sec: 42600.9, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 787775488. Throughput: 0: 11616.7. Samples: 197032448. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:50,767][1648981] Avg episode reward: [(0, '430.230')] [2024-06-15 16:08:51,730][1651669] Updated weights for policy 0, policy_version 384704 (0.0011) [2024-06-15 16:08:53,097][1651669] Updated weights for policy 0, policy_version 384763 (0.0014) [2024-06-15 16:08:54,777][1651669] Updated weights for policy 0, policy_version 384816 (0.0013) [2024-06-15 16:08:55,766][1648981] Fps is (10 sec: 52616.4, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 788135936. Throughput: 0: 11480.2. Samples: 197062656. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:08:55,767][1648981] Avg episode reward: [(0, '424.370')] [2024-06-15 16:08:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000384832_788135936.pth... [2024-06-15 16:08:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000379200_776601600.pth [2024-06-15 16:09:00,256][1651669] Updated weights for policy 0, policy_version 384880 (0.0011) [2024-06-15 16:09:00,770][1648981] Fps is (10 sec: 49136.5, 60 sec: 45882.9, 300 sec: 47540.9). Total num frames: 788267008. Throughput: 0: 11449.4. Samples: 197138944. Policy #0 lag: (min: 61.0, avg: 191.9, max: 317.0) [2024-06-15 16:09:00,770][1648981] Avg episode reward: [(0, '426.130')] [2024-06-15 16:09:02,443][1651669] Updated weights for policy 0, policy_version 384914 (0.0010) [2024-06-15 16:09:04,811][1651669] Updated weights for policy 0, policy_version 385012 (0.0104) [2024-06-15 16:09:05,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 788529152. Throughput: 0: 11337.9. Samples: 197193216. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:05,767][1648981] Avg episode reward: [(0, '430.760')] [2024-06-15 16:09:07,426][1651669] Updated weights for policy 0, policy_version 385078 (0.0012) [2024-06-15 16:09:10,764][1651669] Updated weights for policy 0, policy_version 385120 (0.0011) [2024-06-15 16:09:10,766][1648981] Fps is (10 sec: 45890.1, 60 sec: 44783.0, 300 sec: 47763.5). Total num frames: 788725760. Throughput: 0: 11355.0. Samples: 197230592. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:10,767][1648981] Avg episode reward: [(0, '429.470')] [2024-06-15 16:09:13,755][1651274] Signal inference workers to stop experience collection... (20250 times) [2024-06-15 16:09:13,808][1651669] InferenceWorker_p0-w0: stopping experience collection (20250 times) [2024-06-15 16:09:14,106][1651274] Signal inference workers to resume experience collection... (20250 times) [2024-06-15 16:09:14,107][1651669] InferenceWorker_p0-w0: resuming experience collection (20250 times) [2024-06-15 16:09:14,245][1651669] Updated weights for policy 0, policy_version 385184 (0.0150) [2024-06-15 16:09:15,598][1651669] Updated weights for policy 0, policy_version 385234 (0.0012) [2024-06-15 16:09:15,769][1648981] Fps is (10 sec: 42587.9, 60 sec: 46978.5, 300 sec: 47318.8). Total num frames: 788955136. Throughput: 0: 11593.3. Samples: 197305856. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:15,769][1648981] Avg episode reward: [(0, '409.900')] [2024-06-15 16:09:17,492][1651669] Updated weights for policy 0, policy_version 385281 (0.0011) [2024-06-15 16:09:18,619][1651669] Updated weights for policy 0, policy_version 385344 (0.0011) [2024-06-15 16:09:20,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 43706.9, 300 sec: 47985.7). Total num frames: 789184512. Throughput: 0: 11495.6. Samples: 197380096. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:20,769][1648981] Avg episode reward: [(0, '404.530')] [2024-06-15 16:09:24,539][1651669] Updated weights for policy 0, policy_version 385440 (0.0016) [2024-06-15 16:09:25,767][1648981] Fps is (10 sec: 49162.8, 60 sec: 48059.4, 300 sec: 47652.4). Total num frames: 789446656. Throughput: 0: 11525.8. Samples: 197415936. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:25,767][1648981] Avg episode reward: [(0, '401.750')] [2024-06-15 16:09:26,116][1651669] Updated weights for policy 0, policy_version 385488 (0.0011) [2024-06-15 16:09:27,149][1651669] Updated weights for policy 0, policy_version 385536 (0.0011) [2024-06-15 16:09:30,269][1651669] Updated weights for policy 0, policy_version 385600 (0.0012) [2024-06-15 16:09:30,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 46967.2, 300 sec: 47985.6). Total num frames: 789708800. Throughput: 0: 11648.7. Samples: 197486592. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:30,767][1648981] Avg episode reward: [(0, '400.810')] [2024-06-15 16:09:32,364][1651669] Updated weights for policy 0, policy_version 385658 (0.0012) [2024-06-15 16:09:35,032][1651669] Updated weights for policy 0, policy_version 385712 (0.0012) [2024-06-15 16:09:35,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 48072.3, 300 sec: 47765.0). Total num frames: 789970944. Throughput: 0: 11593.9. Samples: 197554176. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:35,767][1648981] Avg episode reward: [(0, '415.410')] [2024-06-15 16:09:38,320][1651669] Updated weights for policy 0, policy_version 385784 (0.0015) [2024-06-15 16:09:40,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 46421.8, 300 sec: 47652.4). Total num frames: 790134784. Throughput: 0: 11832.9. Samples: 197595136. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:40,767][1648981] Avg episode reward: [(0, '417.740')] [2024-06-15 16:09:42,071][1651669] Updated weights for policy 0, policy_version 385857 (0.0066) [2024-06-15 16:09:43,068][1651669] Updated weights for policy 0, policy_version 385906 (0.0010) [2024-06-15 16:09:45,510][1651669] Updated weights for policy 0, policy_version 385974 (0.0012) [2024-06-15 16:09:45,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 48085.2, 300 sec: 48096.1). Total num frames: 790495232. Throughput: 0: 11741.7. Samples: 197667328. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:45,771][1648981] Avg episode reward: [(0, '412.670')] [2024-06-15 16:09:49,212][1651669] Updated weights for policy 0, policy_version 386032 (0.0012) [2024-06-15 16:09:50,769][1648981] Fps is (10 sec: 49140.7, 60 sec: 47511.8, 300 sec: 47542.3). Total num frames: 790626304. Throughput: 0: 12162.2. Samples: 197740544. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:50,769][1648981] Avg episode reward: [(0, '434.000')] [2024-06-15 16:09:52,239][1651669] Updated weights for policy 0, policy_version 386084 (0.0086) [2024-06-15 16:09:53,740][1651669] Updated weights for policy 0, policy_version 386144 (0.0011) [2024-06-15 16:09:55,706][1651669] Updated weights for policy 0, policy_version 386192 (0.0019) [2024-06-15 16:09:55,766][1648981] Fps is (10 sec: 42615.3, 60 sec: 46421.4, 300 sec: 48207.8). Total num frames: 790921216. Throughput: 0: 12014.9. Samples: 197771264. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:09:55,767][1648981] Avg episode reward: [(0, '420.360')] [2024-06-15 16:09:55,862][1651274] Signal inference workers to stop experience collection... (20300 times) [2024-06-15 16:09:55,940][1651669] InferenceWorker_p0-w0: stopping experience collection (20300 times) [2024-06-15 16:09:56,163][1651274] Signal inference workers to resume experience collection... (20300 times) [2024-06-15 16:09:56,164][1651669] InferenceWorker_p0-w0: resuming experience collection (20300 times) [2024-06-15 16:09:56,830][1651669] Updated weights for policy 0, policy_version 386237 (0.0014) [2024-06-15 16:10:00,555][1651669] Updated weights for policy 0, policy_version 386291 (0.0011) [2024-06-15 16:10:00,766][1648981] Fps is (10 sec: 52441.4, 60 sec: 48062.3, 300 sec: 47541.4). Total num frames: 791150592. Throughput: 0: 12072.5. Samples: 197849088. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:10:00,767][1648981] Avg episode reward: [(0, '414.480')] [2024-06-15 16:10:03,226][1651669] Updated weights for policy 0, policy_version 386338 (0.0024) [2024-06-15 16:10:05,118][1651669] Updated weights for policy 0, policy_version 386416 (0.0056) [2024-06-15 16:10:05,777][1648981] Fps is (10 sec: 49101.4, 60 sec: 48051.5, 300 sec: 48428.3). Total num frames: 791412736. Throughput: 0: 11898.5. Samples: 197915648. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:10:05,779][1648981] Avg episode reward: [(0, '417.080')] [2024-06-15 16:10:06,414][1651669] Updated weights for policy 0, policy_version 386468 (0.0012) [2024-06-15 16:10:06,941][1651669] Updated weights for policy 0, policy_version 386494 (0.0012) [2024-06-15 16:10:10,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 791576576. Throughput: 0: 12015.0. Samples: 197956608. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:10:10,767][1648981] Avg episode reward: [(0, '419.340')] [2024-06-15 16:10:11,478][1651669] Updated weights for policy 0, policy_version 386553 (0.0011) [2024-06-15 16:10:13,772][1651669] Updated weights for policy 0, policy_version 386594 (0.0012) [2024-06-15 16:10:15,149][1651669] Updated weights for policy 0, policy_version 386656 (0.0093) [2024-06-15 16:10:15,766][1648981] Fps is (10 sec: 49202.7, 60 sec: 49154.1, 300 sec: 48322.1). Total num frames: 791904256. Throughput: 0: 12140.2. Samples: 198032896. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:10:15,767][1648981] Avg episode reward: [(0, '422.330')] [2024-06-15 16:10:16,718][1651669] Updated weights for policy 0, policy_version 386720 (0.0012) [2024-06-15 16:10:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 792068096. Throughput: 0: 12333.5. Samples: 198109184. Policy #0 lag: (min: 88.0, avg: 159.9, max: 289.0) [2024-06-15 16:10:20,767][1648981] Avg episode reward: [(0, '425.650')] [2024-06-15 16:10:21,847][1651669] Updated weights for policy 0, policy_version 386786 (0.0012) [2024-06-15 16:10:24,032][1651669] Updated weights for policy 0, policy_version 386848 (0.0012) [2024-06-15 16:10:25,532][1651669] Updated weights for policy 0, policy_version 386912 (0.0144) [2024-06-15 16:10:25,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49152.2, 300 sec: 48208.2). Total num frames: 792395776. Throughput: 0: 12333.5. Samples: 198150144. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:25,767][1648981] Avg episode reward: [(0, '424.590')] [2024-06-15 16:10:26,788][1651669] Updated weights for policy 0, policy_version 386961 (0.0012) [2024-06-15 16:10:27,926][1651669] Updated weights for policy 0, policy_version 387002 (0.0010) [2024-06-15 16:10:30,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 48097.7). Total num frames: 792592384. Throughput: 0: 12186.6. Samples: 198215680. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:30,767][1648981] Avg episode reward: [(0, '429.570')] [2024-06-15 16:10:33,078][1651669] Updated weights for policy 0, policy_version 387068 (0.0099) [2024-06-15 16:10:34,936][1651274] Signal inference workers to stop experience collection... (20350 times) [2024-06-15 16:10:35,099][1651669] InferenceWorker_p0-w0: stopping experience collection (20350 times) [2024-06-15 16:10:35,145][1651274] Signal inference workers to resume experience collection... (20350 times) [2024-06-15 16:10:35,155][1651669] InferenceWorker_p0-w0: resuming experience collection (20350 times) [2024-06-15 16:10:35,312][1651669] Updated weights for policy 0, policy_version 387140 (0.0095) [2024-06-15 16:10:35,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 792887296. Throughput: 0: 12254.5. Samples: 198291968. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:35,767][1648981] Avg episode reward: [(0, '416.230')] [2024-06-15 16:10:36,518][1651669] Updated weights for policy 0, policy_version 387193 (0.0107) [2024-06-15 16:10:37,665][1651669] Updated weights for policy 0, policy_version 387236 (0.0011) [2024-06-15 16:10:40,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 49698.2, 300 sec: 48319.9). Total num frames: 793116672. Throughput: 0: 12288.0. Samples: 198324224. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:40,767][1648981] Avg episode reward: [(0, '392.190')] [2024-06-15 16:10:43,790][1651669] Updated weights for policy 0, policy_version 387297 (0.0013) [2024-06-15 16:10:44,780][1651669] Updated weights for policy 0, policy_version 387348 (0.0014) [2024-06-15 16:10:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48062.9, 300 sec: 47985.7). Total num frames: 793378816. Throughput: 0: 12379.0. Samples: 198406144. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:45,767][1648981] Avg episode reward: [(0, '394.310')] [2024-06-15 16:10:45,776][1651669] Updated weights for policy 0, policy_version 387394 (0.0026) [2024-06-15 16:10:47,292][1651669] Updated weights for policy 0, policy_version 387457 (0.0013) [2024-06-15 16:10:48,633][1651669] Updated weights for policy 0, policy_version 387508 (0.0011) [2024-06-15 16:10:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50246.2, 300 sec: 48430.0). Total num frames: 793640960. Throughput: 0: 12416.0. Samples: 198474240. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:50,767][1648981] Avg episode reward: [(0, '387.850')] [2024-06-15 16:10:54,841][1651669] Updated weights for policy 0, policy_version 387552 (0.0013) [2024-06-15 16:10:55,767][1648981] Fps is (10 sec: 39320.8, 60 sec: 47513.4, 300 sec: 47541.4). Total num frames: 793772032. Throughput: 0: 12424.5. Samples: 198515712. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:10:55,767][1648981] Avg episode reward: [(0, '377.420')] [2024-06-15 16:10:56,195][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000387616_793837568.pth... [2024-06-15 16:10:56,308][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000382016_782368768.pth [2024-06-15 16:10:56,313][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000387616_793837568.pth [2024-06-15 16:10:56,609][1651669] Updated weights for policy 0, policy_version 387632 (0.0010) [2024-06-15 16:10:58,511][1651669] Updated weights for policy 0, policy_version 387710 (0.0014) [2024-06-15 16:11:00,335][1651669] Updated weights for policy 0, policy_version 387771 (0.0012) [2024-06-15 16:11:00,769][1648981] Fps is (10 sec: 52414.9, 60 sec: 50242.0, 300 sec: 48429.6). Total num frames: 794165248. Throughput: 0: 11968.7. Samples: 198571520. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:00,770][1648981] Avg episode reward: [(0, '383.530')] [2024-06-15 16:11:05,778][1648981] Fps is (10 sec: 39275.9, 60 sec: 45874.0, 300 sec: 47095.2). Total num frames: 794165248. Throughput: 0: 12057.3. Samples: 198651904. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:05,779][1648981] Avg episode reward: [(0, '371.400')] [2024-06-15 16:11:06,940][1651669] Updated weights for policy 0, policy_version 387824 (0.0116) [2024-06-15 16:11:08,444][1651669] Updated weights for policy 0, policy_version 387879 (0.0014) [2024-06-15 16:11:09,869][1651669] Updated weights for policy 0, policy_version 387936 (0.0012) [2024-06-15 16:11:10,766][1648981] Fps is (10 sec: 36054.3, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 794525696. Throughput: 0: 11821.5. Samples: 198682112. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:10,767][1648981] Avg episode reward: [(0, '377.940')] [2024-06-15 16:11:11,798][1651274] Signal inference workers to stop experience collection... (20400 times) [2024-06-15 16:11:11,819][1651669] Updated weights for policy 0, policy_version 388002 (0.0011) [2024-06-15 16:11:11,856][1651669] InferenceWorker_p0-w0: stopping experience collection (20400 times) [2024-06-15 16:11:12,021][1651274] Signal inference workers to resume experience collection... (20400 times) [2024-06-15 16:11:12,022][1651669] InferenceWorker_p0-w0: resuming experience collection (20400 times) [2024-06-15 16:11:12,463][1651669] Updated weights for policy 0, policy_version 388031 (0.0031) [2024-06-15 16:11:15,766][1648981] Fps is (10 sec: 52490.6, 60 sec: 46421.3, 300 sec: 47430.3). Total num frames: 794689536. Throughput: 0: 11878.4. Samples: 198750208. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:15,767][1648981] Avg episode reward: [(0, '378.010')] [2024-06-15 16:11:19,223][1651669] Updated weights for policy 0, policy_version 388116 (0.0011) [2024-06-15 16:11:20,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 48605.8, 300 sec: 48096.7). Total num frames: 794984448. Throughput: 0: 11684.9. Samples: 198817792. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:20,767][1648981] Avg episode reward: [(0, '392.370')] [2024-06-15 16:11:21,138][1651669] Updated weights for policy 0, policy_version 388192 (0.0014) [2024-06-15 16:11:22,897][1651669] Updated weights for policy 0, policy_version 388258 (0.0085) [2024-06-15 16:11:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 795213824. Throughput: 0: 11514.3. Samples: 198842368. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:25,767][1648981] Avg episode reward: [(0, '386.550')] [2024-06-15 16:11:28,535][1651669] Updated weights for policy 0, policy_version 388320 (0.0011) [2024-06-15 16:11:30,770][1648981] Fps is (10 sec: 39306.3, 60 sec: 46418.3, 300 sec: 47651.8). Total num frames: 795377664. Throughput: 0: 11592.9. Samples: 198927872. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:30,771][1648981] Avg episode reward: [(0, '404.020')] [2024-06-15 16:11:30,972][1651669] Updated weights for policy 0, policy_version 388387 (0.0012) [2024-06-15 16:11:32,734][1651669] Updated weights for policy 0, policy_version 388464 (0.0127) [2024-06-15 16:11:34,523][1651669] Updated weights for policy 0, policy_version 388528 (0.0053) [2024-06-15 16:11:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47875.1). Total num frames: 795738112. Throughput: 0: 11377.8. Samples: 198986240. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:35,767][1648981] Avg episode reward: [(0, '410.080')] [2024-06-15 16:11:38,997][1651669] Updated weights for policy 0, policy_version 388549 (0.0013) [2024-06-15 16:11:40,791][1648981] Fps is (10 sec: 49053.5, 60 sec: 45856.8, 300 sec: 47540.0). Total num frames: 795869184. Throughput: 0: 11610.5. Samples: 199038464. Policy #0 lag: (min: 31.0, avg: 118.4, max: 287.0) [2024-06-15 16:11:40,791][1648981] Avg episode reward: [(0, '413.200')] [2024-06-15 16:11:40,919][1651669] Updated weights for policy 0, policy_version 388609 (0.0109) [2024-06-15 16:11:42,270][1651669] Updated weights for policy 0, policy_version 388677 (0.0012) [2024-06-15 16:11:44,692][1651669] Updated weights for policy 0, policy_version 388775 (0.0100) [2024-06-15 16:11:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 796262400. Throughput: 0: 11697.1. Samples: 199097856. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:11:45,767][1648981] Avg episode reward: [(0, '416.580')] [2024-06-15 16:11:49,506][1651669] Updated weights for policy 0, policy_version 388808 (0.0012) [2024-06-15 16:11:50,521][1651669] Updated weights for policy 0, policy_version 388861 (0.0020) [2024-06-15 16:11:50,767][1648981] Fps is (10 sec: 52554.2, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 796393472. Throughput: 0: 11892.8. Samples: 199186944. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:11:50,767][1648981] Avg episode reward: [(0, '427.780')] [2024-06-15 16:11:52,729][1651669] Updated weights for policy 0, policy_version 388928 (0.0016) [2024-06-15 16:11:52,839][1651274] Signal inference workers to stop experience collection... (20450 times) [2024-06-15 16:11:52,892][1651669] InferenceWorker_p0-w0: stopping experience collection (20450 times) [2024-06-15 16:11:53,075][1651274] Signal inference workers to resume experience collection... (20450 times) [2024-06-15 16:11:53,076][1651669] InferenceWorker_p0-w0: resuming experience collection (20450 times) [2024-06-15 16:11:54,049][1651669] Updated weights for policy 0, policy_version 388991 (0.0014) [2024-06-15 16:11:55,582][1651669] Updated weights for policy 0, policy_version 389049 (0.0012) [2024-06-15 16:11:55,774][1648981] Fps is (10 sec: 52387.4, 60 sec: 50237.8, 300 sec: 47984.4). Total num frames: 796786688. Throughput: 0: 11865.0. Samples: 199216128. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:11:55,775][1648981] Avg episode reward: [(0, '416.130')] [2024-06-15 16:12:00,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 44784.7, 300 sec: 47319.1). Total num frames: 796852224. Throughput: 0: 12105.9. Samples: 199294976. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:00,770][1648981] Avg episode reward: [(0, '424.620')] [2024-06-15 16:12:01,091][1651669] Updated weights for policy 0, policy_version 389116 (0.0014) [2024-06-15 16:12:02,619][1651669] Updated weights for policy 0, policy_version 389168 (0.0011) [2024-06-15 16:12:03,584][1651669] Updated weights for policy 0, policy_version 389200 (0.0011) [2024-06-15 16:12:04,494][1651669] Updated weights for policy 0, policy_version 389244 (0.0017) [2024-06-15 16:12:05,766][1648981] Fps is (10 sec: 45911.4, 60 sec: 51346.7, 300 sec: 48096.8). Total num frames: 797245440. Throughput: 0: 12106.0. Samples: 199362560. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:05,767][1648981] Avg episode reward: [(0, '424.750')] [2024-06-15 16:12:06,119][1651669] Updated weights for policy 0, policy_version 389301 (0.0109) [2024-06-15 16:12:10,762][1651669] Updated weights for policy 0, policy_version 389330 (0.0011) [2024-06-15 16:12:10,766][1648981] Fps is (10 sec: 49153.9, 60 sec: 46967.5, 300 sec: 47208.1). Total num frames: 797343744. Throughput: 0: 12470.1. Samples: 199403520. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:10,767][1648981] Avg episode reward: [(0, '425.680')] [2024-06-15 16:12:11,977][1651669] Updated weights for policy 0, policy_version 389377 (0.0012) [2024-06-15 16:12:13,119][1651669] Updated weights for policy 0, policy_version 389439 (0.0011) [2024-06-15 16:12:15,597][1651669] Updated weights for policy 0, policy_version 389493 (0.0012) [2024-06-15 16:12:15,780][1648981] Fps is (10 sec: 42541.8, 60 sec: 49687.2, 300 sec: 47872.5). Total num frames: 797671424. Throughput: 0: 12262.7. Samples: 199479808. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:15,780][1648981] Avg episode reward: [(0, '419.070')] [2024-06-15 16:12:16,574][1651669] Updated weights for policy 0, policy_version 389536 (0.0011) [2024-06-15 16:12:20,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 47513.7, 300 sec: 47099.3). Total num frames: 797835264. Throughput: 0: 12583.8. Samples: 199552512. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:20,767][1648981] Avg episode reward: [(0, '407.720')] [2024-06-15 16:12:21,467][1651669] Updated weights for policy 0, policy_version 389603 (0.0036) [2024-06-15 16:12:23,062][1651669] Updated weights for policy 0, policy_version 389651 (0.0011) [2024-06-15 16:12:23,758][1651669] Updated weights for policy 0, policy_version 389692 (0.0012) [2024-06-15 16:12:25,766][1648981] Fps is (10 sec: 49217.3, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 798162944. Throughput: 0: 12271.8. Samples: 199590400. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:25,767][1648981] Avg episode reward: [(0, '410.970')] [2024-06-15 16:12:25,986][1651669] Updated weights for policy 0, policy_version 389752 (0.0013) [2024-06-15 16:12:27,228][1651669] Updated weights for policy 0, policy_version 389808 (0.0012) [2024-06-15 16:12:30,766][1648981] Fps is (10 sec: 55705.7, 60 sec: 50247.6, 300 sec: 47208.1). Total num frames: 798392320. Throughput: 0: 12663.5. Samples: 199667712. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:30,767][1648981] Avg episode reward: [(0, '419.710')] [2024-06-15 16:12:30,942][1651669] Updated weights for policy 0, policy_version 389845 (0.0011) [2024-06-15 16:12:32,503][1651669] Updated weights for policy 0, policy_version 389889 (0.0012) [2024-06-15 16:12:33,245][1651274] Signal inference workers to stop experience collection... (20500 times) [2024-06-15 16:12:33,296][1651669] InferenceWorker_p0-w0: stopping experience collection (20500 times) [2024-06-15 16:12:33,494][1651274] Signal inference workers to resume experience collection... (20500 times) [2024-06-15 16:12:33,494][1651669] InferenceWorker_p0-w0: resuming experience collection (20500 times) [2024-06-15 16:12:35,715][1651669] Updated weights for policy 0, policy_version 389955 (0.0014) [2024-06-15 16:12:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47874.7). Total num frames: 798621696. Throughput: 0: 12470.1. Samples: 199748096. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:35,767][1648981] Avg episode reward: [(0, '422.550')] [2024-06-15 16:12:37,826][1651669] Updated weights for policy 0, policy_version 390035 (0.0012) [2024-06-15 16:12:40,770][1648981] Fps is (10 sec: 49134.6, 60 sec: 50261.5, 300 sec: 47431.2). Total num frames: 798883840. Throughput: 0: 12357.5. Samples: 199772160. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:40,770][1648981] Avg episode reward: [(0, '435.000')] [2024-06-15 16:12:41,970][1651669] Updated weights for policy 0, policy_version 390099 (0.0012) [2024-06-15 16:12:43,922][1651669] Updated weights for policy 0, policy_version 390177 (0.0020) [2024-06-15 16:12:44,473][1651669] Updated weights for policy 0, policy_version 390208 (0.0014) [2024-06-15 16:12:45,778][1648981] Fps is (10 sec: 52366.9, 60 sec: 48050.3, 300 sec: 47983.8). Total num frames: 799145984. Throughput: 0: 12239.4. Samples: 199845888. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:45,779][1648981] Avg episode reward: [(0, '438.570')] [2024-06-15 16:12:47,620][1651669] Updated weights for policy 0, policy_version 390261 (0.0012) [2024-06-15 16:12:48,951][1651669] Updated weights for policy 0, policy_version 390307 (0.0015) [2024-06-15 16:12:50,767][1648981] Fps is (10 sec: 52444.7, 60 sec: 50244.0, 300 sec: 47874.5). Total num frames: 799408128. Throughput: 0: 12447.1. Samples: 199922688. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:50,769][1648981] Avg episode reward: [(0, '456.810')] [2024-06-15 16:12:52,524][1651669] Updated weights for policy 0, policy_version 390354 (0.0014) [2024-06-15 16:12:53,860][1651669] Updated weights for policy 0, policy_version 390405 (0.0012) [2024-06-15 16:12:54,992][1651669] Updated weights for policy 0, policy_version 390462 (0.0011) [2024-06-15 16:12:55,772][1648981] Fps is (10 sec: 52458.7, 60 sec: 48061.1, 300 sec: 47986.8). Total num frames: 799670272. Throughput: 0: 12297.7. Samples: 199956992. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:12:55,773][1648981] Avg episode reward: [(0, '473.440')] [2024-06-15 16:12:55,778][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000390464_799670272.pth... [2024-06-15 16:12:55,832][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000384832_788135936.pth [2024-06-15 16:12:58,725][1651669] Updated weights for policy 0, policy_version 390521 (0.0011) [2024-06-15 16:12:59,758][1651669] Updated weights for policy 0, policy_version 390576 (0.0013) [2024-06-15 16:13:00,774][1648981] Fps is (10 sec: 52390.0, 60 sec: 51330.1, 300 sec: 47984.4). Total num frames: 799932416. Throughput: 0: 12323.6. Samples: 200034304. Policy #0 lag: (min: 97.0, avg: 200.7, max: 369.0) [2024-06-15 16:13:00,775][1648981] Avg episode reward: [(0, '460.590')] [2024-06-15 16:13:03,194][1651669] Updated weights for policy 0, policy_version 390611 (0.0145) [2024-06-15 16:13:05,176][1651669] Updated weights for policy 0, policy_version 390696 (0.0112) [2024-06-15 16:13:05,767][1648981] Fps is (10 sec: 52459.7, 60 sec: 49151.8, 300 sec: 47985.7). Total num frames: 800194560. Throughput: 0: 12219.7. Samples: 200102400. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:05,768][1648981] Avg episode reward: [(0, '458.060')] [2024-06-15 16:13:08,742][1651669] Updated weights for policy 0, policy_version 390739 (0.0012) [2024-06-15 16:13:10,342][1651669] Updated weights for policy 0, policy_version 390806 (0.0011) [2024-06-15 16:13:10,767][1648981] Fps is (10 sec: 45909.9, 60 sec: 50790.1, 300 sec: 48321.6). Total num frames: 800391168. Throughput: 0: 12310.7. Samples: 200144384. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:10,767][1648981] Avg episode reward: [(0, '443.900')] [2024-06-15 16:13:13,210][1651669] Updated weights for policy 0, policy_version 390849 (0.0013) [2024-06-15 16:13:13,843][1651274] Signal inference workers to stop experience collection... (20550 times) [2024-06-15 16:13:13,883][1651669] InferenceWorker_p0-w0: stopping experience collection (20550 times) [2024-06-15 16:13:14,077][1651274] Signal inference workers to resume experience collection... (20550 times) [2024-06-15 16:13:14,078][1651669] InferenceWorker_p0-w0: resuming experience collection (20550 times) [2024-06-15 16:13:14,429][1651669] Updated weights for policy 0, policy_version 390911 (0.0013) [2024-06-15 16:13:15,770][1648981] Fps is (10 sec: 45857.6, 60 sec: 49705.8, 300 sec: 47766.5). Total num frames: 800653312. Throughput: 0: 12161.7. Samples: 200215040. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:15,771][1648981] Avg episode reward: [(0, '451.200')] [2024-06-15 16:13:16,153][1651669] Updated weights for policy 0, policy_version 390960 (0.0044) [2024-06-15 16:13:19,948][1651669] Updated weights for policy 0, policy_version 391008 (0.0012) [2024-06-15 16:13:20,766][1648981] Fps is (10 sec: 42600.1, 60 sec: 49698.2, 300 sec: 48318.9). Total num frames: 800817152. Throughput: 0: 11923.9. Samples: 200284672. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:20,767][1648981] Avg episode reward: [(0, '454.430')] [2024-06-15 16:13:21,536][1651669] Updated weights for policy 0, policy_version 391074 (0.0010) [2024-06-15 16:13:25,411][1651669] Updated weights for policy 0, policy_version 391136 (0.0011) [2024-06-15 16:13:25,766][1648981] Fps is (10 sec: 39337.4, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 801046528. Throughput: 0: 12152.4. Samples: 200318976. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:25,767][1648981] Avg episode reward: [(0, '433.420')] [2024-06-15 16:13:26,156][1651669] Updated weights for policy 0, policy_version 391167 (0.0011) [2024-06-15 16:13:27,756][1651669] Updated weights for policy 0, policy_version 391216 (0.0011) [2024-06-15 16:13:30,773][1648981] Fps is (10 sec: 42568.9, 60 sec: 47508.1, 300 sec: 47987.1). Total num frames: 801243136. Throughput: 0: 12118.7. Samples: 200391168. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:30,774][1648981] Avg episode reward: [(0, '437.440')] [2024-06-15 16:13:31,149][1651669] Updated weights for policy 0, policy_version 391256 (0.0032) [2024-06-15 16:13:33,520][1651669] Updated weights for policy 0, policy_version 391351 (0.0010) [2024-06-15 16:13:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47985.8). Total num frames: 801505280. Throughput: 0: 11855.8. Samples: 200456192. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:35,767][1648981] Avg episode reward: [(0, '430.460')] [2024-06-15 16:13:36,569][1651669] Updated weights for policy 0, policy_version 391408 (0.0111) [2024-06-15 16:13:37,774][1651669] Updated weights for policy 0, policy_version 391440 (0.0011) [2024-06-15 16:13:38,930][1651669] Updated weights for policy 0, policy_version 391488 (0.0011) [2024-06-15 16:13:40,766][1648981] Fps is (10 sec: 52465.1, 60 sec: 48062.6, 300 sec: 47991.5). Total num frames: 801767424. Throughput: 0: 11823.1. Samples: 200488960. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:40,767][1648981] Avg episode reward: [(0, '425.510')] [2024-06-15 16:13:44,513][1651669] Updated weights for policy 0, policy_version 391568 (0.0012) [2024-06-15 16:13:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48069.2, 300 sec: 48318.9). Total num frames: 802029568. Throughput: 0: 11869.1. Samples: 200568320. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:45,767][1648981] Avg episode reward: [(0, '409.690')] [2024-06-15 16:13:46,411][1651669] Updated weights for policy 0, policy_version 391632 (0.0012) [2024-06-15 16:13:47,593][1651669] Updated weights for policy 0, policy_version 391677 (0.0030) [2024-06-15 16:13:49,738][1651669] Updated weights for policy 0, policy_version 391736 (0.0010) [2024-06-15 16:13:50,778][1648981] Fps is (10 sec: 52366.9, 60 sec: 48050.7, 300 sec: 47983.8). Total num frames: 802291712. Throughput: 0: 11784.3. Samples: 200632832. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:50,779][1648981] Avg episode reward: [(0, '382.380')] [2024-06-15 16:13:54,210][1651669] Updated weights for policy 0, policy_version 391763 (0.0012) [2024-06-15 16:13:55,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46426.1, 300 sec: 48097.3). Total num frames: 802455552. Throughput: 0: 11924.0. Samples: 200680960. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:13:55,767][1648981] Avg episode reward: [(0, '388.800')] [2024-06-15 16:13:55,827][1651669] Updated weights for policy 0, policy_version 391840 (0.0011) [2024-06-15 16:13:56,416][1651274] Signal inference workers to stop experience collection... (20600 times) [2024-06-15 16:13:56,473][1651669] InferenceWorker_p0-w0: stopping experience collection (20600 times) [2024-06-15 16:13:56,748][1651274] Signal inference workers to resume experience collection... (20600 times) [2024-06-15 16:13:56,748][1651669] InferenceWorker_p0-w0: resuming experience collection (20600 times) [2024-06-15 16:13:57,504][1651669] Updated weights for policy 0, policy_version 391898 (0.0127) [2024-06-15 16:13:59,395][1651669] Updated weights for policy 0, policy_version 391947 (0.0013) [2024-06-15 16:14:00,776][1648981] Fps is (10 sec: 52442.8, 60 sec: 48058.7, 300 sec: 48428.5). Total num frames: 802816000. Throughput: 0: 11683.7. Samples: 200740864. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:14:00,776][1648981] Avg episode reward: [(0, '387.560')] [2024-06-15 16:14:04,704][1651669] Updated weights for policy 0, policy_version 392019 (0.0013) [2024-06-15 16:14:05,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45875.4, 300 sec: 48207.8). Total num frames: 802947072. Throughput: 0: 12162.8. Samples: 200832000. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:14:05,767][1648981] Avg episode reward: [(0, '398.980')] [2024-06-15 16:14:06,069][1651669] Updated weights for policy 0, policy_version 392084 (0.0111) [2024-06-15 16:14:07,879][1651669] Updated weights for policy 0, policy_version 392145 (0.0012) [2024-06-15 16:14:09,747][1651669] Updated weights for policy 0, policy_version 392211 (0.0011) [2024-06-15 16:14:10,767][1648981] Fps is (10 sec: 52475.6, 60 sec: 49152.1, 300 sec: 48763.6). Total num frames: 803340288. Throughput: 0: 12003.5. Samples: 200859136. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:14:10,767][1648981] Avg episode reward: [(0, '406.960')] [2024-06-15 16:14:15,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 45332.2, 300 sec: 48096.8). Total num frames: 803373056. Throughput: 0: 12073.7. Samples: 200934400. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:14:15,767][1648981] Avg episode reward: [(0, '389.900')] [2024-06-15 16:14:15,823][1651669] Updated weights for policy 0, policy_version 392278 (0.0019) [2024-06-15 16:14:17,218][1651669] Updated weights for policy 0, policy_version 392344 (0.0206) [2024-06-15 16:14:19,429][1651669] Updated weights for policy 0, policy_version 392432 (0.0011) [2024-06-15 16:14:20,440][1651669] Updated weights for policy 0, policy_version 392464 (0.0032) [2024-06-15 16:14:20,772][1648981] Fps is (10 sec: 42574.4, 60 sec: 49147.2, 300 sec: 48540.2). Total num frames: 803766272. Throughput: 0: 12036.1. Samples: 200997888. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:14:20,773][1648981] Avg episode reward: [(0, '409.010')] [2024-06-15 16:14:25,774][1648981] Fps is (10 sec: 49113.5, 60 sec: 46961.4, 300 sec: 47984.5). Total num frames: 803864576. Throughput: 0: 12035.6. Samples: 201030656. Policy #0 lag: (min: 15.0, avg: 112.3, max: 271.0) [2024-06-15 16:14:25,775][1648981] Avg episode reward: [(0, '423.680')] [2024-06-15 16:14:26,450][1651669] Updated weights for policy 0, policy_version 392514 (0.0012) [2024-06-15 16:14:27,820][1651669] Updated weights for policy 0, policy_version 392578 (0.0011) [2024-06-15 16:14:28,847][1651669] Updated weights for policy 0, policy_version 392637 (0.0011) [2024-06-15 16:14:30,331][1651669] Updated weights for policy 0, policy_version 392699 (0.0096) [2024-06-15 16:14:30,772][1648981] Fps is (10 sec: 49150.9, 60 sec: 50244.9, 300 sec: 48429.0). Total num frames: 804257792. Throughput: 0: 12081.6. Samples: 201112064. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:14:30,773][1648981] Avg episode reward: [(0, '432.190')] [2024-06-15 16:14:32,260][1651669] Updated weights for policy 0, policy_version 392758 (0.0028) [2024-06-15 16:14:35,778][1648981] Fps is (10 sec: 52407.8, 60 sec: 48050.3, 300 sec: 48317.0). Total num frames: 804388864. Throughput: 0: 12162.9. Samples: 201180160. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:14:35,779][1648981] Avg episode reward: [(0, '437.660')] [2024-06-15 16:14:38,175][1651274] Signal inference workers to stop experience collection... (20650 times) [2024-06-15 16:14:38,199][1651669] InferenceWorker_p0-w0: stopping experience collection (20650 times) [2024-06-15 16:14:38,446][1651274] Signal inference workers to resume experience collection... (20650 times) [2024-06-15 16:14:38,447][1651669] InferenceWorker_p0-w0: resuming experience collection (20650 times) [2024-06-15 16:14:38,449][1651669] Updated weights for policy 0, policy_version 392816 (0.0012) [2024-06-15 16:14:39,987][1651669] Updated weights for policy 0, policy_version 392887 (0.0011) [2024-06-15 16:14:40,766][1648981] Fps is (10 sec: 42624.2, 60 sec: 48605.8, 300 sec: 48097.4). Total num frames: 804683776. Throughput: 0: 11958.0. Samples: 201219072. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:14:40,767][1648981] Avg episode reward: [(0, '431.970')] [2024-06-15 16:14:41,201][1651669] Updated weights for policy 0, policy_version 392931 (0.0015) [2024-06-15 16:14:42,859][1651669] Updated weights for policy 0, policy_version 392976 (0.0016) [2024-06-15 16:14:45,766][1648981] Fps is (10 sec: 52490.4, 60 sec: 48059.7, 300 sec: 48430.4). Total num frames: 804913152. Throughput: 0: 12085.6. Samples: 201284608. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:14:45,767][1648981] Avg episode reward: [(0, '424.200')] [2024-06-15 16:14:48,668][1651669] Updated weights for policy 0, policy_version 393040 (0.0015) [2024-06-15 16:14:50,530][1651669] Updated weights for policy 0, policy_version 393106 (0.0011) [2024-06-15 16:14:50,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 46430.5, 300 sec: 47985.7). Total num frames: 805076992. Throughput: 0: 11707.7. Samples: 201358848. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:14:50,767][1648981] Avg episode reward: [(0, '408.420')] [2024-06-15 16:14:52,218][1651669] Updated weights for policy 0, policy_version 393169 (0.0011) [2024-06-15 16:14:54,041][1651669] Updated weights for policy 0, policy_version 393223 (0.0011) [2024-06-15 16:14:55,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 49697.8, 300 sec: 48429.9). Total num frames: 805437440. Throughput: 0: 11787.4. Samples: 201389568. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:14:55,767][1648981] Avg episode reward: [(0, '428.490')] [2024-06-15 16:14:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000393280_805437440.pth... [2024-06-15 16:14:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000387616_793837568.pth [2024-06-15 16:14:59,731][1651669] Updated weights for policy 0, policy_version 393298 (0.0013) [2024-06-15 16:15:00,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45336.0, 300 sec: 47876.3). Total num frames: 805535744. Throughput: 0: 11753.2. Samples: 201463296. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:00,767][1648981] Avg episode reward: [(0, '437.760')] [2024-06-15 16:15:01,307][1651669] Updated weights for policy 0, policy_version 393360 (0.0099) [2024-06-15 16:15:02,398][1651669] Updated weights for policy 0, policy_version 393402 (0.0011) [2024-06-15 16:15:04,871][1651669] Updated weights for policy 0, policy_version 393456 (0.0013) [2024-06-15 16:15:05,766][1648981] Fps is (10 sec: 42600.0, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 805863424. Throughput: 0: 11777.6. Samples: 201527808. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:05,767][1648981] Avg episode reward: [(0, '439.590')] [2024-06-15 16:15:06,598][1651669] Updated weights for policy 0, policy_version 393521 (0.0012) [2024-06-15 16:15:10,767][1648981] Fps is (10 sec: 42596.5, 60 sec: 43690.5, 300 sec: 47652.4). Total num frames: 805961728. Throughput: 0: 11777.9. Samples: 201560576. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:10,768][1648981] Avg episode reward: [(0, '436.080')] [2024-06-15 16:15:11,656][1651669] Updated weights for policy 0, policy_version 393584 (0.0012) [2024-06-15 16:15:12,922][1651669] Updated weights for policy 0, policy_version 393632 (0.0122) [2024-06-15 16:15:15,148][1651669] Updated weights for policy 0, policy_version 393667 (0.0012) [2024-06-15 16:15:15,767][1648981] Fps is (10 sec: 42594.5, 60 sec: 48605.1, 300 sec: 48207.7). Total num frames: 806289408. Throughput: 0: 11686.3. Samples: 201637888. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:15,768][1648981] Avg episode reward: [(0, '433.160')] [2024-06-15 16:15:16,701][1651669] Updated weights for policy 0, policy_version 393734 (0.0061) [2024-06-15 16:15:17,432][1651274] Signal inference workers to stop experience collection... (20700 times) [2024-06-15 16:15:17,485][1651669] InferenceWorker_p0-w0: stopping experience collection (20700 times) [2024-06-15 16:15:17,685][1651274] Signal inference workers to resume experience collection... (20700 times) [2024-06-15 16:15:17,686][1651669] InferenceWorker_p0-w0: resuming experience collection (20700 times) [2024-06-15 16:15:20,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 45333.5, 300 sec: 47763.5). Total num frames: 806486016. Throughput: 0: 11653.9. Samples: 201704448. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:20,767][1648981] Avg episode reward: [(0, '419.880')] [2024-06-15 16:15:22,229][1651669] Updated weights for policy 0, policy_version 393808 (0.0012) [2024-06-15 16:15:24,288][1651669] Updated weights for policy 0, policy_version 393888 (0.0012) [2024-06-15 16:15:25,766][1648981] Fps is (10 sec: 45879.0, 60 sec: 48066.0, 300 sec: 47985.7). Total num frames: 806748160. Throughput: 0: 11468.8. Samples: 201735168. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:25,767][1648981] Avg episode reward: [(0, '415.080')] [2024-06-15 16:15:27,428][1651669] Updated weights for policy 0, policy_version 393941 (0.0012) [2024-06-15 16:15:29,034][1651669] Updated weights for policy 0, policy_version 394006 (0.0012) [2024-06-15 16:15:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45879.8, 300 sec: 47874.6). Total num frames: 807010304. Throughput: 0: 11514.3. Samples: 201802752. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:30,767][1648981] Avg episode reward: [(0, '400.000')] [2024-06-15 16:15:33,514][1651669] Updated weights for policy 0, policy_version 394064 (0.0013) [2024-06-15 16:15:34,546][1651669] Updated weights for policy 0, policy_version 394112 (0.0010) [2024-06-15 16:15:35,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 47522.9, 300 sec: 47874.6). Total num frames: 807239680. Throughput: 0: 11480.2. Samples: 201875456. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:35,767][1648981] Avg episode reward: [(0, '391.840')] [2024-06-15 16:15:35,902][1651669] Updated weights for policy 0, policy_version 394174 (0.0109) [2024-06-15 16:15:39,340][1651669] Updated weights for policy 0, policy_version 394241 (0.0013) [2024-06-15 16:15:40,605][1651669] Updated weights for policy 0, policy_version 394299 (0.0011) [2024-06-15 16:15:40,787][1648981] Fps is (10 sec: 52322.9, 60 sec: 47497.6, 300 sec: 47982.4). Total num frames: 807534592. Throughput: 0: 11668.4. Samples: 201914880. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:40,788][1648981] Avg episode reward: [(0, '395.560')] [2024-06-15 16:15:45,548][1651669] Updated weights for policy 0, policy_version 394384 (0.0012) [2024-06-15 16:15:45,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 807698432. Throughput: 0: 11616.7. Samples: 201986048. Policy #0 lag: (min: 15.0, avg: 74.7, max: 271.0) [2024-06-15 16:15:45,767][1648981] Avg episode reward: [(0, '404.220')] [2024-06-15 16:15:48,919][1651669] Updated weights for policy 0, policy_version 394433 (0.0060) [2024-06-15 16:15:50,315][1651669] Updated weights for policy 0, policy_version 394490 (0.0011) [2024-06-15 16:15:50,767][1648981] Fps is (10 sec: 39400.6, 60 sec: 47513.4, 300 sec: 47985.7). Total num frames: 807927808. Throughput: 0: 11787.3. Samples: 202058240. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:15:50,767][1648981] Avg episode reward: [(0, '422.240')] [2024-06-15 16:15:51,515][1651669] Updated weights for policy 0, policy_version 394535 (0.0012) [2024-06-15 16:15:54,542][1651669] Updated weights for policy 0, policy_version 394563 (0.0018) [2024-06-15 16:15:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45329.3, 300 sec: 47430.7). Total num frames: 808157184. Throughput: 0: 11969.5. Samples: 202099200. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:15:55,767][1648981] Avg episode reward: [(0, '420.500')] [2024-06-15 16:15:56,107][1651669] Updated weights for policy 0, policy_version 394626 (0.0014) [2024-06-15 16:15:59,619][1651669] Updated weights for policy 0, policy_version 394705 (0.0015) [2024-06-15 16:16:00,615][1651669] Updated weights for policy 0, policy_version 394752 (0.0017) [2024-06-15 16:16:00,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 48605.9, 300 sec: 48431.9). Total num frames: 808452096. Throughput: 0: 11764.9. Samples: 202167296. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:00,767][1648981] Avg episode reward: [(0, '399.690')] [2024-06-15 16:16:01,084][1651274] Signal inference workers to stop experience collection... (20750 times) [2024-06-15 16:16:01,126][1651669] InferenceWorker_p0-w0: stopping experience collection (20750 times) [2024-06-15 16:16:01,406][1651274] Signal inference workers to resume experience collection... (20750 times) [2024-06-15 16:16:01,407][1651669] InferenceWorker_p0-w0: resuming experience collection (20750 times) [2024-06-15 16:16:02,504][1651669] Updated weights for policy 0, policy_version 394815 (0.0013) [2024-06-15 16:16:05,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 45329.0, 300 sec: 47652.5). Total num frames: 808583168. Throughput: 0: 11969.4. Samples: 202243072. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:05,767][1648981] Avg episode reward: [(0, '408.870')] [2024-06-15 16:16:07,252][1651669] Updated weights for policy 0, policy_version 394869 (0.0012) [2024-06-15 16:16:08,609][1651669] Updated weights for policy 0, policy_version 394943 (0.0013) [2024-06-15 16:16:10,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 48606.2, 300 sec: 48096.7). Total num frames: 808878080. Throughput: 0: 11787.3. Samples: 202265600. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:10,767][1648981] Avg episode reward: [(0, '414.700')] [2024-06-15 16:16:11,562][1651669] Updated weights for policy 0, policy_version 395006 (0.0013) [2024-06-15 16:16:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46968.1, 300 sec: 47874.6). Total num frames: 809107456. Throughput: 0: 11855.7. Samples: 202336256. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:15,767][1648981] Avg episode reward: [(0, '409.130')] [2024-06-15 16:16:17,811][1651669] Updated weights for policy 0, policy_version 395074 (0.0015) [2024-06-15 16:16:19,133][1651669] Updated weights for policy 0, policy_version 395136 (0.0013) [2024-06-15 16:16:20,413][1651669] Updated weights for policy 0, policy_version 395199 (0.0038) [2024-06-15 16:16:20,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 809369600. Throughput: 0: 12003.6. Samples: 202415616. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:20,767][1648981] Avg episode reward: [(0, '412.700')] [2024-06-15 16:16:22,124][1651669] Updated weights for policy 0, policy_version 395256 (0.0013) [2024-06-15 16:16:23,951][1651669] Updated weights for policy 0, policy_version 395297 (0.0014) [2024-06-15 16:16:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48319.6). Total num frames: 809631744. Throughput: 0: 11986.2. Samples: 202454016. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:25,767][1648981] Avg episode reward: [(0, '416.780')] [2024-06-15 16:16:28,491][1651669] Updated weights for policy 0, policy_version 395348 (0.0015) [2024-06-15 16:16:30,073][1651669] Updated weights for policy 0, policy_version 395411 (0.0024) [2024-06-15 16:16:30,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 809861120. Throughput: 0: 12049.1. Samples: 202528256. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:30,767][1648981] Avg episode reward: [(0, '420.000')] [2024-06-15 16:16:30,948][1651669] Updated weights for policy 0, policy_version 395456 (0.0012) [2024-06-15 16:16:32,520][1651669] Updated weights for policy 0, policy_version 395509 (0.0031) [2024-06-15 16:16:33,939][1651669] Updated weights for policy 0, policy_version 395538 (0.0012) [2024-06-15 16:16:34,968][1651669] Updated weights for policy 0, policy_version 395584 (0.0041) [2024-06-15 16:16:35,795][1648981] Fps is (10 sec: 52277.4, 60 sec: 48582.4, 300 sec: 48429.2). Total num frames: 810156032. Throughput: 0: 12075.5. Samples: 202601984. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:35,797][1648981] Avg episode reward: [(0, '409.370')] [2024-06-15 16:16:40,401][1651669] Updated weights for policy 0, policy_version 395651 (0.0013) [2024-06-15 16:16:40,767][1648981] Fps is (10 sec: 45873.5, 60 sec: 46436.7, 300 sec: 47652.4). Total num frames: 810319872. Throughput: 0: 12128.6. Samples: 202644992. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:40,767][1648981] Avg episode reward: [(0, '405.430')] [2024-06-15 16:16:41,635][1651669] Updated weights for policy 0, policy_version 395706 (0.0021) [2024-06-15 16:16:42,798][1651274] Signal inference workers to stop experience collection... (20800 times) [2024-06-15 16:16:42,852][1651669] InferenceWorker_p0-w0: stopping experience collection (20800 times) [2024-06-15 16:16:43,034][1651274] Signal inference workers to resume experience collection... (20800 times) [2024-06-15 16:16:43,034][1651669] InferenceWorker_p0-w0: resuming experience collection (20800 times) [2024-06-15 16:16:43,376][1651669] Updated weights for policy 0, policy_version 395760 (0.0013) [2024-06-15 16:16:45,672][1651669] Updated weights for policy 0, policy_version 395831 (0.0011) [2024-06-15 16:16:45,767][1648981] Fps is (10 sec: 49294.0, 60 sec: 49151.9, 300 sec: 48318.9). Total num frames: 810647552. Throughput: 0: 12003.5. Samples: 202707456. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:45,768][1648981] Avg episode reward: [(0, '394.530')] [2024-06-15 16:16:50,222][1651669] Updated weights for policy 0, policy_version 395861 (0.0014) [2024-06-15 16:16:50,766][1648981] Fps is (10 sec: 45877.1, 60 sec: 47513.8, 300 sec: 47431.6). Total num frames: 810778624. Throughput: 0: 11958.1. Samples: 202781184. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:50,767][1648981] Avg episode reward: [(0, '404.010')] [2024-06-15 16:16:51,777][1651669] Updated weights for policy 0, policy_version 395936 (0.0061) [2024-06-15 16:16:53,635][1651669] Updated weights for policy 0, policy_version 395984 (0.0041) [2024-06-15 16:16:54,758][1651669] Updated weights for policy 0, policy_version 396032 (0.0014) [2024-06-15 16:16:55,787][1648981] Fps is (10 sec: 45780.2, 60 sec: 49134.9, 300 sec: 48315.5). Total num frames: 811106304. Throughput: 0: 12168.6. Samples: 202813440. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:16:55,788][1648981] Avg episode reward: [(0, '410.400')] [2024-06-15 16:16:56,072][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000396064_811139072.pth... [2024-06-15 16:16:56,286][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000390464_799670272.pth [2024-06-15 16:17:00,789][1648981] Fps is (10 sec: 42503.9, 60 sec: 45858.2, 300 sec: 47315.6). Total num frames: 811204608. Throughput: 0: 12190.9. Samples: 202885120. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:17:00,789][1648981] Avg episode reward: [(0, '416.010')] [2024-06-15 16:17:01,423][1651669] Updated weights for policy 0, policy_version 396098 (0.0014) [2024-06-15 16:17:02,773][1651669] Updated weights for policy 0, policy_version 396163 (0.0012) [2024-06-15 16:17:04,159][1651669] Updated weights for policy 0, policy_version 396215 (0.0013) [2024-06-15 16:17:05,266][1651669] Updated weights for policy 0, policy_version 396260 (0.0015) [2024-06-15 16:17:05,770][1648981] Fps is (10 sec: 49236.1, 60 sec: 50241.1, 300 sec: 48318.3). Total num frames: 811597824. Throughput: 0: 11888.8. Samples: 202950656. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:17:05,771][1648981] Avg episode reward: [(0, '413.110')] [2024-06-15 16:17:07,080][1651669] Updated weights for policy 0, policy_version 396306 (0.0015) [2024-06-15 16:17:10,766][1648981] Fps is (10 sec: 52545.7, 60 sec: 47513.7, 300 sec: 47654.6). Total num frames: 811728896. Throughput: 0: 11855.6. Samples: 202987520. Policy #0 lag: (min: 15.0, avg: 114.4, max: 271.0) [2024-06-15 16:17:10,767][1648981] Avg episode reward: [(0, '386.830')] [2024-06-15 16:17:12,778][1651669] Updated weights for policy 0, policy_version 396386 (0.0011) [2024-06-15 16:17:14,912][1651669] Updated weights for policy 0, policy_version 396464 (0.0012) [2024-06-15 16:17:15,770][1648981] Fps is (10 sec: 39321.4, 60 sec: 48056.6, 300 sec: 47985.1). Total num frames: 811991040. Throughput: 0: 11820.5. Samples: 203060224. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:15,771][1648981] Avg episode reward: [(0, '396.620')] [2024-06-15 16:17:16,452][1651669] Updated weights for policy 0, policy_version 396516 (0.0088) [2024-06-15 16:17:17,829][1651669] Updated weights for policy 0, policy_version 396549 (0.0024) [2024-06-15 16:17:18,792][1651669] Updated weights for policy 0, policy_version 396604 (0.0014) [2024-06-15 16:17:20,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 812253184. Throughput: 0: 11908.8. Samples: 203137536. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:20,767][1648981] Avg episode reward: [(0, '398.800')] [2024-06-15 16:17:23,687][1651669] Updated weights for policy 0, policy_version 396668 (0.0015) [2024-06-15 16:17:25,669][1651669] Updated weights for policy 0, policy_version 396733 (0.0045) [2024-06-15 16:17:25,784][1648981] Fps is (10 sec: 52357.3, 60 sec: 48045.7, 300 sec: 47871.8). Total num frames: 812515328. Throughput: 0: 11691.9. Samples: 203171328. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:25,785][1648981] Avg episode reward: [(0, '419.890')] [2024-06-15 16:17:26,749][1651274] Signal inference workers to stop experience collection... (20850 times) [2024-06-15 16:17:26,793][1651669] InferenceWorker_p0-w0: stopping experience collection (20850 times) [2024-06-15 16:17:27,047][1651274] Signal inference workers to resume experience collection... (20850 times) [2024-06-15 16:17:27,047][1651669] InferenceWorker_p0-w0: resuming experience collection (20850 times) [2024-06-15 16:17:27,715][1651669] Updated weights for policy 0, policy_version 396796 (0.0014) [2024-06-15 16:17:30,339][1651669] Updated weights for policy 0, policy_version 396848 (0.0041) [2024-06-15 16:17:30,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 812777472. Throughput: 0: 11821.6. Samples: 203239424. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:30,767][1648981] Avg episode reward: [(0, '430.100')] [2024-06-15 16:17:35,180][1651669] Updated weights for policy 0, policy_version 396898 (0.0012) [2024-06-15 16:17:35,770][1648981] Fps is (10 sec: 36094.6, 60 sec: 45348.1, 300 sec: 47430.3). Total num frames: 812875776. Throughput: 0: 11775.0. Samples: 203311104. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:35,771][1648981] Avg episode reward: [(0, '438.430')] [2024-06-15 16:17:36,061][1651669] Updated weights for policy 0, policy_version 396929 (0.0013) [2024-06-15 16:17:37,748][1651669] Updated weights for policy 0, policy_version 397008 (0.0018) [2024-06-15 16:17:40,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 47513.9, 300 sec: 47543.3). Total num frames: 813170688. Throughput: 0: 11588.0. Samples: 203334656. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:40,767][1648981] Avg episode reward: [(0, '435.580')] [2024-06-15 16:17:41,138][1651669] Updated weights for policy 0, policy_version 397073 (0.0076) [2024-06-15 16:17:45,702][1651669] Updated weights for policy 0, policy_version 397136 (0.0013) [2024-06-15 16:17:45,766][1648981] Fps is (10 sec: 45892.4, 60 sec: 44783.0, 300 sec: 47208.2). Total num frames: 813334528. Throughput: 0: 11781.8. Samples: 203415040. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:45,767][1648981] Avg episode reward: [(0, '408.950')] [2024-06-15 16:17:46,896][1651669] Updated weights for policy 0, policy_version 397176 (0.0012) [2024-06-15 16:17:48,623][1651669] Updated weights for policy 0, policy_version 397248 (0.0130) [2024-06-15 16:17:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48605.8, 300 sec: 47542.3). Total num frames: 813694976. Throughput: 0: 11720.1. Samples: 203478016. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:50,767][1648981] Avg episode reward: [(0, '412.540')] [2024-06-15 16:17:51,964][1651669] Updated weights for policy 0, policy_version 397328 (0.0012) [2024-06-15 16:17:52,734][1651669] Updated weights for policy 0, policy_version 397370 (0.0012) [2024-06-15 16:17:55,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45344.9, 300 sec: 47098.3). Total num frames: 813826048. Throughput: 0: 11844.3. Samples: 203520512. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:17:55,767][1648981] Avg episode reward: [(0, '412.700')] [2024-06-15 16:17:57,759][1651669] Updated weights for policy 0, policy_version 397438 (0.0013) [2024-06-15 16:17:59,182][1651669] Updated weights for policy 0, policy_version 397488 (0.0011) [2024-06-15 16:18:00,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49170.2, 300 sec: 47319.3). Total num frames: 814153728. Throughput: 0: 11913.6. Samples: 203596288. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:00,767][1648981] Avg episode reward: [(0, '409.630')] [2024-06-15 16:18:00,959][1651669] Updated weights for policy 0, policy_version 397558 (0.0025) [2024-06-15 16:18:03,389][1651669] Updated weights for policy 0, policy_version 397606 (0.0011) [2024-06-15 16:18:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45878.1, 300 sec: 47319.3). Total num frames: 814350336. Throughput: 0: 11844.3. Samples: 203670528. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:05,767][1648981] Avg episode reward: [(0, '406.360')] [2024-06-15 16:18:07,595][1651669] Updated weights for policy 0, policy_version 397649 (0.0011) [2024-06-15 16:18:08,445][1651274] Signal inference workers to stop experience collection... (20900 times) [2024-06-15 16:18:08,491][1651669] InferenceWorker_p0-w0: stopping experience collection (20900 times) [2024-06-15 16:18:08,693][1651274] Signal inference workers to resume experience collection... (20900 times) [2024-06-15 16:18:08,693][1651669] InferenceWorker_p0-w0: resuming experience collection (20900 times) [2024-06-15 16:18:08,806][1651669] Updated weights for policy 0, policy_version 397696 (0.0124) [2024-06-15 16:18:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 47319.9). Total num frames: 814612480. Throughput: 0: 11939.9. Samples: 203708416. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:10,767][1648981] Avg episode reward: [(0, '402.590')] [2024-06-15 16:18:10,827][1651669] Updated weights for policy 0, policy_version 397761 (0.0011) [2024-06-15 16:18:12,227][1651669] Updated weights for policy 0, policy_version 397816 (0.0012) [2024-06-15 16:18:14,802][1651669] Updated weights for policy 0, policy_version 397872 (0.0012) [2024-06-15 16:18:15,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48062.8, 300 sec: 47652.4). Total num frames: 814874624. Throughput: 0: 11628.1. Samples: 203762688. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:15,767][1648981] Avg episode reward: [(0, '396.160')] [2024-06-15 16:18:19,852][1651669] Updated weights for policy 0, policy_version 397921 (0.0013) [2024-06-15 16:18:20,560][1651669] Updated weights for policy 0, policy_version 397949 (0.0030) [2024-06-15 16:18:20,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45875.3, 300 sec: 47319.2). Total num frames: 815005696. Throughput: 0: 11708.7. Samples: 203837952. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:20,767][1648981] Avg episode reward: [(0, '376.650')] [2024-06-15 16:18:21,885][1651669] Updated weights for policy 0, policy_version 398005 (0.0011) [2024-06-15 16:18:23,172][1651669] Updated weights for policy 0, policy_version 398049 (0.0029) [2024-06-15 16:18:25,771][1648981] Fps is (10 sec: 45856.1, 60 sec: 46977.9, 300 sec: 47764.0). Total num frames: 815333376. Throughput: 0: 11831.8. Samples: 203867136. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:25,771][1648981] Avg episode reward: [(0, '369.110')] [2024-06-15 16:18:26,020][1651669] Updated weights for policy 0, policy_version 398128 (0.0011) [2024-06-15 16:18:30,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 47097.1). Total num frames: 815398912. Throughput: 0: 11673.6. Samples: 203940352. Policy #0 lag: (min: 1.0, avg: 76.2, max: 257.0) [2024-06-15 16:18:30,767][1648981] Avg episode reward: [(0, '353.230')] [2024-06-15 16:18:31,738][1651669] Updated weights for policy 0, policy_version 398192 (0.0012) [2024-06-15 16:18:33,086][1651669] Updated weights for policy 0, policy_version 398225 (0.0011) [2024-06-15 16:18:34,798][1651669] Updated weights for policy 0, policy_version 398289 (0.0012) [2024-06-15 16:18:35,729][1651669] Updated weights for policy 0, policy_version 398332 (0.0010) [2024-06-15 16:18:35,774][1648981] Fps is (10 sec: 42585.0, 60 sec: 48056.8, 300 sec: 47429.1). Total num frames: 815759360. Throughput: 0: 11671.7. Samples: 204003328. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:18:35,774][1648981] Avg episode reward: [(0, '353.610')] [2024-06-15 16:18:37,158][1651669] Updated weights for policy 0, policy_version 398391 (0.0014) [2024-06-15 16:18:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 815923200. Throughput: 0: 11594.0. Samples: 204042240. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:18:40,767][1648981] Avg episode reward: [(0, '365.560')] [2024-06-15 16:18:42,753][1651669] Updated weights for policy 0, policy_version 398437 (0.0015) [2024-06-15 16:18:44,341][1651669] Updated weights for policy 0, policy_version 398485 (0.0014) [2024-06-15 16:18:45,767][1648981] Fps is (10 sec: 42628.9, 60 sec: 47513.4, 300 sec: 47098.9). Total num frames: 816185344. Throughput: 0: 11582.5. Samples: 204117504. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:18:45,767][1648981] Avg episode reward: [(0, '367.060')] [2024-06-15 16:18:45,995][1651669] Updated weights for policy 0, policy_version 398546 (0.0025) [2024-06-15 16:18:47,760][1651274] Signal inference workers to stop experience collection... (20950 times) [2024-06-15 16:18:47,809][1651669] InferenceWorker_p0-w0: stopping experience collection (20950 times) [2024-06-15 16:18:47,817][1651669] Updated weights for policy 0, policy_version 398611 (0.0101) [2024-06-15 16:18:48,002][1651274] Signal inference workers to resume experience collection... (20950 times) [2024-06-15 16:18:48,003][1651669] InferenceWorker_p0-w0: resuming experience collection (20950 times) [2024-06-15 16:18:50,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 45875.1, 300 sec: 47430.3). Total num frames: 816447488. Throughput: 0: 11389.1. Samples: 204183040. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:18:50,767][1648981] Avg episode reward: [(0, '373.860')] [2024-06-15 16:18:53,660][1651669] Updated weights for policy 0, policy_version 398675 (0.0135) [2024-06-15 16:18:54,913][1651669] Updated weights for policy 0, policy_version 398723 (0.0013) [2024-06-15 16:18:55,794][1648981] Fps is (10 sec: 45750.6, 60 sec: 46946.0, 300 sec: 46872.0). Total num frames: 816644096. Throughput: 0: 11473.2. Samples: 204225024. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:18:55,794][1648981] Avg episode reward: [(0, '384.860')] [2024-06-15 16:18:56,078][1651669] Updated weights for policy 0, policy_version 398782 (0.0160) [2024-06-15 16:18:56,098][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000398784_816709632.pth... [2024-06-15 16:18:56,164][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000393280_805437440.pth [2024-06-15 16:18:57,581][1651669] Updated weights for policy 0, policy_version 398820 (0.0011) [2024-06-15 16:18:59,383][1651669] Updated weights for policy 0, policy_version 398896 (0.0018) [2024-06-15 16:19:00,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 816971776. Throughput: 0: 11480.2. Samples: 204279296. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:00,767][1648981] Avg episode reward: [(0, '413.920')] [2024-06-15 16:19:04,467][1651669] Updated weights for policy 0, policy_version 398918 (0.0022) [2024-06-15 16:19:05,767][1648981] Fps is (10 sec: 42714.8, 60 sec: 45328.9, 300 sec: 46541.7). Total num frames: 817070080. Throughput: 0: 11810.1. Samples: 204369408. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:05,767][1648981] Avg episode reward: [(0, '411.630')] [2024-06-15 16:19:06,206][1651669] Updated weights for policy 0, policy_version 398992 (0.0095) [2024-06-15 16:19:08,143][1651669] Updated weights for policy 0, policy_version 399057 (0.0017) [2024-06-15 16:19:09,530][1651669] Updated weights for policy 0, policy_version 399120 (0.0013) [2024-06-15 16:19:10,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48059.9, 300 sec: 47874.6). Total num frames: 817496064. Throughput: 0: 11788.5. Samples: 204397568. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:10,767][1648981] Avg episode reward: [(0, '422.360')] [2024-06-15 16:19:15,433][1651669] Updated weights for policy 0, policy_version 399184 (0.0012) [2024-06-15 16:19:15,769][1648981] Fps is (10 sec: 45865.6, 60 sec: 44235.2, 300 sec: 46653.3). Total num frames: 817528832. Throughput: 0: 11911.9. Samples: 204476416. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:15,769][1648981] Avg episode reward: [(0, '419.700')] [2024-06-15 16:19:17,192][1651669] Updated weights for policy 0, policy_version 399256 (0.0012) [2024-06-15 16:19:18,779][1651669] Updated weights for policy 0, policy_version 399312 (0.0011) [2024-06-15 16:19:19,601][1651669] Updated weights for policy 0, policy_version 399359 (0.0015) [2024-06-15 16:19:20,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 48605.7, 300 sec: 47653.7). Total num frames: 817922048. Throughput: 0: 11903.1. Samples: 204538880. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:20,767][1648981] Avg episode reward: [(0, '412.770')] [2024-06-15 16:19:25,766][1648981] Fps is (10 sec: 49162.9, 60 sec: 44786.0, 300 sec: 46653.7). Total num frames: 818020352. Throughput: 0: 11867.0. Samples: 204576256. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:25,767][1648981] Avg episode reward: [(0, '444.010')] [2024-06-15 16:19:26,352][1651669] Updated weights for policy 0, policy_version 399440 (0.0087) [2024-06-15 16:19:28,965][1651669] Updated weights for policy 0, policy_version 399536 (0.0012) [2024-06-15 16:19:29,992][1651274] Signal inference workers to stop experience collection... (21000 times) [2024-06-15 16:19:30,051][1651669] InferenceWorker_p0-w0: stopping experience collection (21000 times) [2024-06-15 16:19:30,157][1651274] Signal inference workers to resume experience collection... (21000 times) [2024-06-15 16:19:30,159][1651669] InferenceWorker_p0-w0: resuming experience collection (21000 times) [2024-06-15 16:19:30,341][1651669] Updated weights for policy 0, policy_version 399587 (0.0014) [2024-06-15 16:19:30,770][1648981] Fps is (10 sec: 49133.9, 60 sec: 50241.0, 300 sec: 47542.6). Total num frames: 818413568. Throughput: 0: 11570.3. Samples: 204638208. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:30,771][1648981] Avg episode reward: [(0, '442.910')] [2024-06-15 16:19:33,264][1651669] Updated weights for policy 0, policy_version 399652 (0.0012) [2024-06-15 16:19:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46427.0, 300 sec: 46986.0). Total num frames: 818544640. Throughput: 0: 11958.1. Samples: 204721152. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:35,767][1648981] Avg episode reward: [(0, '444.300')] [2024-06-15 16:19:37,176][1651669] Updated weights for policy 0, policy_version 399712 (0.0015) [2024-06-15 16:19:39,854][1651669] Updated weights for policy 0, policy_version 399776 (0.0013) [2024-06-15 16:19:40,766][1648981] Fps is (10 sec: 39336.6, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 818806784. Throughput: 0: 11840.1. Samples: 204757504. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:40,767][1648981] Avg episode reward: [(0, '441.030')] [2024-06-15 16:19:41,330][1651669] Updated weights for policy 0, policy_version 399844 (0.0032) [2024-06-15 16:19:44,211][1651669] Updated weights for policy 0, policy_version 399922 (0.0012) [2024-06-15 16:19:45,770][1648981] Fps is (10 sec: 52409.3, 60 sec: 48056.9, 300 sec: 47429.7). Total num frames: 819068928. Throughput: 0: 11979.8. Samples: 204818432. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:45,771][1648981] Avg episode reward: [(0, '424.670')] [2024-06-15 16:19:48,106][1651669] Updated weights for policy 0, policy_version 399968 (0.0012) [2024-06-15 16:19:50,104][1651669] Updated weights for policy 0, policy_version 400003 (0.0013) [2024-06-15 16:19:50,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 46875.0). Total num frames: 819265536. Throughput: 0: 11764.7. Samples: 204898816. Policy #0 lag: (min: 15.0, avg: 102.0, max: 218.0) [2024-06-15 16:19:50,767][1648981] Avg episode reward: [(0, '411.450')] [2024-06-15 16:19:51,929][1651669] Updated weights for policy 0, policy_version 400080 (0.0013) [2024-06-15 16:19:55,322][1651669] Updated weights for policy 0, policy_version 400144 (0.0022) [2024-06-15 16:19:55,766][1648981] Fps is (10 sec: 45892.1, 60 sec: 48081.7, 300 sec: 47430.3). Total num frames: 819527680. Throughput: 0: 11707.7. Samples: 204924416. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:19:55,767][1648981] Avg episode reward: [(0, '417.410')] [2024-06-15 16:19:58,702][1651669] Updated weights for policy 0, policy_version 400208 (0.0029) [2024-06-15 16:20:00,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 819724288. Throughput: 0: 11674.2. Samples: 205001728. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:00,767][1648981] Avg episode reward: [(0, '406.710')] [2024-06-15 16:20:01,709][1651669] Updated weights for policy 0, policy_version 400275 (0.0013) [2024-06-15 16:20:03,104][1651669] Updated weights for policy 0, policy_version 400336 (0.0012) [2024-06-15 16:20:04,049][1651669] Updated weights for policy 0, policy_version 400384 (0.0012) [2024-06-15 16:20:05,768][1648981] Fps is (10 sec: 49145.0, 60 sec: 49150.9, 300 sec: 47652.3). Total num frames: 820019200. Throughput: 0: 11787.0. Samples: 205069312. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:05,768][1648981] Avg episode reward: [(0, '409.580')] [2024-06-15 16:20:10,377][1651669] Updated weights for policy 0, policy_version 400480 (0.0120) [2024-06-15 16:20:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 45328.9, 300 sec: 47208.3). Total num frames: 820215808. Throughput: 0: 11650.9. Samples: 205100544. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:10,767][1648981] Avg episode reward: [(0, '426.520')] [2024-06-15 16:20:12,922][1651669] Updated weights for policy 0, policy_version 400528 (0.0010) [2024-06-15 16:20:13,830][1651274] Signal inference workers to stop experience collection... (21050 times) [2024-06-15 16:20:13,865][1651669] InferenceWorker_p0-w0: stopping experience collection (21050 times) [2024-06-15 16:20:14,059][1651274] Signal inference workers to resume experience collection... (21050 times) [2024-06-15 16:20:14,060][1651669] InferenceWorker_p0-w0: resuming experience collection (21050 times) [2024-06-15 16:20:14,062][1651669] Updated weights for policy 0, policy_version 400576 (0.0013) [2024-06-15 16:20:15,774][1648981] Fps is (10 sec: 49120.7, 60 sec: 49693.5, 300 sec: 47540.1). Total num frames: 820510720. Throughput: 0: 12059.4. Samples: 205180928. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:15,775][1648981] Avg episode reward: [(0, '421.780')] [2024-06-15 16:20:17,019][1651669] Updated weights for policy 0, policy_version 400658 (0.0013) [2024-06-15 16:20:20,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 45329.1, 300 sec: 47097.0). Total num frames: 820641792. Throughput: 0: 11798.7. Samples: 205252096. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:20,767][1648981] Avg episode reward: [(0, '417.310')] [2024-06-15 16:20:20,934][1651669] Updated weights for policy 0, policy_version 400705 (0.0013) [2024-06-15 16:20:22,335][1651669] Updated weights for policy 0, policy_version 400765 (0.0150) [2024-06-15 16:20:25,141][1651669] Updated weights for policy 0, policy_version 400840 (0.0015) [2024-06-15 16:20:25,772][1648981] Fps is (10 sec: 45885.7, 60 sec: 49147.5, 300 sec: 47318.3). Total num frames: 820969472. Throughput: 0: 11797.3. Samples: 205288448. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:25,773][1648981] Avg episode reward: [(0, '395.700')] [2024-06-15 16:20:26,119][1651669] Updated weights for policy 0, policy_version 400895 (0.0011) [2024-06-15 16:20:29,037][1651669] Updated weights for policy 0, policy_version 400957 (0.0114) [2024-06-15 16:20:30,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 45877.9, 300 sec: 47208.1). Total num frames: 821166080. Throughput: 0: 11913.4. Samples: 205354496. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:30,768][1648981] Avg episode reward: [(0, '403.420')] [2024-06-15 16:20:33,019][1651669] Updated weights for policy 0, policy_version 401012 (0.0014) [2024-06-15 16:20:34,424][1651669] Updated weights for policy 0, policy_version 401026 (0.0011) [2024-06-15 16:20:35,766][1648981] Fps is (10 sec: 42621.9, 60 sec: 47513.6, 300 sec: 46989.2). Total num frames: 821395456. Throughput: 0: 11753.2. Samples: 205427712. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:35,767][1648981] Avg episode reward: [(0, '408.770')] [2024-06-15 16:20:36,132][1651669] Updated weights for policy 0, policy_version 401104 (0.0011) [2024-06-15 16:20:39,252][1651669] Updated weights for policy 0, policy_version 401160 (0.0130) [2024-06-15 16:20:40,766][1648981] Fps is (10 sec: 52430.5, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 821690368. Throughput: 0: 11912.5. Samples: 205460480. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:40,767][1648981] Avg episode reward: [(0, '397.770')] [2024-06-15 16:20:43,767][1651669] Updated weights for policy 0, policy_version 401220 (0.0012) [2024-06-15 16:20:45,169][1651669] Updated weights for policy 0, policy_version 401280 (0.0014) [2024-06-15 16:20:45,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 45878.1, 300 sec: 47097.1). Total num frames: 821821440. Throughput: 0: 11776.0. Samples: 205531648. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:45,767][1648981] Avg episode reward: [(0, '398.120')] [2024-06-15 16:20:47,063][1651669] Updated weights for policy 0, policy_version 401329 (0.0011) [2024-06-15 16:20:48,503][1651669] Updated weights for policy 0, policy_version 401394 (0.0014) [2024-06-15 16:20:50,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 46967.4, 300 sec: 47208.1). Total num frames: 822083584. Throughput: 0: 11844.6. Samples: 205602304. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:50,767][1648981] Avg episode reward: [(0, '391.200')] [2024-06-15 16:20:50,992][1651669] Updated weights for policy 0, policy_version 401425 (0.0011) [2024-06-15 16:20:52,052][1651669] Updated weights for policy 0, policy_version 401467 (0.0011) [2024-06-15 16:20:55,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 45875.0, 300 sec: 46874.8). Total num frames: 822280192. Throughput: 0: 11946.6. Samples: 205638144. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:20:55,767][1648981] Avg episode reward: [(0, '386.260')] [2024-06-15 16:20:56,168][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000401536_822345728.pth... [2024-06-15 16:20:56,184][1651669] Updated weights for policy 0, policy_version 401536 (0.0129) [2024-06-15 16:20:56,223][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000396064_811139072.pth [2024-06-15 16:20:57,240][1651274] Signal inference workers to stop experience collection... (21100 times) [2024-06-15 16:20:57,267][1651669] InferenceWorker_p0-w0: stopping experience collection (21100 times) [2024-06-15 16:20:57,497][1651274] Signal inference workers to resume experience collection... (21100 times) [2024-06-15 16:20:57,498][1651669] InferenceWorker_p0-w0: resuming experience collection (21100 times) [2024-06-15 16:20:58,911][1651669] Updated weights for policy 0, policy_version 401620 (0.0089) [2024-06-15 16:21:00,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 822607872. Throughput: 0: 11413.9. Samples: 205694464. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:21:00,767][1648981] Avg episode reward: [(0, '402.950')] [2024-06-15 16:21:01,844][1651669] Updated weights for policy 0, policy_version 401680 (0.0012) [2024-06-15 16:21:02,724][1651669] Updated weights for policy 0, policy_version 401722 (0.0012) [2024-06-15 16:21:05,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 45330.1, 300 sec: 46986.0). Total num frames: 822738944. Throughput: 0: 11787.4. Samples: 205782528. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:21:05,767][1648981] Avg episode reward: [(0, '404.400')] [2024-06-15 16:21:07,008][1651669] Updated weights for policy 0, policy_version 401776 (0.0014) [2024-06-15 16:21:07,897][1651669] Updated weights for policy 0, policy_version 401808 (0.0010) [2024-06-15 16:21:08,742][1651669] Updated weights for policy 0, policy_version 401848 (0.0011) [2024-06-15 16:21:10,522][1651669] Updated weights for policy 0, policy_version 401905 (0.0012) [2024-06-15 16:21:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.8, 300 sec: 47541.3). Total num frames: 823132160. Throughput: 0: 11697.8. Samples: 205814784. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:21:10,767][1648981] Avg episode reward: [(0, '402.030')] [2024-06-15 16:21:12,503][1651669] Updated weights for policy 0, policy_version 401945 (0.0020) [2024-06-15 16:21:13,314][1651669] Updated weights for policy 0, policy_version 401979 (0.0012) [2024-06-15 16:21:15,777][1648981] Fps is (10 sec: 52372.0, 60 sec: 45872.9, 300 sec: 47095.3). Total num frames: 823263232. Throughput: 0: 11887.0. Samples: 205889536. Policy #0 lag: (min: 15.0, avg: 146.0, max: 271.0) [2024-06-15 16:21:15,778][1648981] Avg episode reward: [(0, '404.650')] [2024-06-15 16:21:17,246][1651669] Updated weights for policy 0, policy_version 402032 (0.0014) [2024-06-15 16:21:18,097][1651669] Updated weights for policy 0, policy_version 402065 (0.0013) [2024-06-15 16:21:19,022][1651669] Updated weights for policy 0, policy_version 402110 (0.0012) [2024-06-15 16:21:20,397][1651669] Updated weights for policy 0, policy_version 402170 (0.0012) [2024-06-15 16:21:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50244.4, 300 sec: 47541.4). Total num frames: 823656448. Throughput: 0: 11867.0. Samples: 205961728. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:20,767][1648981] Avg episode reward: [(0, '393.990')] [2024-06-15 16:21:23,223][1651669] Updated weights for policy 0, policy_version 402224 (0.0011) [2024-06-15 16:21:25,766][1648981] Fps is (10 sec: 52485.7, 60 sec: 46971.8, 300 sec: 47208.1). Total num frames: 823787520. Throughput: 0: 12037.7. Samples: 206002176. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:25,767][1648981] Avg episode reward: [(0, '404.640')] [2024-06-15 16:21:27,124][1651669] Updated weights for policy 0, policy_version 402260 (0.0017) [2024-06-15 16:21:28,347][1651669] Updated weights for policy 0, policy_version 402320 (0.0023) [2024-06-15 16:21:29,475][1651669] Updated weights for policy 0, policy_version 402366 (0.0012) [2024-06-15 16:21:30,786][1648981] Fps is (10 sec: 42514.1, 60 sec: 48590.1, 300 sec: 47209.6). Total num frames: 824082432. Throughput: 0: 12146.1. Samples: 206078464. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:30,787][1648981] Avg episode reward: [(0, '441.480')] [2024-06-15 16:21:31,361][1651669] Updated weights for policy 0, policy_version 402425 (0.0013) [2024-06-15 16:21:34,030][1651669] Updated weights for policy 0, policy_version 402494 (0.0014) [2024-06-15 16:21:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 47430.4). Total num frames: 824311808. Throughput: 0: 12219.7. Samples: 206152192. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:35,767][1648981] Avg episode reward: [(0, '428.020')] [2024-06-15 16:21:37,625][1651274] Signal inference workers to stop experience collection... (21150 times) [2024-06-15 16:21:37,688][1651669] InferenceWorker_p0-w0: stopping experience collection (21150 times) [2024-06-15 16:21:37,886][1651274] Signal inference workers to resume experience collection... (21150 times) [2024-06-15 16:21:37,906][1651669] InferenceWorker_p0-w0: resuming experience collection (21150 times) [2024-06-15 16:21:38,254][1651669] Updated weights for policy 0, policy_version 402544 (0.0013) [2024-06-15 16:21:40,197][1651669] Updated weights for policy 0, policy_version 402618 (0.0103) [2024-06-15 16:21:40,767][1648981] Fps is (10 sec: 49248.4, 60 sec: 48059.6, 300 sec: 47208.1). Total num frames: 824573952. Throughput: 0: 12299.4. Samples: 206191616. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:40,767][1648981] Avg episode reward: [(0, '461.790')] [2024-06-15 16:21:42,392][1651669] Updated weights for policy 0, policy_version 402672 (0.0013) [2024-06-15 16:21:45,293][1651669] Updated weights for policy 0, policy_version 402736 (0.0015) [2024-06-15 16:21:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.1, 300 sec: 47652.4). Total num frames: 824836096. Throughput: 0: 12435.9. Samples: 206254080. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:45,767][1648981] Avg episode reward: [(0, '454.370')] [2024-06-15 16:21:49,700][1651669] Updated weights for policy 0, policy_version 402785 (0.0014) [2024-06-15 16:21:50,766][1648981] Fps is (10 sec: 39322.7, 60 sec: 48059.8, 300 sec: 46989.3). Total num frames: 824967168. Throughput: 0: 12128.7. Samples: 206328320. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:50,767][1648981] Avg episode reward: [(0, '454.890')] [2024-06-15 16:21:51,232][1651669] Updated weights for policy 0, policy_version 402848 (0.0012) [2024-06-15 16:21:53,740][1651669] Updated weights for policy 0, policy_version 402914 (0.0011) [2024-06-15 16:21:55,111][1651669] Updated weights for policy 0, policy_version 402948 (0.0020) [2024-06-15 16:21:55,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 50244.5, 300 sec: 47767.1). Total num frames: 825294848. Throughput: 0: 12071.9. Samples: 206358016. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:21:55,767][1648981] Avg episode reward: [(0, '465.850')] [2024-06-15 16:21:56,280][1651669] Updated weights for policy 0, policy_version 403008 (0.0011) [2024-06-15 16:22:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.5, 300 sec: 46764.4). Total num frames: 825393152. Throughput: 0: 12154.4. Samples: 206436352. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:00,767][1648981] Avg episode reward: [(0, '457.450')] [2024-06-15 16:22:01,899][1651669] Updated weights for policy 0, policy_version 403073 (0.0012) [2024-06-15 16:22:03,116][1651669] Updated weights for policy 0, policy_version 403131 (0.0014) [2024-06-15 16:22:05,062][1651669] Updated weights for policy 0, policy_version 403190 (0.0013) [2024-06-15 16:22:05,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 50244.2, 300 sec: 47541.3). Total num frames: 825753600. Throughput: 0: 12014.9. Samples: 206502400. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:05,767][1648981] Avg episode reward: [(0, '456.920')] [2024-06-15 16:22:06,794][1651669] Updated weights for policy 0, policy_version 403233 (0.0011) [2024-06-15 16:22:10,788][1648981] Fps is (10 sec: 52317.3, 60 sec: 46405.0, 300 sec: 47205.4). Total num frames: 825917440. Throughput: 0: 11895.6. Samples: 206537728. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:10,788][1648981] Avg episode reward: [(0, '450.560')] [2024-06-15 16:22:10,789][1651669] Updated weights for policy 0, policy_version 403280 (0.0018) [2024-06-15 16:22:11,438][1651669] Updated weights for policy 0, policy_version 403318 (0.0012) [2024-06-15 16:22:12,734][1651669] Updated weights for policy 0, policy_version 403362 (0.0012) [2024-06-15 16:22:14,543][1651669] Updated weights for policy 0, policy_version 403411 (0.0012) [2024-06-15 16:22:15,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50253.4, 300 sec: 47541.4). Total num frames: 826277888. Throughput: 0: 12088.5. Samples: 206622208. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:15,767][1648981] Avg episode reward: [(0, '447.080')] [2024-06-15 16:22:17,055][1651669] Updated weights for policy 0, policy_version 403457 (0.0016) [2024-06-15 16:22:18,042][1651669] Updated weights for policy 0, policy_version 403506 (0.0024) [2024-06-15 16:22:20,733][1651274] Signal inference workers to stop experience collection... (21200 times) [2024-06-15 16:22:20,766][1648981] Fps is (10 sec: 49256.8, 60 sec: 45875.2, 300 sec: 47099.9). Total num frames: 826408960. Throughput: 0: 12128.7. Samples: 206697984. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:20,767][1648981] Avg episode reward: [(0, '430.140')] [2024-06-15 16:22:20,778][1651669] InferenceWorker_p0-w0: stopping experience collection (21200 times) [2024-06-15 16:22:21,105][1651274] Signal inference workers to resume experience collection... (21200 times) [2024-06-15 16:22:21,105][1651669] InferenceWorker_p0-w0: resuming experience collection (21200 times) [2024-06-15 16:22:21,836][1651669] Updated weights for policy 0, policy_version 403555 (0.0025) [2024-06-15 16:22:23,433][1651669] Updated weights for policy 0, policy_version 403636 (0.0013) [2024-06-15 16:22:25,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 826736640. Throughput: 0: 11889.8. Samples: 206726656. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:25,768][1648981] Avg episode reward: [(0, '445.090')] [2024-06-15 16:22:26,027][1651669] Updated weights for policy 0, policy_version 403696 (0.0012) [2024-06-15 16:22:28,277][1651669] Updated weights for policy 0, policy_version 403744 (0.0012) [2024-06-15 16:22:30,790][1648981] Fps is (10 sec: 52304.3, 60 sec: 47510.5, 300 sec: 47649.2). Total num frames: 826933248. Throughput: 0: 11974.5. Samples: 206793216. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:30,791][1648981] Avg episode reward: [(0, '438.700')] [2024-06-15 16:22:32,591][1651669] Updated weights for policy 0, policy_version 403784 (0.0013) [2024-06-15 16:22:34,053][1651669] Updated weights for policy 0, policy_version 403856 (0.0014) [2024-06-15 16:22:35,209][1651669] Updated weights for policy 0, policy_version 403900 (0.0013) [2024-06-15 16:22:35,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 827195392. Throughput: 0: 11878.4. Samples: 206862848. Policy #0 lag: (min: 15.0, avg: 92.5, max: 271.0) [2024-06-15 16:22:35,767][1648981] Avg episode reward: [(0, '442.800')] [2024-06-15 16:22:37,602][1651669] Updated weights for policy 0, policy_version 403967 (0.0013) [2024-06-15 16:22:40,137][1651669] Updated weights for policy 0, policy_version 404016 (0.0018) [2024-06-15 16:22:40,783][1648981] Fps is (10 sec: 52468.5, 60 sec: 48047.0, 300 sec: 47872.0). Total num frames: 827457536. Throughput: 0: 12124.3. Samples: 206903808. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:22:40,783][1648981] Avg episode reward: [(0, '429.790')] [2024-06-15 16:22:43,511][1651669] Updated weights for policy 0, policy_version 404068 (0.0012) [2024-06-15 16:22:45,022][1651669] Updated weights for policy 0, policy_version 404157 (0.0015) [2024-06-15 16:22:45,767][1648981] Fps is (10 sec: 52427.4, 60 sec: 48059.6, 300 sec: 47541.3). Total num frames: 827719680. Throughput: 0: 12060.4. Samples: 206979072. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:22:45,767][1648981] Avg episode reward: [(0, '423.860')] [2024-06-15 16:22:48,564][1651669] Updated weights for policy 0, policy_version 404224 (0.0012) [2024-06-15 16:22:50,691][1651669] Updated weights for policy 0, policy_version 404283 (0.0038) [2024-06-15 16:22:50,767][1648981] Fps is (10 sec: 49230.6, 60 sec: 49697.9, 300 sec: 47874.6). Total num frames: 827949056. Throughput: 0: 12151.4. Samples: 207049216. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:22:50,768][1648981] Avg episode reward: [(0, '416.090')] [2024-06-15 16:22:55,141][1651669] Updated weights for policy 0, policy_version 404356 (0.0011) [2024-06-15 16:22:55,782][1648981] Fps is (10 sec: 45804.1, 60 sec: 48047.1, 300 sec: 47538.8). Total num frames: 828178432. Throughput: 0: 12391.9. Samples: 207095296. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:22:55,783][1648981] Avg episode reward: [(0, '439.990')] [2024-06-15 16:22:56,237][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000404416_828243968.pth... [2024-06-15 16:22:56,254][1651669] Updated weights for policy 0, policy_version 404416 (0.0016) [2024-06-15 16:22:56,278][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000398784_816709632.pth [2024-06-15 16:22:58,651][1651669] Updated weights for policy 0, policy_version 404474 (0.0040) [2024-06-15 16:23:00,137][1651274] Signal inference workers to stop experience collection... (21250 times) [2024-06-15 16:23:00,201][1651669] InferenceWorker_p0-w0: stopping experience collection (21250 times) [2024-06-15 16:23:00,399][1651274] Signal inference workers to resume experience collection... (21250 times) [2024-06-15 16:23:00,400][1651669] InferenceWorker_p0-w0: resuming experience collection (21250 times) [2024-06-15 16:23:00,688][1651669] Updated weights for policy 0, policy_version 404528 (0.0015) [2024-06-15 16:23:00,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 51336.5, 300 sec: 47874.6). Total num frames: 828473344. Throughput: 0: 12106.0. Samples: 207166976. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:00,767][1648981] Avg episode reward: [(0, '428.220')] [2024-06-15 16:23:04,999][1651669] Updated weights for policy 0, policy_version 404581 (0.0011) [2024-06-15 16:23:05,773][1648981] Fps is (10 sec: 45918.5, 60 sec: 48054.7, 300 sec: 47540.3). Total num frames: 828637184. Throughput: 0: 12127.0. Samples: 207243776. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:05,773][1648981] Avg episode reward: [(0, '428.550')] [2024-06-15 16:23:06,274][1651669] Updated weights for policy 0, policy_version 404640 (0.0108) [2024-06-15 16:23:08,503][1651669] Updated weights for policy 0, policy_version 404706 (0.0016) [2024-06-15 16:23:10,143][1651669] Updated weights for policy 0, policy_version 404752 (0.0036) [2024-06-15 16:23:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50808.4, 300 sec: 47763.5). Total num frames: 828964864. Throughput: 0: 12333.6. Samples: 207281664. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:10,767][1648981] Avg episode reward: [(0, '431.470')] [2024-06-15 16:23:14,996][1651669] Updated weights for policy 0, policy_version 404818 (0.0012) [2024-06-15 16:23:15,766][1648981] Fps is (10 sec: 49183.6, 60 sec: 47513.7, 300 sec: 47874.6). Total num frames: 829128704. Throughput: 0: 12533.6. Samples: 207356928. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:15,767][1648981] Avg episode reward: [(0, '427.920')] [2024-06-15 16:23:16,197][1651669] Updated weights for policy 0, policy_version 404866 (0.0011) [2024-06-15 16:23:19,001][1651669] Updated weights for policy 0, policy_version 404933 (0.0012) [2024-06-15 16:23:20,126][1651669] Updated weights for policy 0, policy_version 404992 (0.0021) [2024-06-15 16:23:20,774][1648981] Fps is (10 sec: 45839.3, 60 sec: 50237.7, 300 sec: 47762.9). Total num frames: 829423616. Throughput: 0: 12422.4. Samples: 207421952. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:20,775][1648981] Avg episode reward: [(0, '432.740')] [2024-06-15 16:23:25,798][1648981] Fps is (10 sec: 42462.8, 60 sec: 46942.7, 300 sec: 47980.5). Total num frames: 829554688. Throughput: 0: 12295.1. Samples: 207457280. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:25,799][1648981] Avg episode reward: [(0, '433.600')] [2024-06-15 16:23:25,824][1651669] Updated weights for policy 0, policy_version 405058 (0.0016) [2024-06-15 16:23:26,998][1651669] Updated weights for policy 0, policy_version 405106 (0.0015) [2024-06-15 16:23:28,126][1651669] Updated weights for policy 0, policy_version 405155 (0.0011) [2024-06-15 16:23:30,563][1651669] Updated weights for policy 0, policy_version 405217 (0.0015) [2024-06-15 16:23:30,766][1648981] Fps is (10 sec: 49190.6, 60 sec: 49717.9, 300 sec: 47986.9). Total num frames: 829915136. Throughput: 0: 12276.7. Samples: 207531520. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:30,767][1648981] Avg episode reward: [(0, '446.810')] [2024-06-15 16:23:32,170][1651669] Updated weights for policy 0, policy_version 405264 (0.0010) [2024-06-15 16:23:33,291][1651669] Updated weights for policy 0, policy_version 405305 (0.0010) [2024-06-15 16:23:35,788][1648981] Fps is (10 sec: 52482.9, 60 sec: 48042.5, 300 sec: 47982.2). Total num frames: 830078976. Throughput: 0: 12361.8. Samples: 207605760. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:35,788][1648981] Avg episode reward: [(0, '447.260')] [2024-06-15 16:23:38,003][1651669] Updated weights for policy 0, policy_version 405376 (0.0013) [2024-06-15 16:23:39,142][1651669] Updated weights for policy 0, policy_version 405435 (0.0012) [2024-06-15 16:23:40,767][1648981] Fps is (10 sec: 42596.5, 60 sec: 48072.4, 300 sec: 47985.7). Total num frames: 830341120. Throughput: 0: 12041.8. Samples: 207636992. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:40,768][1648981] Avg episode reward: [(0, '452.500')] [2024-06-15 16:23:41,700][1651274] Signal inference workers to stop experience collection... (21300 times) [2024-06-15 16:23:41,781][1651669] InferenceWorker_p0-w0: stopping experience collection (21300 times) [2024-06-15 16:23:41,790][1651669] Updated weights for policy 0, policy_version 405476 (0.0012) [2024-06-15 16:23:41,936][1651274] Signal inference workers to resume experience collection... (21300 times) [2024-06-15 16:23:41,937][1651669] InferenceWorker_p0-w0: resuming experience collection (21300 times) [2024-06-15 16:23:43,755][1651669] Updated weights for policy 0, policy_version 405558 (0.0013) [2024-06-15 16:23:45,766][1648981] Fps is (10 sec: 52542.0, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 830603264. Throughput: 0: 12105.9. Samples: 207711744. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:45,767][1648981] Avg episode reward: [(0, '448.950')] [2024-06-15 16:23:48,417][1651669] Updated weights for policy 0, policy_version 405600 (0.0013) [2024-06-15 16:23:50,004][1651669] Updated weights for policy 0, policy_version 405669 (0.0011) [2024-06-15 16:23:50,786][1648981] Fps is (10 sec: 52328.0, 60 sec: 48590.1, 300 sec: 48209.1). Total num frames: 830865408. Throughput: 0: 11977.3. Samples: 207782912. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:50,787][1648981] Avg episode reward: [(0, '451.270')] [2024-06-15 16:23:51,949][1651669] Updated weights for policy 0, policy_version 405714 (0.0012) [2024-06-15 16:23:52,825][1651669] Updated weights for policy 0, policy_version 405755 (0.0008) [2024-06-15 16:23:54,444][1651669] Updated weights for policy 0, policy_version 405808 (0.0011) [2024-06-15 16:23:55,786][1648981] Fps is (10 sec: 52324.7, 60 sec: 49148.6, 300 sec: 47982.4). Total num frames: 831127552. Throughput: 0: 12066.5. Samples: 207824896. Policy #0 lag: (min: 47.0, avg: 173.2, max: 303.0) [2024-06-15 16:23:55,787][1648981] Avg episode reward: [(0, '462.130')] [2024-06-15 16:23:58,802][1651669] Updated weights for policy 0, policy_version 405859 (0.0012) [2024-06-15 16:24:00,295][1651669] Updated weights for policy 0, policy_version 405921 (0.0011) [2024-06-15 16:24:00,766][1648981] Fps is (10 sec: 52532.6, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 831389696. Throughput: 0: 12060.5. Samples: 207899648. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:00,767][1648981] Avg episode reward: [(0, '462.680')] [2024-06-15 16:24:03,270][1651669] Updated weights for policy 0, policy_version 405987 (0.0011) [2024-06-15 16:24:04,039][1651669] Updated weights for policy 0, policy_version 406019 (0.0012) [2024-06-15 16:24:05,556][1651669] Updated weights for policy 0, policy_version 406074 (0.0012) [2024-06-15 16:24:05,766][1648981] Fps is (10 sec: 52533.5, 60 sec: 50249.6, 300 sec: 47985.7). Total num frames: 831651840. Throughput: 0: 12028.4. Samples: 207963136. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:05,767][1648981] Avg episode reward: [(0, '452.900')] [2024-06-15 16:24:10,278][1651669] Updated weights for policy 0, policy_version 406134 (0.0023) [2024-06-15 16:24:10,767][1648981] Fps is (10 sec: 39320.6, 60 sec: 46967.3, 300 sec: 48319.3). Total num frames: 831782912. Throughput: 0: 12239.7. Samples: 208007680. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:10,767][1648981] Avg episode reward: [(0, '435.290')] [2024-06-15 16:24:11,497][1651669] Updated weights for policy 0, policy_version 406176 (0.0014) [2024-06-15 16:24:14,534][1651669] Updated weights for policy 0, policy_version 406240 (0.0028) [2024-06-15 16:24:15,790][1648981] Fps is (10 sec: 42497.4, 60 sec: 49132.5, 300 sec: 47981.8). Total num frames: 832077824. Throughput: 0: 12099.5. Samples: 208076288. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:15,791][1648981] Avg episode reward: [(0, '439.830')] [2024-06-15 16:24:16,767][1651669] Updated weights for policy 0, policy_version 406320 (0.0012) [2024-06-15 16:24:20,207][1651669] Updated weights for policy 0, policy_version 406342 (0.0011) [2024-06-15 16:24:20,767][1648981] Fps is (10 sec: 45875.1, 60 sec: 46973.5, 300 sec: 48207.8). Total num frames: 832241664. Throughput: 0: 11929.6. Samples: 208142336. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:20,767][1648981] Avg episode reward: [(0, '452.970')] [2024-06-15 16:24:22,403][1651669] Updated weights for policy 0, policy_version 406402 (0.0016) [2024-06-15 16:24:23,668][1651669] Updated weights for policy 0, policy_version 406464 (0.0010) [2024-06-15 16:24:25,739][1651274] Signal inference workers to stop experience collection... (21350 times) [2024-06-15 16:24:25,767][1648981] Fps is (10 sec: 36130.0, 60 sec: 48085.1, 300 sec: 47542.0). Total num frames: 832438272. Throughput: 0: 11878.4. Samples: 208171520. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:25,767][1648981] Avg episode reward: [(0, '444.760')] [2024-06-15 16:24:25,780][1651669] InferenceWorker_p0-w0: stopping experience collection (21350 times) [2024-06-15 16:24:25,927][1651274] Signal inference workers to resume experience collection... (21350 times) [2024-06-15 16:24:25,928][1651669] InferenceWorker_p0-w0: resuming experience collection (21350 times) [2024-06-15 16:24:27,437][1651669] Updated weights for policy 0, policy_version 406544 (0.0139) [2024-06-15 16:24:28,818][1651669] Updated weights for policy 0, policy_version 406592 (0.0011) [2024-06-15 16:24:30,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 832700416. Throughput: 0: 11707.7. Samples: 208238592. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:30,767][1648981] Avg episode reward: [(0, '459.880')] [2024-06-15 16:24:33,010][1651669] Updated weights for policy 0, policy_version 406652 (0.0012) [2024-06-15 16:24:34,708][1651669] Updated weights for policy 0, policy_version 406708 (0.0011) [2024-06-15 16:24:35,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48077.0, 300 sec: 47985.7). Total num frames: 832962560. Throughput: 0: 11701.5. Samples: 208309248. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:35,767][1648981] Avg episode reward: [(0, '470.000')] [2024-06-15 16:24:37,310][1651669] Updated weights for policy 0, policy_version 406736 (0.0011) [2024-06-15 16:24:38,723][1651669] Updated weights for policy 0, policy_version 406800 (0.0131) [2024-06-15 16:24:39,854][1651669] Updated weights for policy 0, policy_version 406841 (0.0010) [2024-06-15 16:24:40,767][1648981] Fps is (10 sec: 52426.4, 60 sec: 48059.7, 300 sec: 47986.2). Total num frames: 833224704. Throughput: 0: 11599.0. Samples: 208346624. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:40,767][1648981] Avg episode reward: [(0, '476.950')] [2024-06-15 16:24:43,488][1651669] Updated weights for policy 0, policy_version 406881 (0.0011) [2024-06-15 16:24:44,963][1651669] Updated weights for policy 0, policy_version 406946 (0.0013) [2024-06-15 16:24:45,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 48059.6, 300 sec: 48207.8). Total num frames: 833486848. Throughput: 0: 11639.4. Samples: 208423424. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:45,767][1648981] Avg episode reward: [(0, '467.320')] [2024-06-15 16:24:48,146][1651669] Updated weights for policy 0, policy_version 406999 (0.0013) [2024-06-15 16:24:49,747][1651669] Updated weights for policy 0, policy_version 407074 (0.0014) [2024-06-15 16:24:50,802][1648981] Fps is (10 sec: 52244.3, 60 sec: 48046.8, 300 sec: 48202.0). Total num frames: 833748992. Throughput: 0: 11789.4. Samples: 208494080. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:50,803][1648981] Avg episode reward: [(0, '454.140')] [2024-06-15 16:24:53,302][1651669] Updated weights for policy 0, policy_version 407143 (0.0011) [2024-06-15 16:24:54,449][1651669] Updated weights for policy 0, policy_version 407192 (0.0012) [2024-06-15 16:24:55,371][1651669] Updated weights for policy 0, policy_version 407227 (0.0031) [2024-06-15 16:24:55,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48075.7, 300 sec: 48430.0). Total num frames: 834011136. Throughput: 0: 11889.8. Samples: 208542720. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:24:55,767][1648981] Avg episode reward: [(0, '429.190')] [2024-06-15 16:24:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000407232_834011136.pth... [2024-06-15 16:24:55,826][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000401536_822345728.pth [2024-06-15 16:24:58,945][1651669] Updated weights for policy 0, policy_version 407280 (0.0012) [2024-06-15 16:25:00,109][1651669] Updated weights for policy 0, policy_version 407333 (0.0015) [2024-06-15 16:25:00,766][1648981] Fps is (10 sec: 52617.0, 60 sec: 48059.6, 300 sec: 48319.2). Total num frames: 834273280. Throughput: 0: 11896.1. Samples: 208611328. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:25:00,767][1648981] Avg episode reward: [(0, '409.010')] [2024-06-15 16:25:03,772][1651274] Signal inference workers to stop experience collection... (21400 times) [2024-06-15 16:25:03,820][1651669] InferenceWorker_p0-w0: stopping experience collection (21400 times) [2024-06-15 16:25:04,042][1651274] Signal inference workers to resume experience collection... (21400 times) [2024-06-15 16:25:04,054][1651669] InferenceWorker_p0-w0: resuming experience collection (21400 times) [2024-06-15 16:25:04,056][1651669] Updated weights for policy 0, policy_version 407408 (0.0011) [2024-06-15 16:25:05,484][1651669] Updated weights for policy 0, policy_version 407457 (0.0012) [2024-06-15 16:25:05,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 46967.6, 300 sec: 48318.9). Total num frames: 834469888. Throughput: 0: 12094.7. Samples: 208686592. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:25:05,767][1648981] Avg episode reward: [(0, '416.700')] [2024-06-15 16:25:09,227][1651669] Updated weights for policy 0, policy_version 407520 (0.0013) [2024-06-15 16:25:10,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48606.0, 300 sec: 48098.0). Total num frames: 834699264. Throughput: 0: 12310.8. Samples: 208725504. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:25:10,767][1648981] Avg episode reward: [(0, '429.450')] [2024-06-15 16:25:11,455][1651669] Updated weights for policy 0, policy_version 407602 (0.0154) [2024-06-15 16:25:14,521][1651669] Updated weights for policy 0, policy_version 407632 (0.0012) [2024-06-15 16:25:15,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 47532.5, 300 sec: 48430.0). Total num frames: 834928640. Throughput: 0: 12356.3. Samples: 208794624. Policy #0 lag: (min: 2.0, avg: 80.8, max: 258.0) [2024-06-15 16:25:15,767][1648981] Avg episode reward: [(0, '431.040')] [2024-06-15 16:25:16,140][1651669] Updated weights for policy 0, policy_version 407682 (0.0012) [2024-06-15 16:25:17,422][1651669] Updated weights for policy 0, policy_version 407732 (0.0012) [2024-06-15 16:25:20,132][1651669] Updated weights for policy 0, policy_version 407776 (0.0013) [2024-06-15 16:25:20,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48606.0, 300 sec: 48097.7). Total num frames: 835158016. Throughput: 0: 12515.6. Samples: 208872448. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:20,767][1648981] Avg episode reward: [(0, '424.220')] [2024-06-15 16:25:22,142][1651669] Updated weights for policy 0, policy_version 407864 (0.0116) [2024-06-15 16:25:25,400][1651669] Updated weights for policy 0, policy_version 407923 (0.0012) [2024-06-15 16:25:25,775][1648981] Fps is (10 sec: 52381.5, 60 sec: 50236.9, 300 sec: 48428.6). Total num frames: 835452928. Throughput: 0: 12388.1. Samples: 208904192. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:25,776][1648981] Avg episode reward: [(0, '417.670')] [2024-06-15 16:25:27,219][1651669] Updated weights for policy 0, policy_version 407984 (0.0020) [2024-06-15 16:25:30,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 48605.7, 300 sec: 48207.8). Total num frames: 835616768. Throughput: 0: 12401.8. Samples: 208981504. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:30,768][1648981] Avg episode reward: [(0, '424.930')] [2024-06-15 16:25:31,211][1651669] Updated weights for policy 0, policy_version 408032 (0.0011) [2024-06-15 16:25:32,696][1651669] Updated weights for policy 0, policy_version 408084 (0.0013) [2024-06-15 16:25:35,766][1648981] Fps is (10 sec: 39357.0, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 835846144. Throughput: 0: 12400.3. Samples: 209051648. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:35,767][1648981] Avg episode reward: [(0, '408.380')] [2024-06-15 16:25:36,291][1651669] Updated weights for policy 0, policy_version 408146 (0.0015) [2024-06-15 16:25:37,964][1651669] Updated weights for policy 0, policy_version 408209 (0.0012) [2024-06-15 16:25:40,767][1648981] Fps is (10 sec: 49152.2, 60 sec: 48060.0, 300 sec: 48430.0). Total num frames: 836108288. Throughput: 0: 11912.5. Samples: 209078784. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:40,767][1648981] Avg episode reward: [(0, '432.120')] [2024-06-15 16:25:41,341][1651669] Updated weights for policy 0, policy_version 408262 (0.0037) [2024-06-15 16:25:43,234][1651669] Updated weights for policy 0, policy_version 408336 (0.0038) [2024-06-15 16:25:43,805][1651274] Signal inference workers to stop experience collection... (21450 times) [2024-06-15 16:25:43,826][1651669] InferenceWorker_p0-w0: stopping experience collection (21450 times) [2024-06-15 16:25:44,081][1651274] Signal inference workers to resume experience collection... (21450 times) [2024-06-15 16:25:44,082][1651669] InferenceWorker_p0-w0: resuming experience collection (21450 times) [2024-06-15 16:25:45,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 836370432. Throughput: 0: 11946.6. Samples: 209148928. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:45,768][1648981] Avg episode reward: [(0, '449.980')] [2024-06-15 16:25:47,328][1651669] Updated weights for policy 0, policy_version 408402 (0.0015) [2024-06-15 16:25:48,681][1651669] Updated weights for policy 0, policy_version 408464 (0.0011) [2024-06-15 16:25:49,646][1651669] Updated weights for policy 0, policy_version 408512 (0.0013) [2024-06-15 16:25:50,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48088.5, 300 sec: 48652.2). Total num frames: 836632576. Throughput: 0: 12060.4. Samples: 209229312. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:50,767][1648981] Avg episode reward: [(0, '452.490')] [2024-06-15 16:25:52,740][1651669] Updated weights for policy 0, policy_version 408574 (0.0011) [2024-06-15 16:25:54,538][1651669] Updated weights for policy 0, policy_version 408637 (0.0012) [2024-06-15 16:25:55,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 836894720. Throughput: 0: 12071.8. Samples: 209268736. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:25:55,767][1648981] Avg episode reward: [(0, '448.860')] [2024-06-15 16:25:58,451][1651669] Updated weights for policy 0, policy_version 408695 (0.0012) [2024-06-15 16:26:00,447][1651669] Updated weights for policy 0, policy_version 408756 (0.0013) [2024-06-15 16:26:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 837156864. Throughput: 0: 12128.7. Samples: 209340416. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:00,767][1648981] Avg episode reward: [(0, '455.020')] [2024-06-15 16:26:02,963][1651669] Updated weights for policy 0, policy_version 408800 (0.0014) [2024-06-15 16:26:04,968][1651669] Updated weights for policy 0, policy_version 408835 (0.0013) [2024-06-15 16:26:05,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48059.7, 300 sec: 48207.9). Total num frames: 837353472. Throughput: 0: 11901.2. Samples: 209408000. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:05,767][1648981] Avg episode reward: [(0, '460.810')] [2024-06-15 16:26:08,237][1651669] Updated weights for policy 0, policy_version 408899 (0.0013) [2024-06-15 16:26:10,659][1651669] Updated weights for policy 0, policy_version 408976 (0.0012) [2024-06-15 16:26:10,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 48059.7, 300 sec: 48542.9). Total num frames: 837582848. Throughput: 0: 12028.7. Samples: 209445376. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:10,767][1648981] Avg episode reward: [(0, '453.180')] [2024-06-15 16:26:13,603][1651669] Updated weights for policy 0, policy_version 409026 (0.0012) [2024-06-15 16:26:14,547][1651669] Updated weights for policy 0, policy_version 409082 (0.0011) [2024-06-15 16:26:15,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 837812224. Throughput: 0: 11946.7. Samples: 209519104. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:15,767][1648981] Avg episode reward: [(0, '458.390')] [2024-06-15 16:26:16,941][1651669] Updated weights for policy 0, policy_version 409136 (0.0014) [2024-06-15 16:26:19,447][1651669] Updated weights for policy 0, policy_version 409170 (0.0011) [2024-06-15 16:26:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 838074368. Throughput: 0: 11889.8. Samples: 209586688. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:20,767][1648981] Avg episode reward: [(0, '454.910')] [2024-06-15 16:26:21,900][1651669] Updated weights for policy 0, policy_version 409248 (0.0012) [2024-06-15 16:26:22,464][1651669] Updated weights for policy 0, policy_version 409280 (0.0010) [2024-06-15 16:26:25,536][1651669] Updated weights for policy 0, policy_version 409344 (0.0088) [2024-06-15 16:26:25,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 48066.8, 300 sec: 48322.1). Total num frames: 838336512. Throughput: 0: 12219.7. Samples: 209628672. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:25,767][1648981] Avg episode reward: [(0, '489.420')] [2024-06-15 16:26:27,009][1651669] Updated weights for policy 0, policy_version 409400 (0.0013) [2024-06-15 16:26:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 838500352. Throughput: 0: 12276.7. Samples: 209701376. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:30,767][1648981] Avg episode reward: [(0, '465.030')] [2024-06-15 16:26:30,834][1651274] Signal inference workers to stop experience collection... (21500 times) [2024-06-15 16:26:30,894][1651669] InferenceWorker_p0-w0: stopping experience collection (21500 times) [2024-06-15 16:26:31,138][1651274] Signal inference workers to resume experience collection... (21500 times) [2024-06-15 16:26:31,139][1651669] InferenceWorker_p0-w0: resuming experience collection (21500 times) [2024-06-15 16:26:31,141][1651669] Updated weights for policy 0, policy_version 409440 (0.0020) [2024-06-15 16:26:33,184][1651669] Updated weights for policy 0, policy_version 409532 (0.0179) [2024-06-15 16:26:35,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 838729728. Throughput: 0: 11992.2. Samples: 209768960. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:35,767][1648981] Avg episode reward: [(0, '457.860')] [2024-06-15 16:26:36,681][1651669] Updated weights for policy 0, policy_version 409594 (0.0021) [2024-06-15 16:26:38,275][1651669] Updated weights for policy 0, policy_version 409648 (0.0046) [2024-06-15 16:26:40,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 838991872. Throughput: 0: 11810.1. Samples: 209800192. Policy #0 lag: (min: 15.0, avg: 110.9, max: 271.0) [2024-06-15 16:26:40,767][1648981] Avg episode reward: [(0, '431.870')] [2024-06-15 16:26:42,071][1651669] Updated weights for policy 0, policy_version 409684 (0.0017) [2024-06-15 16:26:43,041][1651669] Updated weights for policy 0, policy_version 409728 (0.0012) [2024-06-15 16:26:45,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48060.0, 300 sec: 48430.0). Total num frames: 839254016. Throughput: 0: 11719.1. Samples: 209867776. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:26:45,767][1648981] Avg episode reward: [(0, '411.700')] [2024-06-15 16:26:46,919][1651669] Updated weights for policy 0, policy_version 409808 (0.0151) [2024-06-15 16:26:48,519][1651669] Updated weights for policy 0, policy_version 409872 (0.0115) [2024-06-15 16:26:49,400][1651669] Updated weights for policy 0, policy_version 409915 (0.0013) [2024-06-15 16:26:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 839516160. Throughput: 0: 12094.5. Samples: 209952256. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:26:50,767][1648981] Avg episode reward: [(0, '394.610')] [2024-06-15 16:26:53,186][1651669] Updated weights for policy 0, policy_version 409982 (0.0111) [2024-06-15 16:26:55,051][1651669] Updated weights for policy 0, policy_version 410045 (0.0013) [2024-06-15 16:26:55,790][1648981] Fps is (10 sec: 52303.2, 60 sec: 48040.6, 300 sec: 48759.3). Total num frames: 839778304. Throughput: 0: 11974.4. Samples: 209984512. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:26:55,791][1648981] Avg episode reward: [(0, '407.250')] [2024-06-15 16:26:55,800][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000410048_839778304.pth... [2024-06-15 16:26:55,855][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000404416_828243968.pth [2024-06-15 16:26:58,768][1651669] Updated weights for policy 0, policy_version 410106 (0.0011) [2024-06-15 16:27:00,617][1651669] Updated weights for policy 0, policy_version 410160 (0.0013) [2024-06-15 16:27:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.5, 300 sec: 48318.9). Total num frames: 840007680. Throughput: 0: 11958.0. Samples: 210057216. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:00,767][1648981] Avg episode reward: [(0, '410.100')] [2024-06-15 16:27:03,387][1651669] Updated weights for policy 0, policy_version 410197 (0.0060) [2024-06-15 16:27:05,573][1651669] Updated weights for policy 0, policy_version 410288 (0.0011) [2024-06-15 16:27:05,766][1648981] Fps is (10 sec: 49269.9, 60 sec: 48605.8, 300 sec: 48655.7). Total num frames: 840269824. Throughput: 0: 11969.4. Samples: 210125312. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:05,767][1648981] Avg episode reward: [(0, '417.570')] [2024-06-15 16:27:08,082][1651669] Updated weights for policy 0, policy_version 410320 (0.0014) [2024-06-15 16:27:09,109][1651669] Updated weights for policy 0, policy_version 410364 (0.0011) [2024-06-15 16:27:10,786][1648981] Fps is (10 sec: 49056.6, 60 sec: 48590.1, 300 sec: 48204.7). Total num frames: 840499200. Throughput: 0: 12032.5. Samples: 210170368. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:10,786][1648981] Avg episode reward: [(0, '413.400')] [2024-06-15 16:27:11,142][1651669] Updated weights for policy 0, policy_version 410430 (0.0024) [2024-06-15 16:27:13,703][1651274] Signal inference workers to stop experience collection... (21550 times) [2024-06-15 16:27:13,769][1651669] InferenceWorker_p0-w0: stopping experience collection (21550 times) [2024-06-15 16:27:13,917][1651274] Signal inference workers to resume experience collection... (21550 times) [2024-06-15 16:27:13,926][1651669] InferenceWorker_p0-w0: resuming experience collection (21550 times) [2024-06-15 16:27:14,325][1651669] Updated weights for policy 0, policy_version 410480 (0.0015) [2024-06-15 16:27:15,770][1648981] Fps is (10 sec: 49133.5, 60 sec: 49148.9, 300 sec: 48651.5). Total num frames: 840761344. Throughput: 0: 12025.3. Samples: 210242560. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:15,771][1648981] Avg episode reward: [(0, '412.390')] [2024-06-15 16:27:15,923][1651669] Updated weights for policy 0, policy_version 410533 (0.0014) [2024-06-15 16:27:18,622][1651669] Updated weights for policy 0, policy_version 410576 (0.0102) [2024-06-15 16:27:20,766][1648981] Fps is (10 sec: 45965.2, 60 sec: 48059.8, 300 sec: 48207.9). Total num frames: 840957952. Throughput: 0: 12242.5. Samples: 210319872. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:20,767][1648981] Avg episode reward: [(0, '406.160')] [2024-06-15 16:27:21,039][1651669] Updated weights for policy 0, policy_version 410643 (0.0014) [2024-06-15 16:27:22,136][1651669] Updated weights for policy 0, policy_version 410688 (0.0011) [2024-06-15 16:27:25,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 48059.8, 300 sec: 48433.9). Total num frames: 841220096. Throughput: 0: 12356.3. Samples: 210356224. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:25,767][1648981] Avg episode reward: [(0, '401.650')] [2024-06-15 16:27:26,208][1651669] Updated weights for policy 0, policy_version 410768 (0.0014) [2024-06-15 16:27:29,350][1651669] Updated weights for policy 0, policy_version 410819 (0.0012) [2024-06-15 16:27:30,687][1651669] Updated weights for policy 0, policy_version 410871 (0.0009) [2024-06-15 16:27:30,769][1648981] Fps is (10 sec: 49138.2, 60 sec: 49149.8, 300 sec: 48318.5). Total num frames: 841449472. Throughput: 0: 12435.1. Samples: 210427392. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:30,770][1648981] Avg episode reward: [(0, '389.170')] [2024-06-15 16:27:32,093][1651669] Updated weights for policy 0, policy_version 410916 (0.0011) [2024-06-15 16:27:35,319][1651669] Updated weights for policy 0, policy_version 410981 (0.0014) [2024-06-15 16:27:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 48432.7). Total num frames: 841744384. Throughput: 0: 12333.5. Samples: 210507264. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:35,767][1648981] Avg episode reward: [(0, '397.810')] [2024-06-15 16:27:36,510][1651669] Updated weights for policy 0, policy_version 411027 (0.0029) [2024-06-15 16:27:39,513][1651669] Updated weights for policy 0, policy_version 411091 (0.0108) [2024-06-15 16:27:40,212][1651669] Updated weights for policy 0, policy_version 411132 (0.0012) [2024-06-15 16:27:40,766][1648981] Fps is (10 sec: 55721.1, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 842006528. Throughput: 0: 12397.0. Samples: 210542080. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:40,767][1648981] Avg episode reward: [(0, '405.520')] [2024-06-15 16:27:42,257][1651669] Updated weights for policy 0, policy_version 411191 (0.0010) [2024-06-15 16:27:45,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 49151.7, 300 sec: 48318.9). Total num frames: 842203136. Throughput: 0: 12652.0. Samples: 210626560. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:45,767][1648981] Avg episode reward: [(0, '441.610')] [2024-06-15 16:27:46,415][1651669] Updated weights for policy 0, policy_version 411257 (0.0010) [2024-06-15 16:27:47,804][1651669] Updated weights for policy 0, policy_version 411326 (0.0009) [2024-06-15 16:27:49,410][1651669] Updated weights for policy 0, policy_version 411381 (0.0010) [2024-06-15 16:27:50,770][1648981] Fps is (10 sec: 52408.5, 60 sec: 50241.1, 300 sec: 48654.1). Total num frames: 842530816. Throughput: 0: 12651.0. Samples: 210694656. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:50,771][1648981] Avg episode reward: [(0, '446.870')] [2024-06-15 16:27:53,269][1651669] Updated weights for policy 0, policy_version 411440 (0.0156) [2024-06-15 16:27:55,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 48078.9, 300 sec: 48096.8). Total num frames: 842661888. Throughput: 0: 12441.3. Samples: 210729984. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:27:55,767][1648981] Avg episode reward: [(0, '438.530')] [2024-06-15 16:27:56,823][1651669] Updated weights for policy 0, policy_version 411504 (0.0074) [2024-06-15 16:27:57,416][1651274] Signal inference workers to stop experience collection... (21600 times) [2024-06-15 16:27:57,484][1651669] InferenceWorker_p0-w0: stopping experience collection (21600 times) [2024-06-15 16:27:57,705][1651274] Signal inference workers to resume experience collection... (21600 times) [2024-06-15 16:27:57,706][1651669] InferenceWorker_p0-w0: resuming experience collection (21600 times) [2024-06-15 16:27:58,110][1651669] Updated weights for policy 0, policy_version 411552 (0.0013) [2024-06-15 16:28:00,202][1651669] Updated weights for policy 0, policy_version 411623 (0.0013) [2024-06-15 16:28:00,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 50790.4, 300 sec: 48875.4). Total num frames: 843055104. Throughput: 0: 12505.2. Samples: 210805248. Policy #0 lag: (min: 4.0, avg: 94.5, max: 260.0) [2024-06-15 16:28:00,767][1648981] Avg episode reward: [(0, '402.660')] [2024-06-15 16:28:02,950][1651669] Updated weights for policy 0, policy_version 411666 (0.0012) [2024-06-15 16:28:05,773][1648981] Fps is (10 sec: 52396.6, 60 sec: 48600.9, 300 sec: 48206.8). Total num frames: 843186176. Throughput: 0: 12365.9. Samples: 210876416. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:05,773][1648981] Avg episode reward: [(0, '385.190')] [2024-06-15 16:28:07,467][1651669] Updated weights for policy 0, policy_version 411745 (0.0013) [2024-06-15 16:28:09,348][1651669] Updated weights for policy 0, policy_version 411793 (0.0013) [2024-06-15 16:28:10,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 49168.0, 300 sec: 48541.1). Total num frames: 843448320. Throughput: 0: 12401.8. Samples: 210914304. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:10,767][1648981] Avg episode reward: [(0, '398.430')] [2024-06-15 16:28:10,996][1651669] Updated weights for policy 0, policy_version 411856 (0.0013) [2024-06-15 16:28:14,318][1651669] Updated weights for policy 0, policy_version 411936 (0.0080) [2024-06-15 16:28:15,766][1648981] Fps is (10 sec: 52461.4, 60 sec: 49155.1, 300 sec: 48431.3). Total num frames: 843710464. Throughput: 0: 12231.9. Samples: 210977792. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:15,767][1648981] Avg episode reward: [(0, '405.050')] [2024-06-15 16:28:18,315][1651669] Updated weights for policy 0, policy_version 411984 (0.0018) [2024-06-15 16:28:19,494][1651669] Updated weights for policy 0, policy_version 412032 (0.0013) [2024-06-15 16:28:20,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48605.8, 300 sec: 48546.3). Total num frames: 843874304. Throughput: 0: 12162.8. Samples: 211054592. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:20,767][1648981] Avg episode reward: [(0, '408.040')] [2024-06-15 16:28:21,957][1651669] Updated weights for policy 0, policy_version 412097 (0.0012) [2024-06-15 16:28:23,030][1651669] Updated weights for policy 0, policy_version 412155 (0.0012) [2024-06-15 16:28:25,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 844169216. Throughput: 0: 12037.7. Samples: 211083776. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:25,767][1648981] Avg episode reward: [(0, '404.180')] [2024-06-15 16:28:26,090][1651669] Updated weights for policy 0, policy_version 412222 (0.0024) [2024-06-15 16:28:30,732][1651669] Updated weights for policy 0, policy_version 412282 (0.0012) [2024-06-15 16:28:30,779][1648981] Fps is (10 sec: 45819.8, 60 sec: 48052.2, 300 sec: 48320.5). Total num frames: 844333056. Throughput: 0: 11886.7. Samples: 211161600. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:30,779][1648981] Avg episode reward: [(0, '409.520')] [2024-06-15 16:28:32,755][1651669] Updated weights for policy 0, policy_version 412339 (0.0011) [2024-06-15 16:28:34,187][1651669] Updated weights for policy 0, policy_version 412412 (0.0012) [2024-06-15 16:28:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48430.1). Total num frames: 844627968. Throughput: 0: 11811.1. Samples: 211226112. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:35,767][1648981] Avg episode reward: [(0, '423.250')] [2024-06-15 16:28:36,786][1651669] Updated weights for policy 0, policy_version 412472 (0.0041) [2024-06-15 16:28:40,766][1648981] Fps is (10 sec: 42650.1, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 844759040. Throughput: 0: 11810.1. Samples: 211261440. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:40,767][1648981] Avg episode reward: [(0, '436.780')] [2024-06-15 16:28:41,570][1651274] Signal inference workers to stop experience collection... (21650 times) [2024-06-15 16:28:41,614][1651669] InferenceWorker_p0-w0: stopping experience collection (21650 times) [2024-06-15 16:28:41,854][1651274] Signal inference workers to resume experience collection... (21650 times) [2024-06-15 16:28:41,882][1651669] InferenceWorker_p0-w0: resuming experience collection (21650 times) [2024-06-15 16:28:41,884][1651669] Updated weights for policy 0, policy_version 412528 (0.0011) [2024-06-15 16:28:43,508][1651669] Updated weights for policy 0, policy_version 412580 (0.0011) [2024-06-15 16:28:44,772][1651669] Updated weights for policy 0, policy_version 412640 (0.0012) [2024-06-15 16:28:45,519][1651669] Updated weights for policy 0, policy_version 412670 (0.0014) [2024-06-15 16:28:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.2, 300 sec: 48433.2). Total num frames: 845152256. Throughput: 0: 11696.4. Samples: 211331584. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:45,767][1648981] Avg episode reward: [(0, '410.350')] [2024-06-15 16:28:47,701][1651669] Updated weights for policy 0, policy_version 412720 (0.0012) [2024-06-15 16:28:50,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 45878.1, 300 sec: 47988.9). Total num frames: 845283328. Throughput: 0: 11811.7. Samples: 211407872. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:50,767][1648981] Avg episode reward: [(0, '394.440')] [2024-06-15 16:28:53,167][1651669] Updated weights for policy 0, policy_version 412791 (0.0013) [2024-06-15 16:28:54,287][1651669] Updated weights for policy 0, policy_version 412834 (0.0036) [2024-06-15 16:28:55,767][1648981] Fps is (10 sec: 45872.4, 60 sec: 49151.5, 300 sec: 48207.7). Total num frames: 845611008. Throughput: 0: 11775.8. Samples: 211444224. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:28:55,768][1648981] Avg episode reward: [(0, '411.300')] [2024-06-15 16:28:55,837][1651669] Updated weights for policy 0, policy_version 412897 (0.0094) [2024-06-15 16:28:56,139][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000412912_845643776.pth... [2024-06-15 16:28:56,197][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000407232_834011136.pth [2024-06-15 16:28:58,020][1651669] Updated weights for policy 0, policy_version 412944 (0.0011) [2024-06-15 16:28:58,971][1651669] Updated weights for policy 0, policy_version 412992 (0.0015) [2024-06-15 16:29:00,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 45875.3, 300 sec: 47985.7). Total num frames: 845807616. Throughput: 0: 11844.3. Samples: 211510784. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:29:00,767][1648981] Avg episode reward: [(0, '417.510')] [2024-06-15 16:29:04,104][1651669] Updated weights for policy 0, policy_version 413043 (0.0013) [2024-06-15 16:29:05,574][1651669] Updated weights for policy 0, policy_version 413104 (0.0021) [2024-06-15 16:29:05,766][1648981] Fps is (10 sec: 42601.3, 60 sec: 47518.5, 300 sec: 48318.9). Total num frames: 846036992. Throughput: 0: 11832.9. Samples: 211587072. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:29:05,767][1648981] Avg episode reward: [(0, '412.030')] [2024-06-15 16:29:07,473][1651669] Updated weights for policy 0, policy_version 413173 (0.0014) [2024-06-15 16:29:09,039][1651669] Updated weights for policy 0, policy_version 413216 (0.0011) [2024-06-15 16:29:09,776][1651669] Updated weights for policy 0, policy_version 413245 (0.0011) [2024-06-15 16:29:10,768][1648981] Fps is (10 sec: 52427.3, 60 sec: 48059.6, 300 sec: 48322.8). Total num frames: 846331904. Throughput: 0: 11889.7. Samples: 211618816. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:29:10,769][1648981] Avg episode reward: [(0, '415.100')] [2024-06-15 16:29:14,683][1651669] Updated weights for policy 0, policy_version 413283 (0.0041) [2024-06-15 16:29:15,343][1651669] Updated weights for policy 0, policy_version 413312 (0.0012) [2024-06-15 16:29:15,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 48318.9). Total num frames: 846495744. Throughput: 0: 11881.6. Samples: 211696128. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:29:15,768][1648981] Avg episode reward: [(0, '434.290')] [2024-06-15 16:29:16,450][1651669] Updated weights for policy 0, policy_version 413362 (0.0011) [2024-06-15 16:29:17,846][1651669] Updated weights for policy 0, policy_version 413430 (0.0012) [2024-06-15 16:29:19,623][1651274] Signal inference workers to stop experience collection... (21700 times) [2024-06-15 16:29:19,664][1651669] InferenceWorker_p0-w0: stopping experience collection (21700 times) [2024-06-15 16:29:19,866][1651274] Signal inference workers to resume experience collection... (21700 times) [2024-06-15 16:29:19,867][1651669] InferenceWorker_p0-w0: resuming experience collection (21700 times) [2024-06-15 16:29:20,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 846790656. Throughput: 0: 11923.9. Samples: 211762688. Policy #0 lag: (min: 15.0, avg: 136.6, max: 271.0) [2024-06-15 16:29:20,767][1648981] Avg episode reward: [(0, '452.790')] [2024-06-15 16:29:20,878][1651669] Updated weights for policy 0, policy_version 413488 (0.0113) [2024-06-15 16:29:24,356][1651669] Updated weights for policy 0, policy_version 413520 (0.0015) [2024-06-15 16:29:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 846987264. Throughput: 0: 12071.8. Samples: 211804672. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:25,767][1648981] Avg episode reward: [(0, '461.930')] [2024-06-15 16:29:27,272][1651669] Updated weights for policy 0, policy_version 413607 (0.0014) [2024-06-15 16:29:28,436][1651669] Updated weights for policy 0, policy_version 413669 (0.0016) [2024-06-15 16:29:30,770][1648981] Fps is (10 sec: 49132.5, 60 sec: 49158.7, 300 sec: 48540.4). Total num frames: 847282176. Throughput: 0: 12059.4. Samples: 211874304. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:30,771][1648981] Avg episode reward: [(0, '448.730')] [2024-06-15 16:29:30,873][1651669] Updated weights for policy 0, policy_version 413714 (0.0027) [2024-06-15 16:29:31,889][1651669] Updated weights for policy 0, policy_version 413759 (0.0012) [2024-06-15 16:29:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 48096.8). Total num frames: 847413248. Throughput: 0: 12071.8. Samples: 211951104. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:35,767][1648981] Avg episode reward: [(0, '447.650')] [2024-06-15 16:29:36,291][1651669] Updated weights for policy 0, policy_version 413811 (0.0011) [2024-06-15 16:29:37,351][1651669] Updated weights for policy 0, policy_version 413840 (0.0083) [2024-06-15 16:29:38,313][1651669] Updated weights for policy 0, policy_version 413888 (0.0013) [2024-06-15 16:29:39,901][1651669] Updated weights for policy 0, policy_version 413951 (0.0012) [2024-06-15 16:29:40,772][1648981] Fps is (10 sec: 49144.6, 60 sec: 50239.6, 300 sec: 48429.1). Total num frames: 847773696. Throughput: 0: 12127.4. Samples: 211990016. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:40,772][1648981] Avg episode reward: [(0, '447.870')] [2024-06-15 16:29:42,458][1651669] Updated weights for policy 0, policy_version 414000 (0.0015) [2024-06-15 16:29:45,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 47991.5). Total num frames: 847904768. Throughput: 0: 12094.6. Samples: 212055040. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:45,767][1648981] Avg episode reward: [(0, '474.990')] [2024-06-15 16:29:46,703][1651669] Updated weights for policy 0, policy_version 414048 (0.0146) [2024-06-15 16:29:47,328][1651669] Updated weights for policy 0, policy_version 414080 (0.0013) [2024-06-15 16:29:49,752][1651669] Updated weights for policy 0, policy_version 414176 (0.0012) [2024-06-15 16:29:50,341][1651669] Updated weights for policy 0, policy_version 414202 (0.0011) [2024-06-15 16:29:50,766][1648981] Fps is (10 sec: 52457.3, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 848297984. Throughput: 0: 12083.2. Samples: 212130816. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:50,767][1648981] Avg episode reward: [(0, '455.480')] [2024-06-15 16:29:52,027][1651669] Updated weights for policy 0, policy_version 414229 (0.0012) [2024-06-15 16:29:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46967.9, 300 sec: 47985.7). Total num frames: 848429056. Throughput: 0: 12276.7. Samples: 212171264. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:29:55,767][1648981] Avg episode reward: [(0, '465.140')] [2024-06-15 16:29:56,853][1651669] Updated weights for policy 0, policy_version 414304 (0.0089) [2024-06-15 16:29:59,109][1651669] Updated weights for policy 0, policy_version 414355 (0.0011) [2024-06-15 16:30:00,588][1651274] Signal inference workers to stop experience collection... (21750 times) [2024-06-15 16:30:00,624][1651669] InferenceWorker_p0-w0: stopping experience collection (21750 times) [2024-06-15 16:30:00,769][1648981] Fps is (10 sec: 42586.2, 60 sec: 48603.5, 300 sec: 48318.4). Total num frames: 848723968. Throughput: 0: 12150.7. Samples: 212242944. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:00,770][1648981] Avg episode reward: [(0, '490.100')] [2024-06-15 16:30:00,897][1651274] Signal inference workers to resume experience collection... (21750 times) [2024-06-15 16:30:00,899][1651669] InferenceWorker_p0-w0: resuming experience collection (21750 times) [2024-06-15 16:30:01,024][1651669] Updated weights for policy 0, policy_version 414433 (0.0126) [2024-06-15 16:30:03,402][1651669] Updated weights for policy 0, policy_version 414497 (0.0010) [2024-06-15 16:30:05,774][1648981] Fps is (10 sec: 52388.3, 60 sec: 48599.5, 300 sec: 48317.6). Total num frames: 848953344. Throughput: 0: 12217.6. Samples: 212312576. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:05,775][1648981] Avg episode reward: [(0, '487.100')] [2024-06-15 16:30:07,335][1651669] Updated weights for policy 0, policy_version 414530 (0.0010) [2024-06-15 16:30:08,785][1651669] Updated weights for policy 0, policy_version 414589 (0.0011) [2024-06-15 16:30:10,213][1651669] Updated weights for policy 0, policy_version 414627 (0.0012) [2024-06-15 16:30:10,766][1648981] Fps is (10 sec: 49166.7, 60 sec: 48060.0, 300 sec: 48430.0). Total num frames: 849215488. Throughput: 0: 12049.1. Samples: 212346880. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:10,767][1648981] Avg episode reward: [(0, '469.100')] [2024-06-15 16:30:11,553][1651669] Updated weights for policy 0, policy_version 414691 (0.0102) [2024-06-15 16:30:12,093][1651669] Updated weights for policy 0, policy_version 414720 (0.0013) [2024-06-15 16:30:14,515][1651669] Updated weights for policy 0, policy_version 414774 (0.0010) [2024-06-15 16:30:15,774][1648981] Fps is (10 sec: 52429.0, 60 sec: 49691.8, 300 sec: 48539.8). Total num frames: 849477632. Throughput: 0: 12241.5. Samples: 212425216. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:15,775][1648981] Avg episode reward: [(0, '488.370')] [2024-06-15 16:30:17,807][1651669] Updated weights for policy 0, policy_version 414809 (0.0010) [2024-06-15 16:30:19,147][1651669] Updated weights for policy 0, policy_version 414851 (0.0014) [2024-06-15 16:30:20,600][1651669] Updated weights for policy 0, policy_version 414912 (0.0013) [2024-06-15 16:30:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49152.1, 300 sec: 48431.5). Total num frames: 849739776. Throughput: 0: 12174.2. Samples: 212498944. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:20,767][1648981] Avg episode reward: [(0, '479.420')] [2024-06-15 16:30:22,546][1651669] Updated weights for policy 0, policy_version 414974 (0.0012) [2024-06-15 16:30:25,660][1651669] Updated weights for policy 0, policy_version 415029 (0.0017) [2024-06-15 16:30:25,768][1648981] Fps is (10 sec: 49180.2, 60 sec: 49696.5, 300 sec: 48651.8). Total num frames: 849969152. Throughput: 0: 12141.0. Samples: 212536320. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:25,769][1648981] Avg episode reward: [(0, '470.490')] [2024-06-15 16:30:28,208][1651669] Updated weights for policy 0, policy_version 415057 (0.0012) [2024-06-15 16:30:29,627][1651669] Updated weights for policy 0, policy_version 415107 (0.0012) [2024-06-15 16:30:30,776][1648981] Fps is (10 sec: 49105.9, 60 sec: 49147.6, 300 sec: 48761.7). Total num frames: 850231296. Throughput: 0: 12422.0. Samples: 212614144. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:30,776][1648981] Avg episode reward: [(0, '461.330')] [2024-06-15 16:30:31,964][1651669] Updated weights for policy 0, policy_version 415170 (0.0011) [2024-06-15 16:30:33,222][1651669] Updated weights for policy 0, policy_version 415223 (0.0011) [2024-06-15 16:30:35,775][1648981] Fps is (10 sec: 45846.1, 60 sec: 50237.3, 300 sec: 48539.7). Total num frames: 850427904. Throughput: 0: 12388.1. Samples: 212688384. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:35,776][1648981] Avg episode reward: [(0, '456.760')] [2024-06-15 16:30:36,127][1651669] Updated weights for policy 0, policy_version 415264 (0.0013) [2024-06-15 16:30:38,781][1651669] Updated weights for policy 0, policy_version 415329 (0.0015) [2024-06-15 16:30:39,730][1651669] Updated weights for policy 0, policy_version 415376 (0.0013) [2024-06-15 16:30:40,775][1648981] Fps is (10 sec: 55711.6, 60 sec: 50242.0, 300 sec: 48873.0). Total num frames: 850788352. Throughput: 0: 12376.8. Samples: 212728320. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:40,775][1648981] Avg episode reward: [(0, '434.880')] [2024-06-15 16:30:42,795][1651669] Updated weights for policy 0, policy_version 415440 (0.0029) [2024-06-15 16:30:42,955][1651274] Signal inference workers to stop experience collection... (21800 times) [2024-06-15 16:30:42,983][1651669] InferenceWorker_p0-w0: stopping experience collection (21800 times) [2024-06-15 16:30:43,228][1651274] Signal inference workers to resume experience collection... (21800 times) [2024-06-15 16:30:43,230][1651669] InferenceWorker_p0-w0: resuming experience collection (21800 times) [2024-06-15 16:30:44,080][1651669] Updated weights for policy 0, policy_version 415488 (0.0011) [2024-06-15 16:30:45,766][1648981] Fps is (10 sec: 49193.7, 60 sec: 50244.4, 300 sec: 48430.0). Total num frames: 850919424. Throughput: 0: 12300.2. Samples: 212796416. Policy #0 lag: (min: 15.0, avg: 89.4, max: 271.0) [2024-06-15 16:30:45,767][1648981] Avg episode reward: [(0, '428.950')] [2024-06-15 16:30:47,365][1651669] Updated weights for policy 0, policy_version 415547 (0.0026) [2024-06-15 16:30:49,597][1651669] Updated weights for policy 0, policy_version 415608 (0.0011) [2024-06-15 16:30:50,767][1648981] Fps is (10 sec: 45911.1, 60 sec: 49151.7, 300 sec: 48652.1). Total num frames: 851247104. Throughput: 0: 12472.1. Samples: 212873728. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:30:50,768][1648981] Avg episode reward: [(0, '422.380')] [2024-06-15 16:30:51,351][1651669] Updated weights for policy 0, policy_version 415680 (0.0011) [2024-06-15 16:30:54,682][1651669] Updated weights for policy 0, policy_version 415739 (0.0029) [2024-06-15 16:30:55,766][1648981] Fps is (10 sec: 52427.8, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 851443712. Throughput: 0: 12538.3. Samples: 212911104. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:30:55,767][1648981] Avg episode reward: [(0, '411.800')] [2024-06-15 16:30:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000415744_851443712.pth... [2024-06-15 16:30:55,867][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000410048_839778304.pth [2024-06-15 16:30:55,872][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000415744_851443712.pth [2024-06-15 16:30:57,664][1651669] Updated weights for policy 0, policy_version 415779 (0.0012) [2024-06-15 16:30:59,397][1651669] Updated weights for policy 0, policy_version 415829 (0.0012) [2024-06-15 16:31:00,766][1648981] Fps is (10 sec: 45877.1, 60 sec: 49700.5, 300 sec: 48652.1). Total num frames: 851705856. Throughput: 0: 12438.0. Samples: 212984832. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:00,767][1648981] Avg episode reward: [(0, '409.430')] [2024-06-15 16:31:01,525][1651669] Updated weights for policy 0, policy_version 415904 (0.0012) [2024-06-15 16:31:03,924][1651669] Updated weights for policy 0, policy_version 415952 (0.0012) [2024-06-15 16:31:04,995][1651669] Updated weights for policy 0, policy_version 415999 (0.0012) [2024-06-15 16:31:05,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50250.8, 300 sec: 48763.2). Total num frames: 851968000. Throughput: 0: 12413.1. Samples: 213057536. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:05,767][1648981] Avg episode reward: [(0, '422.190')] [2024-06-15 16:31:09,138][1651669] Updated weights for policy 0, policy_version 416064 (0.0013) [2024-06-15 16:31:10,768][1648981] Fps is (10 sec: 49145.8, 60 sec: 49697.0, 300 sec: 48763.0). Total num frames: 852197376. Throughput: 0: 12413.4. Samples: 213094912. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:10,769][1648981] Avg episode reward: [(0, '405.410')] [2024-06-15 16:31:10,821][1651669] Updated weights for policy 0, policy_version 416124 (0.0012) [2024-06-15 16:31:12,897][1651669] Updated weights for policy 0, policy_version 416164 (0.0012) [2024-06-15 16:31:15,136][1651669] Updated weights for policy 0, policy_version 416210 (0.0013) [2024-06-15 16:31:15,790][1648981] Fps is (10 sec: 49035.1, 60 sec: 49684.8, 300 sec: 48759.3). Total num frames: 852459520. Throughput: 0: 12272.7. Samples: 213166592. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:15,791][1648981] Avg episode reward: [(0, '395.680')] [2024-06-15 16:31:16,012][1651669] Updated weights for policy 0, policy_version 416256 (0.0029) [2024-06-15 16:31:19,966][1651669] Updated weights for policy 0, policy_version 416315 (0.0118) [2024-06-15 16:31:20,772][1648981] Fps is (10 sec: 42584.7, 60 sec: 48056.1, 300 sec: 48429.3). Total num frames: 852623360. Throughput: 0: 12175.3. Samples: 213236224. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:20,779][1648981] Avg episode reward: [(0, '395.510')] [2024-06-15 16:31:21,894][1651669] Updated weights for policy 0, policy_version 416377 (0.0147) [2024-06-15 16:31:24,210][1651669] Updated weights for policy 0, policy_version 416446 (0.0063) [2024-06-15 16:31:25,766][1648981] Fps is (10 sec: 42700.5, 60 sec: 48607.6, 300 sec: 48763.3). Total num frames: 852885504. Throughput: 0: 12051.3. Samples: 213270528. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:25,767][1648981] Avg episode reward: [(0, '422.800')] [2024-06-15 16:31:26,705][1651669] Updated weights for policy 0, policy_version 416485 (0.0031) [2024-06-15 16:31:27,392][1651669] Updated weights for policy 0, policy_version 416512 (0.0011) [2024-06-15 16:31:29,628][1651274] Signal inference workers to stop experience collection... (21850 times) [2024-06-15 16:31:29,667][1651669] InferenceWorker_p0-w0: stopping experience collection (21850 times) [2024-06-15 16:31:29,822][1651274] Signal inference workers to resume experience collection... (21850 times) [2024-06-15 16:31:29,823][1651669] InferenceWorker_p0-w0: resuming experience collection (21850 times) [2024-06-15 16:31:30,767][1648981] Fps is (10 sec: 49173.1, 60 sec: 48067.0, 300 sec: 48763.2). Total num frames: 853114880. Throughput: 0: 12196.9. Samples: 213345280. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:30,767][1648981] Avg episode reward: [(0, '445.190')] [2024-06-15 16:31:31,055][1651669] Updated weights for policy 0, policy_version 416569 (0.0014) [2024-06-15 16:31:33,503][1651669] Updated weights for policy 0, policy_version 416629 (0.0012) [2024-06-15 16:31:34,648][1651669] Updated weights for policy 0, policy_version 416658 (0.0011) [2024-06-15 16:31:35,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 49705.0, 300 sec: 48874.3). Total num frames: 853409792. Throughput: 0: 11889.9. Samples: 213408768. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:35,767][1648981] Avg episode reward: [(0, '449.380')] [2024-06-15 16:31:36,631][1651669] Updated weights for policy 0, policy_version 416705 (0.0011) [2024-06-15 16:31:37,621][1651669] Updated weights for policy 0, policy_version 416758 (0.0013) [2024-06-15 16:31:40,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 45881.5, 300 sec: 48430.0). Total num frames: 853540864. Throughput: 0: 11958.1. Samples: 213449216. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:40,767][1648981] Avg episode reward: [(0, '465.790')] [2024-06-15 16:31:41,743][1651669] Updated weights for policy 0, policy_version 416801 (0.0013) [2024-06-15 16:31:42,970][1651669] Updated weights for policy 0, policy_version 416864 (0.0013) [2024-06-15 16:31:45,068][1651669] Updated weights for policy 0, policy_version 416913 (0.0011) [2024-06-15 16:31:45,779][1648981] Fps is (10 sec: 49087.9, 60 sec: 49687.2, 300 sec: 48761.1). Total num frames: 853901312. Throughput: 0: 12091.1. Samples: 213529088. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:45,780][1648981] Avg episode reward: [(0, '472.830')] [2024-06-15 16:31:46,101][1651669] Updated weights for policy 0, policy_version 416958 (0.0011) [2024-06-15 16:31:48,125][1651669] Updated weights for policy 0, policy_version 417015 (0.0012) [2024-06-15 16:31:50,772][1648981] Fps is (10 sec: 52407.8, 60 sec: 46964.7, 300 sec: 48433.3). Total num frames: 854065152. Throughput: 0: 12002.5. Samples: 213597696. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:50,777][1648981] Avg episode reward: [(0, '497.150')] [2024-06-15 16:31:53,280][1651669] Updated weights for policy 0, policy_version 417080 (0.0012) [2024-06-15 16:31:54,592][1651669] Updated weights for policy 0, policy_version 417122 (0.0012) [2024-06-15 16:31:55,766][1648981] Fps is (10 sec: 42654.5, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 854327296. Throughput: 0: 11969.8. Samples: 213633536. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:31:55,767][1648981] Avg episode reward: [(0, '493.570')] [2024-06-15 16:31:56,684][1651669] Updated weights for policy 0, policy_version 417185 (0.0119) [2024-06-15 16:31:58,479][1651669] Updated weights for policy 0, policy_version 417248 (0.0013) [2024-06-15 16:32:00,766][1648981] Fps is (10 sec: 52449.8, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 854589440. Throughput: 0: 11770.8. Samples: 213696000. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:32:00,767][1648981] Avg episode reward: [(0, '495.250')] [2024-06-15 16:32:03,025][1651669] Updated weights for policy 0, policy_version 417281 (0.0012) [2024-06-15 16:32:04,370][1651669] Updated weights for policy 0, policy_version 417342 (0.0029) [2024-06-15 16:32:05,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 47513.6, 300 sec: 48544.3). Total num frames: 854818816. Throughput: 0: 12073.0. Samples: 213779456. Policy #0 lag: (min: 13.0, avg: 120.2, max: 269.0) [2024-06-15 16:32:05,767][1648981] Avg episode reward: [(0, '487.820')] [2024-06-15 16:32:05,923][1651669] Updated weights for policy 0, policy_version 417407 (0.0012) [2024-06-15 16:32:08,594][1651669] Updated weights for policy 0, policy_version 417488 (0.0080) [2024-06-15 16:32:09,649][1651669] Updated weights for policy 0, policy_version 417536 (0.0013) [2024-06-15 16:32:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48606.9, 300 sec: 48652.8). Total num frames: 855113728. Throughput: 0: 12049.0. Samples: 213812736. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:10,767][1648981] Avg episode reward: [(0, '510.040')] [2024-06-15 16:32:13,953][1651274] Signal inference workers to stop experience collection... (21900 times) [2024-06-15 16:32:14,008][1651669] InferenceWorker_p0-w0: stopping experience collection (21900 times) [2024-06-15 16:32:14,188][1651274] Signal inference workers to resume experience collection... (21900 times) [2024-06-15 16:32:14,189][1651669] InferenceWorker_p0-w0: resuming experience collection (21900 times) [2024-06-15 16:32:15,658][1651669] Updated weights for policy 0, policy_version 417616 (0.0109) [2024-06-15 16:32:15,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 46985.9, 300 sec: 48541.0). Total num frames: 855277568. Throughput: 0: 12128.7. Samples: 213891072. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:15,767][1648981] Avg episode reward: [(0, '496.690')] [2024-06-15 16:32:16,779][1651669] Updated weights for policy 0, policy_version 417661 (0.0011) [2024-06-15 16:32:18,695][1651669] Updated weights for policy 0, policy_version 417724 (0.0011) [2024-06-15 16:32:20,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49701.9, 300 sec: 48763.2). Total num frames: 855605248. Throughput: 0: 12037.7. Samples: 213950464. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:20,767][1648981] Avg episode reward: [(0, '482.840')] [2024-06-15 16:32:20,807][1651669] Updated weights for policy 0, policy_version 417787 (0.0012) [2024-06-15 16:32:25,773][1648981] Fps is (10 sec: 42569.6, 60 sec: 46961.9, 300 sec: 48318.2). Total num frames: 855703552. Throughput: 0: 12092.7. Samples: 213993472. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:25,774][1648981] Avg episode reward: [(0, '458.610')] [2024-06-15 16:32:25,802][1651669] Updated weights for policy 0, policy_version 417834 (0.0048) [2024-06-15 16:32:27,080][1651669] Updated weights for policy 0, policy_version 417893 (0.0013) [2024-06-15 16:32:27,551][1651669] Updated weights for policy 0, policy_version 417920 (0.0014) [2024-06-15 16:32:29,282][1651669] Updated weights for policy 0, policy_version 417979 (0.0051) [2024-06-15 16:32:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.2, 300 sec: 48541.1). Total num frames: 856064000. Throughput: 0: 11938.8. Samples: 214066176. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:30,767][1648981] Avg episode reward: [(0, '474.330')] [2024-06-15 16:32:31,548][1651669] Updated weights for policy 0, policy_version 418036 (0.0011) [2024-06-15 16:32:35,766][1648981] Fps is (10 sec: 45907.4, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 856162304. Throughput: 0: 12129.8. Samples: 214143488. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:35,767][1648981] Avg episode reward: [(0, '449.400')] [2024-06-15 16:32:35,836][1651669] Updated weights for policy 0, policy_version 418064 (0.0015) [2024-06-15 16:32:37,219][1651669] Updated weights for policy 0, policy_version 418113 (0.0017) [2024-06-15 16:32:38,311][1651669] Updated weights for policy 0, policy_version 418170 (0.0011) [2024-06-15 16:32:39,887][1651669] Updated weights for policy 0, policy_version 418210 (0.0011) [2024-06-15 16:32:40,775][1648981] Fps is (10 sec: 49111.3, 60 sec: 50237.4, 300 sec: 48650.8). Total num frames: 856555520. Throughput: 0: 11978.6. Samples: 214172672. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:40,775][1648981] Avg episode reward: [(0, '438.690')] [2024-06-15 16:32:42,356][1651669] Updated weights for policy 0, policy_version 418272 (0.0013) [2024-06-15 16:32:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46431.4, 300 sec: 47986.3). Total num frames: 856686592. Throughput: 0: 12276.6. Samples: 214248448. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:45,767][1648981] Avg episode reward: [(0, '430.970')] [2024-06-15 16:32:46,649][1651669] Updated weights for policy 0, policy_version 418336 (0.0015) [2024-06-15 16:32:48,752][1651669] Updated weights for policy 0, policy_version 418400 (0.0012) [2024-06-15 16:32:50,541][1651669] Updated weights for policy 0, policy_version 418465 (0.0014) [2024-06-15 16:32:50,770][1648981] Fps is (10 sec: 45895.4, 60 sec: 49152.2, 300 sec: 48651.5). Total num frames: 857014272. Throughput: 0: 11934.3. Samples: 214316544. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:50,771][1648981] Avg episode reward: [(0, '434.650')] [2024-06-15 16:32:51,224][1651669] Updated weights for policy 0, policy_version 418496 (0.0009) [2024-06-15 16:32:54,131][1651669] Updated weights for policy 0, policy_version 418553 (0.0036) [2024-06-15 16:32:55,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 857210880. Throughput: 0: 12083.2. Samples: 214356480. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:32:55,767][1648981] Avg episode reward: [(0, '432.240')] [2024-06-15 16:32:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000418560_857210880.pth... [2024-06-15 16:32:55,824][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000412912_845643776.pth [2024-06-15 16:32:57,191][1651274] Signal inference workers to stop experience collection... (21950 times) [2024-06-15 16:32:57,240][1651669] InferenceWorker_p0-w0: stopping experience collection (21950 times) [2024-06-15 16:32:57,545][1651274] Signal inference workers to resume experience collection... (21950 times) [2024-06-15 16:32:57,546][1651669] InferenceWorker_p0-w0: resuming experience collection (21950 times) [2024-06-15 16:32:58,758][1651669] Updated weights for policy 0, policy_version 418624 (0.0019) [2024-06-15 16:33:00,767][1648981] Fps is (10 sec: 45890.5, 60 sec: 48059.4, 300 sec: 48430.9). Total num frames: 857473024. Throughput: 0: 11844.2. Samples: 214424064. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:33:00,768][1648981] Avg episode reward: [(0, '449.980')] [2024-06-15 16:33:00,776][1651669] Updated weights for policy 0, policy_version 418690 (0.0014) [2024-06-15 16:33:01,845][1651669] Updated weights for policy 0, policy_version 418750 (0.0011) [2024-06-15 16:33:05,477][1651669] Updated weights for policy 0, policy_version 418814 (0.0013) [2024-06-15 16:33:05,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 48605.7, 300 sec: 48430.0). Total num frames: 857735168. Throughput: 0: 12253.8. Samples: 214501888. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:33:05,767][1648981] Avg episode reward: [(0, '441.100')] [2024-06-15 16:33:10,068][1651669] Updated weights for policy 0, policy_version 418881 (0.0013) [2024-06-15 16:33:10,767][1648981] Fps is (10 sec: 45876.8, 60 sec: 46967.3, 300 sec: 48207.8). Total num frames: 857931776. Throughput: 0: 12141.9. Samples: 214539776. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:33:10,772][1648981] Avg episode reward: [(0, '433.940')] [2024-06-15 16:33:11,371][1651669] Updated weights for policy 0, policy_version 418944 (0.0011) [2024-06-15 16:33:13,154][1651669] Updated weights for policy 0, policy_version 418999 (0.0013) [2024-06-15 16:33:15,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 47513.8, 300 sec: 48318.9). Total num frames: 858128384. Throughput: 0: 11912.5. Samples: 214602240. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:33:15,767][1648981] Avg episode reward: [(0, '434.220')] [2024-06-15 16:33:16,399][1651669] Updated weights for policy 0, policy_version 419049 (0.0011) [2024-06-15 16:33:19,643][1651669] Updated weights for policy 0, policy_version 419104 (0.0015) [2024-06-15 16:33:20,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46421.3, 300 sec: 48207.8). Total num frames: 858390528. Throughput: 0: 11935.3. Samples: 214680576. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:33:20,767][1648981] Avg episode reward: [(0, '424.480')] [2024-06-15 16:33:21,184][1651669] Updated weights for policy 0, policy_version 419168 (0.0013) [2024-06-15 16:33:22,771][1651669] Updated weights for policy 0, policy_version 419216 (0.0027) [2024-06-15 16:33:23,757][1651669] Updated weights for policy 0, policy_version 419264 (0.0012) [2024-06-15 16:33:25,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 49157.6, 300 sec: 48543.0). Total num frames: 858652672. Throughput: 0: 11960.2. Samples: 214710784. Policy #0 lag: (min: 14.0, avg: 170.3, max: 270.0) [2024-06-15 16:33:25,768][1648981] Avg episode reward: [(0, '427.500')] [2024-06-15 16:33:30,122][1651669] Updated weights for policy 0, policy_version 419360 (0.0011) [2024-06-15 16:33:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 46967.4, 300 sec: 48318.9). Total num frames: 858882048. Throughput: 0: 12151.5. Samples: 214795264. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:33:30,767][1648981] Avg episode reward: [(0, '430.110')] [2024-06-15 16:33:31,246][1651669] Updated weights for policy 0, policy_version 419410 (0.0012) [2024-06-15 16:33:32,522][1651669] Updated weights for policy 0, policy_version 419457 (0.0011) [2024-06-15 16:33:33,865][1651669] Updated weights for policy 0, policy_version 419511 (0.0012) [2024-06-15 16:33:35,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 859176960. Throughput: 0: 12118.3. Samples: 214861824. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:33:35,767][1648981] Avg episode reward: [(0, '439.520')] [2024-06-15 16:33:37,783][1651274] Signal inference workers to stop experience collection... (22000 times) [2024-06-15 16:33:37,836][1651669] InferenceWorker_p0-w0: stopping experience collection (22000 times) [2024-06-15 16:33:37,839][1651669] Updated weights for policy 0, policy_version 419539 (0.0013) [2024-06-15 16:33:38,052][1651274] Signal inference workers to resume experience collection... (22000 times) [2024-06-15 16:33:38,053][1651669] InferenceWorker_p0-w0: resuming experience collection (22000 times) [2024-06-15 16:33:38,636][1651669] Updated weights for policy 0, policy_version 419583 (0.0042) [2024-06-15 16:33:40,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 47519.9, 300 sec: 48318.9). Total num frames: 859406336. Throughput: 0: 12105.9. Samples: 214901248. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:33:40,767][1648981] Avg episode reward: [(0, '467.210')] [2024-06-15 16:33:40,807][1651669] Updated weights for policy 0, policy_version 419637 (0.0017) [2024-06-15 16:33:42,685][1651669] Updated weights for policy 0, policy_version 419696 (0.0026) [2024-06-15 16:33:44,506][1651669] Updated weights for policy 0, policy_version 419765 (0.0015) [2024-06-15 16:33:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 859701248. Throughput: 0: 12117.5. Samples: 214969344. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:33:45,767][1648981] Avg episode reward: [(0, '468.780')] [2024-06-15 16:33:49,238][1651669] Updated weights for policy 0, policy_version 419815 (0.0012) [2024-06-15 16:33:50,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 46970.5, 300 sec: 48207.9). Total num frames: 859832320. Throughput: 0: 12128.8. Samples: 215047680. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:33:50,767][1648981] Avg episode reward: [(0, '474.040')] [2024-06-15 16:33:51,486][1651669] Updated weights for policy 0, policy_version 419872 (0.0027) [2024-06-15 16:33:53,337][1651669] Updated weights for policy 0, policy_version 419922 (0.0013) [2024-06-15 16:33:54,813][1651669] Updated weights for policy 0, policy_version 419984 (0.0013) [2024-06-15 16:33:55,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 49698.0, 300 sec: 48763.2). Total num frames: 860192768. Throughput: 0: 12106.0. Samples: 215084544. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:33:55,769][1648981] Avg episode reward: [(0, '458.520')] [2024-06-15 16:33:59,327][1651669] Updated weights for policy 0, policy_version 420035 (0.0094) [2024-06-15 16:34:00,484][1651669] Updated weights for policy 0, policy_version 420096 (0.0012) [2024-06-15 16:34:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48060.1, 300 sec: 48541.1). Total num frames: 860356608. Throughput: 0: 12288.0. Samples: 215155200. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:00,767][1648981] Avg episode reward: [(0, '475.830')] [2024-06-15 16:34:02,994][1651669] Updated weights for policy 0, policy_version 420156 (0.0015) [2024-06-15 16:34:04,256][1651669] Updated weights for policy 0, policy_version 420192 (0.0037) [2024-06-15 16:34:05,763][1651669] Updated weights for policy 0, policy_version 420261 (0.0176) [2024-06-15 16:34:05,766][1648981] Fps is (10 sec: 49153.2, 60 sec: 49152.2, 300 sec: 48652.2). Total num frames: 860684288. Throughput: 0: 12083.2. Samples: 215224320. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:05,767][1648981] Avg episode reward: [(0, '471.930')] [2024-06-15 16:34:10,699][1651669] Updated weights for policy 0, policy_version 420320 (0.0012) [2024-06-15 16:34:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 860815360. Throughput: 0: 12231.2. Samples: 215261184. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:10,767][1648981] Avg episode reward: [(0, '458.460')] [2024-06-15 16:34:11,277][1651669] Updated weights for policy 0, policy_version 420350 (0.0013) [2024-06-15 16:34:14,108][1651669] Updated weights for policy 0, policy_version 420405 (0.0027) [2024-06-15 16:34:15,766][1648981] Fps is (10 sec: 36044.5, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 861044736. Throughput: 0: 11935.3. Samples: 215332352. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:15,767][1648981] Avg episode reward: [(0, '455.440')] [2024-06-15 16:34:15,975][1651669] Updated weights for policy 0, policy_version 420453 (0.0111) [2024-06-15 16:34:17,342][1651274] Signal inference workers to stop experience collection... (22050 times) [2024-06-15 16:34:17,349][1651669] Updated weights for policy 0, policy_version 420514 (0.0010) [2024-06-15 16:34:17,381][1651669] InferenceWorker_p0-w0: stopping experience collection (22050 times) [2024-06-15 16:34:17,570][1651274] Signal inference workers to resume experience collection... (22050 times) [2024-06-15 16:34:17,571][1651669] InferenceWorker_p0-w0: resuming experience collection (22050 times) [2024-06-15 16:34:20,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 861274112. Throughput: 0: 12083.2. Samples: 215405568. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:20,767][1648981] Avg episode reward: [(0, '457.850')] [2024-06-15 16:34:22,854][1651669] Updated weights for policy 0, policy_version 420594 (0.0016) [2024-06-15 16:34:24,517][1651669] Updated weights for policy 0, policy_version 420625 (0.0020) [2024-06-15 16:34:25,658][1651669] Updated weights for policy 0, policy_version 420672 (0.0012) [2024-06-15 16:34:25,774][1648981] Fps is (10 sec: 49116.7, 60 sec: 48054.2, 300 sec: 48318.4). Total num frames: 861536256. Throughput: 0: 11967.6. Samples: 215439872. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:25,777][1648981] Avg episode reward: [(0, '447.990')] [2024-06-15 16:34:27,917][1651669] Updated weights for policy 0, policy_version 420738 (0.0011) [2024-06-15 16:34:29,157][1651669] Updated weights for policy 0, policy_version 420798 (0.0025) [2024-06-15 16:34:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.9, 300 sec: 48763.2). Total num frames: 861798400. Throughput: 0: 11810.2. Samples: 215500800. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:30,767][1648981] Avg episode reward: [(0, '430.650')] [2024-06-15 16:34:33,994][1651669] Updated weights for policy 0, policy_version 420848 (0.0013) [2024-06-15 16:34:34,805][1651669] Updated weights for policy 0, policy_version 420880 (0.0049) [2024-06-15 16:34:35,802][1648981] Fps is (10 sec: 49011.8, 60 sec: 47485.3, 300 sec: 48314.0). Total num frames: 862027776. Throughput: 0: 11903.1. Samples: 215583744. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:35,805][1648981] Avg episode reward: [(0, '399.340')] [2024-06-15 16:34:37,314][1651669] Updated weights for policy 0, policy_version 420944 (0.0013) [2024-06-15 16:34:39,022][1651669] Updated weights for policy 0, policy_version 421010 (0.0022) [2024-06-15 16:34:40,782][1648981] Fps is (10 sec: 52346.2, 60 sec: 48593.3, 300 sec: 48871.7). Total num frames: 862322688. Throughput: 0: 11726.4. Samples: 215612416. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:40,783][1648981] Avg episode reward: [(0, '407.510')] [2024-06-15 16:34:44,036][1651669] Updated weights for policy 0, policy_version 421072 (0.0012) [2024-06-15 16:34:45,769][1648981] Fps is (10 sec: 46027.4, 60 sec: 46419.3, 300 sec: 48096.3). Total num frames: 862486528. Throughput: 0: 11991.5. Samples: 215694848. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:45,770][1648981] Avg episode reward: [(0, '385.180')] [2024-06-15 16:34:46,412][1651669] Updated weights for policy 0, policy_version 421168 (0.0014) [2024-06-15 16:34:49,406][1651669] Updated weights for policy 0, policy_version 421232 (0.0012) [2024-06-15 16:34:50,736][1651669] Updated weights for policy 0, policy_version 421312 (0.0101) [2024-06-15 16:34:50,766][1648981] Fps is (10 sec: 52511.8, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 862846976. Throughput: 0: 11685.0. Samples: 215750144. Policy #0 lag: (min: 31.0, avg: 119.2, max: 287.0) [2024-06-15 16:34:50,767][1648981] Avg episode reward: [(0, '371.320')] [2024-06-15 16:34:55,766][1648981] Fps is (10 sec: 42610.1, 60 sec: 45329.2, 300 sec: 48097.2). Total num frames: 862912512. Throughput: 0: 11969.4. Samples: 215799808. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:34:55,767][1648981] Avg episode reward: [(0, '354.150')] [2024-06-15 16:34:56,245][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000421376_862978048.pth... [2024-06-15 16:34:56,367][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000415744_851443712.pth [2024-06-15 16:34:56,935][1651669] Updated weights for policy 0, policy_version 421394 (0.0011) [2024-06-15 16:34:58,073][1651669] Updated weights for policy 0, policy_version 421439 (0.0011) [2024-06-15 16:34:59,598][1651274] Signal inference workers to stop experience collection... (22100 times) [2024-06-15 16:34:59,631][1651669] InferenceWorker_p0-w0: stopping experience collection (22100 times) [2024-06-15 16:34:59,839][1651274] Signal inference workers to resume experience collection... (22100 times) [2024-06-15 16:34:59,842][1651669] InferenceWorker_p0-w0: resuming experience collection (22100 times) [2024-06-15 16:35:00,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 48059.7, 300 sec: 48431.3). Total num frames: 863240192. Throughput: 0: 11787.4. Samples: 215862784. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:00,767][1648981] Avg episode reward: [(0, '358.150')] [2024-06-15 16:35:01,048][1651669] Updated weights for policy 0, policy_version 421523 (0.0179) [2024-06-15 16:35:05,767][1648981] Fps is (10 sec: 45873.4, 60 sec: 44782.6, 300 sec: 47985.6). Total num frames: 863371264. Throughput: 0: 11969.3. Samples: 215944192. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:05,767][1648981] Avg episode reward: [(0, '367.660')] [2024-06-15 16:35:06,322][1651669] Updated weights for policy 0, policy_version 421584 (0.0013) [2024-06-15 16:35:07,700][1651669] Updated weights for policy 0, policy_version 421636 (0.0012) [2024-06-15 16:35:10,009][1651669] Updated weights for policy 0, policy_version 421703 (0.0013) [2024-06-15 16:35:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 48209.1). Total num frames: 863698944. Throughput: 0: 11982.7. Samples: 215979008. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:10,767][1648981] Avg episode reward: [(0, '361.880')] [2024-06-15 16:35:11,815][1651669] Updated weights for policy 0, policy_version 421764 (0.0014) [2024-06-15 16:35:15,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 863895552. Throughput: 0: 12060.4. Samples: 216043520. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:15,767][1648981] Avg episode reward: [(0, '386.330')] [2024-06-15 16:35:17,648][1651669] Updated weights for policy 0, policy_version 421842 (0.0012) [2024-06-15 16:35:20,020][1651669] Updated weights for policy 0, policy_version 421941 (0.0011) [2024-06-15 16:35:20,778][1648981] Fps is (10 sec: 45821.4, 60 sec: 48050.3, 300 sec: 48095.2). Total num frames: 864157696. Throughput: 0: 11862.0. Samples: 216117248. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:20,779][1648981] Avg episode reward: [(0, '393.550')] [2024-06-15 16:35:21,013][1651669] Updated weights for policy 0, policy_version 421954 (0.0011) [2024-06-15 16:35:22,769][1651669] Updated weights for policy 0, policy_version 422026 (0.0090) [2024-06-15 16:35:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48065.5, 300 sec: 48098.3). Total num frames: 864419840. Throughput: 0: 11871.2. Samples: 216146432. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:25,767][1648981] Avg episode reward: [(0, '404.680')] [2024-06-15 16:35:28,902][1651669] Updated weights for policy 0, policy_version 422103 (0.0013) [2024-06-15 16:35:30,766][1648981] Fps is (10 sec: 45929.5, 60 sec: 46967.5, 300 sec: 48098.1). Total num frames: 864616448. Throughput: 0: 11822.2. Samples: 216226816. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:30,767][1648981] Avg episode reward: [(0, '399.900')] [2024-06-15 16:35:31,188][1651669] Updated weights for policy 0, policy_version 422192 (0.0014) [2024-06-15 16:35:33,209][1651669] Updated weights for policy 0, policy_version 422241 (0.0010) [2024-06-15 16:35:35,203][1651669] Updated weights for policy 0, policy_version 422327 (0.0027) [2024-06-15 16:35:35,793][1648981] Fps is (10 sec: 52289.0, 60 sec: 48613.2, 300 sec: 47982.7). Total num frames: 864944128. Throughput: 0: 11848.6. Samples: 216283648. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:35,794][1648981] Avg episode reward: [(0, '393.940')] [2024-06-15 16:35:40,725][1651274] Signal inference workers to stop experience collection... (22150 times) [2024-06-15 16:35:40,755][1651669] InferenceWorker_p0-w0: stopping experience collection (22150 times) [2024-06-15 16:35:40,768][1648981] Fps is (10 sec: 39313.9, 60 sec: 44793.3, 300 sec: 47763.2). Total num frames: 865009664. Throughput: 0: 11695.9. Samples: 216326144. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:40,769][1648981] Avg episode reward: [(0, '415.180')] [2024-06-15 16:35:40,981][1651274] Signal inference workers to resume experience collection... (22150 times) [2024-06-15 16:35:40,982][1651669] InferenceWorker_p0-w0: resuming experience collection (22150 times) [2024-06-15 16:35:41,135][1651669] Updated weights for policy 0, policy_version 422387 (0.0012) [2024-06-15 16:35:42,826][1651669] Updated weights for policy 0, policy_version 422460 (0.0011) [2024-06-15 16:35:45,470][1651669] Updated weights for policy 0, policy_version 422528 (0.0012) [2024-06-15 16:35:45,766][1648981] Fps is (10 sec: 39427.1, 60 sec: 47515.8, 300 sec: 47763.6). Total num frames: 865337344. Throughput: 0: 11787.4. Samples: 216393216. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:45,767][1648981] Avg episode reward: [(0, '452.140')] [2024-06-15 16:35:47,151][1651669] Updated weights for policy 0, policy_version 422592 (0.0014) [2024-06-15 16:35:50,766][1648981] Fps is (10 sec: 45884.2, 60 sec: 43690.7, 300 sec: 47541.4). Total num frames: 865468416. Throughput: 0: 11389.3. Samples: 216456704. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:50,767][1648981] Avg episode reward: [(0, '452.580')] [2024-06-15 16:35:52,980][1651669] Updated weights for policy 0, policy_version 422645 (0.0097) [2024-06-15 16:35:54,312][1651669] Updated weights for policy 0, policy_version 422719 (0.0011) [2024-06-15 16:35:55,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 865763328. Throughput: 0: 11457.4. Samples: 216494592. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:35:55,767][1648981] Avg episode reward: [(0, '468.030')] [2024-06-15 16:35:56,799][1651669] Updated weights for policy 0, policy_version 422786 (0.0011) [2024-06-15 16:35:58,310][1651669] Updated weights for policy 0, policy_version 422846 (0.0108) [2024-06-15 16:36:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 865992704. Throughput: 0: 11502.9. Samples: 216561152. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:36:00,767][1648981] Avg episode reward: [(0, '468.670')] [2024-06-15 16:36:03,542][1651669] Updated weights for policy 0, policy_version 422896 (0.0011) [2024-06-15 16:36:05,184][1651669] Updated weights for policy 0, policy_version 422970 (0.0012) [2024-06-15 16:36:05,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48060.0, 300 sec: 47652.6). Total num frames: 866254848. Throughput: 0: 11505.9. Samples: 216634880. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:36:05,767][1648981] Avg episode reward: [(0, '470.820')] [2024-06-15 16:36:07,391][1651669] Updated weights for policy 0, policy_version 423024 (0.0012) [2024-06-15 16:36:09,322][1651669] Updated weights for policy 0, policy_version 423090 (0.0013) [2024-06-15 16:36:10,770][1648981] Fps is (10 sec: 52407.8, 60 sec: 46964.4, 300 sec: 47655.7). Total num frames: 866516992. Throughput: 0: 11661.2. Samples: 216671232. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:36:10,771][1648981] Avg episode reward: [(0, '465.260')] [2024-06-15 16:36:13,789][1651669] Updated weights for policy 0, policy_version 423120 (0.0020) [2024-06-15 16:36:14,670][1651669] Updated weights for policy 0, policy_version 423168 (0.0041) [2024-06-15 16:36:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.4, 300 sec: 47764.3). Total num frames: 866713600. Throughput: 0: 11434.6. Samples: 216741376. Policy #0 lag: (min: 15.0, avg: 85.9, max: 271.0) [2024-06-15 16:36:15,767][1648981] Avg episode reward: [(0, '464.790')] [2024-06-15 16:36:16,226][1651669] Updated weights for policy 0, policy_version 423232 (0.0012) [2024-06-15 16:36:18,133][1651669] Updated weights for policy 0, policy_version 423291 (0.0013) [2024-06-15 16:36:19,502][1651274] Signal inference workers to stop experience collection... (22200 times) [2024-06-15 16:36:19,574][1651669] InferenceWorker_p0-w0: stopping experience collection (22200 times) [2024-06-15 16:36:19,770][1651274] Signal inference workers to resume experience collection... (22200 times) [2024-06-15 16:36:19,771][1651669] InferenceWorker_p0-w0: resuming experience collection (22200 times) [2024-06-15 16:36:20,000][1651669] Updated weights for policy 0, policy_version 423330 (0.0121) [2024-06-15 16:36:20,797][1648981] Fps is (10 sec: 52288.4, 60 sec: 48044.5, 300 sec: 47980.7). Total num frames: 867041280. Throughput: 0: 11729.4. Samples: 216811520. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:20,798][1648981] Avg episode reward: [(0, '475.880')] [2024-06-15 16:36:23,844][1651669] Updated weights for policy 0, policy_version 423379 (0.0011) [2024-06-15 16:36:25,753][1651669] Updated weights for policy 0, policy_version 423430 (0.0013) [2024-06-15 16:36:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47652.5). Total num frames: 867172352. Throughput: 0: 11787.9. Samples: 216856576. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:25,767][1648981] Avg episode reward: [(0, '487.210')] [2024-06-15 16:36:26,664][1651669] Updated weights for policy 0, policy_version 423487 (0.0016) [2024-06-15 16:36:29,057][1651669] Updated weights for policy 0, policy_version 423543 (0.0012) [2024-06-15 16:36:30,766][1648981] Fps is (10 sec: 46017.1, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 867500032. Throughput: 0: 11844.3. Samples: 216926208. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:30,767][1648981] Avg episode reward: [(0, '468.380')] [2024-06-15 16:36:30,899][1651669] Updated weights for policy 0, policy_version 423587 (0.0013) [2024-06-15 16:36:34,345][1651669] Updated weights for policy 0, policy_version 423632 (0.0014) [2024-06-15 16:36:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45895.7, 300 sec: 47985.7). Total num frames: 867696640. Throughput: 0: 12231.1. Samples: 217007104. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:35,767][1648981] Avg episode reward: [(0, '467.760')] [2024-06-15 16:36:35,802][1651669] Updated weights for policy 0, policy_version 423682 (0.0021) [2024-06-15 16:36:36,817][1651669] Updated weights for policy 0, policy_version 423736 (0.0027) [2024-06-15 16:36:38,917][1651669] Updated weights for policy 0, policy_version 423792 (0.0013) [2024-06-15 16:36:40,776][1648981] Fps is (10 sec: 52378.9, 60 sec: 50237.9, 300 sec: 47875.2). Total num frames: 868024320. Throughput: 0: 12183.0. Samples: 217042944. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:40,776][1648981] Avg episode reward: [(0, '457.220')] [2024-06-15 16:36:41,199][1651669] Updated weights for policy 0, policy_version 423867 (0.0012) [2024-06-15 16:36:45,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 47875.3). Total num frames: 868188160. Throughput: 0: 12595.2. Samples: 217127936. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:45,767][1648981] Avg episode reward: [(0, '452.810')] [2024-06-15 16:36:46,157][1651669] Updated weights for policy 0, policy_version 423938 (0.0012) [2024-06-15 16:36:47,334][1651669] Updated weights for policy 0, policy_version 423996 (0.0013) [2024-06-15 16:36:50,291][1651669] Updated weights for policy 0, policy_version 424064 (0.0033) [2024-06-15 16:36:50,766][1648981] Fps is (10 sec: 45919.1, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 868483072. Throughput: 0: 12208.4. Samples: 217184256. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:50,767][1648981] Avg episode reward: [(0, '461.210')] [2024-06-15 16:36:52,550][1651669] Updated weights for policy 0, policy_version 424125 (0.0013) [2024-06-15 16:36:55,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 47513.3, 300 sec: 47541.3). Total num frames: 868614144. Throughput: 0: 12198.0. Samples: 217220096. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:36:55,767][1648981] Avg episode reward: [(0, '440.320')] [2024-06-15 16:36:55,793][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000424128_868614144.pth... [2024-06-15 16:36:56,022][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000418560_857210880.pth [2024-06-15 16:36:57,576][1651669] Updated weights for policy 0, policy_version 424192 (0.0014) [2024-06-15 16:36:59,759][1651669] Updated weights for policy 0, policy_version 424272 (0.0013) [2024-06-15 16:37:00,193][1651274] Signal inference workers to stop experience collection... (22250 times) [2024-06-15 16:37:00,226][1651669] InferenceWorker_p0-w0: stopping experience collection (22250 times) [2024-06-15 16:37:00,378][1651274] Signal inference workers to resume experience collection... (22250 times) [2024-06-15 16:37:00,379][1651669] InferenceWorker_p0-w0: resuming experience collection (22250 times) [2024-06-15 16:37:00,794][1648981] Fps is (10 sec: 52282.3, 60 sec: 50220.8, 300 sec: 48092.2). Total num frames: 869007360. Throughput: 0: 12178.0. Samples: 217289728. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:00,795][1648981] Avg episode reward: [(0, '437.600')] [2024-06-15 16:37:02,552][1651669] Updated weights for policy 0, policy_version 424327 (0.0011) [2024-06-15 16:37:03,758][1651669] Updated weights for policy 0, policy_version 424383 (0.0015) [2024-06-15 16:37:05,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 869138432. Throughput: 0: 12387.5. Samples: 217368576. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:05,767][1648981] Avg episode reward: [(0, '446.280')] [2024-06-15 16:37:08,181][1651669] Updated weights for policy 0, policy_version 424433 (0.0013) [2024-06-15 16:37:09,868][1651669] Updated weights for policy 0, policy_version 424496 (0.0011) [2024-06-15 16:37:10,767][1648981] Fps is (10 sec: 42717.2, 60 sec: 48608.9, 300 sec: 47985.7). Total num frames: 869433344. Throughput: 0: 12276.6. Samples: 217409024. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:10,767][1648981] Avg episode reward: [(0, '445.630')] [2024-06-15 16:37:11,080][1651669] Updated weights for policy 0, policy_version 424544 (0.0012) [2024-06-15 16:37:13,763][1651669] Updated weights for policy 0, policy_version 424608 (0.0011) [2024-06-15 16:37:15,780][1648981] Fps is (10 sec: 52357.6, 60 sec: 49140.9, 300 sec: 47650.2). Total num frames: 869662720. Throughput: 0: 12136.4. Samples: 217472512. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:15,781][1648981] Avg episode reward: [(0, '447.980')] [2024-06-15 16:37:18,780][1651669] Updated weights for policy 0, policy_version 424674 (0.0017) [2024-06-15 16:37:20,731][1651669] Updated weights for policy 0, policy_version 424758 (0.0012) [2024-06-15 16:37:20,770][1648981] Fps is (10 sec: 45858.5, 60 sec: 47535.0, 300 sec: 48097.3). Total num frames: 869892096. Throughput: 0: 11991.1. Samples: 217546752. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:20,771][1648981] Avg episode reward: [(0, '459.040')] [2024-06-15 16:37:22,379][1651669] Updated weights for policy 0, policy_version 424816 (0.0017) [2024-06-15 16:37:24,759][1651669] Updated weights for policy 0, policy_version 424889 (0.0013) [2024-06-15 16:37:25,766][1648981] Fps is (10 sec: 52500.2, 60 sec: 50244.3, 300 sec: 47874.6). Total num frames: 870187008. Throughput: 0: 11937.8. Samples: 217580032. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:25,767][1648981] Avg episode reward: [(0, '456.700')] [2024-06-15 16:37:30,638][1651669] Updated weights for policy 0, policy_version 424960 (0.0130) [2024-06-15 16:37:30,766][1648981] Fps is (10 sec: 42614.8, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 870318080. Throughput: 0: 11821.5. Samples: 217659904. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:30,767][1648981] Avg episode reward: [(0, '455.080')] [2024-06-15 16:37:32,114][1651669] Updated weights for policy 0, policy_version 425024 (0.0012) [2024-06-15 16:37:34,059][1651669] Updated weights for policy 0, policy_version 425088 (0.0013) [2024-06-15 16:37:35,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 47764.9). Total num frames: 870645760. Throughput: 0: 11889.8. Samples: 217719296. Policy #0 lag: (min: 2.0, avg: 123.8, max: 258.0) [2024-06-15 16:37:35,767][1648981] Avg episode reward: [(0, '454.000')] [2024-06-15 16:37:36,334][1651669] Updated weights for policy 0, policy_version 425147 (0.0010) [2024-06-15 16:37:40,770][1648981] Fps is (10 sec: 42581.4, 60 sec: 45333.3, 300 sec: 47651.8). Total num frames: 870744064. Throughput: 0: 12025.3. Samples: 217761280. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:37:40,771][1648981] Avg episode reward: [(0, '453.280')] [2024-06-15 16:37:41,277][1651669] Updated weights for policy 0, policy_version 425203 (0.0012) [2024-06-15 16:37:42,435][1651669] Updated weights for policy 0, policy_version 425248 (0.0131) [2024-06-15 16:37:42,559][1651274] Signal inference workers to stop experience collection... (22300 times) [2024-06-15 16:37:42,610][1651669] InferenceWorker_p0-w0: stopping experience collection (22300 times) [2024-06-15 16:37:42,735][1651274] Signal inference workers to resume experience collection... (22300 times) [2024-06-15 16:37:42,736][1651669] InferenceWorker_p0-w0: resuming experience collection (22300 times) [2024-06-15 16:37:43,619][1651669] Updated weights for policy 0, policy_version 425296 (0.0011) [2024-06-15 16:37:45,368][1651669] Updated weights for policy 0, policy_version 425346 (0.0012) [2024-06-15 16:37:45,767][1648981] Fps is (10 sec: 49149.5, 60 sec: 49151.6, 300 sec: 47875.1). Total num frames: 871137280. Throughput: 0: 11954.0. Samples: 217827328. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:37:45,767][1648981] Avg episode reward: [(0, '460.360')] [2024-06-15 16:37:46,481][1651669] Updated weights for policy 0, policy_version 425394 (0.0033) [2024-06-15 16:37:50,057][1651669] Updated weights for policy 0, policy_version 425412 (0.0010) [2024-06-15 16:37:50,766][1648981] Fps is (10 sec: 55727.0, 60 sec: 46967.3, 300 sec: 47763.5). Total num frames: 871301120. Throughput: 0: 12288.0. Samples: 217921536. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:37:50,767][1648981] Avg episode reward: [(0, '459.580')] [2024-06-15 16:37:51,170][1651669] Updated weights for policy 0, policy_version 425467 (0.0013) [2024-06-15 16:37:52,548][1651669] Updated weights for policy 0, policy_version 425520 (0.0016) [2024-06-15 16:37:54,639][1651669] Updated weights for policy 0, policy_version 425596 (0.0033) [2024-06-15 16:37:55,767][1648981] Fps is (10 sec: 52429.6, 60 sec: 50790.4, 300 sec: 48096.8). Total num frames: 871661568. Throughput: 0: 12071.8. Samples: 217952256. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:37:55,768][1648981] Avg episode reward: [(0, '470.370')] [2024-06-15 16:37:56,398][1651669] Updated weights for policy 0, policy_version 425648 (0.0012) [2024-06-15 16:38:00,192][1651669] Updated weights for policy 0, policy_version 425696 (0.0010) [2024-06-15 16:38:00,770][1648981] Fps is (10 sec: 58961.0, 60 sec: 48079.1, 300 sec: 47985.1). Total num frames: 871890944. Throughput: 0: 12450.0. Samples: 218032640. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:00,771][1648981] Avg episode reward: [(0, '475.600')] [2024-06-15 16:38:02,027][1651669] Updated weights for policy 0, policy_version 425744 (0.0011) [2024-06-15 16:38:04,027][1651669] Updated weights for policy 0, policy_version 425793 (0.0012) [2024-06-15 16:38:04,992][1651669] Updated weights for policy 0, policy_version 425841 (0.0011) [2024-06-15 16:38:05,770][1648981] Fps is (10 sec: 52411.3, 60 sec: 50787.3, 300 sec: 48318.3). Total num frames: 872185856. Throughput: 0: 12413.2. Samples: 218105344. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:05,771][1648981] Avg episode reward: [(0, '475.530')] [2024-06-15 16:38:06,123][1651669] Updated weights for policy 0, policy_version 425891 (0.0101) [2024-06-15 16:38:06,606][1651669] Updated weights for policy 0, policy_version 425920 (0.0012) [2024-06-15 16:38:10,698][1651669] Updated weights for policy 0, policy_version 425969 (0.0011) [2024-06-15 16:38:10,766][1648981] Fps is (10 sec: 49170.2, 60 sec: 49152.1, 300 sec: 48318.9). Total num frames: 872382464. Throughput: 0: 12640.7. Samples: 218148864. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:10,767][1648981] Avg episode reward: [(0, '462.920')] [2024-06-15 16:38:12,680][1651669] Updated weights for policy 0, policy_version 426016 (0.0091) [2024-06-15 16:38:14,761][1651669] Updated weights for policy 0, policy_version 426083 (0.0021) [2024-06-15 16:38:15,766][1648981] Fps is (10 sec: 49170.1, 60 sec: 50255.7, 300 sec: 48430.0). Total num frames: 872677376. Throughput: 0: 12538.3. Samples: 218224128. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:15,767][1648981] Avg episode reward: [(0, '449.310')] [2024-06-15 16:38:16,621][1651669] Updated weights for policy 0, policy_version 426147 (0.0013) [2024-06-15 16:38:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49155.1, 300 sec: 48096.8). Total num frames: 872841216. Throughput: 0: 12834.1. Samples: 218296832. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:20,767][1648981] Avg episode reward: [(0, '438.890')] [2024-06-15 16:38:20,872][1651669] Updated weights for policy 0, policy_version 426208 (0.0011) [2024-06-15 16:38:23,443][1651669] Updated weights for policy 0, policy_version 426260 (0.0042) [2024-06-15 16:38:24,319][1651274] Signal inference workers to stop experience collection... (22350 times) [2024-06-15 16:38:24,378][1651669] InferenceWorker_p0-w0: stopping experience collection (22350 times) [2024-06-15 16:38:24,408][1651274] Signal inference workers to resume experience collection... (22350 times) [2024-06-15 16:38:24,430][1651669] InferenceWorker_p0-w0: resuming experience collection (22350 times) [2024-06-15 16:38:24,549][1651669] Updated weights for policy 0, policy_version 426306 (0.0012) [2024-06-15 16:38:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 873168896. Throughput: 0: 12767.0. Samples: 218335744. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:25,767][1648981] Avg episode reward: [(0, '448.540')] [2024-06-15 16:38:25,918][1651669] Updated weights for policy 0, policy_version 426368 (0.0014) [2024-06-15 16:38:27,697][1651669] Updated weights for policy 0, policy_version 426425 (0.0011) [2024-06-15 16:38:30,779][1648981] Fps is (10 sec: 49089.4, 60 sec: 50233.5, 300 sec: 47983.6). Total num frames: 873332736. Throughput: 0: 12876.1. Samples: 218406912. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:30,780][1648981] Avg episode reward: [(0, '440.590')] [2024-06-15 16:38:31,984][1651669] Updated weights for policy 0, policy_version 426489 (0.0011) [2024-06-15 16:38:34,725][1651669] Updated weights for policy 0, policy_version 426528 (0.0010) [2024-06-15 16:38:35,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 49152.0, 300 sec: 48096.8). Total num frames: 873594880. Throughput: 0: 12344.9. Samples: 218477056. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:35,767][1648981] Avg episode reward: [(0, '435.970')] [2024-06-15 16:38:36,334][1651669] Updated weights for policy 0, policy_version 426592 (0.0015) [2024-06-15 16:38:38,454][1651669] Updated weights for policy 0, policy_version 426656 (0.0012) [2024-06-15 16:38:40,767][1648981] Fps is (10 sec: 52495.5, 60 sec: 51886.0, 300 sec: 47985.7). Total num frames: 873857024. Throughput: 0: 12401.9. Samples: 218510336. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:40,767][1648981] Avg episode reward: [(0, '433.700')] [2024-06-15 16:38:41,748][1651669] Updated weights for policy 0, policy_version 426692 (0.0024) [2024-06-15 16:38:43,082][1651669] Updated weights for policy 0, policy_version 426749 (0.0013) [2024-06-15 16:38:45,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 48606.4, 300 sec: 48207.9). Total num frames: 874053632. Throughput: 0: 12277.7. Samples: 218585088. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:45,767][1648981] Avg episode reward: [(0, '433.470')] [2024-06-15 16:38:45,965][1651669] Updated weights for policy 0, policy_version 426788 (0.0012) [2024-06-15 16:38:46,540][1651669] Updated weights for policy 0, policy_version 426815 (0.0010) [2024-06-15 16:38:48,947][1651669] Updated weights for policy 0, policy_version 426899 (0.0013) [2024-06-15 16:38:50,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 51333.4, 300 sec: 48096.2). Total num frames: 874381312. Throughput: 0: 12140.0. Samples: 218651648. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:50,771][1648981] Avg episode reward: [(0, '416.020')] [2024-06-15 16:38:53,609][1651669] Updated weights for policy 0, policy_version 426976 (0.0014) [2024-06-15 16:38:55,767][1648981] Fps is (10 sec: 45872.6, 60 sec: 47513.6, 300 sec: 47985.6). Total num frames: 874512384. Throughput: 0: 12014.8. Samples: 218689536. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:38:55,767][1648981] Avg episode reward: [(0, '425.250')] [2024-06-15 16:38:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000427008_874512384.pth... [2024-06-15 16:38:55,902][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000421376_862978048.pth [2024-06-15 16:38:56,709][1651669] Updated weights for policy 0, policy_version 427040 (0.0013) [2024-06-15 16:38:59,509][1651669] Updated weights for policy 0, policy_version 427094 (0.0014) [2024-06-15 16:39:00,766][1648981] Fps is (10 sec: 42614.3, 60 sec: 48608.8, 300 sec: 47874.6). Total num frames: 874807296. Throughput: 0: 11855.6. Samples: 218757632. Policy #0 lag: (min: 7.0, avg: 75.5, max: 263.0) [2024-06-15 16:39:00,767][1648981] Avg episode reward: [(0, '421.200')] [2024-06-15 16:39:01,554][1651669] Updated weights for policy 0, policy_version 427199 (0.0011) [2024-06-15 16:39:05,171][1651669] Updated weights for policy 0, policy_version 427252 (0.0012) [2024-06-15 16:39:05,767][1648981] Fps is (10 sec: 52429.4, 60 sec: 47516.3, 300 sec: 48207.8). Total num frames: 875036672. Throughput: 0: 11810.1. Samples: 218828288. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:05,768][1648981] Avg episode reward: [(0, '431.810')] [2024-06-15 16:39:08,060][1651669] Updated weights for policy 0, policy_version 427296 (0.0013) [2024-06-15 16:39:08,175][1651274] Signal inference workers to stop experience collection... (22400 times) [2024-06-15 16:39:08,234][1651669] InferenceWorker_p0-w0: stopping experience collection (22400 times) [2024-06-15 16:39:08,493][1651274] Signal inference workers to resume experience collection... (22400 times) [2024-06-15 16:39:08,494][1651669] InferenceWorker_p0-w0: resuming experience collection (22400 times) [2024-06-15 16:39:10,703][1651669] Updated weights for policy 0, policy_version 427349 (0.0011) [2024-06-15 16:39:10,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 875200512. Throughput: 0: 11696.3. Samples: 218862080. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:10,767][1648981] Avg episode reward: [(0, '430.060')] [2024-06-15 16:39:12,332][1651669] Updated weights for policy 0, policy_version 427424 (0.0129) [2024-06-15 16:39:15,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 875495424. Throughput: 0: 11699.7. Samples: 218933248. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:15,767][1648981] Avg episode reward: [(0, '440.510')] [2024-06-15 16:39:15,821][1651669] Updated weights for policy 0, policy_version 427491 (0.0013) [2024-06-15 16:39:18,757][1651669] Updated weights for policy 0, policy_version 427536 (0.0032) [2024-06-15 16:39:20,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 47513.6, 300 sec: 47986.9). Total num frames: 875692032. Throughput: 0: 11730.5. Samples: 219004928. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:20,767][1648981] Avg episode reward: [(0, '439.270')] [2024-06-15 16:39:21,815][1651669] Updated weights for policy 0, policy_version 427607 (0.0012) [2024-06-15 16:39:23,148][1651669] Updated weights for policy 0, policy_version 427664 (0.0012) [2024-06-15 16:39:25,776][1648981] Fps is (10 sec: 45832.0, 60 sec: 46414.1, 300 sec: 47984.2). Total num frames: 875954176. Throughput: 0: 11637.1. Samples: 219034112. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:25,781][1648981] Avg episode reward: [(0, '462.970')] [2024-06-15 16:39:26,519][1651669] Updated weights for policy 0, policy_version 427717 (0.0013) [2024-06-15 16:39:27,888][1651669] Updated weights for policy 0, policy_version 427776 (0.0014) [2024-06-15 16:39:30,641][1651669] Updated weights for policy 0, policy_version 427839 (0.0013) [2024-06-15 16:39:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48069.9, 300 sec: 48102.6). Total num frames: 876216320. Throughput: 0: 11730.4. Samples: 219112960. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:30,767][1648981] Avg episode reward: [(0, '442.280')] [2024-06-15 16:39:34,212][1651669] Updated weights for policy 0, policy_version 427904 (0.0011) [2024-06-15 16:39:35,521][1651669] Updated weights for policy 0, policy_version 427955 (0.0032) [2024-06-15 16:39:35,766][1648981] Fps is (10 sec: 52477.8, 60 sec: 48059.7, 300 sec: 47988.2). Total num frames: 876478464. Throughput: 0: 11731.5. Samples: 219179520. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:35,767][1648981] Avg episode reward: [(0, '439.510')] [2024-06-15 16:39:37,202][1651669] Updated weights for policy 0, policy_version 427985 (0.0011) [2024-06-15 16:39:40,165][1651669] Updated weights for policy 0, policy_version 428036 (0.0040) [2024-06-15 16:39:40,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46967.6, 300 sec: 48097.2). Total num frames: 876675072. Throughput: 0: 11833.0. Samples: 219222016. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:40,767][1648981] Avg episode reward: [(0, '452.470')] [2024-06-15 16:39:41,234][1651669] Updated weights for policy 0, policy_version 428090 (0.0012) [2024-06-15 16:39:43,532][1651669] Updated weights for policy 0, policy_version 428131 (0.0104) [2024-06-15 16:39:45,357][1651669] Updated weights for policy 0, policy_version 428208 (0.0012) [2024-06-15 16:39:45,770][1648981] Fps is (10 sec: 52409.3, 60 sec: 49148.8, 300 sec: 47985.1). Total num frames: 877002752. Throughput: 0: 11945.7. Samples: 219295232. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:45,771][1648981] Avg episode reward: [(0, '442.170')] [2024-06-15 16:39:47,315][1651669] Updated weights for policy 0, policy_version 428242 (0.0106) [2024-06-15 16:39:50,766][1648981] Fps is (10 sec: 45874.3, 60 sec: 45878.0, 300 sec: 48207.8). Total num frames: 877133824. Throughput: 0: 12128.7. Samples: 219374080. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:50,767][1648981] Avg episode reward: [(0, '451.680')] [2024-06-15 16:39:50,807][1651274] Signal inference workers to stop experience collection... (22450 times) [2024-06-15 16:39:50,852][1651669] InferenceWorker_p0-w0: stopping experience collection (22450 times) [2024-06-15 16:39:50,874][1651669] Updated weights for policy 0, policy_version 428289 (0.0014) [2024-06-15 16:39:51,087][1651274] Signal inference workers to resume experience collection... (22450 times) [2024-06-15 16:39:51,094][1651669] InferenceWorker_p0-w0: resuming experience collection (22450 times) [2024-06-15 16:39:52,246][1651669] Updated weights for policy 0, policy_version 428352 (0.0011) [2024-06-15 16:39:54,564][1651669] Updated weights for policy 0, policy_version 428410 (0.0013) [2024-06-15 16:39:55,753][1651669] Updated weights for policy 0, policy_version 428451 (0.0013) [2024-06-15 16:39:55,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 49152.3, 300 sec: 48207.8). Total num frames: 877461504. Throughput: 0: 12162.9. Samples: 219409408. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:39:55,767][1648981] Avg episode reward: [(0, '438.180')] [2024-06-15 16:39:58,026][1651669] Updated weights for policy 0, policy_version 428498 (0.0013) [2024-06-15 16:40:00,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 47513.7, 300 sec: 48430.1). Total num frames: 877658112. Throughput: 0: 12037.7. Samples: 219474944. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:40:00,767][1648981] Avg episode reward: [(0, '439.520')] [2024-06-15 16:40:02,827][1651669] Updated weights for policy 0, policy_version 428560 (0.0014) [2024-06-15 16:40:04,718][1651669] Updated weights for policy 0, policy_version 428624 (0.0016) [2024-06-15 16:40:05,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 877920256. Throughput: 0: 12014.8. Samples: 219545600. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:40:05,769][1648981] Avg episode reward: [(0, '432.360')] [2024-06-15 16:40:05,932][1651669] Updated weights for policy 0, policy_version 428675 (0.0011) [2024-06-15 16:40:07,176][1651669] Updated weights for policy 0, policy_version 428731 (0.0011) [2024-06-15 16:40:09,923][1651669] Updated weights for policy 0, policy_version 428792 (0.0120) [2024-06-15 16:40:10,810][1648981] Fps is (10 sec: 52199.9, 60 sec: 49661.9, 300 sec: 48422.8). Total num frames: 878182400. Throughput: 0: 12199.0. Samples: 219583488. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:40:10,811][1648981] Avg episode reward: [(0, '434.310')] [2024-06-15 16:40:14,868][1651669] Updated weights for policy 0, policy_version 428848 (0.0011) [2024-06-15 16:40:15,715][1651669] Updated weights for policy 0, policy_version 428880 (0.0012) [2024-06-15 16:40:15,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 47513.6, 300 sec: 48098.7). Total num frames: 878346240. Throughput: 0: 12117.3. Samples: 219658240. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:40:15,767][1648981] Avg episode reward: [(0, '424.820')] [2024-06-15 16:40:17,881][1651669] Updated weights for policy 0, policy_version 428976 (0.0016) [2024-06-15 16:40:20,079][1651669] Updated weights for policy 0, policy_version 429012 (0.0011) [2024-06-15 16:40:20,767][1648981] Fps is (10 sec: 49366.4, 60 sec: 49697.8, 300 sec: 48318.8). Total num frames: 878673920. Throughput: 0: 12060.4. Samples: 219722240. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:40:20,767][1648981] Avg episode reward: [(0, '422.600')] [2024-06-15 16:40:25,283][1651669] Updated weights for policy 0, policy_version 429076 (0.0013) [2024-06-15 16:40:25,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 46974.8, 300 sec: 47985.7). Total num frames: 878772224. Throughput: 0: 12083.2. Samples: 219765760. Policy #0 lag: (min: 7.0, avg: 113.3, max: 263.0) [2024-06-15 16:40:25,767][1648981] Avg episode reward: [(0, '409.650')] [2024-06-15 16:40:27,108][1651669] Updated weights for policy 0, policy_version 429141 (0.0012) [2024-06-15 16:40:28,458][1651669] Updated weights for policy 0, policy_version 429203 (0.0014) [2024-06-15 16:40:30,582][1651669] Updated weights for policy 0, policy_version 429255 (0.0011) [2024-06-15 16:40:30,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 48605.8, 300 sec: 48101.1). Total num frames: 879132672. Throughput: 0: 11970.4. Samples: 219833856. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:40:30,767][1648981] Avg episode reward: [(0, '414.930')] [2024-06-15 16:40:31,257][1651274] Signal inference workers to stop experience collection... (22500 times) [2024-06-15 16:40:31,318][1651669] InferenceWorker_p0-w0: stopping experience collection (22500 times) [2024-06-15 16:40:31,559][1651274] Signal inference workers to resume experience collection... (22500 times) [2024-06-15 16:40:31,560][1651669] InferenceWorker_p0-w0: resuming experience collection (22500 times) [2024-06-15 16:40:31,813][1651669] Updated weights for policy 0, policy_version 429303 (0.0132) [2024-06-15 16:40:35,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 48208.1). Total num frames: 879230976. Throughput: 0: 12003.6. Samples: 219914240. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:40:35,767][1648981] Avg episode reward: [(0, '403.130')] [2024-06-15 16:40:36,433][1651669] Updated weights for policy 0, policy_version 429360 (0.0012) [2024-06-15 16:40:37,974][1651669] Updated weights for policy 0, policy_version 429424 (0.0154) [2024-06-15 16:40:39,140][1651669] Updated weights for policy 0, policy_version 429472 (0.0012) [2024-06-15 16:40:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49151.9, 300 sec: 48430.0). Total num frames: 879624192. Throughput: 0: 11889.8. Samples: 219944448. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:40:40,767][1648981] Avg episode reward: [(0, '405.730')] [2024-06-15 16:40:42,366][1651669] Updated weights for policy 0, policy_version 429536 (0.0013) [2024-06-15 16:40:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45878.0, 300 sec: 48430.0). Total num frames: 879755264. Throughput: 0: 12049.0. Samples: 220017152. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:40:45,767][1648981] Avg episode reward: [(0, '398.890')] [2024-06-15 16:40:46,315][1651669] Updated weights for policy 0, policy_version 429573 (0.0023) [2024-06-15 16:40:48,136][1651669] Updated weights for policy 0, policy_version 429652 (0.0073) [2024-06-15 16:40:48,880][1651669] Updated weights for policy 0, policy_version 429696 (0.0116) [2024-06-15 16:40:50,770][1648981] Fps is (10 sec: 49133.9, 60 sec: 49695.1, 300 sec: 48651.5). Total num frames: 880115712. Throughput: 0: 12059.5. Samples: 220088320. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:40:50,771][1648981] Avg episode reward: [(0, '401.200')] [2024-06-15 16:40:52,570][1651669] Updated weights for policy 0, policy_version 429762 (0.0011) [2024-06-15 16:40:53,797][1651669] Updated weights for policy 0, policy_version 429817 (0.0012) [2024-06-15 16:40:55,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 46967.3, 300 sec: 48430.0). Total num frames: 880279552. Throughput: 0: 12072.1. Samples: 220126208. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:40:55,768][1648981] Avg episode reward: [(0, '414.690')] [2024-06-15 16:40:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000429824_880279552.pth... [2024-06-15 16:40:55,825][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000424128_868614144.pth [2024-06-15 16:40:57,357][1651669] Updated weights for policy 0, policy_version 429872 (0.0012) [2024-06-15 16:40:59,507][1651669] Updated weights for policy 0, policy_version 429922 (0.0012) [2024-06-15 16:41:00,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 880574464. Throughput: 0: 12026.3. Samples: 220199424. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:00,767][1648981] Avg episode reward: [(0, '419.160')] [2024-06-15 16:41:00,893][1651669] Updated weights for policy 0, policy_version 429971 (0.0011) [2024-06-15 16:41:01,646][1651669] Updated weights for policy 0, policy_version 430014 (0.0035) [2024-06-15 16:41:03,752][1651669] Updated weights for policy 0, policy_version 430052 (0.0093) [2024-06-15 16:41:05,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48059.9, 300 sec: 48430.6). Total num frames: 880803840. Throughput: 0: 12390.5. Samples: 220279808. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:05,767][1648981] Avg episode reward: [(0, '409.870')] [2024-06-15 16:41:07,358][1651669] Updated weights for policy 0, policy_version 430128 (0.0013) [2024-06-15 16:41:09,867][1651669] Updated weights for policy 0, policy_version 430176 (0.0012) [2024-06-15 16:41:10,770][1648981] Fps is (10 sec: 49132.5, 60 sec: 48091.7, 300 sec: 48651.5). Total num frames: 881065984. Throughput: 0: 12116.3. Samples: 220311040. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:10,771][1648981] Avg episode reward: [(0, '407.820')] [2024-06-15 16:41:12,192][1651669] Updated weights for policy 0, policy_version 430230 (0.0013) [2024-06-15 16:41:14,837][1651274] Signal inference workers to stop experience collection... (22550 times) [2024-06-15 16:41:14,888][1651669] InferenceWorker_p0-w0: stopping experience collection (22550 times) [2024-06-15 16:41:14,917][1651669] Updated weights for policy 0, policy_version 430305 (0.0012) [2024-06-15 16:41:15,176][1651274] Signal inference workers to resume experience collection... (22550 times) [2024-06-15 16:41:15,177][1651669] InferenceWorker_p0-w0: resuming experience collection (22550 times) [2024-06-15 16:41:15,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 49698.0, 300 sec: 48435.0). Total num frames: 881328128. Throughput: 0: 12253.9. Samples: 220385280. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:15,767][1648981] Avg episode reward: [(0, '428.680')] [2024-06-15 16:41:17,176][1651669] Updated weights for policy 0, policy_version 430352 (0.0011) [2024-06-15 16:41:20,478][1651669] Updated weights for policy 0, policy_version 430416 (0.0012) [2024-06-15 16:41:20,766][1648981] Fps is (10 sec: 42615.4, 60 sec: 46967.8, 300 sec: 48541.1). Total num frames: 881491968. Throughput: 0: 12071.8. Samples: 220457472. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:20,767][1648981] Avg episode reward: [(0, '435.160')] [2024-06-15 16:41:21,376][1651669] Updated weights for policy 0, policy_version 430463 (0.0013) [2024-06-15 16:41:23,820][1651669] Updated weights for policy 0, policy_version 430523 (0.0012) [2024-06-15 16:41:25,163][1651669] Updated weights for policy 0, policy_version 430587 (0.0130) [2024-06-15 16:41:25,784][1648981] Fps is (10 sec: 52334.9, 60 sec: 51321.1, 300 sec: 48649.2). Total num frames: 881852416. Throughput: 0: 12203.5. Samples: 220493824. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:25,785][1648981] Avg episode reward: [(0, '449.460')] [2024-06-15 16:41:28,455][1651669] Updated weights for policy 0, policy_version 430625 (0.0011) [2024-06-15 16:41:30,731][1651669] Updated weights for policy 0, policy_version 430659 (0.0015) [2024-06-15 16:41:30,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 47513.6, 300 sec: 48430.0). Total num frames: 881983488. Throughput: 0: 12322.1. Samples: 220571648. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:30,767][1648981] Avg episode reward: [(0, '440.770')] [2024-06-15 16:41:32,013][1651669] Updated weights for policy 0, policy_version 430713 (0.0012) [2024-06-15 16:41:33,747][1651669] Updated weights for policy 0, policy_version 430757 (0.0013) [2024-06-15 16:41:34,950][1651669] Updated weights for policy 0, policy_version 430802 (0.0014) [2024-06-15 16:41:35,766][1648981] Fps is (10 sec: 52523.4, 60 sec: 52428.8, 300 sec: 48653.7). Total num frames: 882376704. Throughput: 0: 12323.2. Samples: 220642816. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:35,767][1648981] Avg episode reward: [(0, '438.500')] [2024-06-15 16:41:37,813][1651669] Updated weights for policy 0, policy_version 430849 (0.0013) [2024-06-15 16:41:39,063][1651669] Updated weights for policy 0, policy_version 430906 (0.0106) [2024-06-15 16:41:40,768][1648981] Fps is (10 sec: 52423.3, 60 sec: 48058.9, 300 sec: 48540.9). Total num frames: 882507776. Throughput: 0: 12424.3. Samples: 220685312. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:40,768][1648981] Avg episode reward: [(0, '452.210')] [2024-06-15 16:41:41,711][1651669] Updated weights for policy 0, policy_version 430931 (0.0011) [2024-06-15 16:41:43,395][1651669] Updated weights for policy 0, policy_version 430992 (0.0145) [2024-06-15 16:41:44,318][1651669] Updated weights for policy 0, policy_version 431039 (0.0013) [2024-06-15 16:41:45,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 51336.5, 300 sec: 48652.1). Total num frames: 882835456. Throughput: 0: 12481.4. Samples: 220761088. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:45,767][1648981] Avg episode reward: [(0, '431.950')] [2024-06-15 16:41:46,045][1651669] Updated weights for policy 0, policy_version 431096 (0.0012) [2024-06-15 16:41:48,245][1651669] Updated weights for policy 0, policy_version 431139 (0.0013) [2024-06-15 16:41:50,766][1648981] Fps is (10 sec: 52435.0, 60 sec: 48609.0, 300 sec: 48874.4). Total num frames: 883032064. Throughput: 0: 12595.2. Samples: 220846592. Policy #0 lag: (min: 65.0, avg: 133.1, max: 321.0) [2024-06-15 16:41:50,767][1648981] Avg episode reward: [(0, '448.600')] [2024-06-15 16:41:51,924][1651669] Updated weights for policy 0, policy_version 431194 (0.0145) [2024-06-15 16:41:53,866][1651669] Updated weights for policy 0, policy_version 431248 (0.0012) [2024-06-15 16:41:55,410][1651669] Updated weights for policy 0, policy_version 431312 (0.0010) [2024-06-15 16:41:55,779][1648981] Fps is (10 sec: 49088.5, 60 sec: 50779.6, 300 sec: 48543.5). Total num frames: 883326976. Throughput: 0: 12660.9. Samples: 220880896. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:41:55,780][1648981] Avg episode reward: [(0, '457.150')] [2024-06-15 16:41:58,041][1651274] Signal inference workers to stop experience collection... (22600 times) [2024-06-15 16:41:58,187][1651669] InferenceWorker_p0-w0: stopping experience collection (22600 times) [2024-06-15 16:41:58,227][1651274] Signal inference workers to resume experience collection... (22600 times) [2024-06-15 16:41:58,239][1651669] InferenceWorker_p0-w0: resuming experience collection (22600 times) [2024-06-15 16:41:58,243][1651669] Updated weights for policy 0, policy_version 431376 (0.0011) [2024-06-15 16:42:00,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 49697.8, 300 sec: 48874.2). Total num frames: 883556352. Throughput: 0: 12538.3. Samples: 220949504. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:00,767][1648981] Avg episode reward: [(0, '453.290')] [2024-06-15 16:42:02,253][1651669] Updated weights for policy 0, policy_version 431426 (0.0012) [2024-06-15 16:42:04,603][1651669] Updated weights for policy 0, policy_version 431504 (0.0016) [2024-06-15 16:42:05,766][1648981] Fps is (10 sec: 45934.9, 60 sec: 49698.2, 300 sec: 48652.2). Total num frames: 883785728. Throughput: 0: 12663.5. Samples: 221027328. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:05,767][1648981] Avg episode reward: [(0, '450.620')] [2024-06-15 16:42:06,642][1651669] Updated weights for policy 0, policy_version 431584 (0.0013) [2024-06-15 16:42:09,392][1651669] Updated weights for policy 0, policy_version 431649 (0.0013) [2024-06-15 16:42:10,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 50247.6, 300 sec: 48876.6). Total num frames: 884080640. Throughput: 0: 12577.5. Samples: 221059584. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:10,767][1648981] Avg episode reward: [(0, '440.210')] [2024-06-15 16:42:13,738][1651669] Updated weights for policy 0, policy_version 431724 (0.0014) [2024-06-15 16:42:15,413][1651669] Updated weights for policy 0, policy_version 431761 (0.0011) [2024-06-15 16:42:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.1, 300 sec: 48763.9). Total num frames: 884277248. Throughput: 0: 12572.5. Samples: 221137408. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:15,767][1648981] Avg episode reward: [(0, '432.640')] [2024-06-15 16:42:17,422][1651669] Updated weights for policy 0, policy_version 431844 (0.0012) [2024-06-15 16:42:20,140][1651669] Updated weights for policy 0, policy_version 431906 (0.0012) [2024-06-15 16:42:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 51882.7, 300 sec: 48874.3). Total num frames: 884604928. Throughput: 0: 12367.7. Samples: 221199360. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:20,767][1648981] Avg episode reward: [(0, '439.430')] [2024-06-15 16:42:24,515][1651669] Updated weights for policy 0, policy_version 431953 (0.0028) [2024-06-15 16:42:25,539][1651669] Updated weights for policy 0, policy_version 432000 (0.0082) [2024-06-15 16:42:25,786][1648981] Fps is (10 sec: 45783.8, 60 sec: 48058.2, 300 sec: 48871.0). Total num frames: 884736000. Throughput: 0: 12408.0. Samples: 221243904. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:25,787][1648981] Avg episode reward: [(0, '462.170')] [2024-06-15 16:42:28,058][1651669] Updated weights for policy 0, policy_version 432080 (0.0013) [2024-06-15 16:42:29,560][1651669] Updated weights for policy 0, policy_version 432129 (0.0011) [2024-06-15 16:42:30,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 51882.7, 300 sec: 48985.4). Total num frames: 885096448. Throughput: 0: 12288.0. Samples: 221314048. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:30,767][1648981] Avg episode reward: [(0, '474.580')] [2024-06-15 16:42:30,873][1651669] Updated weights for policy 0, policy_version 432189 (0.0111) [2024-06-15 16:42:35,766][1648981] Fps is (10 sec: 42683.6, 60 sec: 46421.4, 300 sec: 48875.0). Total num frames: 885161984. Throughput: 0: 12140.1. Samples: 221392896. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:35,767][1648981] Avg episode reward: [(0, '470.810')] [2024-06-15 16:42:36,568][1651669] Updated weights for policy 0, policy_version 432241 (0.0010) [2024-06-15 16:42:37,532][1651669] Updated weights for policy 0, policy_version 432272 (0.0019) [2024-06-15 16:42:39,153][1651274] Signal inference workers to stop experience collection... (22650 times) [2024-06-15 16:42:39,282][1651669] InferenceWorker_p0-w0: stopping experience collection (22650 times) [2024-06-15 16:42:39,382][1651274] Signal inference workers to resume experience collection... (22650 times) [2024-06-15 16:42:39,383][1651669] InferenceWorker_p0-w0: resuming experience collection (22650 times) [2024-06-15 16:42:39,573][1651669] Updated weights for policy 0, policy_version 432355 (0.0013) [2024-06-15 16:42:40,736][1651669] Updated weights for policy 0, policy_version 432403 (0.0012) [2024-06-15 16:42:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 50791.3, 300 sec: 48874.4). Total num frames: 885555200. Throughput: 0: 12029.8. Samples: 221422080. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:40,767][1648981] Avg episode reward: [(0, '476.830')] [2024-06-15 16:42:45,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46967.5, 300 sec: 48652.2). Total num frames: 885653504. Throughput: 0: 12276.7. Samples: 221501952. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:45,767][1648981] Avg episode reward: [(0, '468.060')] [2024-06-15 16:42:46,345][1651669] Updated weights for policy 0, policy_version 432450 (0.0014) [2024-06-15 16:42:48,457][1651669] Updated weights for policy 0, policy_version 432530 (0.0013) [2024-06-15 16:42:50,135][1651669] Updated weights for policy 0, policy_version 432608 (0.0012) [2024-06-15 16:42:50,776][1648981] Fps is (10 sec: 49103.5, 60 sec: 50235.9, 300 sec: 48761.6). Total num frames: 886046720. Throughput: 0: 11875.8. Samples: 221561856. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:50,777][1648981] Avg episode reward: [(0, '481.870')] [2024-06-15 16:42:52,031][1651669] Updated weights for policy 0, policy_version 432659 (0.0124) [2024-06-15 16:42:52,884][1651669] Updated weights for policy 0, policy_version 432704 (0.0116) [2024-06-15 16:42:55,769][1648981] Fps is (10 sec: 52413.1, 60 sec: 47521.5, 300 sec: 48430.1). Total num frames: 886177792. Throughput: 0: 11934.5. Samples: 221596672. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:42:55,770][1648981] Avg episode reward: [(0, '469.140')] [2024-06-15 16:42:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000432704_886177792.pth... [2024-06-15 16:42:55,838][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000427008_874512384.pth [2024-06-15 16:42:59,023][1651669] Updated weights for policy 0, policy_version 432771 (0.0012) [2024-06-15 16:43:00,780][1648981] Fps is (10 sec: 39307.5, 60 sec: 48049.2, 300 sec: 48317.3). Total num frames: 886439936. Throughput: 0: 12011.3. Samples: 221678080. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:43:00,781][1648981] Avg episode reward: [(0, '466.820')] [2024-06-15 16:43:01,672][1651669] Updated weights for policy 0, policy_version 432880 (0.0013) [2024-06-15 16:43:03,872][1651669] Updated weights for policy 0, policy_version 432951 (0.0014) [2024-06-15 16:43:05,766][1648981] Fps is (10 sec: 52444.6, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 886702080. Throughput: 0: 11969.4. Samples: 221737984. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:43:05,767][1648981] Avg episode reward: [(0, '489.150')] [2024-06-15 16:43:09,399][1651669] Updated weights for policy 0, policy_version 433008 (0.0013) [2024-06-15 16:43:10,766][1648981] Fps is (10 sec: 39375.0, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 886833152. Throughput: 0: 12077.2. Samples: 221787136. Policy #0 lag: (min: 4.0, avg: 89.8, max: 260.0) [2024-06-15 16:43:10,767][1648981] Avg episode reward: [(0, '487.690')] [2024-06-15 16:43:11,638][1651669] Updated weights for policy 0, policy_version 433057 (0.0014) [2024-06-15 16:43:13,149][1651669] Updated weights for policy 0, policy_version 433136 (0.0096) [2024-06-15 16:43:14,446][1651669] Updated weights for policy 0, policy_version 433209 (0.0014) [2024-06-15 16:43:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 48763.2). Total num frames: 887226368. Throughput: 0: 11810.1. Samples: 221845504. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:15,767][1648981] Avg episode reward: [(0, '494.630')] [2024-06-15 16:43:20,267][1651669] Updated weights for policy 0, policy_version 433277 (0.0013) [2024-06-15 16:43:20,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45875.3, 300 sec: 48096.8). Total num frames: 887357440. Throughput: 0: 11855.7. Samples: 221926400. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:20,767][1648981] Avg episode reward: [(0, '494.840')] [2024-06-15 16:43:21,581][1651274] Signal inference workers to stop experience collection... (22700 times) [2024-06-15 16:43:21,650][1651669] InferenceWorker_p0-w0: stopping experience collection (22700 times) [2024-06-15 16:43:21,749][1651274] Signal inference workers to resume experience collection... (22700 times) [2024-06-15 16:43:21,750][1651669] InferenceWorker_p0-w0: resuming experience collection (22700 times) [2024-06-15 16:43:22,259][1651669] Updated weights for policy 0, policy_version 433315 (0.0013) [2024-06-15 16:43:24,270][1651669] Updated weights for policy 0, policy_version 433408 (0.0017) [2024-06-15 16:43:25,791][1648981] Fps is (10 sec: 52302.5, 60 sec: 50240.7, 300 sec: 48872.4). Total num frames: 887750656. Throughput: 0: 11883.4. Samples: 221957120. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:25,791][1648981] Avg episode reward: [(0, '487.970')] [2024-06-15 16:43:29,765][1651669] Updated weights for policy 0, policy_version 433474 (0.0012) [2024-06-15 16:43:30,767][1648981] Fps is (10 sec: 49150.7, 60 sec: 45875.1, 300 sec: 48318.9). Total num frames: 887848960. Throughput: 0: 11980.8. Samples: 222041088. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:30,767][1648981] Avg episode reward: [(0, '471.970')] [2024-06-15 16:43:30,960][1651669] Updated weights for policy 0, policy_version 433532 (0.0010) [2024-06-15 16:43:33,000][1651669] Updated weights for policy 0, policy_version 433574 (0.0013) [2024-06-15 16:43:35,126][1651669] Updated weights for policy 0, policy_version 433680 (0.0095) [2024-06-15 16:43:35,766][1648981] Fps is (10 sec: 45986.3, 60 sec: 50790.4, 300 sec: 48652.2). Total num frames: 888209408. Throughput: 0: 12006.2. Samples: 222102016. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:35,767][1648981] Avg episode reward: [(0, '461.710')] [2024-06-15 16:43:36,320][1651669] Updated weights for policy 0, policy_version 433728 (0.0011) [2024-06-15 16:43:40,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 45875.3, 300 sec: 48318.9). Total num frames: 888307712. Throughput: 0: 12163.7. Samples: 222144000. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:40,767][1648981] Avg episode reward: [(0, '457.740')] [2024-06-15 16:43:41,754][1651669] Updated weights for policy 0, policy_version 433783 (0.0019) [2024-06-15 16:43:43,601][1651669] Updated weights for policy 0, policy_version 433824 (0.0010) [2024-06-15 16:43:44,264][1651669] Updated weights for policy 0, policy_version 433852 (0.0012) [2024-06-15 16:43:45,767][1648981] Fps is (10 sec: 42597.0, 60 sec: 49697.9, 300 sec: 48319.5). Total num frames: 888635392. Throughput: 0: 11961.6. Samples: 222216192. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:45,767][1648981] Avg episode reward: [(0, '432.480')] [2024-06-15 16:43:46,298][1651669] Updated weights for policy 0, policy_version 433936 (0.0223) [2024-06-15 16:43:47,443][1651669] Updated weights for policy 0, policy_version 433984 (0.0018) [2024-06-15 16:43:50,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45882.8, 300 sec: 48430.0). Total num frames: 888799232. Throughput: 0: 12390.4. Samples: 222295552. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:50,767][1648981] Avg episode reward: [(0, '437.030')] [2024-06-15 16:43:52,885][1651669] Updated weights for policy 0, policy_version 434041 (0.0016) [2024-06-15 16:43:54,459][1651669] Updated weights for policy 0, policy_version 434096 (0.0098) [2024-06-15 16:43:55,772][1648981] Fps is (10 sec: 49125.3, 60 sec: 49149.7, 300 sec: 48540.1). Total num frames: 889126912. Throughput: 0: 12058.9. Samples: 222329856. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:43:55,773][1648981] Avg episode reward: [(0, '439.420')] [2024-06-15 16:43:55,874][1651669] Updated weights for policy 0, policy_version 434145 (0.0015) [2024-06-15 16:43:56,946][1651274] Signal inference workers to stop experience collection... (22750 times) [2024-06-15 16:43:57,046][1651669] InferenceWorker_p0-w0: stopping experience collection (22750 times) [2024-06-15 16:43:57,159][1651274] Signal inference workers to resume experience collection... (22750 times) [2024-06-15 16:43:57,161][1651669] InferenceWorker_p0-w0: resuming experience collection (22750 times) [2024-06-15 16:43:57,312][1651669] Updated weights for policy 0, policy_version 434210 (0.0125) [2024-06-15 16:44:00,782][1648981] Fps is (10 sec: 52346.8, 60 sec: 48058.0, 300 sec: 48427.5). Total num frames: 889323520. Throughput: 0: 12261.0. Samples: 222397440. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:00,783][1648981] Avg episode reward: [(0, '424.580')] [2024-06-15 16:44:03,703][1651669] Updated weights for policy 0, policy_version 434294 (0.0013) [2024-06-15 16:44:04,774][1651669] Updated weights for policy 0, policy_version 434336 (0.0011) [2024-06-15 16:44:05,766][1648981] Fps is (10 sec: 45901.5, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 889585664. Throughput: 0: 12174.2. Samples: 222474240. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:05,767][1648981] Avg episode reward: [(0, '446.370')] [2024-06-15 16:44:05,861][1651669] Updated weights for policy 0, policy_version 434384 (0.0013) [2024-06-15 16:44:07,119][1651669] Updated weights for policy 0, policy_version 434437 (0.0010) [2024-06-15 16:44:08,377][1651669] Updated weights for policy 0, policy_version 434496 (0.0011) [2024-06-15 16:44:10,770][1648981] Fps is (10 sec: 52491.3, 60 sec: 50241.1, 300 sec: 48651.5). Total num frames: 889847808. Throughput: 0: 12145.6. Samples: 222503424. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:10,771][1648981] Avg episode reward: [(0, '437.930')] [2024-06-15 16:44:13,918][1651669] Updated weights for policy 0, policy_version 434555 (0.0014) [2024-06-15 16:44:15,778][1648981] Fps is (10 sec: 39274.9, 60 sec: 45866.1, 300 sec: 48428.0). Total num frames: 889978880. Throughput: 0: 12091.4. Samples: 222585344. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:15,779][1648981] Avg episode reward: [(0, '414.650')] [2024-06-15 16:44:16,724][1651669] Updated weights for policy 0, policy_version 434608 (0.0121) [2024-06-15 16:44:18,021][1651669] Updated weights for policy 0, policy_version 434658 (0.0014) [2024-06-15 16:44:19,091][1651669] Updated weights for policy 0, policy_version 434707 (0.0010) [2024-06-15 16:44:19,974][1651669] Updated weights for policy 0, policy_version 434751 (0.0011) [2024-06-15 16:44:20,790][1648981] Fps is (10 sec: 52324.1, 60 sec: 50224.3, 300 sec: 48871.9). Total num frames: 890372096. Throughput: 0: 12292.9. Samples: 222655488. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:20,791][1648981] Avg episode reward: [(0, '416.490')] [2024-06-15 16:44:24,536][1651669] Updated weights for policy 0, policy_version 434815 (0.0012) [2024-06-15 16:44:25,780][1648981] Fps is (10 sec: 52420.0, 60 sec: 45883.3, 300 sec: 48427.8). Total num frames: 890503168. Throughput: 0: 12398.0. Samples: 222702080. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:25,781][1648981] Avg episode reward: [(0, '424.880')] [2024-06-15 16:44:27,413][1651669] Updated weights for policy 0, policy_version 434885 (0.0013) [2024-06-15 16:44:29,356][1651669] Updated weights for policy 0, policy_version 434961 (0.0035) [2024-06-15 16:44:30,198][1651669] Updated weights for policy 0, policy_version 435006 (0.0011) [2024-06-15 16:44:30,766][1648981] Fps is (10 sec: 52553.8, 60 sec: 50790.5, 300 sec: 48874.3). Total num frames: 890896384. Throughput: 0: 12162.9. Samples: 222763520. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:30,767][1648981] Avg episode reward: [(0, '432.400')] [2024-06-15 16:44:34,982][1651669] Updated weights for policy 0, policy_version 435059 (0.0012) [2024-06-15 16:44:35,782][1648981] Fps is (10 sec: 52424.4, 60 sec: 46956.2, 300 sec: 48649.8). Total num frames: 891027456. Throughput: 0: 12204.4. Samples: 222844928. Policy #0 lag: (min: 144.0, avg: 224.1, max: 384.0) [2024-06-15 16:44:35,783][1648981] Avg episode reward: [(0, '444.780')] [2024-06-15 16:44:37,914][1651669] Updated weights for policy 0, policy_version 435120 (0.0012) [2024-06-15 16:44:38,001][1651274] Signal inference workers to stop experience collection... (22800 times) [2024-06-15 16:44:38,048][1651669] InferenceWorker_p0-w0: stopping experience collection (22800 times) [2024-06-15 16:44:38,161][1651274] Signal inference workers to resume experience collection... (22800 times) [2024-06-15 16:44:38,166][1651669] InferenceWorker_p0-w0: resuming experience collection (22800 times) [2024-06-15 16:44:39,351][1651669] Updated weights for policy 0, policy_version 435189 (0.0011) [2024-06-15 16:44:40,515][1651669] Updated weights for policy 0, policy_version 435250 (0.0052) [2024-06-15 16:44:40,792][1648981] Fps is (10 sec: 52297.5, 60 sec: 51860.9, 300 sec: 48870.8). Total num frames: 891420672. Throughput: 0: 12214.5. Samples: 222879744. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:44:40,792][1648981] Avg episode reward: [(0, '454.880')] [2024-06-15 16:44:45,344][1651669] Updated weights for policy 0, policy_version 435320 (0.0130) [2024-06-15 16:44:45,766][1648981] Fps is (10 sec: 52504.4, 60 sec: 48606.1, 300 sec: 48874.3). Total num frames: 891551744. Throughput: 0: 12679.3. Samples: 222967808. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:44:45,767][1648981] Avg episode reward: [(0, '449.150')] [2024-06-15 16:44:47,601][1651669] Updated weights for policy 0, policy_version 435376 (0.0115) [2024-06-15 16:44:49,535][1651669] Updated weights for policy 0, policy_version 435460 (0.0036) [2024-06-15 16:44:50,479][1651669] Updated weights for policy 0, policy_version 435510 (0.0013) [2024-06-15 16:44:50,766][1648981] Fps is (10 sec: 52561.1, 60 sec: 52428.9, 300 sec: 49096.5). Total num frames: 891944960. Throughput: 0: 12185.6. Samples: 223022592. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:44:50,767][1648981] Avg episode reward: [(0, '449.380')] [2024-06-15 16:44:55,776][1648981] Fps is (10 sec: 39285.8, 60 sec: 46964.8, 300 sec: 48428.5). Total num frames: 891944960. Throughput: 0: 12639.2. Samples: 223072256. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:44:55,776][1648981] Avg episode reward: [(0, '432.160')] [2024-06-15 16:44:56,045][1651669] Updated weights for policy 0, policy_version 435542 (0.0031) [2024-06-15 16:44:56,313][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000435552_892010496.pth... [2024-06-15 16:44:56,545][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000429824_880279552.pth [2024-06-15 16:44:57,697][1651669] Updated weights for policy 0, policy_version 435590 (0.0018) [2024-06-15 16:44:59,588][1651669] Updated weights for policy 0, policy_version 435666 (0.0017) [2024-06-15 16:45:00,678][1651669] Updated weights for policy 0, policy_version 435728 (0.0011) [2024-06-15 16:45:00,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 50803.7, 300 sec: 48985.4). Total num frames: 892370944. Throughput: 0: 12154.7. Samples: 223132160. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:00,767][1648981] Avg episode reward: [(0, '411.410')] [2024-06-15 16:45:01,604][1651669] Updated weights for policy 0, policy_version 435776 (0.0011) [2024-06-15 16:45:05,766][1648981] Fps is (10 sec: 52476.1, 60 sec: 48059.6, 300 sec: 48437.2). Total num frames: 892469248. Throughput: 0: 12488.0. Samples: 223217152. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:05,767][1648981] Avg episode reward: [(0, '424.770')] [2024-06-15 16:45:07,965][1651669] Updated weights for policy 0, policy_version 435829 (0.0013) [2024-06-15 16:45:08,930][1651669] Updated weights for policy 0, policy_version 435859 (0.0113) [2024-06-15 16:45:10,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49155.1, 300 sec: 48985.4). Total num frames: 892796928. Throughput: 0: 12257.6. Samples: 223253504. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:10,767][1648981] Avg episode reward: [(0, '437.760')] [2024-06-15 16:45:10,867][1651669] Updated weights for policy 0, policy_version 435937 (0.0013) [2024-06-15 16:45:12,094][1651274] Signal inference workers to stop experience collection... (22850 times) [2024-06-15 16:45:12,096][1651669] Updated weights for policy 0, policy_version 436000 (0.0100) [2024-06-15 16:45:12,154][1651669] InferenceWorker_p0-w0: stopping experience collection (22850 times) [2024-06-15 16:45:12,295][1651274] Signal inference workers to resume experience collection... (22850 times) [2024-06-15 16:45:12,296][1651669] InferenceWorker_p0-w0: resuming experience collection (22850 times) [2024-06-15 16:45:15,768][1648981] Fps is (10 sec: 52420.3, 60 sec: 50252.8, 300 sec: 48540.8). Total num frames: 892993536. Throughput: 0: 12321.7. Samples: 223318016. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:15,769][1648981] Avg episode reward: [(0, '482.960')] [2024-06-15 16:45:18,013][1651669] Updated weights for policy 0, policy_version 436064 (0.0011) [2024-06-15 16:45:18,855][1651669] Updated weights for policy 0, policy_version 436096 (0.0017) [2024-06-15 16:45:20,215][1651669] Updated weights for policy 0, policy_version 436131 (0.0011) [2024-06-15 16:45:20,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 47532.3, 300 sec: 48985.4). Total num frames: 893222912. Throughput: 0: 12212.2. Samples: 223394304. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:20,767][1648981] Avg episode reward: [(0, '473.070')] [2024-06-15 16:45:22,368][1651669] Updated weights for policy 0, policy_version 436224 (0.0012) [2024-06-15 16:45:23,685][1651669] Updated weights for policy 0, policy_version 436285 (0.0048) [2024-06-15 16:45:25,767][1648981] Fps is (10 sec: 52432.9, 60 sec: 50254.8, 300 sec: 48763.1). Total num frames: 893517824. Throughput: 0: 11975.8. Samples: 223418368. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:25,768][1648981] Avg episode reward: [(0, '474.160')] [2024-06-15 16:45:29,790][1651669] Updated weights for policy 0, policy_version 436352 (0.0013) [2024-06-15 16:45:30,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 48874.3). Total num frames: 893648896. Throughput: 0: 11753.2. Samples: 223496704. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:30,767][1648981] Avg episode reward: [(0, '468.230')] [2024-06-15 16:45:31,944][1651669] Updated weights for policy 0, policy_version 436404 (0.0012) [2024-06-15 16:45:33,251][1651669] Updated weights for policy 0, policy_version 436464 (0.0013) [2024-06-15 16:45:34,579][1651669] Updated weights for policy 0, policy_version 436528 (0.0011) [2024-06-15 16:45:35,766][1648981] Fps is (10 sec: 52434.1, 60 sec: 50256.4, 300 sec: 48874.3). Total num frames: 894042112. Throughput: 0: 12117.3. Samples: 223567872. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:35,767][1648981] Avg episode reward: [(0, '470.320')] [2024-06-15 16:45:39,597][1651669] Updated weights for policy 0, policy_version 436576 (0.0011) [2024-06-15 16:45:40,774][1648981] Fps is (10 sec: 52388.3, 60 sec: 45888.5, 300 sec: 48873.0). Total num frames: 894173184. Throughput: 0: 12049.4. Samples: 223614464. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:40,775][1648981] Avg episode reward: [(0, '482.890')] [2024-06-15 16:45:40,982][1651669] Updated weights for policy 0, policy_version 436624 (0.0076) [2024-06-15 16:45:41,996][1651669] Updated weights for policy 0, policy_version 436675 (0.0013) [2024-06-15 16:45:43,319][1651669] Updated weights for policy 0, policy_version 436732 (0.0014) [2024-06-15 16:45:44,686][1651669] Updated weights for policy 0, policy_version 436791 (0.0013) [2024-06-15 16:45:45,772][1648981] Fps is (10 sec: 52397.8, 60 sec: 50239.4, 300 sec: 48985.0). Total num frames: 894566400. Throughput: 0: 12058.9. Samples: 223674880. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:45,773][1648981] Avg episode reward: [(0, '479.290')] [2024-06-15 16:45:49,513][1651669] Updated weights for policy 0, policy_version 436848 (0.0022) [2024-06-15 16:45:50,766][1648981] Fps is (10 sec: 52469.5, 60 sec: 45875.1, 300 sec: 48874.3). Total num frames: 894697472. Throughput: 0: 12117.4. Samples: 223762432. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:50,767][1648981] Avg episode reward: [(0, '503.880')] [2024-06-15 16:45:51,812][1651669] Updated weights for policy 0, policy_version 436897 (0.0014) [2024-06-15 16:45:52,537][1651274] Signal inference workers to stop experience collection... (22900 times) [2024-06-15 16:45:52,615][1651669] InferenceWorker_p0-w0: stopping experience collection (22900 times) [2024-06-15 16:45:52,734][1651274] Signal inference workers to resume experience collection... (22900 times) [2024-06-15 16:45:52,735][1651669] InferenceWorker_p0-w0: resuming experience collection (22900 times) [2024-06-15 16:45:52,889][1651669] Updated weights for policy 0, policy_version 436948 (0.0148) [2024-06-15 16:45:53,666][1651669] Updated weights for policy 0, policy_version 436993 (0.0013) [2024-06-15 16:45:54,754][1651669] Updated weights for policy 0, policy_version 437054 (0.0013) [2024-06-15 16:45:55,767][1648981] Fps is (10 sec: 52457.8, 60 sec: 52436.5, 300 sec: 49207.5). Total num frames: 895090688. Throughput: 0: 12117.2. Samples: 223798784. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:45:55,767][1648981] Avg episode reward: [(0, '499.790')] [2024-06-15 16:45:59,663][1651669] Updated weights for policy 0, policy_version 437111 (0.0012) [2024-06-15 16:46:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 48874.3). Total num frames: 895221760. Throughput: 0: 12516.0. Samples: 223881216. Policy #0 lag: (min: 15.0, avg: 100.7, max: 271.0) [2024-06-15 16:46:00,767][1648981] Avg episode reward: [(0, '506.810')] [2024-06-15 16:46:02,125][1651669] Updated weights for policy 0, policy_version 437152 (0.0104) [2024-06-15 16:46:03,637][1651669] Updated weights for policy 0, policy_version 437223 (0.0164) [2024-06-15 16:46:04,977][1651669] Updated weights for policy 0, policy_version 437296 (0.0021) [2024-06-15 16:46:05,799][1648981] Fps is (10 sec: 52262.2, 60 sec: 52400.8, 300 sec: 49313.9). Total num frames: 895614976. Throughput: 0: 12370.2. Samples: 223951360. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:05,799][1648981] Avg episode reward: [(0, '516.810')] [2024-06-15 16:46:10,049][1651669] Updated weights for policy 0, policy_version 437360 (0.0010) [2024-06-15 16:46:10,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 49151.9, 300 sec: 48874.3). Total num frames: 895746048. Throughput: 0: 12845.8. Samples: 223996416. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:10,767][1648981] Avg episode reward: [(0, '536.380')] [2024-06-15 16:46:10,772][1651274] Saving new best policy, reward=536.380! [2024-06-15 16:46:12,710][1651669] Updated weights for policy 0, policy_version 437408 (0.0068) [2024-06-15 16:46:14,279][1651669] Updated weights for policy 0, policy_version 437477 (0.0012) [2024-06-15 16:46:15,781][1648981] Fps is (10 sec: 49237.4, 60 sec: 51871.3, 300 sec: 49538.3). Total num frames: 896106496. Throughput: 0: 12659.3. Samples: 224066560. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:15,785][1648981] Avg episode reward: [(0, '500.610')] [2024-06-15 16:46:15,809][1651669] Updated weights for policy 0, policy_version 437557 (0.0014) [2024-06-15 16:46:19,633][1651669] Updated weights for policy 0, policy_version 437600 (0.0014) [2024-06-15 16:46:20,385][1651669] Updated weights for policy 0, policy_version 437632 (0.0011) [2024-06-15 16:46:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50790.5, 300 sec: 48877.3). Total num frames: 896270336. Throughput: 0: 12845.5. Samples: 224145920. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:20,767][1648981] Avg episode reward: [(0, '477.310')] [2024-06-15 16:46:24,245][1651669] Updated weights for policy 0, policy_version 437712 (0.0157) [2024-06-15 16:46:25,528][1651669] Updated weights for policy 0, policy_version 437776 (0.0024) [2024-06-15 16:46:25,766][1648981] Fps is (10 sec: 45943.4, 60 sec: 50791.2, 300 sec: 49429.7). Total num frames: 896565248. Throughput: 0: 12608.7. Samples: 224181760. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:25,767][1648981] Avg episode reward: [(0, '467.380')] [2024-06-15 16:46:26,447][1651669] Updated weights for policy 0, policy_version 437820 (0.0019) [2024-06-15 16:46:30,427][1651274] Signal inference workers to stop experience collection... (22950 times) [2024-06-15 16:46:30,490][1651669] InferenceWorker_p0-w0: stopping experience collection (22950 times) [2024-06-15 16:46:30,690][1651274] Signal inference workers to resume experience collection... (22950 times) [2024-06-15 16:46:30,691][1651669] InferenceWorker_p0-w0: resuming experience collection (22950 times) [2024-06-15 16:46:30,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 51336.6, 300 sec: 48652.2). Total num frames: 896729088. Throughput: 0: 12835.8. Samples: 224252416. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:30,767][1648981] Avg episode reward: [(0, '471.980')] [2024-06-15 16:46:31,073][1651669] Updated weights for policy 0, policy_version 437872 (0.0012) [2024-06-15 16:46:33,714][1651669] Updated weights for policy 0, policy_version 437905 (0.0012) [2024-06-15 16:46:35,584][1651669] Updated weights for policy 0, policy_version 437984 (0.0012) [2024-06-15 16:46:35,769][1648981] Fps is (10 sec: 42589.8, 60 sec: 49150.3, 300 sec: 49096.3). Total num frames: 896991232. Throughput: 0: 12344.3. Samples: 224317952. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:35,771][1648981] Avg episode reward: [(0, '479.580')] [2024-06-15 16:46:38,367][1651669] Updated weights for policy 0, policy_version 438077 (0.0012) [2024-06-15 16:46:40,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 50250.8, 300 sec: 48652.1). Total num frames: 897187840. Throughput: 0: 12140.2. Samples: 224345088. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:40,767][1648981] Avg episode reward: [(0, '507.660')] [2024-06-15 16:46:43,053][1651669] Updated weights for policy 0, policy_version 438131 (0.0013) [2024-06-15 16:46:44,777][1651669] Updated weights for policy 0, policy_version 438164 (0.0013) [2024-06-15 16:46:45,770][1648981] Fps is (10 sec: 42591.1, 60 sec: 47515.3, 300 sec: 48762.6). Total num frames: 897417216. Throughput: 0: 12173.2. Samples: 224429056. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:45,771][1648981] Avg episode reward: [(0, '492.640')] [2024-06-15 16:46:46,502][1651669] Updated weights for policy 0, policy_version 438228 (0.0011) [2024-06-15 16:46:48,121][1651669] Updated weights for policy 0, policy_version 438288 (0.0012) [2024-06-15 16:46:50,768][1648981] Fps is (10 sec: 52419.5, 60 sec: 50242.8, 300 sec: 48765.1). Total num frames: 897712128. Throughput: 0: 12000.3. Samples: 224491008. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:50,769][1648981] Avg episode reward: [(0, '486.740')] [2024-06-15 16:46:53,492][1651669] Updated weights for policy 0, policy_version 438358 (0.0012) [2024-06-15 16:46:55,766][1648981] Fps is (10 sec: 42614.4, 60 sec: 45875.5, 300 sec: 48430.0). Total num frames: 897843200. Throughput: 0: 11912.6. Samples: 224532480. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:46:55,767][1648981] Avg episode reward: [(0, '476.470')] [2024-06-15 16:46:56,235][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000438432_897908736.pth... [2024-06-15 16:46:56,397][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000432704_886177792.pth [2024-06-15 16:46:56,750][1651669] Updated weights for policy 0, policy_version 438448 (0.0013) [2024-06-15 16:46:58,698][1651669] Updated weights for policy 0, policy_version 438527 (0.0014) [2024-06-15 16:47:00,777][1648981] Fps is (10 sec: 45834.9, 60 sec: 49143.3, 300 sec: 48761.5). Total num frames: 898170880. Throughput: 0: 11663.3. Samples: 224591360. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:47:00,778][1648981] Avg episode reward: [(0, '478.440')] [2024-06-15 16:47:05,324][1651669] Updated weights for policy 0, policy_version 438608 (0.0013) [2024-06-15 16:47:05,773][1648981] Fps is (10 sec: 45844.4, 60 sec: 44802.0, 300 sec: 48206.7). Total num frames: 898301952. Throughput: 0: 11512.6. Samples: 224664064. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:47:05,774][1648981] Avg episode reward: [(0, '483.890')] [2024-06-15 16:47:08,053][1651669] Updated weights for policy 0, policy_version 438688 (0.0024) [2024-06-15 16:47:09,837][1651669] Updated weights for policy 0, policy_version 438753 (0.0012) [2024-06-15 16:47:10,766][1648981] Fps is (10 sec: 45923.6, 60 sec: 48059.8, 300 sec: 48652.1). Total num frames: 898629632. Throughput: 0: 11457.4. Samples: 224697344. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:47:10,767][1648981] Avg episode reward: [(0, '455.900')] [2024-06-15 16:47:11,121][1651669] Updated weights for policy 0, policy_version 438788 (0.0011) [2024-06-15 16:47:11,891][1651274] Signal inference workers to stop experience collection... (23000 times) [2024-06-15 16:47:11,919][1651669] InferenceWorker_p0-w0: stopping experience collection (23000 times) [2024-06-15 16:47:12,149][1651274] Signal inference workers to resume experience collection... (23000 times) [2024-06-15 16:47:12,150][1651669] InferenceWorker_p0-w0: resuming experience collection (23000 times) [2024-06-15 16:47:15,767][1648981] Fps is (10 sec: 45905.5, 60 sec: 44247.7, 300 sec: 47985.7). Total num frames: 898760704. Throughput: 0: 11355.0. Samples: 224763392. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:47:15,768][1648981] Avg episode reward: [(0, '458.450')] [2024-06-15 16:47:16,897][1651669] Updated weights for policy 0, policy_version 438864 (0.0110) [2024-06-15 16:47:17,767][1651669] Updated weights for policy 0, policy_version 438912 (0.0012) [2024-06-15 16:47:20,399][1651669] Updated weights for policy 0, policy_version 438992 (0.0101) [2024-06-15 16:47:20,767][1648981] Fps is (10 sec: 45870.6, 60 sec: 46966.6, 300 sec: 48655.3). Total num frames: 899088384. Throughput: 0: 11469.1. Samples: 224834048. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:47:20,768][1648981] Avg episode reward: [(0, '447.580')] [2024-06-15 16:47:21,475][1651669] Updated weights for policy 0, policy_version 439035 (0.0012) [2024-06-15 16:47:23,811][1651669] Updated weights for policy 0, policy_version 439093 (0.0101) [2024-06-15 16:47:25,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 45329.1, 300 sec: 48096.8). Total num frames: 899284992. Throughput: 0: 11571.2. Samples: 224865792. Policy #0 lag: (min: 10.0, avg: 78.8, max: 218.0) [2024-06-15 16:47:25,767][1648981] Avg episode reward: [(0, '456.010')] [2024-06-15 16:47:27,506][1651669] Updated weights for policy 0, policy_version 439105 (0.0009) [2024-06-15 16:47:29,515][1651669] Updated weights for policy 0, policy_version 439184 (0.0017) [2024-06-15 16:47:30,767][1648981] Fps is (10 sec: 45878.1, 60 sec: 46967.1, 300 sec: 48763.2). Total num frames: 899547136. Throughput: 0: 11458.3. Samples: 224944640. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:47:30,768][1648981] Avg episode reward: [(0, '448.220')] [2024-06-15 16:47:31,330][1651669] Updated weights for policy 0, policy_version 439251 (0.0015) [2024-06-15 16:47:34,768][1651669] Updated weights for policy 0, policy_version 439313 (0.0010) [2024-06-15 16:47:35,776][1648981] Fps is (10 sec: 49107.5, 60 sec: 46415.9, 300 sec: 48206.4). Total num frames: 899776512. Throughput: 0: 11489.7. Samples: 225008128. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:47:35,776][1648981] Avg episode reward: [(0, '452.240')] [2024-06-15 16:47:35,804][1651669] Updated weights for policy 0, policy_version 439356 (0.0010) [2024-06-15 16:47:39,862][1651669] Updated weights for policy 0, policy_version 439408 (0.0012) [2024-06-15 16:47:40,766][1648981] Fps is (10 sec: 42600.1, 60 sec: 46421.4, 300 sec: 48541.1). Total num frames: 899973120. Throughput: 0: 11582.6. Samples: 225053696. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:47:40,767][1648981] Avg episode reward: [(0, '451.590')] [2024-06-15 16:47:41,305][1651669] Updated weights for policy 0, policy_version 439460 (0.0012) [2024-06-15 16:47:43,005][1651669] Updated weights for policy 0, policy_version 439522 (0.0014) [2024-06-15 16:47:45,155][1651669] Updated weights for policy 0, policy_version 439554 (0.0028) [2024-06-15 16:47:45,766][1648981] Fps is (10 sec: 45917.0, 60 sec: 46970.4, 300 sec: 48098.4). Total num frames: 900235264. Throughput: 0: 11642.2. Samples: 225115136. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:47:45,767][1648981] Avg episode reward: [(0, '452.940')] [2024-06-15 16:47:46,554][1651669] Updated weights for policy 0, policy_version 439616 (0.0015) [2024-06-15 16:47:50,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 44238.1, 300 sec: 48097.3). Total num frames: 900366336. Throughput: 0: 11789.2. Samples: 225194496. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:47:50,767][1648981] Avg episode reward: [(0, '439.770')] [2024-06-15 16:47:51,394][1651669] Updated weights for policy 0, policy_version 439664 (0.0012) [2024-06-15 16:47:52,991][1651669] Updated weights for policy 0, policy_version 439728 (0.0038) [2024-06-15 16:47:54,437][1651669] Updated weights for policy 0, policy_version 439801 (0.0010) [2024-06-15 16:47:55,710][1651274] Signal inference workers to stop experience collection... (23050 times) [2024-06-15 16:47:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 48432.2). Total num frames: 900726784. Throughput: 0: 11673.6. Samples: 225222656. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:47:55,767][1648981] Avg episode reward: [(0, '429.820')] [2024-06-15 16:47:55,776][1651669] InferenceWorker_p0-w0: stopping experience collection (23050 times) [2024-06-15 16:47:56,070][1651274] Signal inference workers to resume experience collection... (23050 times) [2024-06-15 16:47:56,071][1651669] InferenceWorker_p0-w0: resuming experience collection (23050 times) [2024-06-15 16:47:56,877][1651669] Updated weights for policy 0, policy_version 439856 (0.0014) [2024-06-15 16:48:00,738][1651669] Updated weights for policy 0, policy_version 439888 (0.0013) [2024-06-15 16:48:00,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 45337.0, 300 sec: 48096.7). Total num frames: 900890624. Throughput: 0: 11946.7. Samples: 225300992. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:00,767][1648981] Avg episode reward: [(0, '443.820')] [2024-06-15 16:48:01,598][1651669] Updated weights for policy 0, policy_version 439926 (0.0011) [2024-06-15 16:48:02,729][1651669] Updated weights for policy 0, policy_version 439968 (0.0012) [2024-06-15 16:48:04,541][1651669] Updated weights for policy 0, policy_version 440037 (0.0012) [2024-06-15 16:48:05,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49157.5, 300 sec: 48874.3). Total num frames: 901251072. Throughput: 0: 12049.3. Samples: 225376256. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:05,767][1648981] Avg episode reward: [(0, '456.460')] [2024-06-15 16:48:06,572][1651669] Updated weights for policy 0, policy_version 440098 (0.0017) [2024-06-15 16:48:10,774][1648981] Fps is (10 sec: 49113.9, 60 sec: 45869.3, 300 sec: 47984.4). Total num frames: 901382144. Throughput: 0: 12103.9. Samples: 225410560. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:10,775][1648981] Avg episode reward: [(0, '443.580')] [2024-06-15 16:48:11,211][1651669] Updated weights for policy 0, policy_version 440132 (0.0014) [2024-06-15 16:48:12,586][1651669] Updated weights for policy 0, policy_version 440187 (0.0011) [2024-06-15 16:48:14,219][1651669] Updated weights for policy 0, policy_version 440228 (0.0012) [2024-06-15 16:48:15,549][1651669] Updated weights for policy 0, policy_version 440304 (0.0094) [2024-06-15 16:48:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.4, 300 sec: 48874.3). Total num frames: 901775360. Throughput: 0: 12015.0. Samples: 225485312. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:15,767][1648981] Avg episode reward: [(0, '437.360')] [2024-06-15 16:48:17,405][1651669] Updated weights for policy 0, policy_version 440368 (0.0012) [2024-06-15 16:48:20,766][1648981] Fps is (10 sec: 52469.3, 60 sec: 46968.2, 300 sec: 47989.6). Total num frames: 901906432. Throughput: 0: 12290.5. Samples: 225561088. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:20,767][1648981] Avg episode reward: [(0, '435.870')] [2024-06-15 16:48:22,669][1651669] Updated weights for policy 0, policy_version 440432 (0.0012) [2024-06-15 16:48:25,335][1651669] Updated weights for policy 0, policy_version 440496 (0.0012) [2024-06-15 16:48:25,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 902168576. Throughput: 0: 12049.1. Samples: 225595904. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:25,767][1648981] Avg episode reward: [(0, '440.560')] [2024-06-15 16:48:26,938][1651669] Updated weights for policy 0, policy_version 440563 (0.0014) [2024-06-15 16:48:28,244][1651669] Updated weights for policy 0, policy_version 440610 (0.0012) [2024-06-15 16:48:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48060.0, 300 sec: 48207.8). Total num frames: 902430720. Throughput: 0: 12265.2. Samples: 225667072. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:30,767][1648981] Avg episode reward: [(0, '455.560')] [2024-06-15 16:48:32,255][1651669] Updated weights for policy 0, policy_version 440657 (0.0016) [2024-06-15 16:48:35,683][1651669] Updated weights for policy 0, policy_version 440736 (0.0144) [2024-06-15 16:48:35,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 47520.8, 300 sec: 48541.1). Total num frames: 902627328. Throughput: 0: 12265.2. Samples: 225746432. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:35,767][1648981] Avg episode reward: [(0, '450.240')] [2024-06-15 16:48:37,659][1651274] Signal inference workers to stop experience collection... (23100 times) [2024-06-15 16:48:37,708][1651669] InferenceWorker_p0-w0: stopping experience collection (23100 times) [2024-06-15 16:48:37,838][1651274] Signal inference workers to resume experience collection... (23100 times) [2024-06-15 16:48:37,839][1651669] InferenceWorker_p0-w0: resuming experience collection (23100 times) [2024-06-15 16:48:37,994][1651669] Updated weights for policy 0, policy_version 440823 (0.0012) [2024-06-15 16:48:39,416][1651669] Updated weights for policy 0, policy_version 440872 (0.0010) [2024-06-15 16:48:40,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 902955008. Throughput: 0: 12242.5. Samples: 225773568. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:40,767][1648981] Avg episode reward: [(0, '450.680')] [2024-06-15 16:48:43,369][1651669] Updated weights for policy 0, policy_version 440912 (0.0013) [2024-06-15 16:48:44,419][1651669] Updated weights for policy 0, policy_version 440953 (0.0011) [2024-06-15 16:48:45,774][1648981] Fps is (10 sec: 45839.4, 60 sec: 47507.4, 300 sec: 48428.7). Total num frames: 903086080. Throughput: 0: 12126.6. Samples: 225846784. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:45,775][1648981] Avg episode reward: [(0, '453.440')] [2024-06-15 16:48:46,722][1651669] Updated weights for policy 0, policy_version 440993 (0.0015) [2024-06-15 16:48:48,544][1651669] Updated weights for policy 0, policy_version 441058 (0.0100) [2024-06-15 16:48:50,275][1651669] Updated weights for policy 0, policy_version 441110 (0.0012) [2024-06-15 16:48:50,773][1648981] Fps is (10 sec: 45848.2, 60 sec: 50785.4, 300 sec: 48430.0). Total num frames: 903413760. Throughput: 0: 12002.0. Samples: 225916416. Policy #0 lag: (min: 14.0, avg: 90.8, max: 270.0) [2024-06-15 16:48:50,776][1648981] Avg episode reward: [(0, '452.050')] [2024-06-15 16:48:53,846][1651669] Updated weights for policy 0, policy_version 441155 (0.0013) [2024-06-15 16:48:55,793][1648981] Fps is (10 sec: 52332.5, 60 sec: 48038.7, 300 sec: 48428.3). Total num frames: 903610368. Throughput: 0: 12203.4. Samples: 225959936. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:48:55,793][1648981] Avg episode reward: [(0, '457.260')] [2024-06-15 16:48:55,799][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000441216_903610368.pth... [2024-06-15 16:48:55,883][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000435552_892010496.pth [2024-06-15 16:48:57,355][1651669] Updated weights for policy 0, policy_version 441234 (0.0014) [2024-06-15 16:48:58,377][1651669] Updated weights for policy 0, policy_version 441274 (0.0012) [2024-06-15 16:48:59,551][1651669] Updated weights for policy 0, policy_version 441328 (0.0022) [2024-06-15 16:49:00,766][1648981] Fps is (10 sec: 49181.1, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 903905280. Throughput: 0: 12037.7. Samples: 226027008. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:00,767][1648981] Avg episode reward: [(0, '431.700')] [2024-06-15 16:49:00,772][1651669] Updated weights for policy 0, policy_version 441376 (0.0011) [2024-06-15 16:49:05,325][1651669] Updated weights for policy 0, policy_version 441412 (0.0012) [2024-06-15 16:49:05,770][1648981] Fps is (10 sec: 45978.1, 60 sec: 46964.5, 300 sec: 48207.8). Total num frames: 904069120. Throughput: 0: 12025.3. Samples: 226102272. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:05,771][1648981] Avg episode reward: [(0, '438.360')] [2024-06-15 16:49:06,331][1651669] Updated weights for policy 0, policy_version 441471 (0.0136) [2024-06-15 16:49:08,773][1651669] Updated weights for policy 0, policy_version 441508 (0.0045) [2024-06-15 16:49:10,433][1651669] Updated weights for policy 0, policy_version 441569 (0.0012) [2024-06-15 16:49:10,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 49704.6, 300 sec: 48765.2). Total num frames: 904364032. Throughput: 0: 12128.7. Samples: 226141696. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:10,767][1648981] Avg episode reward: [(0, '461.490')] [2024-06-15 16:49:12,212][1651669] Updated weights for policy 0, policy_version 441648 (0.0010) [2024-06-15 16:49:15,767][1648981] Fps is (10 sec: 45891.7, 60 sec: 45875.0, 300 sec: 47989.5). Total num frames: 904527872. Throughput: 0: 11946.6. Samples: 226204672. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:15,767][1648981] Avg episode reward: [(0, '481.750')] [2024-06-15 16:49:17,696][1651669] Updated weights for policy 0, policy_version 441697 (0.0147) [2024-06-15 16:49:19,650][1651669] Updated weights for policy 0, policy_version 441764 (0.0013) [2024-06-15 16:49:19,911][1651274] Signal inference workers to stop experience collection... (23150 times) [2024-06-15 16:49:19,967][1651669] InferenceWorker_p0-w0: stopping experience collection (23150 times) [2024-06-15 16:49:20,089][1651274] Signal inference workers to resume experience collection... (23150 times) [2024-06-15 16:49:20,090][1651669] InferenceWorker_p0-w0: resuming experience collection (23150 times) [2024-06-15 16:49:20,664][1651669] Updated weights for policy 0, policy_version 441816 (0.0011) [2024-06-15 16:49:20,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49152.1, 300 sec: 48654.4). Total num frames: 904855552. Throughput: 0: 11787.4. Samples: 226276864. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:20,767][1648981] Avg episode reward: [(0, '489.190')] [2024-06-15 16:49:21,434][1651669] Updated weights for policy 0, policy_version 441853 (0.0012) [2024-06-15 16:49:23,087][1651669] Updated weights for policy 0, policy_version 441904 (0.0011) [2024-06-15 16:49:25,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 905052160. Throughput: 0: 12026.3. Samples: 226314752. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:25,767][1648981] Avg episode reward: [(0, '492.700')] [2024-06-15 16:49:27,888][1651669] Updated weights for policy 0, policy_version 441968 (0.0012) [2024-06-15 16:49:29,523][1651669] Updated weights for policy 0, policy_version 442041 (0.0012) [2024-06-15 16:49:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 48654.5). Total num frames: 905379840. Throughput: 0: 12153.6. Samples: 226393600. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:30,767][1648981] Avg episode reward: [(0, '486.250')] [2024-06-15 16:49:31,012][1651669] Updated weights for policy 0, policy_version 442096 (0.0017) [2024-06-15 16:49:33,143][1651669] Updated weights for policy 0, policy_version 442144 (0.0029) [2024-06-15 16:49:35,785][1648981] Fps is (10 sec: 52333.8, 60 sec: 49137.1, 300 sec: 47986.8). Total num frames: 905576448. Throughput: 0: 12421.1. Samples: 226475520. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:35,785][1648981] Avg episode reward: [(0, '517.510')] [2024-06-15 16:49:36,792][1651669] Updated weights for policy 0, policy_version 442180 (0.0012) [2024-06-15 16:49:38,095][1651669] Updated weights for policy 0, policy_version 442232 (0.0014) [2024-06-15 16:49:40,177][1651669] Updated weights for policy 0, policy_version 442290 (0.0011) [2024-06-15 16:49:40,767][1648981] Fps is (10 sec: 49151.9, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 905871360. Throughput: 0: 12306.5. Samples: 226513408. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:40,768][1648981] Avg episode reward: [(0, '507.290')] [2024-06-15 16:49:43,591][1651669] Updated weights for policy 0, policy_version 442385 (0.0012) [2024-06-15 16:49:45,782][1648981] Fps is (10 sec: 52441.5, 60 sec: 50237.6, 300 sec: 47983.1). Total num frames: 906100736. Throughput: 0: 12386.1. Samples: 226584576. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:45,783][1648981] Avg episode reward: [(0, '513.150')] [2024-06-15 16:49:47,250][1651669] Updated weights for policy 0, policy_version 442448 (0.0012) [2024-06-15 16:49:48,397][1651669] Updated weights for policy 0, policy_version 442495 (0.0013) [2024-06-15 16:49:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48064.4, 300 sec: 48653.7). Total num frames: 906297344. Throughput: 0: 12368.7. Samples: 226658816. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:50,767][1648981] Avg episode reward: [(0, '515.570')] [2024-06-15 16:49:51,350][1651669] Updated weights for policy 0, policy_version 442558 (0.0013) [2024-06-15 16:49:52,674][1651669] Updated weights for policy 0, policy_version 442612 (0.0012) [2024-06-15 16:49:54,642][1651669] Updated weights for policy 0, policy_version 442672 (0.0012) [2024-06-15 16:49:55,778][1648981] Fps is (10 sec: 52450.5, 60 sec: 50256.5, 300 sec: 48317.0). Total num frames: 906625024. Throughput: 0: 12250.7. Samples: 226693120. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:49:55,779][1648981] Avg episode reward: [(0, '512.550')] [2024-06-15 16:49:57,490][1651669] Updated weights for policy 0, policy_version 442704 (0.0014) [2024-06-15 16:49:58,726][1651669] Updated weights for policy 0, policy_version 442747 (0.0020) [2024-06-15 16:50:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 906788864. Throughput: 0: 12527.0. Samples: 226768384. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:50:00,767][1648981] Avg episode reward: [(0, '509.780')] [2024-06-15 16:50:01,423][1651669] Updated weights for policy 0, policy_version 442800 (0.0014) [2024-06-15 16:50:02,720][1651274] Signal inference workers to stop experience collection... (23200 times) [2024-06-15 16:50:02,773][1651669] InferenceWorker_p0-w0: stopping experience collection (23200 times) [2024-06-15 16:50:02,922][1651274] Signal inference workers to resume experience collection... (23200 times) [2024-06-15 16:50:02,924][1651669] InferenceWorker_p0-w0: resuming experience collection (23200 times) [2024-06-15 16:50:03,683][1651669] Updated weights for policy 0, policy_version 442869 (0.0013) [2024-06-15 16:50:05,086][1651669] Updated weights for policy 0, policy_version 442913 (0.0053) [2024-06-15 16:50:05,769][1648981] Fps is (10 sec: 52476.4, 60 sec: 51337.7, 300 sec: 48651.7). Total num frames: 907149312. Throughput: 0: 12503.5. Samples: 226839552. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:50:05,770][1648981] Avg episode reward: [(0, '511.640')] [2024-06-15 16:50:07,776][1651669] Updated weights for policy 0, policy_version 442960 (0.0013) [2024-06-15 16:50:09,007][1651669] Updated weights for policy 0, policy_version 443005 (0.0016) [2024-06-15 16:50:10,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48605.9, 300 sec: 48430.3). Total num frames: 907280384. Throughput: 0: 12481.4. Samples: 226876416. Policy #0 lag: (min: 42.0, avg: 183.7, max: 298.0) [2024-06-15 16:50:10,767][1648981] Avg episode reward: [(0, '533.560')] [2024-06-15 16:50:12,507][1651669] Updated weights for policy 0, policy_version 443061 (0.0018) [2024-06-15 16:50:14,774][1651669] Updated weights for policy 0, policy_version 443121 (0.0013) [2024-06-15 16:50:15,766][1648981] Fps is (10 sec: 45887.1, 60 sec: 51336.8, 300 sec: 48763.3). Total num frames: 907608064. Throughput: 0: 12333.5. Samples: 226948608. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:15,767][1648981] Avg episode reward: [(0, '540.960')] [2024-06-15 16:50:16,083][1651669] Updated weights for policy 0, policy_version 443196 (0.0012) [2024-06-15 16:50:16,168][1651274] Saving new best policy, reward=540.960! [2024-06-15 16:50:20,068][1651669] Updated weights for policy 0, policy_version 443258 (0.0030) [2024-06-15 16:50:20,770][1648981] Fps is (10 sec: 52409.3, 60 sec: 49148.9, 300 sec: 48429.5). Total num frames: 907804672. Throughput: 0: 12144.0. Samples: 227021824. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:20,771][1648981] Avg episode reward: [(0, '525.060')] [2024-06-15 16:50:23,678][1651669] Updated weights for policy 0, policy_version 443316 (0.0090) [2024-06-15 16:50:25,247][1651669] Updated weights for policy 0, policy_version 443360 (0.0017) [2024-06-15 16:50:25,785][1648981] Fps is (10 sec: 42517.4, 60 sec: 49682.4, 300 sec: 48760.1). Total num frames: 908034048. Throughput: 0: 12021.2. Samples: 227054592. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:25,786][1648981] Avg episode reward: [(0, '516.340')] [2024-06-15 16:50:26,078][1651669] Updated weights for policy 0, policy_version 443392 (0.0012) [2024-06-15 16:50:27,850][1651669] Updated weights for policy 0, policy_version 443453 (0.0012) [2024-06-15 16:50:30,766][1648981] Fps is (10 sec: 49170.9, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 908296192. Throughput: 0: 12167.1. Samples: 227131904. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:30,767][1648981] Avg episode reward: [(0, '528.690')] [2024-06-15 16:50:30,839][1651669] Updated weights for policy 0, policy_version 443516 (0.0011) [2024-06-15 16:50:34,605][1651669] Updated weights for policy 0, policy_version 443578 (0.0018) [2024-06-15 16:50:35,721][1651669] Updated weights for policy 0, policy_version 443618 (0.0015) [2024-06-15 16:50:35,766][1648981] Fps is (10 sec: 49245.6, 60 sec: 49166.9, 300 sec: 48653.4). Total num frames: 908525568. Throughput: 0: 12026.3. Samples: 227200000. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:35,767][1648981] Avg episode reward: [(0, '523.060')] [2024-06-15 16:50:37,347][1651669] Updated weights for policy 0, policy_version 443649 (0.0025) [2024-06-15 16:50:38,414][1651669] Updated weights for policy 0, policy_version 443705 (0.0011) [2024-06-15 16:50:40,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 47513.6, 300 sec: 47986.6). Total num frames: 908722176. Throughput: 0: 12018.0. Samples: 227233792. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:40,767][1648981] Avg episode reward: [(0, '515.720')] [2024-06-15 16:50:41,355][1651669] Updated weights for policy 0, policy_version 443760 (0.0026) [2024-06-15 16:50:44,792][1651669] Updated weights for policy 0, policy_version 443824 (0.0012) [2024-06-15 16:50:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48072.3, 300 sec: 48430.0). Total num frames: 908984320. Throughput: 0: 12151.5. Samples: 227315200. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:45,767][1648981] Avg episode reward: [(0, '490.790')] [2024-06-15 16:50:46,114][1651274] Signal inference workers to stop experience collection... (23250 times) [2024-06-15 16:50:46,221][1651669] InferenceWorker_p0-w0: stopping experience collection (23250 times) [2024-06-15 16:50:46,337][1651274] Signal inference workers to resume experience collection... (23250 times) [2024-06-15 16:50:46,337][1651669] InferenceWorker_p0-w0: resuming experience collection (23250 times) [2024-06-15 16:50:46,340][1651669] Updated weights for policy 0, policy_version 443872 (0.0011) [2024-06-15 16:50:48,225][1651669] Updated weights for policy 0, policy_version 443931 (0.0012) [2024-06-15 16:50:50,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 49151.8, 300 sec: 47985.7). Total num frames: 909246464. Throughput: 0: 12117.9. Samples: 227384832. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:50,768][1648981] Avg episode reward: [(0, '467.740')] [2024-06-15 16:50:51,679][1651669] Updated weights for policy 0, policy_version 443971 (0.0068) [2024-06-15 16:50:52,693][1651669] Updated weights for policy 0, policy_version 444032 (0.0012) [2024-06-15 16:50:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48069.0, 300 sec: 48430.0). Total num frames: 909508608. Throughput: 0: 12299.4. Samples: 227429888. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:50:55,767][1648981] Avg episode reward: [(0, '479.800')] [2024-06-15 16:50:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000444096_909508608.pth... [2024-06-15 16:50:55,857][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000438432_897908736.pth [2024-06-15 16:50:55,871][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000444096_909508608.pth [2024-06-15 16:50:56,539][1651669] Updated weights for policy 0, policy_version 444099 (0.0030) [2024-06-15 16:50:57,922][1651669] Updated weights for policy 0, policy_version 444158 (0.0120) [2024-06-15 16:50:59,693][1651669] Updated weights for policy 0, policy_version 444217 (0.0100) [2024-06-15 16:51:00,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 49698.2, 300 sec: 47990.9). Total num frames: 909770752. Throughput: 0: 12037.7. Samples: 227490304. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:00,767][1648981] Avg episode reward: [(0, '482.060')] [2024-06-15 16:51:04,713][1651669] Updated weights for policy 0, policy_version 444304 (0.0014) [2024-06-15 16:51:05,700][1651669] Updated weights for policy 0, policy_version 444344 (0.0010) [2024-06-15 16:51:05,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 47515.5, 300 sec: 48318.9). Total num frames: 910000128. Throughput: 0: 12152.5. Samples: 227568640. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:05,767][1648981] Avg episode reward: [(0, '459.230')] [2024-06-15 16:51:08,268][1651669] Updated weights for policy 0, policy_version 444400 (0.0012) [2024-06-15 16:51:09,604][1651669] Updated weights for policy 0, policy_version 444450 (0.0010) [2024-06-15 16:51:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48099.2). Total num frames: 910295040. Throughput: 0: 12224.9. Samples: 227604480. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:10,767][1648981] Avg episode reward: [(0, '443.810')] [2024-06-15 16:51:13,556][1651669] Updated weights for policy 0, policy_version 444484 (0.0012) [2024-06-15 16:51:14,562][1651669] Updated weights for policy 0, policy_version 444534 (0.0013) [2024-06-15 16:51:15,435][1651669] Updated weights for policy 0, policy_version 444564 (0.0063) [2024-06-15 16:51:15,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 910491648. Throughput: 0: 12367.6. Samples: 227688448. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:15,767][1648981] Avg episode reward: [(0, '425.980')] [2024-06-15 16:51:16,278][1651669] Updated weights for policy 0, policy_version 444602 (0.0014) [2024-06-15 16:51:19,217][1651669] Updated weights for policy 0, policy_version 444676 (0.0014) [2024-06-15 16:51:20,317][1651669] Updated weights for policy 0, policy_version 444736 (0.0094) [2024-06-15 16:51:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50247.5, 300 sec: 48318.9). Total num frames: 910819328. Throughput: 0: 12231.1. Samples: 227750400. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:20,767][1648981] Avg episode reward: [(0, '441.910')] [2024-06-15 16:51:24,737][1651669] Updated weights for policy 0, policy_version 444784 (0.0012) [2024-06-15 16:51:25,770][1648981] Fps is (10 sec: 49133.4, 60 sec: 49164.4, 300 sec: 48318.3). Total num frames: 910983168. Throughput: 0: 12605.5. Samples: 227801088. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:25,771][1648981] Avg episode reward: [(0, '449.030')] [2024-06-15 16:51:26,313][1651669] Updated weights for policy 0, policy_version 444848 (0.0017) [2024-06-15 16:51:27,990][1651274] Signal inference workers to stop experience collection... (23300 times) [2024-06-15 16:51:28,057][1651669] InferenceWorker_p0-w0: stopping experience collection (23300 times) [2024-06-15 16:51:28,205][1651274] Signal inference workers to resume experience collection... (23300 times) [2024-06-15 16:51:28,205][1651669] InferenceWorker_p0-w0: resuming experience collection (23300 times) [2024-06-15 16:51:28,841][1651669] Updated weights for policy 0, policy_version 444900 (0.0013) [2024-06-15 16:51:30,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 49151.9, 300 sec: 48319.2). Total num frames: 911245312. Throughput: 0: 12219.7. Samples: 227865088. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:30,767][1648981] Avg episode reward: [(0, '455.690')] [2024-06-15 16:51:30,777][1651669] Updated weights for policy 0, policy_version 444946 (0.0026) [2024-06-15 16:51:35,328][1651669] Updated weights for policy 0, policy_version 444997 (0.0014) [2024-06-15 16:51:35,766][1648981] Fps is (10 sec: 39336.9, 60 sec: 47513.7, 300 sec: 48096.8). Total num frames: 911376384. Throughput: 0: 12492.9. Samples: 227947008. Policy #0 lag: (min: 15.0, avg: 122.0, max: 271.0) [2024-06-15 16:51:35,767][1648981] Avg episode reward: [(0, '455.050')] [2024-06-15 16:51:36,400][1651669] Updated weights for policy 0, policy_version 445044 (0.0011) [2024-06-15 16:51:38,110][1651669] Updated weights for policy 0, policy_version 445112 (0.0013) [2024-06-15 16:51:39,966][1651669] Updated weights for policy 0, policy_version 445154 (0.0013) [2024-06-15 16:51:40,767][1648981] Fps is (10 sec: 49151.6, 60 sec: 50244.2, 300 sec: 48541.7). Total num frames: 911736832. Throughput: 0: 12094.6. Samples: 227974144. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:51:40,767][1648981] Avg episode reward: [(0, '460.230')] [2024-06-15 16:51:41,193][1651669] Updated weights for policy 0, policy_version 445187 (0.0014) [2024-06-15 16:51:42,438][1651669] Updated weights for policy 0, policy_version 445247 (0.0033) [2024-06-15 16:51:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 48097.1). Total num frames: 911900672. Throughput: 0: 12538.3. Samples: 228054528. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:51:45,767][1648981] Avg episode reward: [(0, '452.660')] [2024-06-15 16:51:47,792][1651669] Updated weights for policy 0, policy_version 445331 (0.0013) [2024-06-15 16:51:48,677][1651669] Updated weights for policy 0, policy_version 445372 (0.0012) [2024-06-15 16:51:50,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 48606.1, 300 sec: 48541.1). Total num frames: 912162816. Throughput: 0: 12367.7. Samples: 228125184. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:51:50,767][1648981] Avg episode reward: [(0, '454.890')] [2024-06-15 16:51:51,744][1651669] Updated weights for policy 0, policy_version 445429 (0.0012) [2024-06-15 16:51:53,221][1651669] Updated weights for policy 0, policy_version 445496 (0.0012) [2024-06-15 16:51:55,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.8, 300 sec: 48209.6). Total num frames: 912392192. Throughput: 0: 12208.3. Samples: 228153856. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:51:55,767][1648981] Avg episode reward: [(0, '460.430')] [2024-06-15 16:51:56,603][1651669] Updated weights for policy 0, policy_version 445536 (0.0011) [2024-06-15 16:51:58,277][1651669] Updated weights for policy 0, policy_version 445600 (0.0012) [2024-06-15 16:52:00,810][1648981] Fps is (10 sec: 48937.5, 60 sec: 48024.6, 300 sec: 48646.0). Total num frames: 912654336. Throughput: 0: 12026.0. Samples: 228230144. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:00,811][1648981] Avg episode reward: [(0, '471.330')] [2024-06-15 16:52:01,408][1651669] Updated weights for policy 0, policy_version 445633 (0.0014) [2024-06-15 16:52:03,158][1651669] Updated weights for policy 0, policy_version 445712 (0.0111) [2024-06-15 16:52:04,207][1651669] Updated weights for policy 0, policy_version 445760 (0.0013) [2024-06-15 16:52:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 912916480. Throughput: 0: 12197.0. Samples: 228299264. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:05,767][1648981] Avg episode reward: [(0, '486.060')] [2024-06-15 16:52:08,097][1651669] Updated weights for policy 0, policy_version 445821 (0.0015) [2024-06-15 16:52:08,232][1651274] Signal inference workers to stop experience collection... (23350 times) [2024-06-15 16:52:08,241][1651274] Signal inference workers to resume experience collection... (23350 times) [2024-06-15 16:52:08,269][1651669] InferenceWorker_p0-w0: stopping experience collection (23350 times) [2024-06-15 16:52:08,270][1651669] InferenceWorker_p0-w0: resuming experience collection (23350 times) [2024-06-15 16:52:09,899][1651669] Updated weights for policy 0, policy_version 445884 (0.0015) [2024-06-15 16:52:10,766][1648981] Fps is (10 sec: 52659.5, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 913178624. Throughput: 0: 11981.8. Samples: 228340224. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:10,767][1648981] Avg episode reward: [(0, '486.880')] [2024-06-15 16:52:13,788][1651669] Updated weights for policy 0, policy_version 445936 (0.0012) [2024-06-15 16:52:15,582][1651669] Updated weights for policy 0, policy_version 446010 (0.0114) [2024-06-15 16:52:15,782][1648981] Fps is (10 sec: 52345.7, 60 sec: 49139.0, 300 sec: 48649.7). Total num frames: 913440768. Throughput: 0: 12010.7. Samples: 228405760. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:15,783][1648981] Avg episode reward: [(0, '495.390')] [2024-06-15 16:52:18,680][1651669] Updated weights for policy 0, policy_version 446050 (0.0012) [2024-06-15 16:52:19,941][1651669] Updated weights for policy 0, policy_version 446099 (0.0017) [2024-06-15 16:52:20,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 48763.2). Total num frames: 913670144. Throughput: 0: 11923.9. Samples: 228483584. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:20,767][1648981] Avg episode reward: [(0, '493.750')] [2024-06-15 16:52:20,985][1651669] Updated weights for policy 0, policy_version 446143 (0.0017) [2024-06-15 16:52:24,276][1651669] Updated weights for policy 0, policy_version 446197 (0.0021) [2024-06-15 16:52:25,766][1648981] Fps is (10 sec: 45948.5, 60 sec: 48609.0, 300 sec: 48652.2). Total num frames: 913899520. Throughput: 0: 12140.2. Samples: 228520448. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:25,767][1648981] Avg episode reward: [(0, '484.990')] [2024-06-15 16:52:26,229][1651669] Updated weights for policy 0, policy_version 446272 (0.0014) [2024-06-15 16:52:29,750][1651669] Updated weights for policy 0, policy_version 446320 (0.0035) [2024-06-15 16:52:30,769][1648981] Fps is (10 sec: 45861.9, 60 sec: 48057.5, 300 sec: 48653.2). Total num frames: 914128896. Throughput: 0: 11945.9. Samples: 228592128. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:30,770][1648981] Avg episode reward: [(0, '469.610')] [2024-06-15 16:52:31,309][1651669] Updated weights for policy 0, policy_version 446384 (0.0019) [2024-06-15 16:52:34,895][1651669] Updated weights for policy 0, policy_version 446421 (0.0027) [2024-06-15 16:52:35,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 914325504. Throughput: 0: 11912.5. Samples: 228661248. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:35,767][1648981] Avg episode reward: [(0, '442.560')] [2024-06-15 16:52:36,193][1651669] Updated weights for policy 0, policy_version 446480 (0.0123) [2024-06-15 16:52:36,985][1651669] Updated weights for policy 0, policy_version 446520 (0.0013) [2024-06-15 16:52:39,549][1651669] Updated weights for policy 0, policy_version 446560 (0.0013) [2024-06-15 16:52:40,745][1651669] Updated weights for policy 0, policy_version 446624 (0.0011) [2024-06-15 16:52:40,771][1648981] Fps is (10 sec: 55697.1, 60 sec: 49148.5, 300 sec: 48984.7). Total num frames: 914685952. Throughput: 0: 12218.5. Samples: 228703744. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:40,771][1648981] Avg episode reward: [(0, '444.920')] [2024-06-15 16:52:44,427][1651669] Updated weights for policy 0, policy_version 446657 (0.0036) [2024-06-15 16:52:45,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49151.9, 300 sec: 49096.4). Total num frames: 914849792. Throughput: 0: 12436.6. Samples: 228789248. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:45,767][1648981] Avg episode reward: [(0, '439.430')] [2024-06-15 16:52:46,307][1651274] Signal inference workers to stop experience collection... (23400 times) [2024-06-15 16:52:46,370][1651669] InferenceWorker_p0-w0: stopping experience collection (23400 times) [2024-06-15 16:52:46,582][1651274] Signal inference workers to resume experience collection... (23400 times) [2024-06-15 16:52:46,583][1651669] InferenceWorker_p0-w0: resuming experience collection (23400 times) [2024-06-15 16:52:46,745][1651669] Updated weights for policy 0, policy_version 446738 (0.0013) [2024-06-15 16:52:50,767][1648981] Fps is (10 sec: 32781.5, 60 sec: 47513.4, 300 sec: 48429.9). Total num frames: 915013632. Throughput: 0: 12174.1. Samples: 228847104. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:50,767][1648981] Avg episode reward: [(0, '444.140')] [2024-06-15 16:52:51,044][1651669] Updated weights for policy 0, policy_version 446802 (0.0013) [2024-06-15 16:52:52,073][1651669] Updated weights for policy 0, policy_version 446864 (0.0015) [2024-06-15 16:52:55,492][1651669] Updated weights for policy 0, policy_version 446931 (0.0012) [2024-06-15 16:52:55,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 915341312. Throughput: 0: 12106.0. Samples: 228884992. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:52:55,767][1648981] Avg episode reward: [(0, '422.490')] [2024-06-15 16:52:56,228][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000446960_915374080.pth... [2024-06-15 16:52:56,269][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000441216_903610368.pth [2024-06-15 16:52:56,704][1651669] Updated weights for policy 0, policy_version 446976 (0.0015) [2024-06-15 16:52:58,416][1651669] Updated weights for policy 0, policy_version 447032 (0.0215) [2024-06-15 16:53:00,778][1648981] Fps is (10 sec: 52368.3, 60 sec: 48085.4, 300 sec: 48428.1). Total num frames: 915537920. Throughput: 0: 12289.1. Samples: 228958720. Policy #0 lag: (min: 94.0, avg: 152.0, max: 287.0) [2024-06-15 16:53:00,779][1648981] Avg episode reward: [(0, '414.520')] [2024-06-15 16:53:02,085][1651669] Updated weights for policy 0, policy_version 447091 (0.0013) [2024-06-15 16:53:03,546][1651669] Updated weights for policy 0, policy_version 447154 (0.0016) [2024-06-15 16:53:05,787][1648981] Fps is (10 sec: 45782.5, 60 sec: 48043.5, 300 sec: 48872.2). Total num frames: 915800064. Throughput: 0: 12362.1. Samples: 229040128. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:05,787][1648981] Avg episode reward: [(0, '423.980')] [2024-06-15 16:53:06,258][1651669] Updated weights for policy 0, policy_version 447200 (0.0013) [2024-06-15 16:53:07,860][1651669] Updated weights for policy 0, policy_version 447251 (0.0013) [2024-06-15 16:53:08,782][1651669] Updated weights for policy 0, policy_version 447296 (0.0013) [2024-06-15 16:53:10,767][1648981] Fps is (10 sec: 52489.7, 60 sec: 48059.5, 300 sec: 48430.0). Total num frames: 916062208. Throughput: 0: 12128.6. Samples: 229066240. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:10,768][1648981] Avg episode reward: [(0, '435.460')] [2024-06-15 16:53:12,841][1651669] Updated weights for policy 0, policy_version 447357 (0.0013) [2024-06-15 16:53:14,076][1651669] Updated weights for policy 0, policy_version 447414 (0.0024) [2024-06-15 16:53:15,766][1648981] Fps is (10 sec: 52534.6, 60 sec: 48072.4, 300 sec: 48874.3). Total num frames: 916324352. Throughput: 0: 12322.9. Samples: 229146624. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:15,767][1648981] Avg episode reward: [(0, '422.230')] [2024-06-15 16:53:16,987][1651669] Updated weights for policy 0, policy_version 447478 (0.0024) [2024-06-15 16:53:18,880][1651669] Updated weights for policy 0, policy_version 447520 (0.0118) [2024-06-15 16:53:20,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48605.8, 300 sec: 48874.3). Total num frames: 916586496. Throughput: 0: 12424.5. Samples: 229220352. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:20,767][1648981] Avg episode reward: [(0, '414.610')] [2024-06-15 16:53:22,270][1651669] Updated weights for policy 0, policy_version 447570 (0.0013) [2024-06-15 16:53:23,119][1651669] Updated weights for policy 0, policy_version 447616 (0.0035) [2024-06-15 16:53:24,866][1651669] Updated weights for policy 0, policy_version 447671 (0.0011) [2024-06-15 16:53:25,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 49151.9, 300 sec: 48874.3). Total num frames: 916848640. Throughput: 0: 12277.8. Samples: 229256192. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:25,767][1648981] Avg episode reward: [(0, '438.520')] [2024-06-15 16:53:26,686][1651274] Signal inference workers to stop experience collection... (23450 times) [2024-06-15 16:53:26,709][1651669] InferenceWorker_p0-w0: stopping experience collection (23450 times) [2024-06-15 16:53:26,872][1651274] Signal inference workers to resume experience collection... (23450 times) [2024-06-15 16:53:26,873][1651669] InferenceWorker_p0-w0: resuming experience collection (23450 times) [2024-06-15 16:53:27,377][1651669] Updated weights for policy 0, policy_version 447738 (0.0047) [2024-06-15 16:53:30,000][1651669] Updated weights for policy 0, policy_version 447799 (0.0011) [2024-06-15 16:53:30,801][1648981] Fps is (10 sec: 52249.0, 60 sec: 49672.0, 300 sec: 49090.7). Total num frames: 917110784. Throughput: 0: 12085.3. Samples: 229333504. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:30,801][1648981] Avg episode reward: [(0, '438.360')] [2024-06-15 16:53:32,191][1651669] Updated weights for policy 0, policy_version 447825 (0.0011) [2024-06-15 16:53:33,046][1651669] Updated weights for policy 0, policy_version 447868 (0.0041) [2024-06-15 16:53:35,385][1651669] Updated weights for policy 0, policy_version 447920 (0.0036) [2024-06-15 16:53:35,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 50790.2, 300 sec: 48874.3). Total num frames: 917372928. Throughput: 0: 12424.6. Samples: 229406208. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:35,768][1648981] Avg episode reward: [(0, '445.890')] [2024-06-15 16:53:36,979][1651669] Updated weights for policy 0, policy_version 447944 (0.0019) [2024-06-15 16:53:37,955][1651669] Updated weights for policy 0, policy_version 447999 (0.0018) [2024-06-15 16:53:40,766][1648981] Fps is (10 sec: 49321.7, 60 sec: 48609.4, 300 sec: 49208.8). Total num frames: 917602304. Throughput: 0: 12515.5. Samples: 229448192. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:40,767][1648981] Avg episode reward: [(0, '442.520')] [2024-06-15 16:53:40,771][1651669] Updated weights for policy 0, policy_version 448053 (0.0012) [2024-06-15 16:53:42,307][1651669] Updated weights for policy 0, policy_version 448096 (0.0010) [2024-06-15 16:53:45,608][1651669] Updated weights for policy 0, policy_version 448150 (0.0011) [2024-06-15 16:53:45,774][1648981] Fps is (10 sec: 45842.3, 60 sec: 49692.1, 300 sec: 48874.1). Total num frames: 917831680. Throughput: 0: 12516.8. Samples: 229521920. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:45,774][1648981] Avg episode reward: [(0, '456.560')] [2024-06-15 16:53:48,006][1651669] Updated weights for policy 0, policy_version 448224 (0.0012) [2024-06-15 16:53:50,363][1651669] Updated weights for policy 0, policy_version 448258 (0.0012) [2024-06-15 16:53:50,795][1648981] Fps is (10 sec: 45744.8, 60 sec: 50766.5, 300 sec: 48985.0). Total num frames: 918061056. Throughput: 0: 12365.4. Samples: 229596672. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:50,795][1648981] Avg episode reward: [(0, '473.820')] [2024-06-15 16:53:52,585][1651669] Updated weights for policy 0, policy_version 448322 (0.0014) [2024-06-15 16:53:53,663][1651669] Updated weights for policy 0, policy_version 448369 (0.0014) [2024-06-15 16:53:55,766][1648981] Fps is (10 sec: 45909.0, 60 sec: 49152.0, 300 sec: 48763.2). Total num frames: 918290432. Throughput: 0: 12549.8. Samples: 229630976. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:53:55,767][1648981] Avg episode reward: [(0, '493.130')] [2024-06-15 16:53:56,241][1651669] Updated weights for policy 0, policy_version 448416 (0.0118) [2024-06-15 16:53:56,948][1651669] Updated weights for policy 0, policy_version 448447 (0.0011) [2024-06-15 16:53:59,960][1651669] Updated weights for policy 0, policy_version 448503 (0.0012) [2024-06-15 16:54:00,778][1648981] Fps is (10 sec: 52516.9, 60 sec: 50790.5, 300 sec: 49206.2). Total num frames: 918585344. Throughput: 0: 12455.4. Samples: 229707264. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:54:00,779][1648981] Avg episode reward: [(0, '485.560')] [2024-06-15 16:54:01,327][1651669] Updated weights for policy 0, policy_version 448568 (0.0011) [2024-06-15 16:54:04,640][1651669] Updated weights for policy 0, policy_version 448624 (0.0015) [2024-06-15 16:54:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50261.2, 300 sec: 48985.4). Total num frames: 918814720. Throughput: 0: 12390.4. Samples: 229777920. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:54:05,767][1648981] Avg episode reward: [(0, '475.610')] [2024-06-15 16:54:06,500][1651669] Updated weights for policy 0, policy_version 448656 (0.0011) [2024-06-15 16:54:09,573][1651669] Updated weights for policy 0, policy_version 448707 (0.0012) [2024-06-15 16:54:10,777][1648981] Fps is (10 sec: 45886.0, 60 sec: 49690.5, 300 sec: 49206.0). Total num frames: 919044096. Throughput: 0: 12421.9. Samples: 229815296. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:54:10,779][1648981] Avg episode reward: [(0, '484.020')] [2024-06-15 16:54:10,994][1651669] Updated weights for policy 0, policy_version 448765 (0.0011) [2024-06-15 16:54:11,432][1651274] Signal inference workers to stop experience collection... (23500 times) [2024-06-15 16:54:11,566][1651669] InferenceWorker_p0-w0: stopping experience collection (23500 times) [2024-06-15 16:54:11,688][1651274] Signal inference workers to resume experience collection... (23500 times) [2024-06-15 16:54:11,689][1651669] InferenceWorker_p0-w0: resuming experience collection (23500 times) [2024-06-15 16:54:12,614][1651669] Updated weights for policy 0, policy_version 448827 (0.0072) [2024-06-15 16:54:14,991][1651669] Updated weights for policy 0, policy_version 448869 (0.0012) [2024-06-15 16:54:15,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 50241.2, 300 sec: 49095.8). Total num frames: 919339008. Throughput: 0: 12319.1. Samples: 229887488. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:54:15,771][1648981] Avg episode reward: [(0, '487.280')] [2024-06-15 16:54:16,584][1651669] Updated weights for policy 0, policy_version 448912 (0.0011) [2024-06-15 16:54:20,249][1651669] Updated weights for policy 0, policy_version 448965 (0.0011) [2024-06-15 16:54:20,766][1648981] Fps is (10 sec: 45918.3, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 919502848. Throughput: 0: 12447.3. Samples: 229966336. Policy #0 lag: (min: 5.0, avg: 108.1, max: 261.0) [2024-06-15 16:54:20,767][1648981] Avg episode reward: [(0, '496.230')] [2024-06-15 16:54:21,452][1651669] Updated weights for policy 0, policy_version 449018 (0.0011) [2024-06-15 16:54:23,016][1651669] Updated weights for policy 0, policy_version 449078 (0.0012) [2024-06-15 16:54:25,782][1648981] Fps is (10 sec: 45819.7, 60 sec: 49139.0, 300 sec: 48871.7). Total num frames: 919797760. Throughput: 0: 12249.6. Samples: 229999616. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:25,783][1648981] Avg episode reward: [(0, '496.440')] [2024-06-15 16:54:26,119][1651669] Updated weights for policy 0, policy_version 449152 (0.0012) [2024-06-15 16:54:30,356][1651669] Updated weights for policy 0, policy_version 449217 (0.0012) [2024-06-15 16:54:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48633.8, 300 sec: 48988.4). Total num frames: 920027136. Throughput: 0: 12267.3. Samples: 230073856. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:30,767][1648981] Avg episode reward: [(0, '497.360')] [2024-06-15 16:54:31,660][1651669] Updated weights for policy 0, policy_version 449276 (0.0011) [2024-06-15 16:54:33,120][1651669] Updated weights for policy 0, policy_version 449338 (0.0115) [2024-06-15 16:54:35,774][1648981] Fps is (10 sec: 45912.3, 60 sec: 48053.6, 300 sec: 48761.9). Total num frames: 920256512. Throughput: 0: 12316.4. Samples: 230150656. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:35,775][1648981] Avg episode reward: [(0, '498.120')] [2024-06-15 16:54:37,110][1651669] Updated weights for policy 0, policy_version 449395 (0.0012) [2024-06-15 16:54:38,172][1651669] Updated weights for policy 0, policy_version 449456 (0.0020) [2024-06-15 16:54:40,767][1648981] Fps is (10 sec: 49146.7, 60 sec: 48605.0, 300 sec: 48876.7). Total num frames: 920518656. Throughput: 0: 12310.5. Samples: 230184960. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:40,768][1648981] Avg episode reward: [(0, '458.440')] [2024-06-15 16:54:41,437][1651669] Updated weights for policy 0, policy_version 449491 (0.0012) [2024-06-15 16:54:42,526][1651669] Updated weights for policy 0, policy_version 449536 (0.0033) [2024-06-15 16:54:44,083][1651669] Updated weights for policy 0, policy_version 449598 (0.0012) [2024-06-15 16:54:45,766][1648981] Fps is (10 sec: 52469.4, 60 sec: 49158.0, 300 sec: 49096.5). Total num frames: 920780800. Throughput: 0: 12245.7. Samples: 230258176. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:45,767][1648981] Avg episode reward: [(0, '458.640')] [2024-06-15 16:54:49,134][1651669] Updated weights for policy 0, policy_version 449696 (0.0120) [2024-06-15 16:54:50,769][1648981] Fps is (10 sec: 52419.9, 60 sec: 49719.5, 300 sec: 48875.8). Total num frames: 921042944. Throughput: 0: 12173.5. Samples: 230325760. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:50,770][1648981] Avg episode reward: [(0, '460.510')] [2024-06-15 16:54:52,576][1651669] Updated weights for policy 0, policy_version 449744 (0.0012) [2024-06-15 16:54:52,683][1651274] Signal inference workers to stop experience collection... (23550 times) [2024-06-15 16:54:52,775][1651669] InferenceWorker_p0-w0: stopping experience collection (23550 times) [2024-06-15 16:54:52,950][1651274] Signal inference workers to resume experience collection... (23550 times) [2024-06-15 16:54:52,951][1651669] InferenceWorker_p0-w0: resuming experience collection (23550 times) [2024-06-15 16:54:53,896][1651669] Updated weights for policy 0, policy_version 449796 (0.0012) [2024-06-15 16:54:55,174][1651669] Updated weights for policy 0, policy_version 449852 (0.0011) [2024-06-15 16:54:55,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.1, 300 sec: 49207.5). Total num frames: 921305088. Throughput: 0: 12245.0. Samples: 230366208. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:54:55,767][1648981] Avg episode reward: [(0, '474.280')] [2024-06-15 16:54:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000449856_921305088.pth... [2024-06-15 16:54:55,840][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000444096_909508608.pth [2024-06-15 16:54:59,639][1651669] Updated weights for policy 0, policy_version 449920 (0.0013) [2024-06-15 16:55:00,773][1648981] Fps is (10 sec: 49133.0, 60 sec: 49156.2, 300 sec: 48762.6). Total num frames: 921534464. Throughput: 0: 12139.3. Samples: 230433792. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:00,774][1648981] Avg episode reward: [(0, '477.390')] [2024-06-15 16:55:00,984][1651669] Updated weights for policy 0, policy_version 449984 (0.0014) [2024-06-15 16:55:05,476][1651669] Updated weights for policy 0, policy_version 450050 (0.0012) [2024-06-15 16:55:05,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 921731072. Throughput: 0: 11980.8. Samples: 230505472. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:05,767][1648981] Avg episode reward: [(0, '480.170')] [2024-06-15 16:55:06,731][1651669] Updated weights for policy 0, policy_version 450110 (0.0014) [2024-06-15 16:55:10,767][1648981] Fps is (10 sec: 39347.2, 60 sec: 48067.2, 300 sec: 48541.0). Total num frames: 921927680. Throughput: 0: 12019.1. Samples: 230540288. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:10,767][1648981] Avg episode reward: [(0, '474.910')] [2024-06-15 16:55:10,968][1651669] Updated weights for policy 0, policy_version 450168 (0.0015) [2024-06-15 16:55:12,521][1651669] Updated weights for policy 0, policy_version 450236 (0.0033) [2024-06-15 16:55:15,759][1651669] Updated weights for policy 0, policy_version 450297 (0.0013) [2024-06-15 16:55:15,777][1648981] Fps is (10 sec: 45828.3, 60 sec: 47508.5, 300 sec: 48762.2). Total num frames: 922189824. Throughput: 0: 11921.2. Samples: 230610432. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:15,777][1648981] Avg episode reward: [(0, '456.540')] [2024-06-15 16:55:17,874][1651669] Updated weights for policy 0, policy_version 450336 (0.0012) [2024-06-15 16:55:20,104][1651669] Updated weights for policy 0, policy_version 450369 (0.0010) [2024-06-15 16:55:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 48605.9, 300 sec: 48766.4). Total num frames: 922419200. Throughput: 0: 11971.5. Samples: 230689280. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:20,767][1648981] Avg episode reward: [(0, '451.640')] [2024-06-15 16:55:21,540][1651669] Updated weights for policy 0, policy_version 450438 (0.0015) [2024-06-15 16:55:22,652][1651669] Updated weights for policy 0, policy_version 450493 (0.0113) [2024-06-15 16:55:25,766][1648981] Fps is (10 sec: 55762.6, 60 sec: 49165.0, 300 sec: 48985.4). Total num frames: 922746880. Throughput: 0: 12026.6. Samples: 230726144. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:25,767][1648981] Avg episode reward: [(0, '432.450')] [2024-06-15 16:55:27,616][1651669] Updated weights for policy 0, policy_version 450562 (0.0011) [2024-06-15 16:55:28,790][1651669] Updated weights for policy 0, policy_version 450621 (0.0034) [2024-06-15 16:55:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 922943488. Throughput: 0: 12106.0. Samples: 230802944. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:30,767][1648981] Avg episode reward: [(0, '434.430')] [2024-06-15 16:55:31,175][1651669] Updated weights for policy 0, policy_version 450673 (0.0010) [2024-06-15 16:55:31,438][1651274] Signal inference workers to stop experience collection... (23600 times) [2024-06-15 16:55:31,498][1651669] InferenceWorker_p0-w0: stopping experience collection (23600 times) [2024-06-15 16:55:31,684][1651274] Signal inference workers to resume experience collection... (23600 times) [2024-06-15 16:55:31,685][1651669] InferenceWorker_p0-w0: resuming experience collection (23600 times) [2024-06-15 16:55:32,483][1651669] Updated weights for policy 0, policy_version 450748 (0.0012) [2024-06-15 16:55:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49158.4, 300 sec: 49096.5). Total num frames: 923205632. Throughput: 0: 12322.9. Samples: 230880256. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:35,767][1648981] Avg episode reward: [(0, '418.780')] [2024-06-15 16:55:36,041][1651669] Updated weights for policy 0, policy_version 450800 (0.0012) [2024-06-15 16:55:38,033][1651669] Updated weights for policy 0, policy_version 450817 (0.0012) [2024-06-15 16:55:39,201][1651669] Updated weights for policy 0, policy_version 450878 (0.0011) [2024-06-15 16:55:40,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48606.7, 300 sec: 48985.4). Total num frames: 923435008. Throughput: 0: 12276.7. Samples: 230918656. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:40,767][1648981] Avg episode reward: [(0, '395.570')] [2024-06-15 16:55:41,505][1651669] Updated weights for policy 0, policy_version 450944 (0.0011) [2024-06-15 16:55:43,206][1651669] Updated weights for policy 0, policy_version 451008 (0.0011) [2024-06-15 16:55:45,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 48056.7, 300 sec: 48873.7). Total num frames: 923664384. Throughput: 0: 12118.1. Samples: 230979072. Policy #0 lag: (min: 15.0, avg: 128.5, max: 271.0) [2024-06-15 16:55:45,771][1648981] Avg episode reward: [(0, '398.500')] [2024-06-15 16:55:47,799][1651669] Updated weights for policy 0, policy_version 451065 (0.0012) [2024-06-15 16:55:50,499][1651669] Updated weights for policy 0, policy_version 451105 (0.0013) [2024-06-15 16:55:50,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47515.7, 300 sec: 48763.2). Total num frames: 923893760. Throughput: 0: 12162.8. Samples: 231052800. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:55:50,767][1648981] Avg episode reward: [(0, '395.570')] [2024-06-15 16:55:52,255][1651669] Updated weights for policy 0, policy_version 451168 (0.0012) [2024-06-15 16:55:54,030][1651669] Updated weights for policy 0, policy_version 451217 (0.0030) [2024-06-15 16:55:54,956][1651669] Updated weights for policy 0, policy_version 451257 (0.0010) [2024-06-15 16:55:55,814][1648981] Fps is (10 sec: 52199.3, 60 sec: 48021.6, 300 sec: 48866.4). Total num frames: 924188672. Throughput: 0: 12127.2. Samples: 231086592. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:55:55,815][1648981] Avg episode reward: [(0, '414.320')] [2024-06-15 16:56:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46426.4, 300 sec: 48541.1). Total num frames: 924319744. Throughput: 0: 12063.2. Samples: 231153152. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:00,767][1648981] Avg episode reward: [(0, '413.830')] [2024-06-15 16:56:01,342][1651669] Updated weights for policy 0, policy_version 451329 (0.0014) [2024-06-15 16:56:03,033][1651669] Updated weights for policy 0, policy_version 451408 (0.0015) [2024-06-15 16:56:04,210][1651669] Updated weights for policy 0, policy_version 451456 (0.0014) [2024-06-15 16:56:05,766][1648981] Fps is (10 sec: 46095.5, 60 sec: 48605.9, 300 sec: 48652.1). Total num frames: 924647424. Throughput: 0: 11832.9. Samples: 231221760. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:05,767][1648981] Avg episode reward: [(0, '404.690')] [2024-06-15 16:56:06,338][1651669] Updated weights for policy 0, policy_version 451520 (0.0066) [2024-06-15 16:56:09,478][1651669] Updated weights for policy 0, policy_version 451577 (0.0016) [2024-06-15 16:56:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48605.9, 300 sec: 48652.1). Total num frames: 924844032. Throughput: 0: 11935.3. Samples: 231263232. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:10,767][1648981] Avg episode reward: [(0, '410.190')] [2024-06-15 16:56:13,800][1651669] Updated weights for policy 0, policy_version 451619 (0.0013) [2024-06-15 16:56:15,149][1651669] Updated weights for policy 0, policy_version 451680 (0.0010) [2024-06-15 16:56:15,331][1651274] Signal inference workers to stop experience collection... (23650 times) [2024-06-15 16:56:15,399][1651669] InferenceWorker_p0-w0: stopping experience collection (23650 times) [2024-06-15 16:56:15,566][1651274] Signal inference workers to resume experience collection... (23650 times) [2024-06-15 16:56:15,567][1651669] InferenceWorker_p0-w0: resuming experience collection (23650 times) [2024-06-15 16:56:15,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48067.9, 300 sec: 48318.9). Total num frames: 925073408. Throughput: 0: 11776.0. Samples: 231332864. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:15,767][1648981] Avg episode reward: [(0, '423.120')] [2024-06-15 16:56:16,765][1651669] Updated weights for policy 0, policy_version 451744 (0.0092) [2024-06-15 16:56:19,158][1651669] Updated weights for policy 0, policy_version 451781 (0.0011) [2024-06-15 16:56:20,272][1651669] Updated weights for policy 0, policy_version 451837 (0.0139) [2024-06-15 16:56:20,775][1648981] Fps is (10 sec: 52383.4, 60 sec: 49144.8, 300 sec: 48762.4). Total num frames: 925368320. Throughput: 0: 11660.0. Samples: 231405056. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:20,776][1648981] Avg episode reward: [(0, '416.670')] [2024-06-15 16:56:24,808][1651669] Updated weights for policy 0, policy_version 451892 (0.0014) [2024-06-15 16:56:25,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 46967.6, 300 sec: 48541.1). Total num frames: 925564928. Throughput: 0: 11764.7. Samples: 231448064. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:25,767][1648981] Avg episode reward: [(0, '429.150')] [2024-06-15 16:56:25,927][1651669] Updated weights for policy 0, policy_version 451952 (0.0012) [2024-06-15 16:56:27,431][1651669] Updated weights for policy 0, policy_version 452004 (0.0011) [2024-06-15 16:56:30,433][1651669] Updated weights for policy 0, policy_version 452080 (0.0012) [2024-06-15 16:56:30,766][1648981] Fps is (10 sec: 49194.9, 60 sec: 48605.8, 300 sec: 49096.4). Total num frames: 925859840. Throughput: 0: 11924.9. Samples: 231515648. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:30,767][1648981] Avg episode reward: [(0, '440.780')] [2024-06-15 16:56:34,290][1651669] Updated weights for policy 0, policy_version 452097 (0.0011) [2024-06-15 16:56:35,385][1651669] Updated weights for policy 0, policy_version 452145 (0.0012) [2024-06-15 16:56:35,777][1648981] Fps is (10 sec: 45824.0, 60 sec: 46958.8, 300 sec: 48428.2). Total num frames: 926023680. Throughput: 0: 11932.4. Samples: 231589888. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:35,778][1648981] Avg episode reward: [(0, '460.280')] [2024-06-15 16:56:36,409][1651669] Updated weights for policy 0, policy_version 452199 (0.0012) [2024-06-15 16:56:38,050][1651669] Updated weights for policy 0, policy_version 452244 (0.0011) [2024-06-15 16:56:39,054][1651669] Updated weights for policy 0, policy_version 452284 (0.0023) [2024-06-15 16:56:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 926318592. Throughput: 0: 12016.3. Samples: 231626752. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:40,767][1648981] Avg episode reward: [(0, '485.340')] [2024-06-15 16:56:41,612][1651669] Updated weights for policy 0, policy_version 452345 (0.0016) [2024-06-15 16:56:44,816][1651669] Updated weights for policy 0, policy_version 452384 (0.0131) [2024-06-15 16:56:45,773][1648981] Fps is (10 sec: 52452.7, 60 sec: 48057.6, 300 sec: 48762.1). Total num frames: 926547968. Throughput: 0: 12320.3. Samples: 231707648. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:45,773][1648981] Avg episode reward: [(0, '488.010')] [2024-06-15 16:56:46,444][1651669] Updated weights for policy 0, policy_version 452448 (0.0016) [2024-06-15 16:56:47,933][1651669] Updated weights for policy 0, policy_version 452485 (0.0014) [2024-06-15 16:56:49,148][1651669] Updated weights for policy 0, policy_version 452544 (0.0108) [2024-06-15 16:56:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 926810112. Throughput: 0: 12276.6. Samples: 231774208. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:50,767][1648981] Avg episode reward: [(0, '520.190')] [2024-06-15 16:56:52,292][1651669] Updated weights for policy 0, policy_version 452604 (0.0011) [2024-06-15 16:56:55,563][1651669] Updated weights for policy 0, policy_version 452644 (0.0011) [2024-06-15 16:56:55,778][1648981] Fps is (10 sec: 49125.3, 60 sec: 47542.0, 300 sec: 48768.5). Total num frames: 927039488. Throughput: 0: 12193.8. Samples: 231812096. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:56:55,779][1648981] Avg episode reward: [(0, '518.500')] [2024-06-15 16:56:55,925][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000452672_927072256.pth... [2024-06-15 16:56:55,985][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000446960_915374080.pth [2024-06-15 16:56:56,688][1651274] Signal inference workers to stop experience collection... (23700 times) [2024-06-15 16:56:56,749][1651669] InferenceWorker_p0-w0: stopping experience collection (23700 times) [2024-06-15 16:56:56,901][1651274] Signal inference workers to resume experience collection... (23700 times) [2024-06-15 16:56:56,922][1651669] InferenceWorker_p0-w0: resuming experience collection (23700 times) [2024-06-15 16:56:56,924][1651669] Updated weights for policy 0, policy_version 452688 (0.0012) [2024-06-15 16:56:58,896][1651669] Updated weights for policy 0, policy_version 452739 (0.0014) [2024-06-15 16:57:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 927334400. Throughput: 0: 12253.9. Samples: 231884288. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:57:00,767][1648981] Avg episode reward: [(0, '517.880')] [2024-06-15 16:57:01,296][1651669] Updated weights for policy 0, policy_version 452801 (0.0020) [2024-06-15 16:57:02,240][1651669] Updated weights for policy 0, policy_version 452853 (0.0012) [2024-06-15 16:57:05,029][1651669] Updated weights for policy 0, policy_version 452883 (0.0021) [2024-06-15 16:57:05,766][1648981] Fps is (10 sec: 52491.5, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 927563776. Throughput: 0: 12449.7. Samples: 231965184. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:57:05,767][1648981] Avg episode reward: [(0, '523.870')] [2024-06-15 16:57:08,129][1651669] Updated weights for policy 0, policy_version 452961 (0.0013) [2024-06-15 16:57:09,567][1651669] Updated weights for policy 0, policy_version 452993 (0.0013) [2024-06-15 16:57:10,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.2, 300 sec: 48765.8). Total num frames: 927825920. Throughput: 0: 12299.3. Samples: 232001536. Policy #0 lag: (min: 15.0, avg: 104.7, max: 271.0) [2024-06-15 16:57:10,767][1648981] Avg episode reward: [(0, '524.640')] [2024-06-15 16:57:10,993][1651669] Updated weights for policy 0, policy_version 453056 (0.0242) [2024-06-15 16:57:13,129][1651669] Updated weights for policy 0, policy_version 453120 (0.0101) [2024-06-15 16:57:15,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 49151.8, 300 sec: 48652.1). Total num frames: 928022528. Throughput: 0: 12390.3. Samples: 232073216. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:15,767][1648981] Avg episode reward: [(0, '526.980')] [2024-06-15 16:57:17,916][1651669] Updated weights for policy 0, policy_version 453191 (0.0012) [2024-06-15 16:57:18,770][1651669] Updated weights for policy 0, policy_version 453238 (0.0011) [2024-06-15 16:57:20,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49159.2, 300 sec: 48874.3). Total num frames: 928317440. Throughput: 0: 12370.7. Samples: 232146432. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:20,767][1648981] Avg episode reward: [(0, '510.990')] [2024-06-15 16:57:21,067][1651669] Updated weights for policy 0, policy_version 453296 (0.0012) [2024-06-15 16:57:23,457][1651669] Updated weights for policy 0, policy_version 453331 (0.0012) [2024-06-15 16:57:24,117][1651669] Updated weights for policy 0, policy_version 453376 (0.0010) [2024-06-15 16:57:25,766][1648981] Fps is (10 sec: 49154.2, 60 sec: 49152.0, 300 sec: 48763.7). Total num frames: 928514048. Throughput: 0: 12401.8. Samples: 232184832. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:25,767][1648981] Avg episode reward: [(0, '523.980')] [2024-06-15 16:57:28,948][1651669] Updated weights for policy 0, policy_version 453456 (0.0124) [2024-06-15 16:57:30,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48605.8, 300 sec: 48985.4). Total num frames: 928776192. Throughput: 0: 12153.2. Samples: 232254464. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:30,767][1648981] Avg episode reward: [(0, '524.110')] [2024-06-15 16:57:31,836][1651669] Updated weights for policy 0, policy_version 453522 (0.0012) [2024-06-15 16:57:32,707][1651669] Updated weights for policy 0, policy_version 453566 (0.0011) [2024-06-15 16:57:34,392][1651669] Updated weights for policy 0, policy_version 453629 (0.0037) [2024-06-15 16:57:35,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50253.6, 300 sec: 48652.9). Total num frames: 929038336. Throughput: 0: 12322.1. Samples: 232328704. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:35,767][1648981] Avg episode reward: [(0, '539.800')] [2024-06-15 16:57:37,730][1651669] Updated weights for policy 0, policy_version 453680 (0.0013) [2024-06-15 16:57:40,414][1651274] Signal inference workers to stop experience collection... (23750 times) [2024-06-15 16:57:40,443][1651669] InferenceWorker_p0-w0: stopping experience collection (23750 times) [2024-06-15 16:57:40,613][1651274] Signal inference workers to resume experience collection... (23750 times) [2024-06-15 16:57:40,614][1651669] InferenceWorker_p0-w0: resuming experience collection (23750 times) [2024-06-15 16:57:40,616][1651669] Updated weights for policy 0, policy_version 453744 (0.0014) [2024-06-15 16:57:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 929267712. Throughput: 0: 12257.1. Samples: 232363520. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:40,767][1648981] Avg episode reward: [(0, '512.380')] [2024-06-15 16:57:42,723][1651669] Updated weights for policy 0, policy_version 453792 (0.0029) [2024-06-15 16:57:43,462][1651669] Updated weights for policy 0, policy_version 453818 (0.0012) [2024-06-15 16:57:44,870][1651669] Updated weights for policy 0, policy_version 453878 (0.0023) [2024-06-15 16:57:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50249.8, 300 sec: 49318.7). Total num frames: 929562624. Throughput: 0: 12401.8. Samples: 232442368. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:45,767][1648981] Avg episode reward: [(0, '510.220')] [2024-06-15 16:57:47,946][1651669] Updated weights for policy 0, policy_version 453920 (0.0014) [2024-06-15 16:57:50,299][1651669] Updated weights for policy 0, policy_version 453956 (0.0013) [2024-06-15 16:57:50,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 929726464. Throughput: 0: 12208.3. Samples: 232514560. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:50,767][1648981] Avg episode reward: [(0, '477.010')] [2024-06-15 16:57:51,516][1651669] Updated weights for policy 0, policy_version 454015 (0.0010) [2024-06-15 16:57:53,737][1651669] Updated weights for policy 0, policy_version 454080 (0.0012) [2024-06-15 16:57:55,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 50800.5, 300 sec: 49320.6). Total num frames: 930086912. Throughput: 0: 12242.5. Samples: 232552448. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:57:55,767][1648981] Avg episode reward: [(0, '493.510')] [2024-06-15 16:57:57,884][1651669] Updated weights for policy 0, policy_version 454149 (0.0012) [2024-06-15 16:58:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 48877.7). Total num frames: 930217984. Throughput: 0: 12162.9. Samples: 232620544. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:00,767][1648981] Avg episode reward: [(0, '501.040')] [2024-06-15 16:58:01,133][1651669] Updated weights for policy 0, policy_version 454209 (0.0013) [2024-06-15 16:58:02,573][1651669] Updated weights for policy 0, policy_version 454272 (0.0017) [2024-06-15 16:58:04,806][1651669] Updated weights for policy 0, policy_version 454328 (0.0010) [2024-06-15 16:58:05,774][1648981] Fps is (10 sec: 39291.0, 60 sec: 48599.5, 300 sec: 48873.0). Total num frames: 930480128. Throughput: 0: 12194.8. Samples: 232695296. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:05,775][1648981] Avg episode reward: [(0, '517.710')] [2024-06-15 16:58:06,588][1651669] Updated weights for policy 0, policy_version 454384 (0.0011) [2024-06-15 16:58:09,193][1651669] Updated weights for policy 0, policy_version 454416 (0.0012) [2024-06-15 16:58:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 930742272. Throughput: 0: 12288.0. Samples: 232737792. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:10,767][1648981] Avg episode reward: [(0, '515.710')] [2024-06-15 16:58:11,441][1651669] Updated weights for policy 0, policy_version 454466 (0.0133) [2024-06-15 16:58:12,628][1651669] Updated weights for policy 0, policy_version 454526 (0.0019) [2024-06-15 16:58:15,656][1651669] Updated weights for policy 0, policy_version 454584 (0.0013) [2024-06-15 16:58:15,814][1648981] Fps is (10 sec: 52219.8, 60 sec: 49658.8, 300 sec: 48866.4). Total num frames: 931004416. Throughput: 0: 12149.9. Samples: 232801792. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:15,815][1648981] Avg episode reward: [(0, '525.230')] [2024-06-15 16:58:17,481][1651669] Updated weights for policy 0, policy_version 454640 (0.0012) [2024-06-15 16:58:20,138][1651669] Updated weights for policy 0, policy_version 454688 (0.0011) [2024-06-15 16:58:20,767][1648981] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 931233792. Throughput: 0: 12208.3. Samples: 232878080. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:20,767][1648981] Avg episode reward: [(0, '504.270')] [2024-06-15 16:58:22,222][1651669] Updated weights for policy 0, policy_version 454724 (0.0019) [2024-06-15 16:58:25,256][1651274] Signal inference workers to stop experience collection... (23800 times) [2024-06-15 16:58:25,350][1651669] InferenceWorker_p0-w0: stopping experience collection (23800 times) [2024-06-15 16:58:25,356][1651669] Updated weights for policy 0, policy_version 454789 (0.0012) [2024-06-15 16:58:25,532][1651274] Signal inference workers to resume experience collection... (23800 times) [2024-06-15 16:58:25,532][1651669] InferenceWorker_p0-w0: resuming experience collection (23800 times) [2024-06-15 16:58:25,767][1648981] Fps is (10 sec: 42800.9, 60 sec: 48605.3, 300 sec: 48546.6). Total num frames: 931430400. Throughput: 0: 12219.6. Samples: 232913408. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:25,768][1648981] Avg episode reward: [(0, '484.810')] [2024-06-15 16:58:26,600][1651669] Updated weights for policy 0, policy_version 454842 (0.0083) [2024-06-15 16:58:28,582][1651669] Updated weights for policy 0, policy_version 454900 (0.0022) [2024-06-15 16:58:30,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 931692544. Throughput: 0: 12049.1. Samples: 232984576. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:30,767][1648981] Avg episode reward: [(0, '479.730')] [2024-06-15 16:58:31,149][1651669] Updated weights for policy 0, policy_version 454948 (0.0010) [2024-06-15 16:58:31,597][1651669] Updated weights for policy 0, policy_version 454975 (0.0011) [2024-06-15 16:58:34,202][1651669] Updated weights for policy 0, policy_version 455040 (0.0013) [2024-06-15 16:58:35,766][1648981] Fps is (10 sec: 49155.0, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 931921920. Throughput: 0: 12117.3. Samples: 233059840. Policy #0 lag: (min: 7.0, avg: 119.3, max: 263.0) [2024-06-15 16:58:35,767][1648981] Avg episode reward: [(0, '481.510')] [2024-06-15 16:58:37,734][1651669] Updated weights for policy 0, policy_version 455099 (0.0049) [2024-06-15 16:58:39,116][1651669] Updated weights for policy 0, policy_version 455152 (0.0010) [2024-06-15 16:58:40,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 48605.8, 300 sec: 48653.3). Total num frames: 932184064. Throughput: 0: 12037.7. Samples: 233094144. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:58:40,767][1648981] Avg episode reward: [(0, '492.910')] [2024-06-15 16:58:41,436][1651669] Updated weights for policy 0, policy_version 455184 (0.0058) [2024-06-15 16:58:42,583][1651669] Updated weights for policy 0, policy_version 455228 (0.0010) [2024-06-15 16:58:44,331][1651669] Updated weights for policy 0, policy_version 455267 (0.0012) [2024-06-15 16:58:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 48768.0). Total num frames: 932446208. Throughput: 0: 12265.3. Samples: 233172480. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:58:45,767][1648981] Avg episode reward: [(0, '488.630')] [2024-06-15 16:58:47,124][1651669] Updated weights for policy 0, policy_version 455315 (0.0013) [2024-06-15 16:58:48,094][1651669] Updated weights for policy 0, policy_version 455356 (0.0108) [2024-06-15 16:58:49,871][1651669] Updated weights for policy 0, policy_version 455408 (0.0024) [2024-06-15 16:58:50,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 49698.3, 300 sec: 48874.3). Total num frames: 932708352. Throughput: 0: 12187.8. Samples: 233243648. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:58:50,767][1648981] Avg episode reward: [(0, '453.370')] [2024-06-15 16:58:51,696][1651669] Updated weights for policy 0, policy_version 455428 (0.0011) [2024-06-15 16:58:52,702][1651669] Updated weights for policy 0, policy_version 455484 (0.0102) [2024-06-15 16:58:54,971][1651669] Updated weights for policy 0, policy_version 455536 (0.0012) [2024-06-15 16:58:55,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 48059.5, 300 sec: 48765.1). Total num frames: 932970496. Throughput: 0: 12265.2. Samples: 233289728. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:58:55,768][1648981] Avg episode reward: [(0, '453.100')] [2024-06-15 16:58:55,785][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000455552_932970496.pth... [2024-06-15 16:58:55,905][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000449856_921305088.pth [2024-06-15 16:58:58,495][1651669] Updated weights for policy 0, policy_version 455600 (0.0045) [2024-06-15 16:58:59,840][1651669] Updated weights for policy 0, policy_version 455649 (0.0011) [2024-06-15 16:59:00,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 50244.1, 300 sec: 48874.3). Total num frames: 933232640. Throughput: 0: 12255.5. Samples: 233352704. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:00,767][1648981] Avg episode reward: [(0, '444.420')] [2024-06-15 16:59:02,992][1651669] Updated weights for policy 0, policy_version 455701 (0.0053) [2024-06-15 16:59:03,703][1651669] Updated weights for policy 0, policy_version 455744 (0.0013) [2024-06-15 16:59:05,546][1651669] Updated weights for policy 0, policy_version 455808 (0.0010) [2024-06-15 16:59:05,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 50250.9, 300 sec: 48987.0). Total num frames: 933494784. Throughput: 0: 12401.8. Samples: 233436160. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:05,767][1648981] Avg episode reward: [(0, '448.010')] [2024-06-15 16:59:08,233][1651274] Signal inference workers to stop experience collection... (23850 times) [2024-06-15 16:59:08,313][1651669] InferenceWorker_p0-w0: stopping experience collection (23850 times) [2024-06-15 16:59:08,506][1651274] Signal inference workers to resume experience collection... (23850 times) [2024-06-15 16:59:08,507][1651669] InferenceWorker_p0-w0: resuming experience collection (23850 times) [2024-06-15 16:59:09,509][1651669] Updated weights for policy 0, policy_version 455879 (0.0015) [2024-06-15 16:59:10,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 50244.3, 300 sec: 48874.9). Total num frames: 933756928. Throughput: 0: 12493.0. Samples: 233475584. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:10,767][1648981] Avg episode reward: [(0, '424.430')] [2024-06-15 16:59:13,033][1651669] Updated weights for policy 0, policy_version 455937 (0.0012) [2024-06-15 16:59:13,900][1651669] Updated weights for policy 0, policy_version 455987 (0.0017) [2024-06-15 16:59:15,364][1651669] Updated weights for policy 0, policy_version 456032 (0.0011) [2024-06-15 16:59:15,774][1648981] Fps is (10 sec: 49112.4, 60 sec: 49731.2, 300 sec: 49095.1). Total num frames: 933986304. Throughput: 0: 12513.3. Samples: 233547776. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:15,775][1648981] Avg episode reward: [(0, '425.340')] [2024-06-15 16:59:18,264][1651669] Updated weights for policy 0, policy_version 456065 (0.0011) [2024-06-15 16:59:20,111][1651669] Updated weights for policy 0, policy_version 456132 (0.0034) [2024-06-15 16:59:20,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 49151.8, 300 sec: 48765.8). Total num frames: 934182912. Throughput: 0: 12435.8. Samples: 233619456. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:20,768][1648981] Avg episode reward: [(0, '438.340')] [2024-06-15 16:59:21,499][1651669] Updated weights for policy 0, policy_version 456192 (0.0013) [2024-06-15 16:59:24,845][1651669] Updated weights for policy 0, policy_version 456256 (0.0014) [2024-06-15 16:59:25,767][1648981] Fps is (10 sec: 42631.5, 60 sec: 49698.4, 300 sec: 48763.2). Total num frames: 934412288. Throughput: 0: 12492.8. Samples: 233656320. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:25,768][1648981] Avg episode reward: [(0, '449.790')] [2024-06-15 16:59:26,851][1651669] Updated weights for policy 0, policy_version 456313 (0.0076) [2024-06-15 16:59:29,267][1651669] Updated weights for policy 0, policy_version 456358 (0.0011) [2024-06-15 16:59:30,766][1648981] Fps is (10 sec: 52430.9, 60 sec: 50244.3, 300 sec: 48986.7). Total num frames: 934707200. Throughput: 0: 12492.8. Samples: 233734656. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:30,767][1648981] Avg episode reward: [(0, '449.610')] [2024-06-15 16:59:31,282][1651669] Updated weights for policy 0, policy_version 456416 (0.0137) [2024-06-15 16:59:34,571][1651669] Updated weights for policy 0, policy_version 456466 (0.0012) [2024-06-15 16:59:35,301][1651669] Updated weights for policy 0, policy_version 456506 (0.0013) [2024-06-15 16:59:35,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 50244.2, 300 sec: 48874.5). Total num frames: 934936576. Throughput: 0: 12617.9. Samples: 233811456. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:35,767][1648981] Avg episode reward: [(0, '448.870')] [2024-06-15 16:59:36,356][1651669] Updated weights for policy 0, policy_version 456544 (0.0011) [2024-06-15 16:59:39,259][1651669] Updated weights for policy 0, policy_version 456624 (0.0011) [2024-06-15 16:59:40,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 50244.4, 300 sec: 48874.3). Total num frames: 935198720. Throughput: 0: 12458.8. Samples: 233850368. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:40,767][1648981] Avg episode reward: [(0, '456.180')] [2024-06-15 16:59:42,301][1651669] Updated weights for policy 0, policy_version 456674 (0.0080) [2024-06-15 16:59:45,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48605.9, 300 sec: 48541.5). Total num frames: 935362560. Throughput: 0: 12561.1. Samples: 233917952. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:45,767][1648981] Avg episode reward: [(0, '474.950')] [2024-06-15 16:59:46,150][1651669] Updated weights for policy 0, policy_version 456736 (0.0015) [2024-06-15 16:59:47,693][1651669] Updated weights for policy 0, policy_version 456800 (0.0011) [2024-06-15 16:59:47,831][1651274] Signal inference workers to stop experience collection... (23900 times) [2024-06-15 16:59:47,884][1651669] InferenceWorker_p0-w0: stopping experience collection (23900 times) [2024-06-15 16:59:48,142][1651274] Signal inference workers to resume experience collection... (23900 times) [2024-06-15 16:59:48,143][1651669] InferenceWorker_p0-w0: resuming experience collection (23900 times) [2024-06-15 16:59:49,178][1651669] Updated weights for policy 0, policy_version 456834 (0.0013) [2024-06-15 16:59:50,148][1651669] Updated weights for policy 0, policy_version 456896 (0.0068) [2024-06-15 16:59:50,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.1, 300 sec: 48874.3). Total num frames: 935723008. Throughput: 0: 12276.6. Samples: 233988608. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:50,769][1648981] Avg episode reward: [(0, '483.570')] [2024-06-15 16:59:53,280][1651669] Updated weights for policy 0, policy_version 456954 (0.0090) [2024-06-15 16:59:55,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 48059.9, 300 sec: 48542.1). Total num frames: 935854080. Throughput: 0: 12265.2. Samples: 234027520. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 16:59:55,767][1648981] Avg episode reward: [(0, '490.030')] [2024-06-15 16:59:57,532][1651669] Updated weights for policy 0, policy_version 457024 (0.0092) [2024-06-15 16:59:59,190][1651669] Updated weights for policy 0, policy_version 457088 (0.0011) [2024-06-15 17:00:00,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.1, 300 sec: 48985.4). Total num frames: 936181760. Throughput: 0: 12199.2. Samples: 234096640. Policy #0 lag: (min: 63.0, avg: 193.2, max: 319.0) [2024-06-15 17:00:00,767][1648981] Avg episode reward: [(0, '491.490')] [2024-06-15 17:00:01,118][1651669] Updated weights for policy 0, policy_version 457152 (0.0091) [2024-06-15 17:00:04,347][1651669] Updated weights for policy 0, policy_version 457215 (0.0013) [2024-06-15 17:00:05,778][1648981] Fps is (10 sec: 52369.0, 60 sec: 48050.4, 300 sec: 48983.5). Total num frames: 936378368. Throughput: 0: 12353.2. Samples: 234175488. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:05,779][1648981] Avg episode reward: [(0, '490.360')] [2024-06-15 17:00:08,375][1651669] Updated weights for policy 0, policy_version 457281 (0.0011) [2024-06-15 17:00:09,818][1651669] Updated weights for policy 0, policy_version 457340 (0.0089) [2024-06-15 17:00:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 48987.1). Total num frames: 936640512. Throughput: 0: 12231.2. Samples: 234206720. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:10,767][1648981] Avg episode reward: [(0, '510.460')] [2024-06-15 17:00:12,988][1651669] Updated weights for policy 0, policy_version 457396 (0.0139) [2024-06-15 17:00:15,691][1651669] Updated weights for policy 0, policy_version 457446 (0.0012) [2024-06-15 17:00:15,766][1648981] Fps is (10 sec: 45928.3, 60 sec: 47519.9, 300 sec: 48874.3). Total num frames: 936837120. Throughput: 0: 12037.7. Samples: 234276352. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:15,767][1648981] Avg episode reward: [(0, '515.090')] [2024-06-15 17:00:18,878][1651669] Updated weights for policy 0, policy_version 457507 (0.0014) [2024-06-15 17:00:20,767][1648981] Fps is (10 sec: 45871.4, 60 sec: 48605.4, 300 sec: 48652.0). Total num frames: 937099264. Throughput: 0: 11593.7. Samples: 234333184. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:20,768][1648981] Avg episode reward: [(0, '504.370')] [2024-06-15 17:00:20,994][1651669] Updated weights for policy 0, policy_version 457594 (0.0021) [2024-06-15 17:00:25,356][1651669] Updated weights for policy 0, policy_version 457649 (0.0012) [2024-06-15 17:00:25,767][1648981] Fps is (10 sec: 45870.5, 60 sec: 48059.1, 300 sec: 48652.0). Total num frames: 937295872. Throughput: 0: 11650.6. Samples: 234374656. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:25,768][1648981] Avg episode reward: [(0, '491.470')] [2024-06-15 17:00:26,487][1651669] Updated weights for policy 0, policy_version 457696 (0.0016) [2024-06-15 17:00:30,252][1651669] Updated weights for policy 0, policy_version 457762 (0.0073) [2024-06-15 17:00:30,766][1648981] Fps is (10 sec: 45879.2, 60 sec: 47513.5, 300 sec: 48652.2). Total num frames: 937558016. Throughput: 0: 11798.7. Samples: 234448896. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:30,767][1648981] Avg episode reward: [(0, '480.400')] [2024-06-15 17:00:30,868][1651274] Signal inference workers to stop experience collection... (23950 times) [2024-06-15 17:00:30,999][1651669] InferenceWorker_p0-w0: stopping experience collection (23950 times) [2024-06-15 17:00:31,195][1651274] Signal inference workers to resume experience collection... (23950 times) [2024-06-15 17:00:31,195][1651669] InferenceWorker_p0-w0: resuming experience collection (23950 times) [2024-06-15 17:00:31,382][1651669] Updated weights for policy 0, policy_version 457812 (0.0016) [2024-06-15 17:00:32,395][1651669] Updated weights for policy 0, policy_version 457855 (0.0089) [2024-06-15 17:00:35,766][1648981] Fps is (10 sec: 45880.2, 60 sec: 46967.5, 300 sec: 48541.1). Total num frames: 937754624. Throughput: 0: 11912.6. Samples: 234524672. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:35,767][1648981] Avg episode reward: [(0, '463.930')] [2024-06-15 17:00:36,062][1651669] Updated weights for policy 0, policy_version 457904 (0.0012) [2024-06-15 17:00:37,439][1651669] Updated weights for policy 0, policy_version 457953 (0.0041) [2024-06-15 17:00:40,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 48430.6). Total num frames: 937951232. Throughput: 0: 11673.6. Samples: 234552832. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:40,767][1648981] Avg episode reward: [(0, '464.580')] [2024-06-15 17:00:41,277][1651669] Updated weights for policy 0, policy_version 458016 (0.0013) [2024-06-15 17:00:42,801][1651669] Updated weights for policy 0, policy_version 458082 (0.0012) [2024-06-15 17:00:43,399][1651669] Updated weights for policy 0, policy_version 458109 (0.0011) [2024-06-15 17:00:45,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 938213376. Throughput: 0: 11787.4. Samples: 234627072. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:45,767][1648981] Avg episode reward: [(0, '438.610')] [2024-06-15 17:00:48,086][1651669] Updated weights for policy 0, policy_version 458182 (0.0215) [2024-06-15 17:00:50,772][1648981] Fps is (10 sec: 52399.5, 60 sec: 45871.0, 300 sec: 48436.9). Total num frames: 938475520. Throughput: 0: 11504.5. Samples: 234693120. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:50,776][1648981] Avg episode reward: [(0, '425.890')] [2024-06-15 17:00:51,813][1651669] Updated weights for policy 0, policy_version 458256 (0.0034) [2024-06-15 17:00:53,140][1651669] Updated weights for policy 0, policy_version 458320 (0.0016) [2024-06-15 17:00:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.9, 300 sec: 48874.3). Total num frames: 938737664. Throughput: 0: 11594.0. Samples: 234728448. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:00:55,767][1648981] Avg episode reward: [(0, '414.550')] [2024-06-15 17:00:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000458368_938737664.pth... [2024-06-15 17:00:55,876][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000452672_927072256.pth [2024-06-15 17:00:57,496][1651669] Updated weights for policy 0, policy_version 458370 (0.0012) [2024-06-15 17:00:59,188][1651669] Updated weights for policy 0, policy_version 458434 (0.0014) [2024-06-15 17:01:00,466][1651669] Updated weights for policy 0, policy_version 458494 (0.0014) [2024-06-15 17:01:00,767][1648981] Fps is (10 sec: 52456.3, 60 sec: 46967.2, 300 sec: 48652.1). Total num frames: 938999808. Throughput: 0: 11844.2. Samples: 234809344. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:01:00,767][1648981] Avg episode reward: [(0, '428.740')] [2024-06-15 17:01:02,828][1651669] Updated weights for policy 0, policy_version 458532 (0.0016) [2024-06-15 17:01:04,509][1651669] Updated weights for policy 0, policy_version 458614 (0.0012) [2024-06-15 17:01:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48069.0, 300 sec: 48874.3). Total num frames: 939261952. Throughput: 0: 12128.9. Samples: 234878976. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:01:05,767][1648981] Avg episode reward: [(0, '429.790')] [2024-06-15 17:01:08,940][1651669] Updated weights for policy 0, policy_version 458660 (0.0035) [2024-06-15 17:01:10,199][1651274] Signal inference workers to stop experience collection... (24000 times) [2024-06-15 17:01:10,201][1651669] Updated weights for policy 0, policy_version 458706 (0.0014) [2024-06-15 17:01:10,254][1651669] InferenceWorker_p0-w0: stopping experience collection (24000 times) [2024-06-15 17:01:10,441][1651274] Signal inference workers to resume experience collection... (24000 times) [2024-06-15 17:01:10,441][1651669] InferenceWorker_p0-w0: resuming experience collection (24000 times) [2024-06-15 17:01:10,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 46967.5, 300 sec: 48763.2). Total num frames: 939458560. Throughput: 0: 12140.4. Samples: 234920960. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:01:10,767][1648981] Avg episode reward: [(0, '414.510')] [2024-06-15 17:01:11,112][1651669] Updated weights for policy 0, policy_version 458747 (0.0012) [2024-06-15 17:01:14,066][1651669] Updated weights for policy 0, policy_version 458816 (0.0013) [2024-06-15 17:01:15,262][1651669] Updated weights for policy 0, policy_version 458880 (0.0026) [2024-06-15 17:01:15,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 49151.9, 300 sec: 48875.7). Total num frames: 939786240. Throughput: 0: 12003.5. Samples: 234989056. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:01:15,767][1648981] Avg episode reward: [(0, '442.390')] [2024-06-15 17:01:19,941][1651669] Updated weights for policy 0, policy_version 458934 (0.0011) [2024-06-15 17:01:20,770][1648981] Fps is (10 sec: 49133.3, 60 sec: 47511.3, 300 sec: 48762.6). Total num frames: 939950080. Throughput: 0: 12116.3. Samples: 235069952. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:01:20,771][1648981] Avg episode reward: [(0, '456.270')] [2024-06-15 17:01:21,452][1651669] Updated weights for policy 0, policy_version 458999 (0.0101) [2024-06-15 17:01:23,763][1651669] Updated weights for policy 0, policy_version 459041 (0.0011) [2024-06-15 17:01:25,136][1651669] Updated weights for policy 0, policy_version 459104 (0.0012) [2024-06-15 17:01:25,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 50245.2, 300 sec: 48985.4). Total num frames: 940310528. Throughput: 0: 12219.7. Samples: 235102720. Policy #0 lag: (min: 31.0, avg: 145.4, max: 287.0) [2024-06-15 17:01:25,767][1648981] Avg episode reward: [(0, '454.200')] [2024-06-15 17:01:30,766][1648981] Fps is (10 sec: 39336.9, 60 sec: 46421.4, 300 sec: 48542.9). Total num frames: 940343296. Throughput: 0: 12231.1. Samples: 235177472. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:01:30,767][1648981] Avg episode reward: [(0, '455.590')] [2024-06-15 17:01:30,921][1651669] Updated weights for policy 0, policy_version 459168 (0.0012) [2024-06-15 17:01:33,032][1651669] Updated weights for policy 0, policy_version 459251 (0.0014) [2024-06-15 17:01:33,699][1651669] Updated weights for policy 0, policy_version 459280 (0.0011) [2024-06-15 17:01:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 940769280. Throughput: 0: 12084.7. Samples: 235236864. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:01:35,767][1648981] Avg episode reward: [(0, '452.750')] [2024-06-15 17:01:36,005][1651669] Updated weights for policy 0, policy_version 459361 (0.0015) [2024-06-15 17:01:40,770][1648981] Fps is (10 sec: 49132.8, 60 sec: 48056.7, 300 sec: 48430.4). Total num frames: 940834816. Throughput: 0: 12093.6. Samples: 235272704. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:01:40,771][1648981] Avg episode reward: [(0, '443.490')] [2024-06-15 17:01:42,170][1651669] Updated weights for policy 0, policy_version 459398 (0.0013) [2024-06-15 17:01:43,608][1651669] Updated weights for policy 0, policy_version 459459 (0.0087) [2024-06-15 17:01:45,164][1651669] Updated weights for policy 0, policy_version 459520 (0.0011) [2024-06-15 17:01:45,766][1648981] Fps is (10 sec: 36044.7, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 941129728. Throughput: 0: 12026.4. Samples: 235350528. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:01:45,768][1648981] Avg episode reward: [(0, '438.790')] [2024-06-15 17:01:46,898][1651274] Signal inference workers to stop experience collection... (24050 times) [2024-06-15 17:01:46,912][1651669] Updated weights for policy 0, policy_version 459585 (0.0017) [2024-06-15 17:01:46,949][1651669] InferenceWorker_p0-w0: stopping experience collection (24050 times) [2024-06-15 17:01:47,222][1651274] Signal inference workers to resume experience collection... (24050 times) [2024-06-15 17:01:47,224][1651669] InferenceWorker_p0-w0: resuming experience collection (24050 times) [2024-06-15 17:01:48,372][1651669] Updated weights for policy 0, policy_version 459648 (0.0104) [2024-06-15 17:01:50,766][1648981] Fps is (10 sec: 52448.8, 60 sec: 48064.2, 300 sec: 48543.0). Total num frames: 941359104. Throughput: 0: 11741.9. Samples: 235407360. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:01:50,767][1648981] Avg episode reward: [(0, '444.740')] [2024-06-15 17:01:54,528][1651669] Updated weights for policy 0, policy_version 459698 (0.0016) [2024-06-15 17:01:55,770][1648981] Fps is (10 sec: 42581.3, 60 sec: 46964.4, 300 sec: 48207.2). Total num frames: 941555712. Throughput: 0: 11854.6. Samples: 235454464. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:01:55,771][1648981] Avg episode reward: [(0, '466.840')] [2024-06-15 17:01:56,000][1651669] Updated weights for policy 0, policy_version 459772 (0.0030) [2024-06-15 17:01:57,944][1651669] Updated weights for policy 0, policy_version 459832 (0.0106) [2024-06-15 17:02:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48060.0, 300 sec: 48541.1). Total num frames: 941883392. Throughput: 0: 11548.5. Samples: 235508736. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:00,767][1648981] Avg episode reward: [(0, '460.210')] [2024-06-15 17:02:04,530][1651669] Updated weights for policy 0, policy_version 459906 (0.0017) [2024-06-15 17:02:05,766][1648981] Fps is (10 sec: 45893.5, 60 sec: 45875.2, 300 sec: 48096.8). Total num frames: 942014464. Throughput: 0: 11617.7. Samples: 235592704. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:05,767][1648981] Avg episode reward: [(0, '444.290')] [2024-06-15 17:02:06,339][1651669] Updated weights for policy 0, policy_version 460000 (0.0013) [2024-06-15 17:02:07,838][1651669] Updated weights for policy 0, policy_version 460037 (0.0014) [2024-06-15 17:02:09,931][1651669] Updated weights for policy 0, policy_version 460112 (0.0011) [2024-06-15 17:02:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 942342144. Throughput: 0: 11639.5. Samples: 235626496. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:10,767][1648981] Avg episode reward: [(0, '428.600')] [2024-06-15 17:02:11,165][1651669] Updated weights for policy 0, policy_version 460158 (0.0019) [2024-06-15 17:02:15,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 43690.8, 300 sec: 47763.5). Total num frames: 942407680. Throughput: 0: 11571.2. Samples: 235698176. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:15,767][1648981] Avg episode reward: [(0, '432.110')] [2024-06-15 17:02:16,345][1651669] Updated weights for policy 0, policy_version 460208 (0.0062) [2024-06-15 17:02:17,744][1651669] Updated weights for policy 0, policy_version 460279 (0.0012) [2024-06-15 17:02:19,175][1651669] Updated weights for policy 0, policy_version 460320 (0.0010) [2024-06-15 17:02:20,699][1651669] Updated weights for policy 0, policy_version 460369 (0.0013) [2024-06-15 17:02:20,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48062.8, 300 sec: 48541.1). Total num frames: 942833664. Throughput: 0: 11798.7. Samples: 235767808. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:20,767][1648981] Avg episode reward: [(0, '426.330')] [2024-06-15 17:02:25,630][1651669] Updated weights for policy 0, policy_version 460420 (0.0012) [2024-06-15 17:02:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 47985.7). Total num frames: 942931968. Throughput: 0: 11833.9. Samples: 235805184. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:25,767][1648981] Avg episode reward: [(0, '435.660')] [2024-06-15 17:02:27,146][1651274] Signal inference workers to stop experience collection... (24100 times) [2024-06-15 17:02:27,247][1651669] InferenceWorker_p0-w0: stopping experience collection (24100 times) [2024-06-15 17:02:27,374][1651274] Signal inference workers to resume experience collection... (24100 times) [2024-06-15 17:02:27,375][1651669] InferenceWorker_p0-w0: resuming experience collection (24100 times) [2024-06-15 17:02:27,506][1651669] Updated weights for policy 0, policy_version 460515 (0.0102) [2024-06-15 17:02:29,186][1651669] Updated weights for policy 0, policy_version 460576 (0.0012) [2024-06-15 17:02:30,602][1651669] Updated weights for policy 0, policy_version 460640 (0.0123) [2024-06-15 17:02:30,770][1648981] Fps is (10 sec: 55684.4, 60 sec: 50787.1, 300 sec: 48651.5). Total num frames: 943390720. Throughput: 0: 11877.4. Samples: 235885056. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:30,771][1648981] Avg episode reward: [(0, '437.460')] [2024-06-15 17:02:35,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 44783.0, 300 sec: 48096.8). Total num frames: 943456256. Throughput: 0: 12356.3. Samples: 235963392. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:35,767][1648981] Avg episode reward: [(0, '447.950')] [2024-06-15 17:02:36,940][1651669] Updated weights for policy 0, policy_version 460688 (0.0015) [2024-06-15 17:02:38,281][1651669] Updated weights for policy 0, policy_version 460757 (0.0012) [2024-06-15 17:02:39,338][1651669] Updated weights for policy 0, policy_version 460816 (0.0011) [2024-06-15 17:02:40,674][1651669] Updated weights for policy 0, policy_version 460880 (0.0012) [2024-06-15 17:02:40,778][1648981] Fps is (10 sec: 49112.9, 60 sec: 50783.7, 300 sec: 48539.1). Total num frames: 943882240. Throughput: 0: 12103.9. Samples: 235999232. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:40,779][1648981] Avg episode reward: [(0, '437.630')] [2024-06-15 17:02:41,835][1651669] Updated weights for policy 0, policy_version 460928 (0.0012) [2024-06-15 17:02:45,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 943980544. Throughput: 0: 12458.7. Samples: 236069376. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:45,767][1648981] Avg episode reward: [(0, '441.190')] [2024-06-15 17:02:48,477][1651669] Updated weights for policy 0, policy_version 460992 (0.0020) [2024-06-15 17:02:49,732][1651669] Updated weights for policy 0, policy_version 461056 (0.0012) [2024-06-15 17:02:50,772][1648981] Fps is (10 sec: 45902.8, 60 sec: 49693.4, 300 sec: 48318.0). Total num frames: 944340992. Throughput: 0: 12309.2. Samples: 236146688. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:50,773][1648981] Avg episode reward: [(0, '436.510')] [2024-06-15 17:02:50,798][1651669] Updated weights for policy 0, policy_version 461116 (0.0018) [2024-06-15 17:02:51,859][1651669] Updated weights for policy 0, policy_version 461158 (0.0050) [2024-06-15 17:02:55,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49155.3, 300 sec: 48430.0). Total num frames: 944504832. Throughput: 0: 12435.9. Samples: 236186112. Policy #0 lag: (min: 15.0, avg: 87.8, max: 271.0) [2024-06-15 17:02:55,767][1648981] Avg episode reward: [(0, '437.480')] [2024-06-15 17:02:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000461184_944504832.pth... [2024-06-15 17:02:55,874][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000455552_932970496.pth [2024-06-15 17:02:58,454][1651669] Updated weights for policy 0, policy_version 461217 (0.0013) [2024-06-15 17:03:00,068][1651669] Updated weights for policy 0, policy_version 461298 (0.0012) [2024-06-15 17:03:00,779][1648981] Fps is (10 sec: 45842.5, 60 sec: 48595.5, 300 sec: 48540.2). Total num frames: 944799744. Throughput: 0: 12432.4. Samples: 236257792. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:00,780][1648981] Avg episode reward: [(0, '474.200')] [2024-06-15 17:03:01,366][1651669] Updated weights for policy 0, policy_version 461368 (0.0141) [2024-06-15 17:03:02,230][1651274] Signal inference workers to stop experience collection... (24150 times) [2024-06-15 17:03:02,267][1651669] InferenceWorker_p0-w0: stopping experience collection (24150 times) [2024-06-15 17:03:02,377][1651274] Signal inference workers to resume experience collection... (24150 times) [2024-06-15 17:03:02,377][1651669] InferenceWorker_p0-w0: resuming experience collection (24150 times) [2024-06-15 17:03:02,559][1651669] Updated weights for policy 0, policy_version 461409 (0.0010) [2024-06-15 17:03:05,783][1648981] Fps is (10 sec: 52342.3, 60 sec: 50230.4, 300 sec: 48427.3). Total num frames: 945029120. Throughput: 0: 12692.9. Samples: 236339200. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:05,784][1648981] Avg episode reward: [(0, '497.470')] [2024-06-15 17:03:08,107][1651669] Updated weights for policy 0, policy_version 461458 (0.0014) [2024-06-15 17:03:10,114][1651669] Updated weights for policy 0, policy_version 461521 (0.0026) [2024-06-15 17:03:10,766][1648981] Fps is (10 sec: 45934.3, 60 sec: 48605.8, 300 sec: 48326.8). Total num frames: 945258496. Throughput: 0: 12663.5. Samples: 236375040. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:10,767][1648981] Avg episode reward: [(0, '504.230')] [2024-06-15 17:03:11,330][1651669] Updated weights for policy 0, policy_version 461585 (0.0014) [2024-06-15 17:03:12,689][1651669] Updated weights for policy 0, policy_version 461648 (0.0040) [2024-06-15 17:03:13,481][1651669] Updated weights for policy 0, policy_version 461691 (0.0013) [2024-06-15 17:03:15,766][1648981] Fps is (10 sec: 52516.2, 60 sec: 52428.9, 300 sec: 48541.1). Total num frames: 945553408. Throughput: 0: 12459.7. Samples: 236445696. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:15,767][1648981] Avg episode reward: [(0, '495.190')] [2024-06-15 17:03:18,480][1651669] Updated weights for policy 0, policy_version 461733 (0.0011) [2024-06-15 17:03:20,451][1651669] Updated weights for policy 0, policy_version 461793 (0.0086) [2024-06-15 17:03:20,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 49151.8, 300 sec: 48652.2). Total num frames: 945782784. Throughput: 0: 12481.3. Samples: 236525056. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:20,767][1648981] Avg episode reward: [(0, '495.540')] [2024-06-15 17:03:21,551][1651669] Updated weights for policy 0, policy_version 461856 (0.0011) [2024-06-15 17:03:22,824][1651669] Updated weights for policy 0, policy_version 461890 (0.0046) [2024-06-15 17:03:23,772][1651669] Updated weights for policy 0, policy_version 461942 (0.0014) [2024-06-15 17:03:25,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 52428.8, 300 sec: 48763.2). Total num frames: 946077696. Throughput: 0: 12473.3. Samples: 236560384. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:25,767][1648981] Avg episode reward: [(0, '492.000')] [2024-06-15 17:03:28,641][1651669] Updated weights for policy 0, policy_version 462013 (0.0105) [2024-06-15 17:03:30,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 47516.6, 300 sec: 48541.1). Total num frames: 946241536. Throughput: 0: 12754.5. Samples: 236643328. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:30,767][1648981] Avg episode reward: [(0, '521.580')] [2024-06-15 17:03:31,158][1651669] Updated weights for policy 0, policy_version 462064 (0.0013) [2024-06-15 17:03:32,925][1651669] Updated weights for policy 0, policy_version 462131 (0.0096) [2024-06-15 17:03:34,041][1651669] Updated weights for policy 0, policy_version 462160 (0.0010) [2024-06-15 17:03:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 52428.7, 300 sec: 48874.3). Total num frames: 946601984. Throughput: 0: 12437.5. Samples: 236706304. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:35,767][1648981] Avg episode reward: [(0, '521.060')] [2024-06-15 17:03:38,744][1651669] Updated weights for policy 0, policy_version 462215 (0.0117) [2024-06-15 17:03:40,160][1651669] Updated weights for policy 0, policy_version 462271 (0.0011) [2024-06-15 17:03:40,797][1648981] Fps is (10 sec: 49003.0, 60 sec: 47498.8, 300 sec: 48425.0). Total num frames: 946733056. Throughput: 0: 12609.4. Samples: 236753920. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:40,798][1648981] Avg episode reward: [(0, '521.460')] [2024-06-15 17:03:42,889][1651274] Signal inference workers to stop experience collection... (24200 times) [2024-06-15 17:03:42,949][1651669] InferenceWorker_p0-w0: stopping experience collection (24200 times) [2024-06-15 17:03:42,961][1651669] Updated weights for policy 0, policy_version 462356 (0.0012) [2024-06-15 17:03:43,167][1651274] Signal inference workers to resume experience collection... (24200 times) [2024-06-15 17:03:43,168][1651669] InferenceWorker_p0-w0: resuming experience collection (24200 times) [2024-06-15 17:03:44,105][1651669] Updated weights for policy 0, policy_version 462400 (0.0020) [2024-06-15 17:03:45,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 51336.5, 300 sec: 48652.1). Total num frames: 947060736. Throughput: 0: 12257.4. Samples: 236809216. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:45,767][1648981] Avg episode reward: [(0, '518.730')] [2024-06-15 17:03:45,988][1651669] Updated weights for policy 0, policy_version 462453 (0.0012) [2024-06-15 17:03:50,656][1651669] Updated weights for policy 0, policy_version 462497 (0.0014) [2024-06-15 17:03:50,782][1648981] Fps is (10 sec: 45942.6, 60 sec: 47505.7, 300 sec: 48205.3). Total num frames: 947191808. Throughput: 0: 12219.9. Samples: 236889088. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:50,783][1648981] Avg episode reward: [(0, '503.070')] [2024-06-15 17:03:53,104][1651669] Updated weights for policy 0, policy_version 462560 (0.0016) [2024-06-15 17:03:54,469][1651669] Updated weights for policy 0, policy_version 462610 (0.0011) [2024-06-15 17:03:55,419][1651669] Updated weights for policy 0, policy_version 462651 (0.0024) [2024-06-15 17:03:55,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 50241.1, 300 sec: 48429.4). Total num frames: 947519488. Throughput: 0: 12150.4. Samples: 236921856. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:03:55,771][1648981] Avg episode reward: [(0, '475.460')] [2024-06-15 17:03:56,967][1651669] Updated weights for policy 0, policy_version 462714 (0.0015) [2024-06-15 17:04:00,767][1648981] Fps is (10 sec: 45947.0, 60 sec: 47523.7, 300 sec: 47985.7). Total num frames: 947650560. Throughput: 0: 12071.8. Samples: 236988928. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:04:00,767][1648981] Avg episode reward: [(0, '456.790')] [2024-06-15 17:04:04,013][1651669] Updated weights for policy 0, policy_version 462800 (0.0013) [2024-06-15 17:04:05,766][1648981] Fps is (10 sec: 42614.6, 60 sec: 48619.2, 300 sec: 48096.7). Total num frames: 947945472. Throughput: 0: 11832.9. Samples: 237057536. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:04:05,767][1648981] Avg episode reward: [(0, '437.480')] [2024-06-15 17:04:05,813][1651669] Updated weights for policy 0, policy_version 462866 (0.0013) [2024-06-15 17:04:06,855][1651669] Updated weights for policy 0, policy_version 462912 (0.0012) [2024-06-15 17:04:08,804][1651669] Updated weights for policy 0, policy_version 462962 (0.0024) [2024-06-15 17:04:10,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48605.9, 300 sec: 48098.1). Total num frames: 948174848. Throughput: 0: 11764.6. Samples: 237089792. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:04:10,767][1648981] Avg episode reward: [(0, '442.850')] [2024-06-15 17:04:12,474][1651669] Updated weights for policy 0, policy_version 463027 (0.0014) [2024-06-15 17:04:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 48096.8). Total num frames: 948371456. Throughput: 0: 11639.5. Samples: 237167104. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:04:15,767][1648981] Avg episode reward: [(0, '424.450')] [2024-06-15 17:04:15,774][1651669] Updated weights for policy 0, policy_version 463088 (0.0012) [2024-06-15 17:04:17,318][1651669] Updated weights for policy 0, policy_version 463136 (0.0127) [2024-06-15 17:04:20,053][1651669] Updated weights for policy 0, policy_version 463188 (0.0011) [2024-06-15 17:04:20,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.9, 300 sec: 48319.0). Total num frames: 948666368. Throughput: 0: 11628.1. Samples: 237229568. Policy #0 lag: (min: 15.0, avg: 75.9, max: 271.0) [2024-06-15 17:04:20,767][1648981] Avg episode reward: [(0, '407.920')] [2024-06-15 17:04:22,242][1651669] Updated weights for policy 0, policy_version 463233 (0.0014) [2024-06-15 17:04:23,475][1651669] Updated weights for policy 0, policy_version 463292 (0.0012) [2024-06-15 17:04:25,767][1648981] Fps is (10 sec: 45873.1, 60 sec: 45874.8, 300 sec: 47874.5). Total num frames: 948830208. Throughput: 0: 11419.5. Samples: 237267456. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:25,768][1648981] Avg episode reward: [(0, '401.300')] [2024-06-15 17:04:27,528][1651274] Signal inference workers to stop experience collection... (24250 times) [2024-06-15 17:04:27,567][1651669] Updated weights for policy 0, policy_version 463346 (0.0011) [2024-06-15 17:04:27,580][1651669] InferenceWorker_p0-w0: stopping experience collection (24250 times) [2024-06-15 17:04:27,763][1651274] Signal inference workers to resume experience collection... (24250 times) [2024-06-15 17:04:27,763][1651669] InferenceWorker_p0-w0: resuming experience collection (24250 times) [2024-06-15 17:04:28,909][1651669] Updated weights for policy 0, policy_version 463394 (0.0011) [2024-06-15 17:04:30,770][1648981] Fps is (10 sec: 45857.8, 60 sec: 48056.7, 300 sec: 48096.2). Total num frames: 949125120. Throughput: 0: 11775.0. Samples: 237339136. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:30,771][1648981] Avg episode reward: [(0, '408.100')] [2024-06-15 17:04:31,158][1651669] Updated weights for policy 0, policy_version 463459 (0.0013) [2024-06-15 17:04:33,783][1651669] Updated weights for policy 0, policy_version 463520 (0.0014) [2024-06-15 17:04:35,766][1648981] Fps is (10 sec: 52431.6, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 949354496. Throughput: 0: 11654.9. Samples: 237413376. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:35,767][1648981] Avg episode reward: [(0, '407.310')] [2024-06-15 17:04:38,119][1651669] Updated weights for policy 0, policy_version 463585 (0.0012) [2024-06-15 17:04:40,211][1651669] Updated weights for policy 0, policy_version 463675 (0.0010) [2024-06-15 17:04:40,770][1648981] Fps is (10 sec: 49151.7, 60 sec: 48081.1, 300 sec: 48318.3). Total num frames: 949616640. Throughput: 0: 11867.0. Samples: 237455872. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:40,771][1648981] Avg episode reward: [(0, '409.560')] [2024-06-15 17:04:41,644][1651669] Updated weights for policy 0, policy_version 463718 (0.0012) [2024-06-15 17:04:43,550][1651669] Updated weights for policy 0, policy_version 463747 (0.0016) [2024-06-15 17:04:44,929][1651669] Updated weights for policy 0, policy_version 463803 (0.0033) [2024-06-15 17:04:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 949878784. Throughput: 0: 11878.4. Samples: 237523456. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:45,767][1648981] Avg episode reward: [(0, '407.230')] [2024-06-15 17:04:49,625][1651669] Updated weights for policy 0, policy_version 463876 (0.0014) [2024-06-15 17:04:50,770][1648981] Fps is (10 sec: 49151.7, 60 sec: 48615.5, 300 sec: 48318.3). Total num frames: 950108160. Throughput: 0: 11979.8. Samples: 237596672. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:50,771][1648981] Avg episode reward: [(0, '433.800')] [2024-06-15 17:04:50,957][1651669] Updated weights for policy 0, policy_version 463931 (0.0011) [2024-06-15 17:04:53,022][1651669] Updated weights for policy 0, policy_version 463992 (0.0016) [2024-06-15 17:04:54,856][1651669] Updated weights for policy 0, policy_version 464032 (0.0013) [2024-06-15 17:04:55,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 48062.6, 300 sec: 48207.8). Total num frames: 950403072. Throughput: 0: 12162.8. Samples: 237637120. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:04:55,767][1648981] Avg episode reward: [(0, '441.870')] [2024-06-15 17:04:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000464064_950403072.pth... [2024-06-15 17:04:55,808][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000458368_938737664.pth [2024-06-15 17:04:58,863][1651669] Updated weights for policy 0, policy_version 464096 (0.0013) [2024-06-15 17:05:00,225][1651669] Updated weights for policy 0, policy_version 464160 (0.0101) [2024-06-15 17:05:00,766][1648981] Fps is (10 sec: 52449.0, 60 sec: 49698.2, 300 sec: 48320.8). Total num frames: 950632448. Throughput: 0: 12094.6. Samples: 237711360. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:00,767][1648981] Avg episode reward: [(0, '432.490')] [2024-06-15 17:05:03,446][1651669] Updated weights for policy 0, policy_version 464224 (0.0013) [2024-06-15 17:05:05,363][1651669] Updated weights for policy 0, policy_version 464261 (0.0044) [2024-06-15 17:05:05,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 950829056. Throughput: 0: 12231.1. Samples: 237779968. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:05,767][1648981] Avg episode reward: [(0, '430.960')] [2024-06-15 17:05:06,738][1651669] Updated weights for policy 0, policy_version 464319 (0.0020) [2024-06-15 17:05:09,769][1651274] Signal inference workers to stop experience collection... (24300 times) [2024-06-15 17:05:09,874][1651669] InferenceWorker_p0-w0: stopping experience collection (24300 times) [2024-06-15 17:05:10,048][1651274] Signal inference workers to resume experience collection... (24300 times) [2024-06-15 17:05:10,050][1651669] InferenceWorker_p0-w0: resuming experience collection (24300 times) [2024-06-15 17:05:10,617][1651669] Updated weights for policy 0, policy_version 464373 (0.0042) [2024-06-15 17:05:10,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 951058432. Throughput: 0: 12128.8. Samples: 237813248. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:10,769][1648981] Avg episode reward: [(0, '429.900')] [2024-06-15 17:05:12,744][1651669] Updated weights for policy 0, policy_version 464442 (0.0012) [2024-06-15 17:05:15,312][1651669] Updated weights for policy 0, policy_version 464499 (0.0016) [2024-06-15 17:05:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49152.0, 300 sec: 48208.0). Total num frames: 951320576. Throughput: 0: 12084.2. Samples: 237882880. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:15,767][1648981] Avg episode reward: [(0, '434.230')] [2024-06-15 17:05:17,152][1651669] Updated weights for policy 0, policy_version 464544 (0.0012) [2024-06-15 17:05:20,539][1651669] Updated weights for policy 0, policy_version 464595 (0.0017) [2024-06-15 17:05:20,770][1648981] Fps is (10 sec: 45858.5, 60 sec: 47510.6, 300 sec: 48207.4). Total num frames: 951517184. Throughput: 0: 12048.0. Samples: 237955584. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:20,771][1648981] Avg episode reward: [(0, '440.360')] [2024-06-15 17:05:23,319][1651669] Updated weights for policy 0, policy_version 464672 (0.0011) [2024-06-15 17:05:23,929][1651669] Updated weights for policy 0, policy_version 464704 (0.0011) [2024-06-15 17:05:25,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48060.2, 300 sec: 47985.7). Total num frames: 951713792. Throughput: 0: 11833.9. Samples: 237988352. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:25,767][1648981] Avg episode reward: [(0, '439.060')] [2024-06-15 17:05:26,899][1651669] Updated weights for policy 0, policy_version 464766 (0.0041) [2024-06-15 17:05:29,010][1651669] Updated weights for policy 0, policy_version 464826 (0.0035) [2024-06-15 17:05:30,766][1648981] Fps is (10 sec: 49171.5, 60 sec: 48062.9, 300 sec: 48318.9). Total num frames: 952008704. Throughput: 0: 11958.1. Samples: 238061568. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:30,767][1648981] Avg episode reward: [(0, '463.470')] [2024-06-15 17:05:31,683][1651669] Updated weights for policy 0, policy_version 464886 (0.0010) [2024-06-15 17:05:33,895][1651669] Updated weights for policy 0, policy_version 464928 (0.0014) [2024-06-15 17:05:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 952238080. Throughput: 0: 12027.4. Samples: 238137856. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:35,767][1648981] Avg episode reward: [(0, '464.310')] [2024-06-15 17:05:36,820][1651669] Updated weights for policy 0, policy_version 464976 (0.0029) [2024-06-15 17:05:38,202][1651669] Updated weights for policy 0, policy_version 465040 (0.0014) [2024-06-15 17:05:40,766][1648981] Fps is (10 sec: 49150.8, 60 sec: 48062.7, 300 sec: 48430.0). Total num frames: 952500224. Throughput: 0: 11980.9. Samples: 238176256. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:40,767][1648981] Avg episode reward: [(0, '472.090')] [2024-06-15 17:05:40,795][1651669] Updated weights for policy 0, policy_version 465104 (0.0013) [2024-06-15 17:05:44,395][1651669] Updated weights for policy 0, policy_version 465156 (0.0013) [2024-06-15 17:05:45,577][1651669] Updated weights for policy 0, policy_version 465213 (0.0012) [2024-06-15 17:05:45,787][1648981] Fps is (10 sec: 52321.4, 60 sec: 48043.4, 300 sec: 48427.6). Total num frames: 952762368. Throughput: 0: 12214.2. Samples: 238261248. Policy #0 lag: (min: 15.0, avg: 126.4, max: 271.0) [2024-06-15 17:05:45,787][1648981] Avg episode reward: [(0, '460.190')] [2024-06-15 17:05:48,103][1651669] Updated weights for policy 0, policy_version 465282 (0.0017) [2024-06-15 17:05:49,279][1651669] Updated weights for policy 0, policy_version 465344 (0.0027) [2024-06-15 17:05:50,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 48608.9, 300 sec: 48430.0). Total num frames: 953024512. Throughput: 0: 12128.7. Samples: 238325760. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:05:50,767][1648981] Avg episode reward: [(0, '458.680')] [2024-06-15 17:05:51,405][1651274] Signal inference workers to stop experience collection... (24350 times) [2024-06-15 17:05:51,440][1651669] InferenceWorker_p0-w0: stopping experience collection (24350 times) [2024-06-15 17:05:51,690][1651274] Signal inference workers to resume experience collection... (24350 times) [2024-06-15 17:05:51,691][1651669] InferenceWorker_p0-w0: resuming experience collection (24350 times) [2024-06-15 17:05:52,315][1651669] Updated weights for policy 0, policy_version 465394 (0.0016) [2024-06-15 17:05:54,944][1651669] Updated weights for policy 0, policy_version 465428 (0.0010) [2024-06-15 17:05:55,766][1648981] Fps is (10 sec: 49252.8, 60 sec: 47513.8, 300 sec: 48319.0). Total num frames: 953253888. Throughput: 0: 12299.4. Samples: 238366720. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:05:55,767][1648981] Avg episode reward: [(0, '430.940')] [2024-06-15 17:05:56,003][1651669] Updated weights for policy 0, policy_version 465469 (0.0013) [2024-06-15 17:05:57,892][1651669] Updated weights for policy 0, policy_version 465508 (0.0043) [2024-06-15 17:05:59,603][1651669] Updated weights for policy 0, policy_version 465592 (0.0053) [2024-06-15 17:06:00,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 48605.6, 300 sec: 48430.0). Total num frames: 953548800. Throughput: 0: 12344.8. Samples: 238438400. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:00,767][1648981] Avg episode reward: [(0, '437.340')] [2024-06-15 17:06:01,839][1651669] Updated weights for policy 0, policy_version 465662 (0.0013) [2024-06-15 17:06:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 953745408. Throughput: 0: 12653.2. Samples: 238524928. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:05,767][1648981] Avg episode reward: [(0, '479.190')] [2024-06-15 17:06:06,056][1651669] Updated weights for policy 0, policy_version 465712 (0.0012) [2024-06-15 17:06:07,807][1651669] Updated weights for policy 0, policy_version 465749 (0.0010) [2024-06-15 17:06:09,289][1651669] Updated weights for policy 0, policy_version 465813 (0.0011) [2024-06-15 17:06:10,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 954073088. Throughput: 0: 12743.1. Samples: 238561792. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:10,767][1648981] Avg episode reward: [(0, '465.300')] [2024-06-15 17:06:11,754][1651669] Updated weights for policy 0, policy_version 465877 (0.0045) [2024-06-15 17:06:12,400][1651669] Updated weights for policy 0, policy_version 465920 (0.0019) [2024-06-15 17:06:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48605.9, 300 sec: 48430.6). Total num frames: 954236928. Throughput: 0: 12731.7. Samples: 238634496. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:15,767][1648981] Avg episode reward: [(0, '449.290')] [2024-06-15 17:06:16,878][1651669] Updated weights for policy 0, policy_version 465976 (0.0011) [2024-06-15 17:06:19,008][1651669] Updated weights for policy 0, policy_version 466016 (0.0022) [2024-06-15 17:06:20,541][1651669] Updated weights for policy 0, policy_version 466080 (0.0012) [2024-06-15 17:06:20,767][1648981] Fps is (10 sec: 45875.3, 60 sec: 50247.4, 300 sec: 48207.8). Total num frames: 954531840. Throughput: 0: 12572.4. Samples: 238703616. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:20,768][1648981] Avg episode reward: [(0, '449.610')] [2024-06-15 17:06:22,054][1651669] Updated weights for policy 0, policy_version 466136 (0.0012) [2024-06-15 17:06:22,779][1651669] Updated weights for policy 0, policy_version 466175 (0.0012) [2024-06-15 17:06:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 48763.2). Total num frames: 954728448. Throughput: 0: 12504.2. Samples: 238738944. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:25,767][1648981] Avg episode reward: [(0, '433.700')] [2024-06-15 17:06:27,383][1651669] Updated weights for policy 0, policy_version 466229 (0.0012) [2024-06-15 17:06:29,420][1651669] Updated weights for policy 0, policy_version 466272 (0.0033) [2024-06-15 17:06:30,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 50244.2, 300 sec: 48318.9). Total num frames: 955023360. Throughput: 0: 12509.9. Samples: 238823936. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:30,767][1648981] Avg episode reward: [(0, '427.370')] [2024-06-15 17:06:30,976][1651669] Updated weights for policy 0, policy_version 466336 (0.0077) [2024-06-15 17:06:31,067][1651274] Signal inference workers to stop experience collection... (24400 times) [2024-06-15 17:06:31,116][1651669] InferenceWorker_p0-w0: stopping experience collection (24400 times) [2024-06-15 17:06:31,400][1651274] Signal inference workers to resume experience collection... (24400 times) [2024-06-15 17:06:31,430][1651669] InferenceWorker_p0-w0: resuming experience collection (24400 times) [2024-06-15 17:06:32,071][1651669] Updated weights for policy 0, policy_version 466371 (0.0010) [2024-06-15 17:06:35,778][1648981] Fps is (10 sec: 52367.0, 60 sec: 50234.4, 300 sec: 48873.0). Total num frames: 955252736. Throughput: 0: 12455.4. Samples: 238886400. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:35,779][1648981] Avg episode reward: [(0, '441.450')] [2024-06-15 17:06:37,591][1651669] Updated weights for policy 0, policy_version 466434 (0.0014) [2024-06-15 17:06:38,789][1651669] Updated weights for policy 0, policy_version 466496 (0.0012) [2024-06-15 17:06:40,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 955416576. Throughput: 0: 12390.4. Samples: 238924288. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:40,767][1648981] Avg episode reward: [(0, '432.970')] [2024-06-15 17:06:42,062][1651669] Updated weights for policy 0, policy_version 466564 (0.0012) [2024-06-15 17:06:44,958][1651669] Updated weights for policy 0, policy_version 466686 (0.0105) [2024-06-15 17:06:45,768][1648981] Fps is (10 sec: 52490.2, 60 sec: 50261.4, 300 sec: 48874.3). Total num frames: 955777024. Throughput: 0: 12049.1. Samples: 238980608. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:45,768][1648981] Avg episode reward: [(0, '430.450')] [2024-06-15 17:06:50,171][1651669] Updated weights for policy 0, policy_version 466742 (0.0013) [2024-06-15 17:06:50,782][1648981] Fps is (10 sec: 49073.1, 60 sec: 48047.0, 300 sec: 48650.2). Total num frames: 955908096. Throughput: 0: 11896.9. Samples: 239060480. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:50,783][1648981] Avg episode reward: [(0, '427.690')] [2024-06-15 17:06:52,878][1651669] Updated weights for policy 0, policy_version 466818 (0.0011) [2024-06-15 17:06:54,207][1651669] Updated weights for policy 0, policy_version 466877 (0.0014) [2024-06-15 17:06:55,700][1651669] Updated weights for policy 0, policy_version 466936 (0.0012) [2024-06-15 17:06:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 956268544. Throughput: 0: 11855.6. Samples: 239095296. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:06:55,767][1648981] Avg episode reward: [(0, '437.110')] [2024-06-15 17:06:55,830][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000466944_956301312.pth... [2024-06-15 17:06:55,903][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000461184_944504832.pth [2024-06-15 17:07:00,766][1648981] Fps is (10 sec: 45949.0, 60 sec: 46967.7, 300 sec: 48652.2). Total num frames: 956366848. Throughput: 0: 11992.2. Samples: 239174144. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:07:00,767][1648981] Avg episode reward: [(0, '447.650')] [2024-06-15 17:07:01,051][1651669] Updated weights for policy 0, policy_version 466992 (0.0012) [2024-06-15 17:07:02,769][1651669] Updated weights for policy 0, policy_version 467041 (0.0013) [2024-06-15 17:07:04,678][1651669] Updated weights for policy 0, policy_version 467136 (0.0012) [2024-06-15 17:07:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 956760064. Throughput: 0: 11901.2. Samples: 239239168. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:07:05,767][1648981] Avg episode reward: [(0, '432.180')] [2024-06-15 17:07:06,104][1651669] Updated weights for policy 0, policy_version 467199 (0.0162) [2024-06-15 17:07:10,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 48874.3). Total num frames: 956825600. Throughput: 0: 12083.2. Samples: 239282688. Policy #0 lag: (min: 31.0, avg: 133.3, max: 287.0) [2024-06-15 17:07:10,767][1648981] Avg episode reward: [(0, '408.120')] [2024-06-15 17:07:12,435][1651669] Updated weights for policy 0, policy_version 467253 (0.0014) [2024-06-15 17:07:13,006][1651274] Signal inference workers to stop experience collection... (24450 times) [2024-06-15 17:07:13,068][1651669] InferenceWorker_p0-w0: stopping experience collection (24450 times) [2024-06-15 17:07:13,208][1651274] Signal inference workers to resume experience collection... (24450 times) [2024-06-15 17:07:13,209][1651669] InferenceWorker_p0-w0: resuming experience collection (24450 times) [2024-06-15 17:07:13,686][1651669] Updated weights for policy 0, policy_version 467316 (0.0013) [2024-06-15 17:07:15,312][1651669] Updated weights for policy 0, policy_version 467395 (0.0112) [2024-06-15 17:07:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 957251584. Throughput: 0: 11844.2. Samples: 239356928. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:15,767][1648981] Avg episode reward: [(0, '415.460')] [2024-06-15 17:07:16,780][1651669] Updated weights for policy 0, policy_version 467454 (0.0011) [2024-06-15 17:07:20,790][1648981] Fps is (10 sec: 52304.0, 60 sec: 46948.8, 300 sec: 48870.4). Total num frames: 957349888. Throughput: 0: 12136.8. Samples: 239432704. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:20,791][1648981] Avg episode reward: [(0, '414.950')] [2024-06-15 17:07:23,502][1651669] Updated weights for policy 0, policy_version 467522 (0.0012) [2024-06-15 17:07:24,904][1651669] Updated weights for policy 0, policy_version 467600 (0.0015) [2024-06-15 17:07:25,767][1648981] Fps is (10 sec: 45871.5, 60 sec: 49697.5, 300 sec: 48541.6). Total num frames: 957710336. Throughput: 0: 12219.5. Samples: 239474176. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:25,768][1648981] Avg episode reward: [(0, '405.690')] [2024-06-15 17:07:26,090][1651669] Updated weights for policy 0, policy_version 467660 (0.0045) [2024-06-15 17:07:26,945][1651669] Updated weights for policy 0, policy_version 467704 (0.0016) [2024-06-15 17:07:30,767][1648981] Fps is (10 sec: 52550.3, 60 sec: 47512.9, 300 sec: 48874.2). Total num frames: 957874176. Throughput: 0: 12777.0. Samples: 239555584. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:30,768][1648981] Avg episode reward: [(0, '408.590')] [2024-06-15 17:07:33,435][1651669] Updated weights for policy 0, policy_version 467762 (0.0013) [2024-06-15 17:07:35,279][1651669] Updated weights for policy 0, policy_version 467844 (0.0093) [2024-06-15 17:07:35,766][1648981] Fps is (10 sec: 49155.7, 60 sec: 49161.6, 300 sec: 48543.0). Total num frames: 958201856. Throughput: 0: 12554.2. Samples: 239625216. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:35,767][1648981] Avg episode reward: [(0, '399.030')] [2024-06-15 17:07:36,516][1651669] Updated weights for policy 0, policy_version 467906 (0.0013) [2024-06-15 17:07:40,766][1648981] Fps is (10 sec: 52432.5, 60 sec: 49698.0, 300 sec: 48874.3). Total num frames: 958398464. Throughput: 0: 12492.8. Samples: 239657472. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:40,767][1648981] Avg episode reward: [(0, '410.180')] [2024-06-15 17:07:43,493][1651669] Updated weights for policy 0, policy_version 467970 (0.0012) [2024-06-15 17:07:44,894][1651669] Updated weights for policy 0, policy_version 468032 (0.0012) [2024-06-15 17:07:45,802][1648981] Fps is (10 sec: 39182.7, 60 sec: 46939.8, 300 sec: 48314.0). Total num frames: 958595072. Throughput: 0: 12630.7. Samples: 239742976. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:45,802][1648981] Avg episode reward: [(0, '432.300')] [2024-06-15 17:07:46,058][1651669] Updated weights for policy 0, policy_version 468085 (0.0061) [2024-06-15 17:07:46,918][1651274] Signal inference workers to stop experience collection... (24500 times) [2024-06-15 17:07:46,949][1651669] InferenceWorker_p0-w0: stopping experience collection (24500 times) [2024-06-15 17:07:47,123][1651274] Signal inference workers to resume experience collection... (24500 times) [2024-06-15 17:07:47,124][1651669] InferenceWorker_p0-w0: resuming experience collection (24500 times) [2024-06-15 17:07:47,422][1651669] Updated weights for policy 0, policy_version 468160 (0.0089) [2024-06-15 17:07:48,539][1651669] Updated weights for policy 0, policy_version 468221 (0.0013) [2024-06-15 17:07:50,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50257.7, 300 sec: 48874.3). Total num frames: 958922752. Throughput: 0: 12674.8. Samples: 239809536. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:50,767][1648981] Avg episode reward: [(0, '426.050')] [2024-06-15 17:07:55,579][1651669] Updated weights for policy 0, policy_version 468291 (0.0013) [2024-06-15 17:07:55,766][1648981] Fps is (10 sec: 49327.2, 60 sec: 46967.5, 300 sec: 48432.1). Total num frames: 959086592. Throughput: 0: 12629.4. Samples: 239851008. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:07:55,767][1648981] Avg episode reward: [(0, '424.020')] [2024-06-15 17:07:57,188][1651669] Updated weights for policy 0, policy_version 468370 (0.0013) [2024-06-15 17:07:59,157][1651669] Updated weights for policy 0, policy_version 468464 (0.0013) [2024-06-15 17:08:00,770][1648981] Fps is (10 sec: 52410.8, 60 sec: 51333.6, 300 sec: 48876.5). Total num frames: 959447040. Throughput: 0: 12127.8. Samples: 239902720. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:00,770][1648981] Avg episode reward: [(0, '430.940')] [2024-06-15 17:08:05,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45329.1, 300 sec: 48207.8). Total num frames: 959479808. Throughput: 0: 12374.2. Samples: 239989248. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:05,767][1648981] Avg episode reward: [(0, '451.270')] [2024-06-15 17:08:06,837][1651669] Updated weights for policy 0, policy_version 468544 (0.0013) [2024-06-15 17:08:08,305][1651669] Updated weights for policy 0, policy_version 468612 (0.0013) [2024-06-15 17:08:10,265][1651669] Updated weights for policy 0, policy_version 468691 (0.0011) [2024-06-15 17:08:10,785][1648981] Fps is (10 sec: 45804.2, 60 sec: 51320.4, 300 sec: 48649.0). Total num frames: 959905792. Throughput: 0: 11998.7. Samples: 240014336. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:10,786][1648981] Avg episode reward: [(0, '464.990')] [2024-06-15 17:08:11,316][1651669] Updated weights for policy 0, policy_version 468736 (0.0011) [2024-06-15 17:08:15,778][1648981] Fps is (10 sec: 49094.2, 60 sec: 45320.2, 300 sec: 48094.9). Total num frames: 959971328. Throughput: 0: 11693.5. Samples: 240081920. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:15,779][1648981] Avg episode reward: [(0, '490.030')] [2024-06-15 17:08:18,635][1651669] Updated weights for policy 0, policy_version 468816 (0.0109) [2024-06-15 17:08:20,379][1651669] Updated weights for policy 0, policy_version 468882 (0.0011) [2024-06-15 17:08:20,766][1648981] Fps is (10 sec: 39396.2, 60 sec: 49171.6, 300 sec: 48207.8). Total num frames: 960299008. Throughput: 0: 11685.0. Samples: 240151040. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:20,767][1648981] Avg episode reward: [(0, '486.900')] [2024-06-15 17:08:22,179][1651669] Updated weights for policy 0, policy_version 468947 (0.0013) [2024-06-15 17:08:25,766][1648981] Fps is (10 sec: 52490.7, 60 sec: 46422.0, 300 sec: 48318.9). Total num frames: 960495616. Throughput: 0: 11468.8. Samples: 240173568. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:25,767][1648981] Avg episode reward: [(0, '494.450')] [2024-06-15 17:08:28,703][1651274] Signal inference workers to stop experience collection... (24550 times) [2024-06-15 17:08:28,777][1651669] InferenceWorker_p0-w0: stopping experience collection (24550 times) [2024-06-15 17:08:28,945][1651274] Signal inference workers to resume experience collection... (24550 times) [2024-06-15 17:08:28,946][1651669] InferenceWorker_p0-w0: resuming experience collection (24550 times) [2024-06-15 17:08:29,128][1651669] Updated weights for policy 0, policy_version 469011 (0.0120) [2024-06-15 17:08:30,766][1648981] Fps is (10 sec: 36045.0, 60 sec: 46422.0, 300 sec: 47652.5). Total num frames: 960659456. Throughput: 0: 11443.7. Samples: 240257536. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:30,767][1648981] Avg episode reward: [(0, '497.690')] [2024-06-15 17:08:31,108][1651669] Updated weights for policy 0, policy_version 469104 (0.0013) [2024-06-15 17:08:32,719][1651669] Updated weights for policy 0, policy_version 469154 (0.0013) [2024-06-15 17:08:34,580][1651669] Updated weights for policy 0, policy_version 469222 (0.0021) [2024-06-15 17:08:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46967.5, 300 sec: 48435.0). Total num frames: 961019904. Throughput: 0: 11013.7. Samples: 240305152. Policy #0 lag: (min: 47.0, avg: 110.0, max: 303.0) [2024-06-15 17:08:35,767][1648981] Avg episode reward: [(0, '496.160')] [2024-06-15 17:08:40,755][1651669] Updated weights for policy 0, policy_version 469307 (0.0012) [2024-06-15 17:08:40,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 45326.3, 300 sec: 47651.8). Total num frames: 961118208. Throughput: 0: 11126.5. Samples: 240351744. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:08:40,771][1648981] Avg episode reward: [(0, '490.610')] [2024-06-15 17:08:42,339][1651669] Updated weights for policy 0, policy_version 469349 (0.0040) [2024-06-15 17:08:44,293][1651669] Updated weights for policy 0, policy_version 469426 (0.0142) [2024-06-15 17:08:45,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48634.7, 300 sec: 48543.7). Total num frames: 961511424. Throughput: 0: 11390.0. Samples: 240415232. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:08:45,767][1648981] Avg episode reward: [(0, '484.360')] [2024-06-15 17:08:46,025][1651669] Updated weights for policy 0, policy_version 469501 (0.0013) [2024-06-15 17:08:50,786][1648981] Fps is (10 sec: 49073.4, 60 sec: 44768.2, 300 sec: 47760.9). Total num frames: 961609728. Throughput: 0: 11327.3. Samples: 240499200. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:08:50,787][1648981] Avg episode reward: [(0, '458.900')] [2024-06-15 17:08:50,988][1651669] Updated weights for policy 0, policy_version 469558 (0.0011) [2024-06-15 17:08:53,123][1651669] Updated weights for policy 0, policy_version 469616 (0.0064) [2024-06-15 17:08:54,491][1651669] Updated weights for policy 0, policy_version 469680 (0.0021) [2024-06-15 17:08:55,766][1651669] Updated weights for policy 0, policy_version 469731 (0.0012) [2024-06-15 17:08:55,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48605.8, 300 sec: 48652.2). Total num frames: 962002944. Throughput: 0: 11507.8. Samples: 240531968. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:08:55,767][1648981] Avg episode reward: [(0, '455.360')] [2024-06-15 17:08:56,323][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000469760_962068480.pth... [2024-06-15 17:08:56,363][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000464064_950403072.pth [2024-06-15 17:09:00,766][1648981] Fps is (10 sec: 45966.1, 60 sec: 43693.2, 300 sec: 47874.6). Total num frames: 962068480. Throughput: 0: 11733.5. Samples: 240609792. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:00,767][1648981] Avg episode reward: [(0, '468.640')] [2024-06-15 17:09:01,457][1651669] Updated weights for policy 0, policy_version 469808 (0.0014) [2024-06-15 17:09:03,438][1651274] Signal inference workers to stop experience collection... (24600 times) [2024-06-15 17:09:03,495][1651669] InferenceWorker_p0-w0: stopping experience collection (24600 times) [2024-06-15 17:09:03,703][1651274] Signal inference workers to resume experience collection... (24600 times) [2024-06-15 17:09:03,704][1651669] InferenceWorker_p0-w0: resuming experience collection (24600 times) [2024-06-15 17:09:03,706][1651669] Updated weights for policy 0, policy_version 469904 (0.0086) [2024-06-15 17:09:05,494][1651669] Updated weights for policy 0, policy_version 469969 (0.0012) [2024-06-15 17:09:05,778][1648981] Fps is (10 sec: 52365.6, 60 sec: 50780.1, 300 sec: 48650.2). Total num frames: 962527232. Throughput: 0: 11772.8. Samples: 240680960. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:05,779][1648981] Avg episode reward: [(0, '474.950')] [2024-06-15 17:09:10,767][1648981] Fps is (10 sec: 52426.7, 60 sec: 44796.8, 300 sec: 48207.8). Total num frames: 962592768. Throughput: 0: 12196.9. Samples: 240722432. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:10,767][1648981] Avg episode reward: [(0, '453.740')] [2024-06-15 17:09:10,970][1651669] Updated weights for policy 0, policy_version 470032 (0.0017) [2024-06-15 17:09:12,712][1651669] Updated weights for policy 0, policy_version 470112 (0.0011) [2024-06-15 17:09:14,161][1651669] Updated weights for policy 0, policy_version 470161 (0.0011) [2024-06-15 17:09:15,581][1651669] Updated weights for policy 0, policy_version 470224 (0.0013) [2024-06-15 17:09:15,766][1648981] Fps is (10 sec: 49212.1, 60 sec: 50800.5, 300 sec: 48652.2). Total num frames: 963018752. Throughput: 0: 11980.8. Samples: 240796672. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:15,767][1648981] Avg episode reward: [(0, '461.500')] [2024-06-15 17:09:16,690][1651669] Updated weights for policy 0, policy_version 470271 (0.0012) [2024-06-15 17:09:20,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 46967.5, 300 sec: 48430.1). Total num frames: 963117056. Throughput: 0: 12754.5. Samples: 240879104. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:20,767][1648981] Avg episode reward: [(0, '463.040')] [2024-06-15 17:09:23,218][1651669] Updated weights for policy 0, policy_version 470322 (0.0011) [2024-06-15 17:09:24,838][1651669] Updated weights for policy 0, policy_version 470385 (0.0013) [2024-06-15 17:09:25,774][1648981] Fps is (10 sec: 39290.7, 60 sec: 48599.6, 300 sec: 48429.3). Total num frames: 963411968. Throughput: 0: 12560.0. Samples: 240916992. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:25,775][1648981] Avg episode reward: [(0, '457.940')] [2024-06-15 17:09:26,732][1651669] Updated weights for policy 0, policy_version 470465 (0.0218) [2024-06-15 17:09:27,838][1651669] Updated weights for policy 0, policy_version 470522 (0.0012) [2024-06-15 17:09:30,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 49694.9, 300 sec: 48429.4). Total num frames: 963641344. Throughput: 0: 12582.7. Samples: 240981504. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:30,771][1648981] Avg episode reward: [(0, '473.110')] [2024-06-15 17:09:33,461][1651669] Updated weights for policy 0, policy_version 470576 (0.0011) [2024-06-15 17:09:34,675][1651669] Updated weights for policy 0, policy_version 470624 (0.0012) [2024-06-15 17:09:35,766][1648981] Fps is (10 sec: 49190.2, 60 sec: 48059.7, 300 sec: 48430.6). Total num frames: 963903488. Throughput: 0: 12350.3. Samples: 241054720. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:35,767][1648981] Avg episode reward: [(0, '463.740')] [2024-06-15 17:09:36,568][1651669] Updated weights for policy 0, policy_version 470691 (0.0085) [2024-06-15 17:09:37,550][1651669] Updated weights for policy 0, policy_version 470736 (0.0013) [2024-06-15 17:09:38,614][1651669] Updated weights for policy 0, policy_version 470784 (0.0150) [2024-06-15 17:09:40,780][1648981] Fps is (10 sec: 52376.8, 60 sec: 50782.0, 300 sec: 48427.7). Total num frames: 964165632. Throughput: 0: 12193.3. Samples: 241080832. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:40,781][1648981] Avg episode reward: [(0, '479.350')] [2024-06-15 17:09:43,847][1651274] Signal inference workers to stop experience collection... (24650 times) [2024-06-15 17:09:43,909][1651669] InferenceWorker_p0-w0: stopping experience collection (24650 times) [2024-06-15 17:09:44,066][1651274] Signal inference workers to resume experience collection... (24650 times) [2024-06-15 17:09:44,067][1651669] InferenceWorker_p0-w0: resuming experience collection (24650 times) [2024-06-15 17:09:45,549][1651669] Updated weights for policy 0, policy_version 470850 (0.0111) [2024-06-15 17:09:45,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 46967.3, 300 sec: 48208.4). Total num frames: 964329472. Throughput: 0: 12344.9. Samples: 241165312. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:45,767][1648981] Avg episode reward: [(0, '487.890')] [2024-06-15 17:09:47,561][1651669] Updated weights for policy 0, policy_version 470929 (0.0127) [2024-06-15 17:09:49,529][1651669] Updated weights for policy 0, policy_version 470992 (0.0030) [2024-06-15 17:09:50,561][1651669] Updated weights for policy 0, policy_version 471040 (0.0013) [2024-06-15 17:09:50,766][1648981] Fps is (10 sec: 52501.0, 60 sec: 51353.5, 300 sec: 48430.0). Total num frames: 964689920. Throughput: 0: 11949.9. Samples: 241218560. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:50,767][1648981] Avg episode reward: [(0, '487.030')] [2024-06-15 17:09:55,767][1648981] Fps is (10 sec: 42594.3, 60 sec: 45874.4, 300 sec: 47874.4). Total num frames: 964755456. Throughput: 0: 12003.4. Samples: 241262592. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:09:55,768][1648981] Avg episode reward: [(0, '484.400')] [2024-06-15 17:09:56,371][1651669] Updated weights for policy 0, policy_version 471107 (0.0013) [2024-06-15 17:09:58,017][1651669] Updated weights for policy 0, policy_version 471171 (0.0011) [2024-06-15 17:09:59,424][1651669] Updated weights for policy 0, policy_version 471232 (0.0055) [2024-06-15 17:10:00,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 50244.3, 300 sec: 48318.9). Total num frames: 965083136. Throughput: 0: 11719.1. Samples: 241324032. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:10:00,767][1648981] Avg episode reward: [(0, '478.260')] [2024-06-15 17:10:02,270][1651669] Updated weights for policy 0, policy_version 471291 (0.0011) [2024-06-15 17:10:05,799][1648981] Fps is (10 sec: 45732.6, 60 sec: 44767.9, 300 sec: 47980.5). Total num frames: 965214208. Throughput: 0: 11676.6. Samples: 241404928. Policy #0 lag: (min: 11.0, avg: 70.3, max: 267.0) [2024-06-15 17:10:05,799][1648981] Avg episode reward: [(0, '471.960')] [2024-06-15 17:10:07,725][1651669] Updated weights for policy 0, policy_version 471377 (0.0012) [2024-06-15 17:10:09,706][1651669] Updated weights for policy 0, policy_version 471441 (0.0013) [2024-06-15 17:10:10,767][1648981] Fps is (10 sec: 52426.2, 60 sec: 50244.2, 300 sec: 48429.9). Total num frames: 965607424. Throughput: 0: 11493.4. Samples: 241434112. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:10,768][1648981] Avg episode reward: [(0, '472.980')] [2024-06-15 17:10:13,253][1651669] Updated weights for policy 0, policy_version 471505 (0.0012) [2024-06-15 17:10:15,766][1648981] Fps is (10 sec: 52598.8, 60 sec: 45329.0, 300 sec: 48208.5). Total num frames: 965738496. Throughput: 0: 11435.6. Samples: 241496064. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:15,767][1648981] Avg episode reward: [(0, '470.850')] [2024-06-15 17:10:17,768][1651669] Updated weights for policy 0, policy_version 471569 (0.0013) [2024-06-15 17:10:19,044][1651669] Updated weights for policy 0, policy_version 471632 (0.0010) [2024-06-15 17:10:20,328][1651669] Updated weights for policy 0, policy_version 471673 (0.0048) [2024-06-15 17:10:20,607][1651274] Signal inference workers to stop experience collection... (24700 times) [2024-06-15 17:10:20,724][1651669] InferenceWorker_p0-w0: stopping experience collection (24700 times) [2024-06-15 17:10:20,766][1648981] Fps is (10 sec: 39323.5, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 966000640. Throughput: 0: 11468.8. Samples: 241570816. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:20,767][1648981] Avg episode reward: [(0, '482.140')] [2024-06-15 17:10:20,925][1651274] Signal inference workers to resume experience collection... (24700 times) [2024-06-15 17:10:20,927][1651669] InferenceWorker_p0-w0: resuming experience collection (24700 times) [2024-06-15 17:10:21,605][1651669] Updated weights for policy 0, policy_version 471720 (0.0012) [2024-06-15 17:10:24,087][1651669] Updated weights for policy 0, policy_version 471760 (0.0016) [2024-06-15 17:10:25,077][1651669] Updated weights for policy 0, policy_version 471802 (0.0012) [2024-06-15 17:10:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47519.7, 300 sec: 48318.9). Total num frames: 966262784. Throughput: 0: 11677.2. Samples: 241606144. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:25,767][1648981] Avg episode reward: [(0, '461.450')] [2024-06-15 17:10:28,285][1651669] Updated weights for policy 0, policy_version 471840 (0.0068) [2024-06-15 17:10:30,696][1651669] Updated weights for policy 0, policy_version 471909 (0.0037) [2024-06-15 17:10:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46970.4, 300 sec: 48207.8). Total num frames: 966459392. Throughput: 0: 11491.6. Samples: 241682432. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:30,767][1648981] Avg episode reward: [(0, '444.290')] [2024-06-15 17:10:32,393][1651669] Updated weights for policy 0, policy_version 471984 (0.0013) [2024-06-15 17:10:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 48207.9). Total num frames: 966721536. Throughput: 0: 11912.5. Samples: 241754624. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:35,767][1648981] Avg episode reward: [(0, '445.660')] [2024-06-15 17:10:36,008][1651669] Updated weights for policy 0, policy_version 472048 (0.0011) [2024-06-15 17:10:39,296][1651669] Updated weights for policy 0, policy_version 472098 (0.0012) [2024-06-15 17:10:40,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 46431.7, 300 sec: 48100.0). Total num frames: 966950912. Throughput: 0: 11821.7. Samples: 241794560. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:40,768][1648981] Avg episode reward: [(0, '465.920')] [2024-06-15 17:10:41,398][1651669] Updated weights for policy 0, policy_version 472176 (0.0013) [2024-06-15 17:10:42,746][1651669] Updated weights for policy 0, policy_version 472228 (0.0011) [2024-06-15 17:10:45,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 967180288. Throughput: 0: 12003.6. Samples: 241864192. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:45,767][1648981] Avg episode reward: [(0, '455.350')] [2024-06-15 17:10:46,132][1651669] Updated weights for policy 0, policy_version 472272 (0.0012) [2024-06-15 17:10:47,165][1651669] Updated weights for policy 0, policy_version 472318 (0.0012) [2024-06-15 17:10:50,465][1651669] Updated weights for policy 0, policy_version 472371 (0.0011) [2024-06-15 17:10:50,767][1648981] Fps is (10 sec: 49152.6, 60 sec: 45875.0, 300 sec: 48096.7). Total num frames: 967442432. Throughput: 0: 12012.1. Samples: 241945088. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:50,767][1648981] Avg episode reward: [(0, '459.320')] [2024-06-15 17:10:51,964][1651669] Updated weights for policy 0, policy_version 472433 (0.0011) [2024-06-15 17:10:53,508][1651669] Updated weights for policy 0, policy_version 472501 (0.0013) [2024-06-15 17:10:55,768][1648981] Fps is (10 sec: 52422.1, 60 sec: 49151.9, 300 sec: 47985.5). Total num frames: 967704576. Throughput: 0: 11923.7. Samples: 241970688. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:10:55,768][1648981] Avg episode reward: [(0, '476.780')] [2024-06-15 17:10:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000472512_967704576.pth... [2024-06-15 17:10:55,855][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000466944_956301312.pth [2024-06-15 17:10:55,859][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000472512_967704576.pth [2024-06-15 17:10:58,011][1651669] Updated weights for policy 0, policy_version 472545 (0.0016) [2024-06-15 17:10:59,744][1651669] Updated weights for policy 0, policy_version 472577 (0.0014) [2024-06-15 17:11:00,774][1648981] Fps is (10 sec: 49115.4, 60 sec: 47507.5, 300 sec: 48095.5). Total num frames: 967933952. Throughput: 0: 12490.7. Samples: 242058240. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:00,775][1648981] Avg episode reward: [(0, '471.760')] [2024-06-15 17:11:01,072][1651274] Signal inference workers to stop experience collection... (24750 times) [2024-06-15 17:11:01,099][1651669] Updated weights for policy 0, policy_version 472643 (0.0010) [2024-06-15 17:11:01,117][1651669] InferenceWorker_p0-w0: stopping experience collection (24750 times) [2024-06-15 17:11:01,272][1651274] Signal inference workers to resume experience collection... (24750 times) [2024-06-15 17:11:01,273][1651669] InferenceWorker_p0-w0: resuming experience collection (24750 times) [2024-06-15 17:11:02,326][1651669] Updated weights for policy 0, policy_version 472706 (0.0011) [2024-06-15 17:11:03,594][1651669] Updated weights for policy 0, policy_version 472767 (0.0016) [2024-06-15 17:11:05,766][1648981] Fps is (10 sec: 52435.1, 60 sec: 50271.3, 300 sec: 47985.7). Total num frames: 968228864. Throughput: 0: 12424.5. Samples: 242129920. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:05,767][1648981] Avg episode reward: [(0, '485.690')] [2024-06-15 17:11:08,343][1651669] Updated weights for policy 0, policy_version 472826 (0.0013) [2024-06-15 17:11:10,766][1648981] Fps is (10 sec: 42631.5, 60 sec: 45875.6, 300 sec: 47874.6). Total num frames: 968359936. Throughput: 0: 12424.5. Samples: 242165248. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:10,767][1648981] Avg episode reward: [(0, '488.770')] [2024-06-15 17:11:11,714][1651669] Updated weights for policy 0, policy_version 472865 (0.0014) [2024-06-15 17:11:13,603][1651669] Updated weights for policy 0, policy_version 472948 (0.0098) [2024-06-15 17:11:14,895][1651669] Updated weights for policy 0, policy_version 473013 (0.0012) [2024-06-15 17:11:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 48207.9). Total num frames: 968753152. Throughput: 0: 12231.1. Samples: 242232832. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:15,767][1648981] Avg episode reward: [(0, '474.180')] [2024-06-15 17:11:18,385][1651669] Updated weights for policy 0, policy_version 473060 (0.0011) [2024-06-15 17:11:20,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 968884224. Throughput: 0: 12515.5. Samples: 242317824. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:20,767][1648981] Avg episode reward: [(0, '450.650')] [2024-06-15 17:11:21,493][1651669] Updated weights for policy 0, policy_version 473108 (0.0014) [2024-06-15 17:11:22,939][1651669] Updated weights for policy 0, policy_version 473168 (0.0016) [2024-06-15 17:11:24,914][1651669] Updated weights for policy 0, policy_version 473254 (0.0174) [2024-06-15 17:11:25,778][1648981] Fps is (10 sec: 52366.9, 60 sec: 50234.4, 300 sec: 48317.0). Total num frames: 969277440. Throughput: 0: 12182.5. Samples: 242342912. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:25,779][1648981] Avg episode reward: [(0, '453.480')] [2024-06-15 17:11:29,894][1651669] Updated weights for policy 0, policy_version 473322 (0.0013) [2024-06-15 17:11:30,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 49151.9, 300 sec: 47987.6). Total num frames: 969408512. Throughput: 0: 12356.2. Samples: 242420224. Policy #0 lag: (min: 15.0, avg: 85.5, max: 271.0) [2024-06-15 17:11:30,768][1648981] Avg episode reward: [(0, '453.370')] [2024-06-15 17:11:32,511][1651669] Updated weights for policy 0, policy_version 473363 (0.0013) [2024-06-15 17:11:33,817][1651669] Updated weights for policy 0, policy_version 473410 (0.0126) [2024-06-15 17:11:35,590][1651669] Updated weights for policy 0, policy_version 473473 (0.0012) [2024-06-15 17:11:35,766][1648981] Fps is (10 sec: 39367.9, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 969670656. Throughput: 0: 12003.6. Samples: 242485248. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:11:35,767][1648981] Avg episode reward: [(0, '454.660')] [2024-06-15 17:11:36,789][1651669] Updated weights for policy 0, policy_version 473531 (0.0010) [2024-06-15 17:11:40,677][1651274] Signal inference workers to stop experience collection... (24800 times) [2024-06-15 17:11:40,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 48060.0, 300 sec: 47652.5). Total num frames: 969834496. Throughput: 0: 12265.6. Samples: 242522624. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:11:40,767][1648981] Avg episode reward: [(0, '467.700')] [2024-06-15 17:11:40,771][1651669] InferenceWorker_p0-w0: stopping experience collection (24800 times) [2024-06-15 17:11:40,941][1651274] Signal inference workers to resume experience collection... (24800 times) [2024-06-15 17:11:40,942][1651669] InferenceWorker_p0-w0: resuming experience collection (24800 times) [2024-06-15 17:11:41,264][1651669] Updated weights for policy 0, policy_version 473584 (0.0040) [2024-06-15 17:11:43,806][1651669] Updated weights for policy 0, policy_version 473626 (0.0012) [2024-06-15 17:11:45,496][1651669] Updated weights for policy 0, policy_version 473683 (0.0017) [2024-06-15 17:11:45,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 48210.5). Total num frames: 970129408. Throughput: 0: 11994.2. Samples: 242597888. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:11:45,767][1648981] Avg episode reward: [(0, '464.730')] [2024-06-15 17:11:47,121][1651669] Updated weights for policy 0, policy_version 473745 (0.0029) [2024-06-15 17:11:48,120][1651669] Updated weights for policy 0, policy_version 473786 (0.0012) [2024-06-15 17:11:50,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.9, 300 sec: 47652.5). Total num frames: 970326016. Throughput: 0: 11980.8. Samples: 242669056. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:11:50,767][1648981] Avg episode reward: [(0, '474.250')] [2024-06-15 17:11:52,163][1651669] Updated weights for policy 0, policy_version 473840 (0.0013) [2024-06-15 17:11:55,129][1651669] Updated weights for policy 0, policy_version 473888 (0.0014) [2024-06-15 17:11:55,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 47514.5, 300 sec: 48096.7). Total num frames: 970555392. Throughput: 0: 12037.7. Samples: 242706944. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:11:55,767][1648981] Avg episode reward: [(0, '460.110')] [2024-06-15 17:11:56,799][1651669] Updated weights for policy 0, policy_version 473952 (0.0018) [2024-06-15 17:11:58,453][1651669] Updated weights for policy 0, policy_version 474005 (0.0011) [2024-06-15 17:11:59,500][1651669] Updated weights for policy 0, policy_version 474045 (0.0024) [2024-06-15 17:12:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48612.2, 300 sec: 47763.5). Total num frames: 970850304. Throughput: 0: 11810.1. Samples: 242764288. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:00,767][1648981] Avg episode reward: [(0, '465.060')] [2024-06-15 17:12:03,337][1651669] Updated weights for policy 0, policy_version 474096 (0.0014) [2024-06-15 17:12:05,767][1648981] Fps is (10 sec: 42593.9, 60 sec: 45874.4, 300 sec: 47985.5). Total num frames: 970981376. Throughput: 0: 11753.0. Samples: 242846720. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:05,768][1648981] Avg episode reward: [(0, '477.190')] [2024-06-15 17:12:06,875][1651669] Updated weights for policy 0, policy_version 474145 (0.0012) [2024-06-15 17:12:08,799][1651669] Updated weights for policy 0, policy_version 474224 (0.0171) [2024-06-15 17:12:10,406][1651669] Updated weights for policy 0, policy_version 474288 (0.0012) [2024-06-15 17:12:10,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49698.1, 300 sec: 47763.5). Total num frames: 971341824. Throughput: 0: 11870.1. Samples: 242876928. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:10,767][1648981] Avg episode reward: [(0, '490.450')] [2024-06-15 17:12:14,253][1651669] Updated weights for policy 0, policy_version 474320 (0.0013) [2024-06-15 17:12:15,796][1648981] Fps is (10 sec: 52279.6, 60 sec: 45852.5, 300 sec: 47984.7). Total num frames: 971505664. Throughput: 0: 11688.7. Samples: 242946560. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:15,797][1648981] Avg episode reward: [(0, '501.630')] [2024-06-15 17:12:17,948][1651669] Updated weights for policy 0, policy_version 474384 (0.0206) [2024-06-15 17:12:19,794][1651669] Updated weights for policy 0, policy_version 474464 (0.0012) [2024-06-15 17:12:19,885][1651274] Signal inference workers to stop experience collection... (24850 times) [2024-06-15 17:12:19,938][1651669] InferenceWorker_p0-w0: stopping experience collection (24850 times) [2024-06-15 17:12:20,193][1651274] Signal inference workers to resume experience collection... (24850 times) [2024-06-15 17:12:20,208][1651669] InferenceWorker_p0-w0: resuming experience collection (24850 times) [2024-06-15 17:12:20,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 47652.6). Total num frames: 971767808. Throughput: 0: 11571.2. Samples: 243005952. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:20,767][1648981] Avg episode reward: [(0, '517.710')] [2024-06-15 17:12:22,185][1651669] Updated weights for policy 0, policy_version 474550 (0.0013) [2024-06-15 17:12:25,766][1648981] Fps is (10 sec: 39438.6, 60 sec: 43699.2, 300 sec: 47541.5). Total num frames: 971898880. Throughput: 0: 11468.8. Samples: 243038720. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:25,767][1648981] Avg episode reward: [(0, '520.460')] [2024-06-15 17:12:26,782][1651669] Updated weights for policy 0, policy_version 474612 (0.0013) [2024-06-15 17:12:30,413][1651669] Updated weights for policy 0, policy_version 474688 (0.0014) [2024-06-15 17:12:30,773][1648981] Fps is (10 sec: 42569.0, 60 sec: 46416.2, 300 sec: 47429.2). Total num frames: 972193792. Throughput: 0: 11489.8. Samples: 243115008. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:30,774][1648981] Avg episode reward: [(0, '518.310')] [2024-06-15 17:12:31,637][1651669] Updated weights for policy 0, policy_version 474737 (0.0031) [2024-06-15 17:12:33,401][1651669] Updated weights for policy 0, policy_version 474803 (0.0019) [2024-06-15 17:12:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 972423168. Throughput: 0: 11366.4. Samples: 243180544. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:35,767][1648981] Avg episode reward: [(0, '505.470')] [2024-06-15 17:12:37,848][1651669] Updated weights for policy 0, policy_version 474848 (0.0013) [2024-06-15 17:12:39,885][1651669] Updated weights for policy 0, policy_version 474884 (0.0012) [2024-06-15 17:12:40,767][1648981] Fps is (10 sec: 42627.4, 60 sec: 46421.3, 300 sec: 47547.1). Total num frames: 972619776. Throughput: 0: 11446.0. Samples: 243222016. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:40,767][1648981] Avg episode reward: [(0, '513.280')] [2024-06-15 17:12:41,435][1651669] Updated weights for policy 0, policy_version 474942 (0.0011) [2024-06-15 17:12:42,810][1651669] Updated weights for policy 0, policy_version 474998 (0.0011) [2024-06-15 17:12:43,922][1651669] Updated weights for policy 0, policy_version 475056 (0.0011) [2024-06-15 17:12:45,769][1648981] Fps is (10 sec: 52415.3, 60 sec: 46965.4, 300 sec: 47541.0). Total num frames: 972947456. Throughput: 0: 11661.5. Samples: 243289088. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:45,770][1648981] Avg episode reward: [(0, '502.560')] [2024-06-15 17:12:49,029][1651669] Updated weights for policy 0, policy_version 475124 (0.0018) [2024-06-15 17:12:50,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 973078528. Throughput: 0: 11503.2. Samples: 243364352. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:50,767][1648981] Avg episode reward: [(0, '489.610')] [2024-06-15 17:12:51,736][1651669] Updated weights for policy 0, policy_version 475168 (0.0013) [2024-06-15 17:12:53,360][1651669] Updated weights for policy 0, policy_version 475232 (0.0015) [2024-06-15 17:12:55,375][1651669] Updated weights for policy 0, policy_version 475322 (0.0012) [2024-06-15 17:12:55,767][1648981] Fps is (10 sec: 52440.8, 60 sec: 48605.7, 300 sec: 47541.9). Total num frames: 973471744. Throughput: 0: 11468.7. Samples: 243393024. Policy #0 lag: (min: 15.0, avg: 94.7, max: 271.0) [2024-06-15 17:12:55,767][1648981] Avg episode reward: [(0, '473.280')] [2024-06-15 17:12:55,782][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000475328_973471744.pth... [2024-06-15 17:12:55,832][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000469760_962068480.pth [2024-06-15 17:13:00,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 44783.0, 300 sec: 47652.5). Total num frames: 973537280. Throughput: 0: 11578.9. Samples: 243467264. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:00,767][1648981] Avg episode reward: [(0, '479.670')] [2024-06-15 17:13:01,265][1651669] Updated weights for policy 0, policy_version 475386 (0.0025) [2024-06-15 17:13:02,614][1651274] Signal inference workers to stop experience collection... (24900 times) [2024-06-15 17:13:02,647][1651669] InferenceWorker_p0-w0: stopping experience collection (24900 times) [2024-06-15 17:13:02,893][1651274] Signal inference workers to resume experience collection... (24900 times) [2024-06-15 17:13:02,894][1651669] InferenceWorker_p0-w0: resuming experience collection (24900 times) [2024-06-15 17:13:03,503][1651669] Updated weights for policy 0, policy_version 475442 (0.0014) [2024-06-15 17:13:05,007][1651669] Updated weights for policy 0, policy_version 475506 (0.0014) [2024-06-15 17:13:05,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 48606.8, 300 sec: 47433.3). Total num frames: 973897728. Throughput: 0: 11650.9. Samples: 243530240. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:05,767][1648981] Avg episode reward: [(0, '453.560')] [2024-06-15 17:13:06,736][1651669] Updated weights for policy 0, policy_version 475582 (0.0014) [2024-06-15 17:13:10,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 44236.9, 300 sec: 47543.3). Total num frames: 973996032. Throughput: 0: 11605.4. Samples: 243560960. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:10,767][1648981] Avg episode reward: [(0, '438.930')] [2024-06-15 17:13:13,020][1651669] Updated weights for policy 0, policy_version 475643 (0.0012) [2024-06-15 17:13:15,391][1651669] Updated weights for policy 0, policy_version 475703 (0.0011) [2024-06-15 17:13:15,766][1648981] Fps is (10 sec: 36044.5, 60 sec: 45897.9, 300 sec: 47319.2). Total num frames: 974258176. Throughput: 0: 11538.8. Samples: 243634176. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:15,767][1648981] Avg episode reward: [(0, '429.640')] [2024-06-15 17:13:17,060][1651669] Updated weights for policy 0, policy_version 475777 (0.0011) [2024-06-15 17:13:18,579][1651669] Updated weights for policy 0, policy_version 475839 (0.0012) [2024-06-15 17:13:20,766][1648981] Fps is (10 sec: 52427.8, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 974520320. Throughput: 0: 11594.0. Samples: 243702272. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:20,767][1648981] Avg episode reward: [(0, '408.990')] [2024-06-15 17:13:24,645][1651669] Updated weights for policy 0, policy_version 475888 (0.0011) [2024-06-15 17:13:25,786][1648981] Fps is (10 sec: 42514.6, 60 sec: 46406.1, 300 sec: 47538.2). Total num frames: 974684160. Throughput: 0: 11577.5. Samples: 243743232. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:25,787][1648981] Avg episode reward: [(0, '421.030')] [2024-06-15 17:13:26,274][1651669] Updated weights for policy 0, policy_version 475952 (0.0011) [2024-06-15 17:13:27,213][1651669] Updated weights for policy 0, policy_version 476000 (0.0012) [2024-06-15 17:13:28,665][1651669] Updated weights for policy 0, policy_version 476051 (0.0098) [2024-06-15 17:13:30,805][1648981] Fps is (10 sec: 52229.6, 60 sec: 47488.8, 300 sec: 47535.2). Total num frames: 975044608. Throughput: 0: 11334.7. Samples: 243799552. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:30,805][1648981] Avg episode reward: [(0, '427.110')] [2024-06-15 17:13:35,616][1651669] Updated weights for policy 0, policy_version 476128 (0.0016) [2024-06-15 17:13:35,766][1648981] Fps is (10 sec: 42682.6, 60 sec: 44782.9, 300 sec: 47430.9). Total num frames: 975110144. Throughput: 0: 11525.7. Samples: 243883008. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:35,767][1648981] Avg episode reward: [(0, '417.110')] [2024-06-15 17:13:37,453][1651669] Updated weights for policy 0, policy_version 476193 (0.0011) [2024-06-15 17:13:38,169][1651669] Updated weights for policy 0, policy_version 476228 (0.0013) [2024-06-15 17:13:38,823][1651274] Signal inference workers to stop experience collection... (24950 times) [2024-06-15 17:13:38,874][1651669] InferenceWorker_p0-w0: stopping experience collection (24950 times) [2024-06-15 17:13:39,055][1651274] Signal inference workers to resume experience collection... (24950 times) [2024-06-15 17:13:39,056][1651669] InferenceWorker_p0-w0: resuming experience collection (24950 times) [2024-06-15 17:13:39,783][1651669] Updated weights for policy 0, policy_version 476304 (0.0017) [2024-06-15 17:13:40,789][1648981] Fps is (10 sec: 49230.4, 60 sec: 48587.9, 300 sec: 47537.8). Total num frames: 975536128. Throughput: 0: 11520.1. Samples: 243911680. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:40,790][1648981] Avg episode reward: [(0, '418.440')] [2024-06-15 17:13:45,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 43692.6, 300 sec: 47322.4). Total num frames: 975568896. Throughput: 0: 11628.1. Samples: 243990528. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:45,767][1648981] Avg episode reward: [(0, '419.040')] [2024-06-15 17:13:45,774][1651669] Updated weights for policy 0, policy_version 476355 (0.0012) [2024-06-15 17:13:47,394][1651669] Updated weights for policy 0, policy_version 476418 (0.0024) [2024-06-15 17:13:48,719][1651669] Updated weights for policy 0, policy_version 476476 (0.0018) [2024-06-15 17:13:49,959][1651669] Updated weights for policy 0, policy_version 476532 (0.0010) [2024-06-15 17:13:50,766][1648981] Fps is (10 sec: 49261.6, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 976027648. Throughput: 0: 11855.6. Samples: 244063744. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:50,767][1648981] Avg episode reward: [(0, '423.340')] [2024-06-15 17:13:50,958][1651669] Updated weights for policy 0, policy_version 476592 (0.0013) [2024-06-15 17:13:55,379][1651669] Updated weights for policy 0, policy_version 476627 (0.0013) [2024-06-15 17:13:55,766][1648981] Fps is (10 sec: 58981.9, 60 sec: 44783.2, 300 sec: 47763.5). Total num frames: 976158720. Throughput: 0: 12344.8. Samples: 244116480. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:13:55,767][1648981] Avg episode reward: [(0, '418.380')] [2024-06-15 17:13:57,902][1651669] Updated weights for policy 0, policy_version 476720 (0.0013) [2024-06-15 17:13:59,820][1651669] Updated weights for policy 0, policy_version 476784 (0.0013) [2024-06-15 17:14:00,767][1648981] Fps is (10 sec: 49151.6, 60 sec: 49698.0, 300 sec: 47432.2). Total num frames: 976519168. Throughput: 0: 12003.5. Samples: 244174336. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:14:00,767][1648981] Avg episode reward: [(0, '436.310')] [2024-06-15 17:14:01,313][1651669] Updated weights for policy 0, policy_version 476848 (0.0013) [2024-06-15 17:14:05,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45329.0, 300 sec: 47541.4). Total num frames: 976617472. Throughput: 0: 12333.5. Samples: 244257280. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:14:05,767][1648981] Avg episode reward: [(0, '424.460')] [2024-06-15 17:14:07,513][1651669] Updated weights for policy 0, policy_version 476917 (0.0011) [2024-06-15 17:14:08,552][1651669] Updated weights for policy 0, policy_version 476963 (0.0013) [2024-06-15 17:14:10,097][1651669] Updated weights for policy 0, policy_version 477024 (0.0013) [2024-06-15 17:14:10,724][1651669] Updated weights for policy 0, policy_version 477047 (0.0008) [2024-06-15 17:14:10,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49698.0, 300 sec: 47319.2). Total num frames: 976977920. Throughput: 0: 12043.0. Samples: 244284928. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:14:10,767][1648981] Avg episode reward: [(0, '428.810')] [2024-06-15 17:14:12,464][1651669] Updated weights for policy 0, policy_version 477113 (0.0013) [2024-06-15 17:14:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 977141760. Throughput: 0: 12674.2. Samples: 244369408. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:14:15,767][1648981] Avg episode reward: [(0, '441.550')] [2024-06-15 17:14:17,266][1651274] Signal inference workers to stop experience collection... (25000 times) [2024-06-15 17:14:17,340][1651669] InferenceWorker_p0-w0: stopping experience collection (25000 times) [2024-06-15 17:14:17,604][1651274] Signal inference workers to resume experience collection... (25000 times) [2024-06-15 17:14:17,604][1651669] InferenceWorker_p0-w0: resuming experience collection (25000 times) [2024-06-15 17:14:18,004][1651669] Updated weights for policy 0, policy_version 477184 (0.0011) [2024-06-15 17:14:19,175][1651669] Updated weights for policy 0, policy_version 477233 (0.0011) [2024-06-15 17:14:20,703][1651669] Updated weights for policy 0, policy_version 477296 (0.0011) [2024-06-15 17:14:20,768][1648981] Fps is (10 sec: 52419.1, 60 sec: 49696.6, 300 sec: 47764.5). Total num frames: 977502208. Throughput: 0: 12219.2. Samples: 244432896. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:14:20,769][1648981] Avg episode reward: [(0, '435.230')] [2024-06-15 17:14:22,624][1651669] Updated weights for policy 0, policy_version 477344 (0.0039) [2024-06-15 17:14:25,769][1648981] Fps is (10 sec: 52413.0, 60 sec: 49712.0, 300 sec: 47541.5). Total num frames: 977666048. Throughput: 0: 12384.3. Samples: 244468736. Policy #0 lag: (min: 15.0, avg: 99.8, max: 271.0) [2024-06-15 17:14:25,770][1648981] Avg episode reward: [(0, '424.700')] [2024-06-15 17:14:27,809][1651669] Updated weights for policy 0, policy_version 477408 (0.0123) [2024-06-15 17:14:28,957][1651669] Updated weights for policy 0, policy_version 477456 (0.0012) [2024-06-15 17:14:30,388][1651669] Updated weights for policy 0, policy_version 477505 (0.0020) [2024-06-15 17:14:30,770][1648981] Fps is (10 sec: 45866.4, 60 sec: 48633.7, 300 sec: 47651.8). Total num frames: 977960960. Throughput: 0: 12423.5. Samples: 244549632. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:14:30,771][1648981] Avg episode reward: [(0, '413.540')] [2024-06-15 17:14:31,659][1651669] Updated weights for policy 0, policy_version 477568 (0.0016) [2024-06-15 17:14:33,789][1651669] Updated weights for policy 0, policy_version 477618 (0.0011) [2024-06-15 17:14:35,782][1648981] Fps is (10 sec: 52362.0, 60 sec: 51323.1, 300 sec: 47541.0). Total num frames: 978190336. Throughput: 0: 12488.4. Samples: 244625920. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:14:35,783][1648981] Avg episode reward: [(0, '416.600')] [2024-06-15 17:14:37,642][1651669] Updated weights for policy 0, policy_version 477649 (0.0014) [2024-06-15 17:14:39,463][1651669] Updated weights for policy 0, policy_version 477738 (0.0134) [2024-06-15 17:14:40,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 49170.3, 300 sec: 47985.7). Total num frames: 978485248. Throughput: 0: 12344.9. Samples: 244672000. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:14:40,767][1648981] Avg episode reward: [(0, '417.420')] [2024-06-15 17:14:41,389][1651669] Updated weights for policy 0, policy_version 477814 (0.0015) [2024-06-15 17:14:43,754][1651669] Updated weights for policy 0, policy_version 477859 (0.0011) [2024-06-15 17:14:45,770][1648981] Fps is (10 sec: 52494.0, 60 sec: 52425.8, 300 sec: 47540.8). Total num frames: 978714624. Throughput: 0: 12355.4. Samples: 244730368. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:14:45,770][1648981] Avg episode reward: [(0, '417.910')] [2024-06-15 17:14:48,347][1651669] Updated weights for policy 0, policy_version 477906 (0.0013) [2024-06-15 17:14:49,844][1651669] Updated weights for policy 0, policy_version 477971 (0.0012) [2024-06-15 17:14:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49152.0, 300 sec: 48208.0). Total num frames: 978976768. Throughput: 0: 12424.5. Samples: 244816384. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:14:50,767][1648981] Avg episode reward: [(0, '406.300')] [2024-06-15 17:14:51,251][1651669] Updated weights for policy 0, policy_version 478033 (0.0011) [2024-06-15 17:14:51,576][1651274] Signal inference workers to stop experience collection... (25050 times) [2024-06-15 17:14:51,625][1651669] InferenceWorker_p0-w0: stopping experience collection (25050 times) [2024-06-15 17:14:51,816][1651274] Signal inference workers to resume experience collection... (25050 times) [2024-06-15 17:14:51,817][1651669] InferenceWorker_p0-w0: resuming experience collection (25050 times) [2024-06-15 17:14:53,911][1651669] Updated weights for policy 0, policy_version 478112 (0.0013) [2024-06-15 17:14:55,766][1648981] Fps is (10 sec: 52446.1, 60 sec: 51336.5, 300 sec: 47985.7). Total num frames: 979238912. Throughput: 0: 12583.8. Samples: 244851200. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:14:55,767][1648981] Avg episode reward: [(0, '395.760')] [2024-06-15 17:14:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000478144_979238912.pth... [2024-06-15 17:14:55,813][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000472512_967704576.pth [2024-06-15 17:14:58,497][1651669] Updated weights for policy 0, policy_version 478148 (0.0022) [2024-06-15 17:15:00,081][1651669] Updated weights for policy 0, policy_version 478208 (0.0016) [2024-06-15 17:15:00,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48606.0, 300 sec: 48213.1). Total num frames: 979435520. Throughput: 0: 12413.2. Samples: 244928000. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:00,767][1648981] Avg episode reward: [(0, '420.250')] [2024-06-15 17:15:01,391][1651669] Updated weights for policy 0, policy_version 478272 (0.0010) [2024-06-15 17:15:02,867][1651669] Updated weights for policy 0, policy_version 478336 (0.0011) [2024-06-15 17:15:05,794][1648981] Fps is (10 sec: 52283.3, 60 sec: 52404.5, 300 sec: 47981.2). Total num frames: 979763200. Throughput: 0: 12417.4. Samples: 244992000. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:05,795][1648981] Avg episode reward: [(0, '415.590')] [2024-06-15 17:15:09,914][1651669] Updated weights for policy 0, policy_version 478416 (0.0018) [2024-06-15 17:15:10,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 48059.6, 300 sec: 47874.6). Total num frames: 979861504. Throughput: 0: 12516.3. Samples: 245031936. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:10,767][1648981] Avg episode reward: [(0, '405.430')] [2024-06-15 17:15:11,301][1651669] Updated weights for policy 0, policy_version 478466 (0.0012) [2024-06-15 17:15:12,937][1651669] Updated weights for policy 0, policy_version 478544 (0.0011) [2024-06-15 17:15:15,762][1651669] Updated weights for policy 0, policy_version 478608 (0.0016) [2024-06-15 17:15:15,766][1648981] Fps is (10 sec: 42717.4, 60 sec: 50790.4, 300 sec: 48096.8). Total num frames: 980189184. Throughput: 0: 12186.6. Samples: 245097984. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:15,767][1648981] Avg episode reward: [(0, '404.930')] [2024-06-15 17:15:20,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46422.8, 300 sec: 47541.4). Total num frames: 980287488. Throughput: 0: 12201.2. Samples: 245174784. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:20,767][1648981] Avg episode reward: [(0, '408.490')] [2024-06-15 17:15:21,001][1651669] Updated weights for policy 0, policy_version 478661 (0.0016) [2024-06-15 17:15:22,504][1651669] Updated weights for policy 0, policy_version 478722 (0.0010) [2024-06-15 17:15:24,188][1651669] Updated weights for policy 0, policy_version 478801 (0.0012) [2024-06-15 17:15:25,793][1648981] Fps is (10 sec: 49021.3, 60 sec: 50224.5, 300 sec: 48203.5). Total num frames: 980680704. Throughput: 0: 11916.9. Samples: 245208576. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:25,794][1648981] Avg episode reward: [(0, '405.720')] [2024-06-15 17:15:26,775][1651669] Updated weights for policy 0, policy_version 478864 (0.0053) [2024-06-15 17:15:30,778][1648981] Fps is (10 sec: 52367.3, 60 sec: 47507.3, 300 sec: 47761.6). Total num frames: 980811776. Throughput: 0: 12149.2. Samples: 245277184. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:30,779][1648981] Avg episode reward: [(0, '413.420')] [2024-06-15 17:15:32,407][1651669] Updated weights for policy 0, policy_version 478913 (0.0016) [2024-06-15 17:15:33,970][1651274] Signal inference workers to stop experience collection... (25100 times) [2024-06-15 17:15:34,108][1651669] InferenceWorker_p0-w0: stopping experience collection (25100 times) [2024-06-15 17:15:34,114][1651669] Updated weights for policy 0, policy_version 478981 (0.0030) [2024-06-15 17:15:34,232][1651274] Signal inference workers to resume experience collection... (25100 times) [2024-06-15 17:15:34,233][1651669] InferenceWorker_p0-w0: resuming experience collection (25100 times) [2024-06-15 17:15:35,766][1648981] Fps is (10 sec: 42711.9, 60 sec: 48618.6, 300 sec: 47985.7). Total num frames: 981106688. Throughput: 0: 11753.2. Samples: 245345280. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:35,778][1648981] Avg episode reward: [(0, '409.190')] [2024-06-15 17:15:35,812][1651669] Updated weights for policy 0, policy_version 479060 (0.0012) [2024-06-15 17:15:38,062][1651669] Updated weights for policy 0, policy_version 479120 (0.0109) [2024-06-15 17:15:40,766][1648981] Fps is (10 sec: 52490.4, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 981336064. Throughput: 0: 11685.0. Samples: 245377024. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:40,767][1648981] Avg episode reward: [(0, '425.060')] [2024-06-15 17:15:43,540][1651669] Updated weights for policy 0, policy_version 479170 (0.0012) [2024-06-15 17:15:44,928][1651669] Updated weights for policy 0, policy_version 479230 (0.0014) [2024-06-15 17:15:45,767][1648981] Fps is (10 sec: 42598.2, 60 sec: 46970.0, 300 sec: 47763.6). Total num frames: 981532672. Throughput: 0: 11639.4. Samples: 245451776. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:45,767][1648981] Avg episode reward: [(0, '452.240')] [2024-06-15 17:15:46,250][1651669] Updated weights for policy 0, policy_version 479284 (0.0012) [2024-06-15 17:15:47,628][1651669] Updated weights for policy 0, policy_version 479348 (0.0010) [2024-06-15 17:15:48,490][1651669] Updated weights for policy 0, policy_version 479361 (0.0011) [2024-06-15 17:15:50,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47985.9). Total num frames: 981860352. Throughput: 0: 11749.1. Samples: 245520384. Policy #0 lag: (min: 15.0, avg: 80.2, max: 271.0) [2024-06-15 17:15:50,767][1648981] Avg episode reward: [(0, '444.830')] [2024-06-15 17:15:54,689][1651669] Updated weights for policy 0, policy_version 479440 (0.0016) [2024-06-15 17:15:55,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 47653.7). Total num frames: 981991424. Throughput: 0: 11741.9. Samples: 245560320. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:15:55,767][1648981] Avg episode reward: [(0, '477.830')] [2024-06-15 17:15:56,724][1651669] Updated weights for policy 0, policy_version 479522 (0.0012) [2024-06-15 17:15:58,062][1651669] Updated weights for policy 0, policy_version 479584 (0.0036) [2024-06-15 17:15:59,836][1651669] Updated weights for policy 0, policy_version 479617 (0.0027) [2024-06-15 17:16:00,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 982351872. Throughput: 0: 11764.6. Samples: 245627392. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:00,767][1648981] Avg episode reward: [(0, '482.840')] [2024-06-15 17:16:00,888][1651669] Updated weights for policy 0, policy_version 479674 (0.0012) [2024-06-15 17:16:05,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 43710.9, 300 sec: 47541.4). Total num frames: 982384640. Throughput: 0: 11844.3. Samples: 245707776. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:05,767][1648981] Avg episode reward: [(0, '485.780')] [2024-06-15 17:16:06,947][1651669] Updated weights for policy 0, policy_version 479731 (0.0021) [2024-06-15 17:16:08,351][1651669] Updated weights for policy 0, policy_version 479796 (0.0012) [2024-06-15 17:16:10,123][1651669] Updated weights for policy 0, policy_version 479865 (0.0011) [2024-06-15 17:16:10,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 48605.8, 300 sec: 47541.3). Total num frames: 982777856. Throughput: 0: 11771.5. Samples: 245737984. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:10,768][1648981] Avg episode reward: [(0, '483.680')] [2024-06-15 17:16:11,121][1651274] Signal inference workers to stop experience collection... (25150 times) [2024-06-15 17:16:11,169][1651669] InferenceWorker_p0-w0: stopping experience collection (25150 times) [2024-06-15 17:16:11,323][1651274] Signal inference workers to resume experience collection... (25150 times) [2024-06-15 17:16:11,323][1651669] InferenceWorker_p0-w0: resuming experience collection (25150 times) [2024-06-15 17:16:11,904][1651669] Updated weights for policy 0, policy_version 479930 (0.0097) [2024-06-15 17:16:15,778][1648981] Fps is (10 sec: 52367.2, 60 sec: 45320.1, 300 sec: 47539.5). Total num frames: 982908928. Throughput: 0: 11639.5. Samples: 245800960. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:15,779][1648981] Avg episode reward: [(0, '484.690')] [2024-06-15 17:16:18,068][1651669] Updated weights for policy 0, policy_version 479977 (0.0093) [2024-06-15 17:16:19,435][1651669] Updated weights for policy 0, policy_version 480033 (0.0092) [2024-06-15 17:16:20,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 48605.9, 300 sec: 47210.0). Total num frames: 983203840. Throughput: 0: 11832.9. Samples: 245877760. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:20,767][1648981] Avg episode reward: [(0, '492.450')] [2024-06-15 17:16:21,018][1651669] Updated weights for policy 0, policy_version 480096 (0.0106) [2024-06-15 17:16:23,087][1651669] Updated weights for policy 0, policy_version 480184 (0.0014) [2024-06-15 17:16:25,766][1648981] Fps is (10 sec: 52490.8, 60 sec: 45895.6, 300 sec: 47541.4). Total num frames: 983433216. Throughput: 0: 11662.2. Samples: 245901824. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:25,767][1648981] Avg episode reward: [(0, '485.060')] [2024-06-15 17:16:29,476][1651669] Updated weights for policy 0, policy_version 480243 (0.0012) [2024-06-15 17:16:30,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47522.9, 300 sec: 47430.3). Total num frames: 983662592. Throughput: 0: 11844.3. Samples: 245984768. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:30,767][1648981] Avg episode reward: [(0, '513.130')] [2024-06-15 17:16:30,794][1651669] Updated weights for policy 0, policy_version 480311 (0.0012) [2024-06-15 17:16:32,599][1651669] Updated weights for policy 0, policy_version 480375 (0.0016) [2024-06-15 17:16:34,144][1651669] Updated weights for policy 0, policy_version 480442 (0.0037) [2024-06-15 17:16:35,770][1648981] Fps is (10 sec: 52408.5, 60 sec: 47510.6, 300 sec: 47874.0). Total num frames: 983957504. Throughput: 0: 11672.6. Samples: 246045696. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:35,771][1648981] Avg episode reward: [(0, '521.320')] [2024-06-15 17:16:40,488][1651669] Updated weights for policy 0, policy_version 480486 (0.0012) [2024-06-15 17:16:40,770][1648981] Fps is (10 sec: 39306.3, 60 sec: 45326.2, 300 sec: 47207.5). Total num frames: 984055808. Throughput: 0: 11729.5. Samples: 246088192. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:40,771][1648981] Avg episode reward: [(0, '494.470')] [2024-06-15 17:16:41,927][1651669] Updated weights for policy 0, policy_version 480560 (0.0010) [2024-06-15 17:16:43,772][1651669] Updated weights for policy 0, policy_version 480631 (0.0012) [2024-06-15 17:16:44,862][1651669] Updated weights for policy 0, policy_version 480657 (0.0011) [2024-06-15 17:16:45,694][1651669] Updated weights for policy 0, policy_version 480704 (0.0011) [2024-06-15 17:16:45,777][1648981] Fps is (10 sec: 52392.2, 60 sec: 49143.2, 300 sec: 47983.9). Total num frames: 984481792. Throughput: 0: 11625.3. Samples: 246150656. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:45,778][1648981] Avg episode reward: [(0, '496.740')] [2024-06-15 17:16:50,766][1648981] Fps is (10 sec: 42614.7, 60 sec: 43690.6, 300 sec: 47208.1). Total num frames: 984481792. Throughput: 0: 11594.0. Samples: 246229504. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:50,767][1648981] Avg episode reward: [(0, '467.990')] [2024-06-15 17:16:52,096][1651274] Signal inference workers to stop experience collection... (25200 times) [2024-06-15 17:16:52,101][1651669] Updated weights for policy 0, policy_version 480770 (0.0102) [2024-06-15 17:16:52,152][1651669] InferenceWorker_p0-w0: stopping experience collection (25200 times) [2024-06-15 17:16:52,338][1651274] Signal inference workers to resume experience collection... (25200 times) [2024-06-15 17:16:52,339][1651669] InferenceWorker_p0-w0: resuming experience collection (25200 times) [2024-06-15 17:16:53,201][1651669] Updated weights for policy 0, policy_version 480817 (0.0011) [2024-06-15 17:16:54,830][1651669] Updated weights for policy 0, policy_version 480883 (0.0014) [2024-06-15 17:16:55,766][1648981] Fps is (10 sec: 42644.1, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 984907776. Throughput: 0: 11594.0. Samples: 246259712. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:16:55,767][1648981] Avg episode reward: [(0, '488.760')] [2024-06-15 17:16:56,095][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000480928_984940544.pth... [2024-06-15 17:16:56,211][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000475328_973471744.pth [2024-06-15 17:16:56,620][1651669] Updated weights for policy 0, policy_version 480956 (0.0110) [2024-06-15 17:17:00,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 44236.5, 300 sec: 47541.5). Total num frames: 985006080. Throughput: 0: 11676.6. Samples: 246326272. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:17:00,767][1648981] Avg episode reward: [(0, '478.620')] [2024-06-15 17:17:03,591][1651669] Updated weights for policy 0, policy_version 481024 (0.0010) [2024-06-15 17:17:05,309][1651669] Updated weights for policy 0, policy_version 481089 (0.0089) [2024-06-15 17:17:05,766][1648981] Fps is (10 sec: 39322.1, 60 sec: 48605.9, 300 sec: 47319.2). Total num frames: 985300992. Throughput: 0: 11491.5. Samples: 246394880. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:17:05,767][1648981] Avg episode reward: [(0, '459.070')] [2024-06-15 17:17:06,831][1651669] Updated weights for policy 0, policy_version 481148 (0.0011) [2024-06-15 17:17:08,443][1651669] Updated weights for policy 0, policy_version 481216 (0.0011) [2024-06-15 17:17:10,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 45875.3, 300 sec: 47546.1). Total num frames: 985530368. Throughput: 0: 11616.7. Samples: 246424576. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:17:10,767][1648981] Avg episode reward: [(0, '456.740')] [2024-06-15 17:17:15,677][1651669] Updated weights for policy 0, policy_version 481312 (0.0012) [2024-06-15 17:17:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46976.7, 300 sec: 47319.2). Total num frames: 985726976. Throughput: 0: 11639.5. Samples: 246508544. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:17:15,767][1648981] Avg episode reward: [(0, '457.220')] [2024-06-15 17:17:17,303][1651669] Updated weights for policy 0, policy_version 481380 (0.0012) [2024-06-15 17:17:18,350][1651669] Updated weights for policy 0, policy_version 481424 (0.0011) [2024-06-15 17:17:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 986054656. Throughput: 0: 11606.3. Samples: 246567936. Policy #0 lag: (min: 0.0, avg: 72.3, max: 256.0) [2024-06-15 17:17:20,767][1648981] Avg episode reward: [(0, '466.600')] [2024-06-15 17:17:25,021][1651669] Updated weights for policy 0, policy_version 481488 (0.0035) [2024-06-15 17:17:25,778][1648981] Fps is (10 sec: 39275.0, 60 sec: 44774.1, 300 sec: 47207.3). Total num frames: 986120192. Throughput: 0: 11603.3. Samples: 246610432. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:25,779][1648981] Avg episode reward: [(0, '475.400')] [2024-06-15 17:17:26,372][1651669] Updated weights for policy 0, policy_version 481540 (0.0012) [2024-06-15 17:17:28,091][1651669] Updated weights for policy 0, policy_version 481632 (0.0012) [2024-06-15 17:17:29,549][1651669] Updated weights for policy 0, policy_version 481680 (0.0011) [2024-06-15 17:17:29,702][1651274] Signal inference workers to stop experience collection... (25250 times) [2024-06-15 17:17:29,763][1651669] InferenceWorker_p0-w0: stopping experience collection (25250 times) [2024-06-15 17:17:29,927][1651274] Signal inference workers to resume experience collection... (25250 times) [2024-06-15 17:17:29,928][1651669] InferenceWorker_p0-w0: resuming experience collection (25250 times) [2024-06-15 17:17:30,452][1651669] Updated weights for policy 0, policy_version 481724 (0.0018) [2024-06-15 17:17:30,767][1648981] Fps is (10 sec: 52423.3, 60 sec: 48604.9, 300 sec: 47985.5). Total num frames: 986578944. Throughput: 0: 11721.6. Samples: 246678016. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:30,768][1648981] Avg episode reward: [(0, '480.340')] [2024-06-15 17:17:35,775][1648981] Fps is (10 sec: 49168.4, 60 sec: 44233.4, 300 sec: 47428.9). Total num frames: 986611712. Throughput: 0: 11705.5. Samples: 246756352. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:35,775][1648981] Avg episode reward: [(0, '464.640')] [2024-06-15 17:17:36,456][1651669] Updated weights for policy 0, policy_version 481787 (0.0011) [2024-06-15 17:17:37,747][1651669] Updated weights for policy 0, policy_version 481833 (0.0012) [2024-06-15 17:17:39,307][1651669] Updated weights for policy 0, policy_version 481904 (0.0096) [2024-06-15 17:17:40,766][1648981] Fps is (10 sec: 39325.9, 60 sec: 48609.0, 300 sec: 47541.8). Total num frames: 986972160. Throughput: 0: 11673.6. Samples: 246785024. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:40,767][1648981] Avg episode reward: [(0, '470.620')] [2024-06-15 17:17:40,967][1651669] Updated weights for policy 0, policy_version 481938 (0.0011) [2024-06-15 17:17:41,773][1651669] Updated weights for policy 0, policy_version 481979 (0.0013) [2024-06-15 17:17:45,767][1648981] Fps is (10 sec: 49193.0, 60 sec: 43698.5, 300 sec: 47541.3). Total num frames: 987103232. Throughput: 0: 11844.3. Samples: 246859264. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:45,767][1648981] Avg episode reward: [(0, '473.900')] [2024-06-15 17:17:46,866][1651669] Updated weights for policy 0, policy_version 482018 (0.0010) [2024-06-15 17:17:48,609][1651669] Updated weights for policy 0, policy_version 482105 (0.0011) [2024-06-15 17:17:50,148][1651669] Updated weights for policy 0, policy_version 482146 (0.0011) [2024-06-15 17:17:50,705][1651669] Updated weights for policy 0, policy_version 482176 (0.0012) [2024-06-15 17:17:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 987496448. Throughput: 0: 11878.4. Samples: 246929408. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:50,767][1648981] Avg episode reward: [(0, '481.090')] [2024-06-15 17:17:52,398][1651669] Updated weights for policy 0, policy_version 482227 (0.0011) [2024-06-15 17:17:55,773][1648981] Fps is (10 sec: 52400.3, 60 sec: 45324.9, 300 sec: 47762.6). Total num frames: 987627520. Throughput: 0: 12115.8. Samples: 246969856. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:17:55,777][1648981] Avg episode reward: [(0, '475.630')] [2024-06-15 17:17:57,148][1651669] Updated weights for policy 0, policy_version 482261 (0.0011) [2024-06-15 17:17:59,292][1651669] Updated weights for policy 0, policy_version 482339 (0.0132) [2024-06-15 17:18:00,786][1648981] Fps is (10 sec: 42514.4, 60 sec: 48590.1, 300 sec: 47538.2). Total num frames: 987922432. Throughput: 0: 11839.0. Samples: 247041536. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:00,787][1648981] Avg episode reward: [(0, '471.490')] [2024-06-15 17:18:00,795][1651669] Updated weights for policy 0, policy_version 482385 (0.0126) [2024-06-15 17:18:01,829][1651669] Updated weights for policy 0, policy_version 482425 (0.0013) [2024-06-15 17:18:03,310][1651669] Updated weights for policy 0, policy_version 482480 (0.0011) [2024-06-15 17:18:05,766][1648981] Fps is (10 sec: 52458.2, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 988151808. Throughput: 0: 12049.1. Samples: 247110144. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:05,767][1648981] Avg episode reward: [(0, '471.470')] [2024-06-15 17:18:08,400][1651669] Updated weights for policy 0, policy_version 482530 (0.0013) [2024-06-15 17:18:10,680][1651669] Updated weights for policy 0, policy_version 482608 (0.0013) [2024-06-15 17:18:10,766][1648981] Fps is (10 sec: 45966.2, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 988381184. Throughput: 0: 12075.0. Samples: 247153664. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:10,767][1648981] Avg episode reward: [(0, '466.390')] [2024-06-15 17:18:12,411][1651274] Signal inference workers to stop experience collection... (25300 times) [2024-06-15 17:18:12,495][1651669] InferenceWorker_p0-w0: stopping experience collection (25300 times) [2024-06-15 17:18:12,721][1651274] Signal inference workers to resume experience collection... (25300 times) [2024-06-15 17:18:12,730][1651669] InferenceWorker_p0-w0: resuming experience collection (25300 times) [2024-06-15 17:18:12,732][1651669] Updated weights for policy 0, policy_version 482640 (0.0010) [2024-06-15 17:18:14,847][1651669] Updated weights for policy 0, policy_version 482720 (0.0013) [2024-06-15 17:18:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 988676096. Throughput: 0: 11833.2. Samples: 247210496. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:15,767][1648981] Avg episode reward: [(0, '466.560')] [2024-06-15 17:18:19,419][1651669] Updated weights for policy 0, policy_version 482800 (0.0011) [2024-06-15 17:18:20,710][1651669] Updated weights for policy 0, policy_version 482850 (0.0024) [2024-06-15 17:18:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46967.5, 300 sec: 48100.0). Total num frames: 988872704. Throughput: 0: 11835.1. Samples: 247288832. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:20,767][1648981] Avg episode reward: [(0, '463.040')] [2024-06-15 17:18:23,235][1651669] Updated weights for policy 0, policy_version 482896 (0.0010) [2024-06-15 17:18:25,235][1651669] Updated weights for policy 0, policy_version 482977 (0.0087) [2024-06-15 17:18:25,767][1648981] Fps is (10 sec: 49148.0, 60 sec: 50799.7, 300 sec: 47880.7). Total num frames: 989167616. Throughput: 0: 12071.6. Samples: 247328256. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:25,768][1648981] Avg episode reward: [(0, '450.490')] [2024-06-15 17:18:29,508][1651669] Updated weights for policy 0, policy_version 483014 (0.0027) [2024-06-15 17:18:30,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45329.9, 300 sec: 48096.8). Total num frames: 989298688. Throughput: 0: 12071.9. Samples: 247402496. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:30,767][1648981] Avg episode reward: [(0, '454.470')] [2024-06-15 17:18:31,181][1651669] Updated weights for policy 0, policy_version 483088 (0.0012) [2024-06-15 17:18:32,465][1651669] Updated weights for policy 0, policy_version 483135 (0.0011) [2024-06-15 17:18:35,589][1651669] Updated weights for policy 0, policy_version 483199 (0.0016) [2024-06-15 17:18:35,766][1648981] Fps is (10 sec: 42601.9, 60 sec: 49705.2, 300 sec: 47656.0). Total num frames: 989593600. Throughput: 0: 11901.2. Samples: 247464960. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:35,767][1648981] Avg episode reward: [(0, '476.810')] [2024-06-15 17:18:36,833][1651669] Updated weights for policy 0, policy_version 483251 (0.0013) [2024-06-15 17:18:40,636][1651669] Updated weights for policy 0, policy_version 483282 (0.0011) [2024-06-15 17:18:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 48096.7). Total num frames: 989757440. Throughput: 0: 11811.6. Samples: 247501312. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:40,767][1648981] Avg episode reward: [(0, '468.820')] [2024-06-15 17:18:42,214][1651669] Updated weights for policy 0, policy_version 483347 (0.0013) [2024-06-15 17:18:43,186][1651669] Updated weights for policy 0, policy_version 483389 (0.0013) [2024-06-15 17:18:45,782][1648981] Fps is (10 sec: 42531.2, 60 sec: 48593.2, 300 sec: 47427.8). Total num frames: 990019584. Throughput: 0: 11970.5. Samples: 247580160. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:45,783][1648981] Avg episode reward: [(0, '463.320')] [2024-06-15 17:18:47,408][1651669] Updated weights for policy 0, policy_version 483472 (0.0012) [2024-06-15 17:18:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 47763.5). Total num frames: 990248960. Throughput: 0: 11878.4. Samples: 247644672. Policy #0 lag: (min: 15.0, avg: 71.4, max: 271.0) [2024-06-15 17:18:50,767][1648981] Avg episode reward: [(0, '462.400')] [2024-06-15 17:18:51,192][1651669] Updated weights for policy 0, policy_version 483523 (0.0011) [2024-06-15 17:18:51,995][1651274] Signal inference workers to stop experience collection... (25350 times) [2024-06-15 17:18:52,053][1651669] InferenceWorker_p0-w0: stopping experience collection (25350 times) [2024-06-15 17:18:52,274][1651274] Signal inference workers to resume experience collection... (25350 times) [2024-06-15 17:18:52,275][1651669] InferenceWorker_p0-w0: resuming experience collection (25350 times) [2024-06-15 17:18:52,627][1651669] Updated weights for policy 0, policy_version 483584 (0.0011) [2024-06-15 17:18:53,925][1651669] Updated weights for policy 0, policy_version 483635 (0.0012) [2024-06-15 17:18:55,766][1648981] Fps is (10 sec: 49229.2, 60 sec: 48064.2, 300 sec: 47430.3). Total num frames: 990511104. Throughput: 0: 11741.8. Samples: 247682048. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:18:55,767][1648981] Avg episode reward: [(0, '463.170')] [2024-06-15 17:18:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000483648_990511104.pth... [2024-06-15 17:18:55,810][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000478144_979238912.pth [2024-06-15 17:18:57,053][1651669] Updated weights for policy 0, policy_version 483664 (0.0011) [2024-06-15 17:18:58,664][1651669] Updated weights for policy 0, policy_version 483728 (0.0035) [2024-06-15 17:19:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47529.3, 300 sec: 47985.7). Total num frames: 990773248. Throughput: 0: 12014.9. Samples: 247751168. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:00,767][1648981] Avg episode reward: [(0, '458.200')] [2024-06-15 17:19:01,990][1651669] Updated weights for policy 0, policy_version 483792 (0.0011) [2024-06-15 17:19:03,350][1651669] Updated weights for policy 0, policy_version 483842 (0.0015) [2024-06-15 17:19:04,611][1651669] Updated weights for policy 0, policy_version 483903 (0.0012) [2024-06-15 17:19:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 991035392. Throughput: 0: 11958.0. Samples: 247826944. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:05,767][1648981] Avg episode reward: [(0, '450.230')] [2024-06-15 17:19:08,845][1651669] Updated weights for policy 0, policy_version 483953 (0.0011) [2024-06-15 17:19:10,450][1651669] Updated weights for policy 0, policy_version 484021 (0.0101) [2024-06-15 17:19:10,768][1648981] Fps is (10 sec: 52419.6, 60 sec: 48604.5, 300 sec: 47985.4). Total num frames: 991297536. Throughput: 0: 12094.3. Samples: 247872512. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:10,769][1648981] Avg episode reward: [(0, '463.690')] [2024-06-15 17:19:13,568][1651669] Updated weights for policy 0, policy_version 484080 (0.0015) [2024-06-15 17:19:15,161][1651669] Updated weights for policy 0, policy_version 484129 (0.0013) [2024-06-15 17:19:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47652.8). Total num frames: 991559680. Throughput: 0: 11867.0. Samples: 247936512. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:15,767][1648981] Avg episode reward: [(0, '477.310')] [2024-06-15 17:19:18,903][1651669] Updated weights for policy 0, policy_version 484181 (0.0012) [2024-06-15 17:19:20,436][1651669] Updated weights for policy 0, policy_version 484242 (0.0124) [2024-06-15 17:19:20,767][1648981] Fps is (10 sec: 45882.7, 60 sec: 48059.7, 300 sec: 47764.0). Total num frames: 991756288. Throughput: 0: 12060.4. Samples: 248007680. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:20,767][1648981] Avg episode reward: [(0, '470.370')] [2024-06-15 17:19:23,823][1651669] Updated weights for policy 0, policy_version 484291 (0.0013) [2024-06-15 17:19:25,764][1651669] Updated weights for policy 0, policy_version 484370 (0.0010) [2024-06-15 17:19:25,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46968.2, 300 sec: 47542.0). Total num frames: 991985664. Throughput: 0: 12049.1. Samples: 248043520. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:25,767][1648981] Avg episode reward: [(0, '460.300')] [2024-06-15 17:19:29,215][1651669] Updated weights for policy 0, policy_version 484418 (0.0013) [2024-06-15 17:19:30,450][1651669] Updated weights for policy 0, policy_version 484473 (0.0011) [2024-06-15 17:19:30,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48605.9, 300 sec: 47543.9). Total num frames: 992215040. Throughput: 0: 11928.1. Samples: 248116736. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:30,767][1648981] Avg episode reward: [(0, '470.080')] [2024-06-15 17:19:31,225][1651274] Signal inference workers to stop experience collection... (25400 times) [2024-06-15 17:19:31,320][1651669] InferenceWorker_p0-w0: stopping experience collection (25400 times) [2024-06-15 17:19:31,425][1651274] Signal inference workers to resume experience collection... (25400 times) [2024-06-15 17:19:31,426][1651669] InferenceWorker_p0-w0: resuming experience collection (25400 times) [2024-06-15 17:19:31,838][1651669] Updated weights for policy 0, policy_version 484528 (0.0020) [2024-06-15 17:19:35,205][1651669] Updated weights for policy 0, policy_version 484545 (0.0013) [2024-06-15 17:19:35,771][1648981] Fps is (10 sec: 42577.0, 60 sec: 46963.6, 300 sec: 47207.4). Total num frames: 992411648. Throughput: 0: 12070.5. Samples: 248187904. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:35,772][1648981] Avg episode reward: [(0, '449.000')] [2024-06-15 17:19:36,680][1651669] Updated weights for policy 0, policy_version 484610 (0.0010) [2024-06-15 17:19:40,246][1651669] Updated weights for policy 0, policy_version 484674 (0.0011) [2024-06-15 17:19:40,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 47208.7). Total num frames: 992641024. Throughput: 0: 11923.9. Samples: 248218624. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:40,767][1648981] Avg episode reward: [(0, '454.740')] [2024-06-15 17:19:41,684][1651669] Updated weights for policy 0, policy_version 484728 (0.0010) [2024-06-15 17:19:42,927][1651669] Updated weights for policy 0, policy_version 484768 (0.0104) [2024-06-15 17:19:43,679][1651669] Updated weights for policy 0, policy_version 484800 (0.0012) [2024-06-15 17:19:45,786][1648981] Fps is (10 sec: 45806.8, 60 sec: 47510.4, 300 sec: 47093.9). Total num frames: 992870400. Throughput: 0: 12055.1. Samples: 248293888. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:45,787][1648981] Avg episode reward: [(0, '447.590')] [2024-06-15 17:19:47,325][1651669] Updated weights for policy 0, policy_version 484865 (0.0011) [2024-06-15 17:19:48,723][1651669] Updated weights for policy 0, policy_version 484919 (0.0011) [2024-06-15 17:19:50,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 993132544. Throughput: 0: 12037.7. Samples: 248368640. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:50,767][1648981] Avg episode reward: [(0, '441.930')] [2024-06-15 17:19:50,793][1651669] Updated weights for policy 0, policy_version 484944 (0.0013) [2024-06-15 17:19:51,951][1651669] Updated weights for policy 0, policy_version 484990 (0.0016) [2024-06-15 17:19:54,284][1651669] Updated weights for policy 0, policy_version 485045 (0.0011) [2024-06-15 17:19:55,766][1648981] Fps is (10 sec: 52532.6, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 993394688. Throughput: 0: 11890.2. Samples: 248407552. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:19:55,767][1648981] Avg episode reward: [(0, '436.350')] [2024-06-15 17:19:57,317][1651669] Updated weights for policy 0, policy_version 485105 (0.0012) [2024-06-15 17:19:58,522][1651669] Updated weights for policy 0, policy_version 485168 (0.0091) [2024-06-15 17:20:00,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 48059.5, 300 sec: 47101.5). Total num frames: 993656832. Throughput: 0: 12049.0. Samples: 248478720. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:20:00,767][1648981] Avg episode reward: [(0, '435.220')] [2024-06-15 17:20:02,150][1651669] Updated weights for policy 0, policy_version 485238 (0.0012) [2024-06-15 17:20:04,906][1651669] Updated weights for policy 0, policy_version 485296 (0.0012) [2024-06-15 17:20:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 47652.5). Total num frames: 993918976. Throughput: 0: 12094.6. Samples: 248551936. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:20:05,767][1648981] Avg episode reward: [(0, '431.060')] [2024-06-15 17:20:07,138][1651669] Updated weights for policy 0, policy_version 485332 (0.0017) [2024-06-15 17:20:08,331][1651669] Updated weights for policy 0, policy_version 485392 (0.0010) [2024-06-15 17:20:09,404][1651669] Updated weights for policy 0, policy_version 485437 (0.0011) [2024-06-15 17:20:10,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 48061.2, 300 sec: 47430.3). Total num frames: 994181120. Throughput: 0: 12105.9. Samples: 248588288. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:20:10,767][1648981] Avg episode reward: [(0, '455.610')] [2024-06-15 17:20:13,163][1651274] Signal inference workers to stop experience collection... (25450 times) [2024-06-15 17:20:13,205][1651669] InferenceWorker_p0-w0: stopping experience collection (25450 times) [2024-06-15 17:20:13,375][1651274] Signal inference workers to resume experience collection... (25450 times) [2024-06-15 17:20:13,375][1651669] InferenceWorker_p0-w0: resuming experience collection (25450 times) [2024-06-15 17:20:13,665][1651669] Updated weights for policy 0, policy_version 485499 (0.0014) [2024-06-15 17:20:15,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 994377728. Throughput: 0: 12162.8. Samples: 248664064. Policy #0 lag: (min: 15.0, avg: 110.1, max: 271.0) [2024-06-15 17:20:15,767][1648981] Avg episode reward: [(0, '438.700')] [2024-06-15 17:20:15,875][1651669] Updated weights for policy 0, policy_version 485552 (0.0013) [2024-06-15 17:20:17,758][1651669] Updated weights for policy 0, policy_version 485600 (0.0013) [2024-06-15 17:20:19,453][1651669] Updated weights for policy 0, policy_version 485668 (0.0015) [2024-06-15 17:20:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 47545.7). Total num frames: 994705408. Throughput: 0: 12095.9. Samples: 248732160. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:20,767][1648981] Avg episode reward: [(0, '456.340')] [2024-06-15 17:20:23,977][1651669] Updated weights for policy 0, policy_version 485728 (0.0014) [2024-06-15 17:20:24,788][1651669] Updated weights for policy 0, policy_version 485759 (0.0012) [2024-06-15 17:20:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.4, 300 sec: 47543.3). Total num frames: 994836480. Throughput: 0: 12413.2. Samples: 248777216. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:25,767][1648981] Avg episode reward: [(0, '449.560')] [2024-06-15 17:20:26,580][1651669] Updated weights for policy 0, policy_version 485795 (0.0035) [2024-06-15 17:20:28,096][1651669] Updated weights for policy 0, policy_version 485856 (0.0021) [2024-06-15 17:20:29,064][1651669] Updated weights for policy 0, policy_version 485889 (0.0012) [2024-06-15 17:20:30,199][1651669] Updated weights for policy 0, policy_version 485952 (0.0094) [2024-06-15 17:20:30,768][1648981] Fps is (10 sec: 52417.8, 60 sec: 50242.5, 300 sec: 47874.3). Total num frames: 995229696. Throughput: 0: 12281.5. Samples: 248846336. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:30,769][1648981] Avg episode reward: [(0, '463.580')] [2024-06-15 17:20:35,250][1651669] Updated weights for policy 0, policy_version 486005 (0.0118) [2024-06-15 17:20:35,782][1648981] Fps is (10 sec: 52346.7, 60 sec: 49143.1, 300 sec: 47538.8). Total num frames: 995360768. Throughput: 0: 12329.2. Samples: 248923648. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:35,783][1648981] Avg episode reward: [(0, '464.750')] [2024-06-15 17:20:37,287][1651669] Updated weights for policy 0, policy_version 486050 (0.0013) [2024-06-15 17:20:38,649][1651669] Updated weights for policy 0, policy_version 486112 (0.0012) [2024-06-15 17:20:39,368][1651669] Updated weights for policy 0, policy_version 486143 (0.0021) [2024-06-15 17:20:40,772][1648981] Fps is (10 sec: 49133.0, 60 sec: 51331.5, 300 sec: 48095.8). Total num frames: 995721216. Throughput: 0: 12275.0. Samples: 248960000. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:40,773][1648981] Avg episode reward: [(0, '479.290')] [2024-06-15 17:20:40,859][1651669] Updated weights for policy 0, policy_version 486199 (0.0013) [2024-06-15 17:20:44,520][1651669] Updated weights for policy 0, policy_version 486228 (0.0031) [2024-06-15 17:20:45,773][1648981] Fps is (10 sec: 52478.4, 60 sec: 50255.6, 300 sec: 47540.3). Total num frames: 995885056. Throughput: 0: 12468.4. Samples: 249039872. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:45,773][1648981] Avg episode reward: [(0, '472.070')] [2024-06-15 17:20:46,813][1651669] Updated weights for policy 0, policy_version 486288 (0.0013) [2024-06-15 17:20:48,340][1651669] Updated weights for policy 0, policy_version 486338 (0.0025) [2024-06-15 17:20:49,603][1651669] Updated weights for policy 0, policy_version 486398 (0.0021) [2024-06-15 17:20:50,772][1648981] Fps is (10 sec: 49152.8, 60 sec: 51331.6, 300 sec: 48206.9). Total num frames: 996212736. Throughput: 0: 12331.9. Samples: 249106944. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:50,773][1648981] Avg episode reward: [(0, '482.190')] [2024-06-15 17:20:51,018][1651669] Updated weights for policy 0, policy_version 486448 (0.0011) [2024-06-15 17:20:54,670][1651669] Updated weights for policy 0, policy_version 486465 (0.0018) [2024-06-15 17:20:55,339][1651274] Signal inference workers to stop experience collection... (25500 times) [2024-06-15 17:20:55,420][1651669] InferenceWorker_p0-w0: stopping experience collection (25500 times) [2024-06-15 17:20:55,654][1651274] Signal inference workers to resume experience collection... (25500 times) [2024-06-15 17:20:55,655][1651669] InferenceWorker_p0-w0: resuming experience collection (25500 times) [2024-06-15 17:20:55,766][1648981] Fps is (10 sec: 49182.9, 60 sec: 49698.1, 300 sec: 47541.4). Total num frames: 996376576. Throughput: 0: 12390.4. Samples: 249145856. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:20:55,767][1648981] Avg episode reward: [(0, '506.820')] [2024-06-15 17:20:55,916][1651669] Updated weights for policy 0, policy_version 486526 (0.0158) [2024-06-15 17:20:55,935][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000486528_996409344.pth... [2024-06-15 17:20:55,972][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000480928_984940544.pth [2024-06-15 17:20:59,043][1651669] Updated weights for policy 0, policy_version 486592 (0.0014) [2024-06-15 17:21:00,306][1651669] Updated weights for policy 0, policy_version 486643 (0.0014) [2024-06-15 17:21:00,766][1648981] Fps is (10 sec: 45901.7, 60 sec: 50244.5, 300 sec: 48430.0). Total num frames: 996671488. Throughput: 0: 12310.8. Samples: 249218048. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:00,767][1648981] Avg episode reward: [(0, '494.860')] [2024-06-15 17:21:01,725][1651669] Updated weights for policy 0, policy_version 486694 (0.0011) [2024-06-15 17:21:05,537][1651669] Updated weights for policy 0, policy_version 486742 (0.0012) [2024-06-15 17:21:05,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49152.0, 300 sec: 47763.6). Total num frames: 996868096. Throughput: 0: 12492.8. Samples: 249294336. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:05,767][1648981] Avg episode reward: [(0, '481.860')] [2024-06-15 17:21:08,638][1651669] Updated weights for policy 0, policy_version 486785 (0.0021) [2024-06-15 17:21:10,678][1651669] Updated weights for policy 0, policy_version 486864 (0.0093) [2024-06-15 17:21:10,767][1648981] Fps is (10 sec: 42595.7, 60 sec: 48605.3, 300 sec: 48098.6). Total num frames: 997097472. Throughput: 0: 12344.7. Samples: 249332736. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:10,768][1648981] Avg episode reward: [(0, '495.010')] [2024-06-15 17:21:13,482][1651669] Updated weights for policy 0, policy_version 486966 (0.0110) [2024-06-15 17:21:15,767][1648981] Fps is (10 sec: 45873.1, 60 sec: 49151.7, 300 sec: 47874.5). Total num frames: 997326848. Throughput: 0: 12060.9. Samples: 249389056. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:15,767][1648981] Avg episode reward: [(0, '494.370')] [2024-06-15 17:21:17,115][1651669] Updated weights for policy 0, policy_version 486997 (0.0015) [2024-06-15 17:21:19,936][1651669] Updated weights for policy 0, policy_version 487056 (0.0052) [2024-06-15 17:21:20,766][1648981] Fps is (10 sec: 45878.2, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 997556224. Throughput: 0: 12178.5. Samples: 249471488. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:20,767][1648981] Avg episode reward: [(0, '497.890')] [2024-06-15 17:21:21,505][1651669] Updated weights for policy 0, policy_version 487109 (0.0011) [2024-06-15 17:21:22,697][1651669] Updated weights for policy 0, policy_version 487168 (0.0015) [2024-06-15 17:21:25,389][1651669] Updated weights for policy 0, policy_version 487224 (0.0018) [2024-06-15 17:21:25,768][1648981] Fps is (10 sec: 52423.8, 60 sec: 50243.2, 300 sec: 48096.5). Total num frames: 997851136. Throughput: 0: 12016.2. Samples: 249500672. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:25,768][1648981] Avg episode reward: [(0, '501.840')] [2024-06-15 17:21:28,383][1651669] Updated weights for policy 0, policy_version 487271 (0.0017) [2024-06-15 17:21:30,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46969.1, 300 sec: 47764.1). Total num frames: 998047744. Throughput: 0: 11948.3. Samples: 249577472. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:30,767][1648981] Avg episode reward: [(0, '496.490')] [2024-06-15 17:21:31,128][1651669] Updated weights for policy 0, policy_version 487344 (0.0135) [2024-06-15 17:21:32,570][1651669] Updated weights for policy 0, policy_version 487409 (0.0012) [2024-06-15 17:21:35,766][1648981] Fps is (10 sec: 42604.3, 60 sec: 48618.7, 300 sec: 48208.5). Total num frames: 998277120. Throughput: 0: 11993.7. Samples: 249646592. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:35,767][1648981] Avg episode reward: [(0, '498.060')] [2024-06-15 17:21:35,935][1651669] Updated weights for policy 0, policy_version 487456 (0.0016) [2024-06-15 17:21:38,920][1651274] Signal inference workers to stop experience collection... (25550 times) [2024-06-15 17:21:38,960][1651669] InferenceWorker_p0-w0: stopping experience collection (25550 times) [2024-06-15 17:21:39,182][1651274] Signal inference workers to resume experience collection... (25550 times) [2024-06-15 17:21:39,189][1651669] InferenceWorker_p0-w0: resuming experience collection (25550 times) [2024-06-15 17:21:39,191][1651669] Updated weights for policy 0, policy_version 487520 (0.0012) [2024-06-15 17:21:40,769][1648981] Fps is (10 sec: 45864.4, 60 sec: 46424.1, 300 sec: 47542.7). Total num frames: 998506496. Throughput: 0: 11957.4. Samples: 249683968. Policy #0 lag: (min: 15.0, avg: 125.5, max: 271.0) [2024-06-15 17:21:40,769][1648981] Avg episode reward: [(0, '500.210')] [2024-06-15 17:21:42,034][1651669] Updated weights for policy 0, policy_version 487584 (0.0012) [2024-06-15 17:21:43,085][1651669] Updated weights for policy 0, policy_version 487632 (0.0012) [2024-06-15 17:21:43,940][1651669] Updated weights for policy 0, policy_version 487680 (0.0091) [2024-06-15 17:21:45,780][1648981] Fps is (10 sec: 49082.8, 60 sec: 48053.5, 300 sec: 48427.7). Total num frames: 998768640. Throughput: 0: 11908.8. Samples: 249754112. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:21:45,781][1648981] Avg episode reward: [(0, '476.560')] [2024-06-15 17:21:47,655][1651669] Updated weights for policy 0, policy_version 487733 (0.0014) [2024-06-15 17:21:50,406][1651669] Updated weights for policy 0, policy_version 487792 (0.0012) [2024-06-15 17:21:50,766][1648981] Fps is (10 sec: 52441.3, 60 sec: 46972.0, 300 sec: 47874.6). Total num frames: 999030784. Throughput: 0: 11719.1. Samples: 249821696. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:21:50,767][1648981] Avg episode reward: [(0, '465.090')] [2024-06-15 17:21:52,962][1651669] Updated weights for policy 0, policy_version 487840 (0.0012) [2024-06-15 17:21:54,401][1651669] Updated weights for policy 0, policy_version 487904 (0.0015) [2024-06-15 17:21:55,767][1648981] Fps is (10 sec: 52502.0, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 999292928. Throughput: 0: 11742.0. Samples: 249861120. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:21:55,767][1648981] Avg episode reward: [(0, '459.980')] [2024-06-15 17:21:57,431][1651669] Updated weights for policy 0, policy_version 487952 (0.0013) [2024-06-15 17:22:00,308][1651669] Updated weights for policy 0, policy_version 488007 (0.0012) [2024-06-15 17:22:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 999456768. Throughput: 0: 12242.6. Samples: 249939968. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:00,767][1648981] Avg episode reward: [(0, '429.660')] [2024-06-15 17:22:01,459][1651669] Updated weights for policy 0, policy_version 488060 (0.0015) [2024-06-15 17:22:03,692][1651669] Updated weights for policy 0, policy_version 488121 (0.0025) [2024-06-15 17:22:05,310][1651669] Updated weights for policy 0, policy_version 488176 (0.0011) [2024-06-15 17:22:05,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49151.9, 300 sec: 48430.0). Total num frames: 999817216. Throughput: 0: 11889.8. Samples: 250006528. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:05,768][1648981] Avg episode reward: [(0, '432.770')] [2024-06-15 17:22:08,841][1651669] Updated weights for policy 0, policy_version 488228 (0.0016) [2024-06-15 17:22:10,379][1651669] Updated weights for policy 0, policy_version 488260 (0.0011) [2024-06-15 17:22:10,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 48060.1, 300 sec: 48318.9). Total num frames: 999981056. Throughput: 0: 12174.6. Samples: 250048512. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:10,767][1648981] Avg episode reward: [(0, '442.640')] [2024-06-15 17:22:11,700][1651669] Updated weights for policy 0, policy_version 488317 (0.0148) [2024-06-15 17:22:14,499][1651669] Updated weights for policy 0, policy_version 488359 (0.0015) [2024-06-15 17:22:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48606.2, 300 sec: 48096.8). Total num frames: 1000243200. Throughput: 0: 12140.1. Samples: 250123776. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:15,767][1648981] Avg episode reward: [(0, '441.980')] [2024-06-15 17:22:15,933][1651669] Updated weights for policy 0, policy_version 488402 (0.0013) [2024-06-15 17:22:16,685][1651669] Updated weights for policy 0, policy_version 488446 (0.0027) [2024-06-15 17:22:20,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 48605.8, 300 sec: 48654.1). Total num frames: 1000472576. Throughput: 0: 12219.7. Samples: 250196480. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:20,767][1648981] Avg episode reward: [(0, '423.400')] [2024-06-15 17:22:20,952][1651274] Signal inference workers to stop experience collection... (25600 times) [2024-06-15 17:22:20,959][1651669] Updated weights for policy 0, policy_version 488513 (0.0012) [2024-06-15 17:22:21,011][1651669] InferenceWorker_p0-w0: stopping experience collection (25600 times) [2024-06-15 17:22:21,186][1651274] Signal inference workers to resume experience collection... (25600 times) [2024-06-15 17:22:21,187][1651669] InferenceWorker_p0-w0: resuming experience collection (25600 times) [2024-06-15 17:22:22,378][1651669] Updated weights for policy 0, policy_version 488576 (0.0011) [2024-06-15 17:22:25,778][1648981] Fps is (10 sec: 45821.2, 60 sec: 47505.3, 300 sec: 47872.9). Total num frames: 1000701952. Throughput: 0: 12194.4. Samples: 250232832. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:25,779][1648981] Avg episode reward: [(0, '418.880')] [2024-06-15 17:22:26,208][1651669] Updated weights for policy 0, policy_version 488656 (0.0190) [2024-06-15 17:22:30,162][1651669] Updated weights for policy 0, policy_version 488705 (0.0011) [2024-06-15 17:22:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 48431.4). Total num frames: 1000898560. Throughput: 0: 12189.4. Samples: 250302464. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:30,767][1648981] Avg episode reward: [(0, '413.610')] [2024-06-15 17:22:31,324][1651669] Updated weights for policy 0, policy_version 488760 (0.0011) [2024-06-15 17:22:32,204][1651669] Updated weights for policy 0, policy_version 488787 (0.0011) [2024-06-15 17:22:35,766][1648981] Fps is (10 sec: 49210.0, 60 sec: 48605.8, 300 sec: 48207.8). Total num frames: 1001193472. Throughput: 0: 12356.3. Samples: 250377728. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:35,767][1648981] Avg episode reward: [(0, '396.640')] [2024-06-15 17:22:36,050][1651669] Updated weights for policy 0, policy_version 488880 (0.0011) [2024-06-15 17:22:37,905][1651669] Updated weights for policy 0, policy_version 488949 (0.0012) [2024-06-15 17:22:40,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48061.6, 300 sec: 48430.0). Total num frames: 1001390080. Throughput: 0: 12083.2. Samples: 250404864. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:40,767][1648981] Avg episode reward: [(0, '409.070')] [2024-06-15 17:22:42,495][1651669] Updated weights for policy 0, policy_version 489019 (0.0013) [2024-06-15 17:22:44,786][1651669] Updated weights for policy 0, policy_version 489080 (0.0036) [2024-06-15 17:22:45,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48071.0, 300 sec: 47985.7). Total num frames: 1001652224. Throughput: 0: 11992.2. Samples: 250479616. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:45,767][1648981] Avg episode reward: [(0, '400.140')] [2024-06-15 17:22:47,162][1651669] Updated weights for policy 0, policy_version 489120 (0.0012) [2024-06-15 17:22:48,632][1651669] Updated weights for policy 0, policy_version 489172 (0.0011) [2024-06-15 17:22:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 48430.9). Total num frames: 1001914368. Throughput: 0: 12174.2. Samples: 250554368. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:50,767][1648981] Avg episode reward: [(0, '401.230')] [2024-06-15 17:22:52,005][1651669] Updated weights for policy 0, policy_version 489232 (0.0022) [2024-06-15 17:22:53,078][1651669] Updated weights for policy 0, policy_version 489279 (0.0012) [2024-06-15 17:22:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 48322.2). Total num frames: 1002176512. Throughput: 0: 12060.5. Samples: 250591232. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:22:55,767][1648981] Avg episode reward: [(0, '420.310')] [2024-06-15 17:22:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000489344_1002176512.pth... [2024-06-15 17:22:55,822][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000483648_990511104.pth [2024-06-15 17:22:56,818][1651669] Updated weights for policy 0, policy_version 489347 (0.0093) [2024-06-15 17:22:58,505][1651669] Updated weights for policy 0, policy_version 489408 (0.0013) [2024-06-15 17:23:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 1002438656. Throughput: 0: 11798.8. Samples: 250654720. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:23:00,767][1648981] Avg episode reward: [(0, '424.990')] [2024-06-15 17:23:02,707][1651669] Updated weights for policy 0, policy_version 489477 (0.0021) [2024-06-15 17:23:03,044][1651274] Signal inference workers to stop experience collection... (25650 times) [2024-06-15 17:23:03,111][1651669] InferenceWorker_p0-w0: stopping experience collection (25650 times) [2024-06-15 17:23:03,361][1651274] Signal inference workers to resume experience collection... (25650 times) [2024-06-15 17:23:03,381][1651669] InferenceWorker_p0-w0: resuming experience collection (25650 times) [2024-06-15 17:23:04,038][1651669] Updated weights for policy 0, policy_version 489534 (0.0073) [2024-06-15 17:23:05,767][1648981] Fps is (10 sec: 39320.4, 60 sec: 45875.0, 300 sec: 48096.7). Total num frames: 1002569728. Throughput: 0: 11889.7. Samples: 250731520. Policy #0 lag: (min: 55.0, avg: 182.8, max: 303.0) [2024-06-15 17:23:05,768][1648981] Avg episode reward: [(0, '434.940')] [2024-06-15 17:23:06,750][1651669] Updated weights for policy 0, policy_version 489585 (0.0011) [2024-06-15 17:23:08,240][1651669] Updated weights for policy 0, policy_version 489634 (0.0033) [2024-06-15 17:23:10,183][1651669] Updated weights for policy 0, policy_version 489722 (0.0012) [2024-06-15 17:23:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 48430.0). Total num frames: 1002962944. Throughput: 0: 11983.9. Samples: 250771968. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:10,767][1648981] Avg episode reward: [(0, '420.510')] [2024-06-15 17:23:14,095][1651669] Updated weights for policy 0, policy_version 489764 (0.0012) [2024-06-15 17:23:15,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 1003094016. Throughput: 0: 12003.5. Samples: 250842624. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:15,767][1648981] Avg episode reward: [(0, '437.610')] [2024-06-15 17:23:17,060][1651669] Updated weights for policy 0, policy_version 489841 (0.0096) [2024-06-15 17:23:18,963][1651669] Updated weights for policy 0, policy_version 489872 (0.0011) [2024-06-15 17:23:20,778][1648981] Fps is (10 sec: 42548.6, 60 sec: 48596.4, 300 sec: 48206.0). Total num frames: 1003388928. Throughput: 0: 11932.2. Samples: 250914816. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:20,779][1648981] Avg episode reward: [(0, '441.770')] [2024-06-15 17:23:20,958][1651669] Updated weights for policy 0, policy_version 489953 (0.0011) [2024-06-15 17:23:21,722][1651669] Updated weights for policy 0, policy_version 489984 (0.0013) [2024-06-15 17:23:25,235][1651669] Updated weights for policy 0, policy_version 490037 (0.0010) [2024-06-15 17:23:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48615.5, 300 sec: 48541.1). Total num frames: 1003618304. Throughput: 0: 12140.1. Samples: 250951168. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:25,767][1648981] Avg episode reward: [(0, '432.180')] [2024-06-15 17:23:28,096][1651669] Updated weights for policy 0, policy_version 490112 (0.0011) [2024-06-15 17:23:30,770][1648981] Fps is (10 sec: 42632.1, 60 sec: 48602.7, 300 sec: 48207.2). Total num frames: 1003814912. Throughput: 0: 12173.2. Samples: 251027456. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:30,771][1648981] Avg episode reward: [(0, '454.510')] [2024-06-15 17:23:31,502][1651669] Updated weights for policy 0, policy_version 490192 (0.0014) [2024-06-15 17:23:34,749][1651669] Updated weights for policy 0, policy_version 490241 (0.0015) [2024-06-15 17:23:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 1004077056. Throughput: 0: 12049.1. Samples: 251096576. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:35,767][1648981] Avg episode reward: [(0, '434.310')] [2024-06-15 17:23:36,120][1651669] Updated weights for policy 0, policy_version 490298 (0.0011) [2024-06-15 17:23:38,746][1651669] Updated weights for policy 0, policy_version 490352 (0.0019) [2024-06-15 17:23:40,766][1648981] Fps is (10 sec: 45892.7, 60 sec: 48059.8, 300 sec: 48321.5). Total num frames: 1004273664. Throughput: 0: 12094.6. Samples: 251135488. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:40,767][1648981] Avg episode reward: [(0, '444.460')] [2024-06-15 17:23:41,265][1651669] Updated weights for policy 0, policy_version 490389 (0.0010) [2024-06-15 17:23:42,217][1651669] Updated weights for policy 0, policy_version 490435 (0.0012) [2024-06-15 17:23:42,935][1651274] Signal inference workers to stop experience collection... (25700 times) [2024-06-15 17:23:42,962][1651669] InferenceWorker_p0-w0: stopping experience collection (25700 times) [2024-06-15 17:23:43,096][1651274] Signal inference workers to resume experience collection... (25700 times) [2024-06-15 17:23:43,097][1651669] InferenceWorker_p0-w0: resuming experience collection (25700 times) [2024-06-15 17:23:43,327][1651669] Updated weights for policy 0, policy_version 490496 (0.0010) [2024-06-15 17:23:45,768][1648981] Fps is (10 sec: 52418.7, 60 sec: 49150.5, 300 sec: 48651.9). Total num frames: 1004601344. Throughput: 0: 12310.3. Samples: 251208704. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:45,769][1648981] Avg episode reward: [(0, '435.290')] [2024-06-15 17:23:46,225][1651669] Updated weights for policy 0, policy_version 490556 (0.0012) [2024-06-15 17:23:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1004797952. Throughput: 0: 12254.0. Samples: 251282944. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:50,767][1648981] Avg episode reward: [(0, '434.210')] [2024-06-15 17:23:51,782][1651669] Updated weights for policy 0, policy_version 490626 (0.0011) [2024-06-15 17:23:53,513][1651669] Updated weights for policy 0, policy_version 490708 (0.0014) [2024-06-15 17:23:55,769][1648981] Fps is (10 sec: 45869.4, 60 sec: 48057.3, 300 sec: 48429.5). Total num frames: 1005060096. Throughput: 0: 12059.6. Samples: 251314688. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:23:55,770][1648981] Avg episode reward: [(0, '422.150')] [2024-06-15 17:23:56,365][1651669] Updated weights for policy 0, policy_version 490769 (0.0012) [2024-06-15 17:23:57,321][1651669] Updated weights for policy 0, policy_version 490816 (0.0012) [2024-06-15 17:24:00,305][1651669] Updated weights for policy 0, policy_version 490874 (0.0013) [2024-06-15 17:24:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1005322240. Throughput: 0: 12106.0. Samples: 251387392. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:00,767][1648981] Avg episode reward: [(0, '423.900')] [2024-06-15 17:24:03,103][1651669] Updated weights for policy 0, policy_version 490934 (0.0012) [2024-06-15 17:24:04,278][1651669] Updated weights for policy 0, policy_version 490976 (0.0161) [2024-06-15 17:24:05,769][1648981] Fps is (10 sec: 52431.4, 60 sec: 50242.4, 300 sec: 48429.9). Total num frames: 1005584384. Throughput: 0: 12153.9. Samples: 251461632. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:05,770][1648981] Avg episode reward: [(0, '399.740')] [2024-06-15 17:24:07,364][1651669] Updated weights for policy 0, policy_version 491036 (0.0144) [2024-06-15 17:24:10,348][1651669] Updated weights for policy 0, policy_version 491088 (0.0046) [2024-06-15 17:24:10,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 1005780992. Throughput: 0: 12140.1. Samples: 251497472. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:10,767][1648981] Avg episode reward: [(0, '404.050')] [2024-06-15 17:24:11,512][1651669] Updated weights for policy 0, policy_version 491135 (0.0011) [2024-06-15 17:24:13,099][1651669] Updated weights for policy 0, policy_version 491173 (0.0013) [2024-06-15 17:24:15,603][1651669] Updated weights for policy 0, policy_version 491234 (0.0011) [2024-06-15 17:24:15,766][1648981] Fps is (10 sec: 49165.2, 60 sec: 49698.2, 300 sec: 48541.1). Total num frames: 1006075904. Throughput: 0: 12118.4. Samples: 251572736. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:15,767][1648981] Avg episode reward: [(0, '407.200')] [2024-06-15 17:24:18,042][1651669] Updated weights for policy 0, policy_version 491296 (0.0011) [2024-06-15 17:24:20,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 47522.8, 300 sec: 48318.9). Total num frames: 1006239744. Throughput: 0: 12356.2. Samples: 251652608. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:20,767][1648981] Avg episode reward: [(0, '411.790')] [2024-06-15 17:24:21,053][1651669] Updated weights for policy 0, policy_version 491344 (0.0011) [2024-06-15 17:24:22,777][1651669] Updated weights for policy 0, policy_version 491408 (0.0010) [2024-06-15 17:24:25,701][1651669] Updated weights for policy 0, policy_version 491459 (0.0013) [2024-06-15 17:24:25,769][1648981] Fps is (10 sec: 42585.2, 60 sec: 48057.3, 300 sec: 48429.5). Total num frames: 1006501888. Throughput: 0: 12105.1. Samples: 251680256. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:25,770][1648981] Avg episode reward: [(0, '405.980')] [2024-06-15 17:24:26,453][1651274] Signal inference workers to stop experience collection... (25750 times) [2024-06-15 17:24:26,502][1651669] InferenceWorker_p0-w0: stopping experience collection (25750 times) [2024-06-15 17:24:26,850][1651274] Signal inference workers to resume experience collection... (25750 times) [2024-06-15 17:24:26,851][1651669] InferenceWorker_p0-w0: resuming experience collection (25750 times) [2024-06-15 17:24:27,162][1651669] Updated weights for policy 0, policy_version 491515 (0.0023) [2024-06-15 17:24:28,887][1651669] Updated weights for policy 0, policy_version 491556 (0.0031) [2024-06-15 17:24:30,774][1648981] Fps is (10 sec: 52387.3, 60 sec: 49148.5, 300 sec: 48651.6). Total num frames: 1006764032. Throughput: 0: 12070.2. Samples: 251751936. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:30,775][1648981] Avg episode reward: [(0, '406.910')] [2024-06-15 17:24:32,671][1651669] Updated weights for policy 0, policy_version 491616 (0.0014) [2024-06-15 17:24:34,795][1651669] Updated weights for policy 0, policy_version 491707 (0.0014) [2024-06-15 17:24:35,766][1648981] Fps is (10 sec: 52444.8, 60 sec: 49152.0, 300 sec: 48763.2). Total num frames: 1007026176. Throughput: 0: 12026.3. Samples: 251824128. Policy #0 lag: (min: 108.0, avg: 213.2, max: 348.0) [2024-06-15 17:24:35,767][1648981] Avg episode reward: [(0, '401.510')] [2024-06-15 17:24:37,809][1651669] Updated weights for policy 0, policy_version 491760 (0.0011) [2024-06-15 17:24:38,427][1651669] Updated weights for policy 0, policy_version 491780 (0.0108) [2024-06-15 17:24:39,951][1651669] Updated weights for policy 0, policy_version 491831 (0.0066) [2024-06-15 17:24:40,766][1648981] Fps is (10 sec: 52471.1, 60 sec: 50244.3, 300 sec: 48877.6). Total num frames: 1007288320. Throughput: 0: 12197.8. Samples: 251863552. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:24:40,767][1648981] Avg episode reward: [(0, '394.940')] [2024-06-15 17:24:44,202][1651669] Updated weights for policy 0, policy_version 491877 (0.0034) [2024-06-15 17:24:45,778][1648981] Fps is (10 sec: 45820.9, 60 sec: 48051.7, 300 sec: 48650.2). Total num frames: 1007484928. Throughput: 0: 12080.0. Samples: 251931136. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:24:45,779][1648981] Avg episode reward: [(0, '409.660')] [2024-06-15 17:24:45,963][1651669] Updated weights for policy 0, policy_version 491952 (0.0012) [2024-06-15 17:24:49,298][1651669] Updated weights for policy 0, policy_version 492016 (0.0100) [2024-06-15 17:24:50,347][1651669] Updated weights for policy 0, policy_version 492036 (0.0010) [2024-06-15 17:24:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1007714304. Throughput: 0: 11936.0. Samples: 251998720. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:24:50,767][1648981] Avg episode reward: [(0, '396.160')] [2024-06-15 17:24:55,024][1651669] Updated weights for policy 0, policy_version 492098 (0.0012) [2024-06-15 17:24:55,766][1648981] Fps is (10 sec: 39368.1, 60 sec: 46969.9, 300 sec: 48207.9). Total num frames: 1007878144. Throughput: 0: 11878.4. Samples: 252032000. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:24:55,767][1648981] Avg episode reward: [(0, '438.230')] [2024-06-15 17:24:56,366][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000492160_1007943680.pth... [2024-06-15 17:24:56,475][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000486528_996409344.pth [2024-06-15 17:24:56,946][1651669] Updated weights for policy 0, policy_version 492179 (0.0013) [2024-06-15 17:24:59,571][1651669] Updated weights for policy 0, policy_version 492240 (0.0012) [2024-06-15 17:25:00,666][1651669] Updated weights for policy 0, policy_version 492285 (0.0018) [2024-06-15 17:25:00,794][1648981] Fps is (10 sec: 49014.9, 60 sec: 48037.4, 300 sec: 48425.4). Total num frames: 1008205824. Throughput: 0: 11893.8. Samples: 252108288. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:00,795][1648981] Avg episode reward: [(0, '452.010')] [2024-06-15 17:25:02,130][1651669] Updated weights for policy 0, policy_version 492336 (0.0012) [2024-06-15 17:25:05,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46423.3, 300 sec: 48096.8). Total num frames: 1008369664. Throughput: 0: 11821.5. Samples: 252184576. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:05,767][1648981] Avg episode reward: [(0, '446.860')] [2024-06-15 17:25:06,054][1651669] Updated weights for policy 0, policy_version 492384 (0.0013) [2024-06-15 17:25:07,492][1651274] Signal inference workers to stop experience collection... (25800 times) [2024-06-15 17:25:07,527][1651669] InferenceWorker_p0-w0: stopping experience collection (25800 times) [2024-06-15 17:25:07,676][1651274] Signal inference workers to resume experience collection... (25800 times) [2024-06-15 17:25:07,677][1651669] InferenceWorker_p0-w0: resuming experience collection (25800 times) [2024-06-15 17:25:07,910][1651669] Updated weights for policy 0, policy_version 492477 (0.0058) [2024-06-15 17:25:10,767][1648981] Fps is (10 sec: 39429.1, 60 sec: 46967.0, 300 sec: 48207.7). Total num frames: 1008599040. Throughput: 0: 11822.1. Samples: 252212224. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:10,768][1648981] Avg episode reward: [(0, '426.290')] [2024-06-15 17:25:12,194][1651669] Updated weights for policy 0, policy_version 492529 (0.0012) [2024-06-15 17:25:13,983][1651669] Updated weights for policy 0, policy_version 492603 (0.0013) [2024-06-15 17:25:15,768][1648981] Fps is (10 sec: 49144.4, 60 sec: 46420.1, 300 sec: 47985.4). Total num frames: 1008861184. Throughput: 0: 11607.0. Samples: 252274176. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:15,768][1648981] Avg episode reward: [(0, '432.910')] [2024-06-15 17:25:18,184][1651669] Updated weights for policy 0, policy_version 492657 (0.0027) [2024-06-15 17:25:19,158][1651669] Updated weights for policy 0, policy_version 492711 (0.0140) [2024-06-15 17:25:20,766][1648981] Fps is (10 sec: 52432.3, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1009123328. Throughput: 0: 11901.1. Samples: 252359680. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:20,767][1648981] Avg episode reward: [(0, '441.490')] [2024-06-15 17:25:21,516][1651669] Updated weights for policy 0, policy_version 492737 (0.0012) [2024-06-15 17:25:23,753][1651669] Updated weights for policy 0, policy_version 492832 (0.0034) [2024-06-15 17:25:25,766][1648981] Fps is (10 sec: 52437.1, 60 sec: 48062.2, 300 sec: 47986.0). Total num frames: 1009385472. Throughput: 0: 11730.5. Samples: 252391424. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:25,767][1648981] Avg episode reward: [(0, '468.730')] [2024-06-15 17:25:27,301][1651669] Updated weights for policy 0, policy_version 492868 (0.0011) [2024-06-15 17:25:28,538][1651669] Updated weights for policy 0, policy_version 492928 (0.0012) [2024-06-15 17:25:29,933][1651669] Updated weights for policy 0, policy_version 492992 (0.0012) [2024-06-15 17:25:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48066.2, 300 sec: 48432.6). Total num frames: 1009647616. Throughput: 0: 11892.9. Samples: 252466176. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:30,767][1648981] Avg episode reward: [(0, '459.430')] [2024-06-15 17:25:33,428][1651669] Updated weights for policy 0, policy_version 493042 (0.0012) [2024-06-15 17:25:34,986][1651669] Updated weights for policy 0, policy_version 493108 (0.0010) [2024-06-15 17:25:35,774][1648981] Fps is (10 sec: 52387.7, 60 sec: 48053.4, 300 sec: 48096.4). Total num frames: 1009909760. Throughput: 0: 11921.8. Samples: 252535296. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:35,775][1648981] Avg episode reward: [(0, '472.120')] [2024-06-15 17:25:38,664][1651669] Updated weights for policy 0, policy_version 493138 (0.0011) [2024-06-15 17:25:40,163][1651669] Updated weights for policy 0, policy_version 493202 (0.0024) [2024-06-15 17:25:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 48320.0). Total num frames: 1010139136. Throughput: 0: 12151.5. Samples: 252578816. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:40,767][1648981] Avg episode reward: [(0, '472.030')] [2024-06-15 17:25:43,908][1651669] Updated weights for policy 0, policy_version 493265 (0.0012) [2024-06-15 17:25:45,766][1648981] Fps is (10 sec: 42631.8, 60 sec: 47523.0, 300 sec: 47875.5). Total num frames: 1010335744. Throughput: 0: 12056.6. Samples: 252650496. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:45,767][1648981] Avg episode reward: [(0, '455.570')] [2024-06-15 17:25:46,192][1651669] Updated weights for policy 0, policy_version 493364 (0.0015) [2024-06-15 17:25:49,829][1651274] Signal inference workers to stop experience collection... (25850 times) [2024-06-15 17:25:49,855][1651669] Updated weights for policy 0, policy_version 493394 (0.0012) [2024-06-15 17:25:49,879][1651669] InferenceWorker_p0-w0: stopping experience collection (25850 times) [2024-06-15 17:25:50,044][1651274] Signal inference workers to resume experience collection... (25850 times) [2024-06-15 17:25:50,044][1651669] InferenceWorker_p0-w0: resuming experience collection (25850 times) [2024-06-15 17:25:50,781][1648981] Fps is (10 sec: 42535.2, 60 sec: 47501.8, 300 sec: 48094.3). Total num frames: 1010565120. Throughput: 0: 11885.9. Samples: 252719616. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:50,782][1648981] Avg episode reward: [(0, '441.390')] [2024-06-15 17:25:51,726][1651669] Updated weights for policy 0, policy_version 493488 (0.0012) [2024-06-15 17:25:54,604][1651669] Updated weights for policy 0, policy_version 493528 (0.0011) [2024-06-15 17:25:55,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1010827264. Throughput: 0: 12083.4. Samples: 252755968. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:25:55,767][1648981] Avg episode reward: [(0, '437.850')] [2024-06-15 17:25:56,450][1651669] Updated weights for policy 0, policy_version 493600 (0.0012) [2024-06-15 17:25:59,933][1651669] Updated weights for policy 0, policy_version 493637 (0.0014) [2024-06-15 17:26:00,766][1648981] Fps is (10 sec: 45943.4, 60 sec: 46989.4, 300 sec: 47985.7). Total num frames: 1011023872. Throughput: 0: 12481.9. Samples: 252835840. Policy #0 lag: (min: 15.0, avg: 140.3, max: 271.0) [2024-06-15 17:26:00,767][1648981] Avg episode reward: [(0, '454.440')] [2024-06-15 17:26:01,082][1651669] Updated weights for policy 0, policy_version 493683 (0.0022) [2024-06-15 17:26:02,519][1651669] Updated weights for policy 0, policy_version 493747 (0.0013) [2024-06-15 17:26:05,309][1651669] Updated weights for policy 0, policy_version 493794 (0.0013) [2024-06-15 17:26:05,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49152.1, 300 sec: 48207.9). Total num frames: 1011318784. Throughput: 0: 12140.1. Samples: 252905984. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:05,767][1648981] Avg episode reward: [(0, '455.510')] [2024-06-15 17:26:06,530][1651669] Updated weights for policy 0, policy_version 493843 (0.0012) [2024-06-15 17:26:07,658][1651669] Updated weights for policy 0, policy_version 493888 (0.0012) [2024-06-15 17:26:10,774][1648981] Fps is (10 sec: 45839.3, 60 sec: 48054.0, 300 sec: 47984.5). Total num frames: 1011482624. Throughput: 0: 12206.2. Samples: 252940800. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:10,775][1648981] Avg episode reward: [(0, '444.140')] [2024-06-15 17:26:12,157][1651669] Updated weights for policy 0, policy_version 493954 (0.0015) [2024-06-15 17:26:15,774][1648981] Fps is (10 sec: 42565.2, 60 sec: 48054.8, 300 sec: 48095.5). Total num frames: 1011744768. Throughput: 0: 12138.0. Samples: 253012480. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:15,775][1648981] Avg episode reward: [(0, '445.630')] [2024-06-15 17:26:16,392][1651669] Updated weights for policy 0, policy_version 494032 (0.0012) [2024-06-15 17:26:18,707][1651669] Updated weights for policy 0, policy_version 494115 (0.0121) [2024-06-15 17:26:20,766][1648981] Fps is (10 sec: 52470.1, 60 sec: 48059.8, 300 sec: 47985.9). Total num frames: 1012006912. Throughput: 0: 12142.2. Samples: 253081600. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:20,767][1648981] Avg episode reward: [(0, '456.480')] [2024-06-15 17:26:22,464][1651669] Updated weights for policy 0, policy_version 494192 (0.0129) [2024-06-15 17:26:24,062][1651669] Updated weights for policy 0, policy_version 494263 (0.0012) [2024-06-15 17:26:25,783][1648981] Fps is (10 sec: 52384.2, 60 sec: 48046.7, 300 sec: 48205.2). Total num frames: 1012269056. Throughput: 0: 11976.5. Samples: 253117952. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:25,784][1648981] Avg episode reward: [(0, '483.070')] [2024-06-15 17:26:27,349][1651669] Updated weights for policy 0, policy_version 494298 (0.0021) [2024-06-15 17:26:27,532][1651274] Signal inference workers to stop experience collection... (25900 times) [2024-06-15 17:26:27,574][1651669] InferenceWorker_p0-w0: stopping experience collection (25900 times) [2024-06-15 17:26:27,783][1651274] Signal inference workers to resume experience collection... (25900 times) [2024-06-15 17:26:27,784][1651669] InferenceWorker_p0-w0: resuming experience collection (25900 times) [2024-06-15 17:26:28,644][1651669] Updated weights for policy 0, policy_version 494353 (0.0014) [2024-06-15 17:26:29,614][1651669] Updated weights for policy 0, policy_version 494398 (0.0021) [2024-06-15 17:26:30,774][1648981] Fps is (10 sec: 52387.5, 60 sec: 48053.4, 300 sec: 48317.6). Total num frames: 1012531200. Throughput: 0: 11967.3. Samples: 253189120. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:30,775][1648981] Avg episode reward: [(0, '484.810')] [2024-06-15 17:26:32,902][1651669] Updated weights for policy 0, policy_version 494463 (0.0015) [2024-06-15 17:26:34,083][1651669] Updated weights for policy 0, policy_version 494512 (0.0012) [2024-06-15 17:26:35,767][1648981] Fps is (10 sec: 52510.4, 60 sec: 48065.4, 300 sec: 48430.3). Total num frames: 1012793344. Throughput: 0: 12337.4. Samples: 253274624. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:35,768][1648981] Avg episode reward: [(0, '491.470')] [2024-06-15 17:26:37,317][1651669] Updated weights for policy 0, policy_version 494550 (0.0014) [2024-06-15 17:26:39,347][1651669] Updated weights for policy 0, policy_version 494624 (0.0054) [2024-06-15 17:26:40,766][1648981] Fps is (10 sec: 52469.5, 60 sec: 48605.8, 300 sec: 48432.3). Total num frames: 1013055488. Throughput: 0: 12276.6. Samples: 253308416. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:40,767][1648981] Avg episode reward: [(0, '518.140')] [2024-06-15 17:26:42,779][1651669] Updated weights for policy 0, policy_version 494664 (0.0026) [2024-06-15 17:26:43,796][1651669] Updated weights for policy 0, policy_version 494721 (0.0011) [2024-06-15 17:26:44,781][1651669] Updated weights for policy 0, policy_version 494776 (0.0012) [2024-06-15 17:26:45,766][1648981] Fps is (10 sec: 52432.9, 60 sec: 49698.2, 300 sec: 48430.0). Total num frames: 1013317632. Throughput: 0: 12185.6. Samples: 253384192. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:45,767][1648981] Avg episode reward: [(0, '545.250')] [2024-06-15 17:26:45,768][1651274] Saving new best policy, reward=545.250! [2024-06-15 17:26:47,562][1651669] Updated weights for policy 0, policy_version 494816 (0.0012) [2024-06-15 17:26:49,487][1651669] Updated weights for policy 0, policy_version 494880 (0.0011) [2024-06-15 17:26:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50256.6, 300 sec: 48430.0). Total num frames: 1013579776. Throughput: 0: 12185.6. Samples: 253454336. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:50,767][1648981] Avg episode reward: [(0, '541.070')] [2024-06-15 17:26:53,660][1651669] Updated weights for policy 0, policy_version 494937 (0.0012) [2024-06-15 17:26:55,026][1651669] Updated weights for policy 0, policy_version 494996 (0.0017) [2024-06-15 17:26:55,766][1648981] Fps is (10 sec: 52428.0, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1013841920. Throughput: 0: 12290.1. Samples: 253493760. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:26:55,767][1648981] Avg episode reward: [(0, '549.390')] [2024-06-15 17:26:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000495040_1013841920.pth... [2024-06-15 17:26:55,878][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000489344_1002176512.pth [2024-06-15 17:26:55,883][1651274] Saving new best policy, reward=549.390! [2024-06-15 17:26:58,382][1651669] Updated weights for policy 0, policy_version 495078 (0.0012) [2024-06-15 17:27:00,369][1651669] Updated weights for policy 0, policy_version 495152 (0.0010) [2024-06-15 17:27:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 51336.5, 300 sec: 48430.0). Total num frames: 1014104064. Throughput: 0: 12517.7. Samples: 253575680. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:00,767][1648981] Avg episode reward: [(0, '545.260')] [2024-06-15 17:27:02,814][1651669] Updated weights for policy 0, policy_version 495173 (0.0012) [2024-06-15 17:27:04,345][1651669] Updated weights for policy 0, policy_version 495230 (0.0013) [2024-06-15 17:27:05,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1014235136. Throughput: 0: 12492.8. Samples: 253643776. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:05,767][1648981] Avg episode reward: [(0, '557.830')] [2024-06-15 17:27:05,891][1651274] Signal inference workers to stop experience collection... (25950 times) [2024-06-15 17:27:05,958][1651669] InferenceWorker_p0-w0: stopping experience collection (25950 times) [2024-06-15 17:27:06,103][1651274] Saving new best policy, reward=557.830! [2024-06-15 17:27:06,103][1651274] Signal inference workers to resume experience collection... (25950 times) [2024-06-15 17:27:06,104][1651669] InferenceWorker_p0-w0: resuming experience collection (25950 times) [2024-06-15 17:27:06,689][1651669] Updated weights for policy 0, policy_version 495296 (0.0012) [2024-06-15 17:27:09,847][1651669] Updated weights for policy 0, policy_version 495360 (0.0012) [2024-06-15 17:27:10,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 51343.2, 300 sec: 48541.1). Total num frames: 1014562816. Throughput: 0: 12679.4. Samples: 253688320. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:10,767][1648981] Avg episode reward: [(0, '545.460')] [2024-06-15 17:27:11,314][1651669] Updated weights for policy 0, policy_version 495419 (0.0103) [2024-06-15 17:27:14,729][1651669] Updated weights for policy 0, policy_version 495483 (0.0140) [2024-06-15 17:27:15,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 50250.6, 300 sec: 48430.0). Total num frames: 1014759424. Throughput: 0: 12586.0. Samples: 253755392. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:15,769][1648981] Avg episode reward: [(0, '557.590')] [2024-06-15 17:27:17,527][1651669] Updated weights for policy 0, policy_version 495547 (0.0118) [2024-06-15 17:27:20,033][1651669] Updated weights for policy 0, policy_version 495600 (0.0013) [2024-06-15 17:27:20,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 50790.2, 300 sec: 48654.1). Total num frames: 1015054336. Throughput: 0: 12413.3. Samples: 253833216. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:20,767][1648981] Avg episode reward: [(0, '543.010')] [2024-06-15 17:27:21,591][1651669] Updated weights for policy 0, policy_version 495678 (0.0011) [2024-06-15 17:27:25,335][1651669] Updated weights for policy 0, policy_version 495742 (0.0015) [2024-06-15 17:27:25,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50257.9, 300 sec: 48763.2). Total num frames: 1015283712. Throughput: 0: 12435.9. Samples: 253868032. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:25,767][1648981] Avg episode reward: [(0, '549.570')] [2024-06-15 17:27:28,872][1651669] Updated weights for policy 0, policy_version 495806 (0.0054) [2024-06-15 17:27:30,782][1648981] Fps is (10 sec: 42532.2, 60 sec: 49145.5, 300 sec: 48427.4). Total num frames: 1015480320. Throughput: 0: 12306.4. Samples: 253938176. Policy #0 lag: (min: 7.0, avg: 111.9, max: 247.0) [2024-06-15 17:27:30,783][1648981] Avg episode reward: [(0, '503.650')] [2024-06-15 17:27:31,470][1651669] Updated weights for policy 0, policy_version 495872 (0.0010) [2024-06-15 17:27:35,120][1651669] Updated weights for policy 0, policy_version 495937 (0.0012) [2024-06-15 17:27:35,794][1648981] Fps is (10 sec: 42480.2, 60 sec: 48583.9, 300 sec: 48536.5). Total num frames: 1015709696. Throughput: 0: 12280.4. Samples: 254007296. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:27:35,796][1648981] Avg episode reward: [(0, '492.030')] [2024-06-15 17:27:36,555][1651669] Updated weights for policy 0, policy_version 495993 (0.0012) [2024-06-15 17:27:40,507][1651669] Updated weights for policy 0, policy_version 496048 (0.0011) [2024-06-15 17:27:40,766][1648981] Fps is (10 sec: 42665.6, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1015906304. Throughput: 0: 12265.3. Samples: 254045696. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:27:40,767][1648981] Avg episode reward: [(0, '495.060')] [2024-06-15 17:27:42,639][1651669] Updated weights for policy 0, policy_version 496128 (0.0012) [2024-06-15 17:27:43,809][1651669] Updated weights for policy 0, policy_version 496184 (0.0012) [2024-06-15 17:27:45,767][1648981] Fps is (10 sec: 49285.4, 60 sec: 48059.1, 300 sec: 48429.9). Total num frames: 1016201216. Throughput: 0: 11798.6. Samples: 254106624. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:27:45,768][1648981] Avg episode reward: [(0, '476.040')] [2024-06-15 17:27:46,140][1651274] Signal inference workers to stop experience collection... (26000 times) [2024-06-15 17:27:46,186][1651669] InferenceWorker_p0-w0: stopping experience collection (26000 times) [2024-06-15 17:27:46,411][1651274] Signal inference workers to resume experience collection... (26000 times) [2024-06-15 17:27:46,412][1651669] InferenceWorker_p0-w0: resuming experience collection (26000 times) [2024-06-15 17:27:46,908][1651669] Updated weights for policy 0, policy_version 496240 (0.0011) [2024-06-15 17:27:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 47985.7). Total num frames: 1016332288. Throughput: 0: 12106.0. Samples: 254188544. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:27:50,767][1648981] Avg episode reward: [(0, '467.790')] [2024-06-15 17:27:51,123][1651669] Updated weights for policy 0, policy_version 496261 (0.0032) [2024-06-15 17:27:52,801][1651669] Updated weights for policy 0, policy_version 496324 (0.0013) [2024-06-15 17:27:54,714][1651669] Updated weights for policy 0, policy_version 496400 (0.0017) [2024-06-15 17:27:55,791][1648981] Fps is (10 sec: 52308.3, 60 sec: 48040.8, 300 sec: 48426.1). Total num frames: 1016725504. Throughput: 0: 11826.7. Samples: 254220800. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:27:55,793][1648981] Avg episode reward: [(0, '456.690')] [2024-06-15 17:27:56,517][1651669] Updated weights for policy 0, policy_version 496452 (0.0015) [2024-06-15 17:28:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45875.2, 300 sec: 48430.1). Total num frames: 1016856576. Throughput: 0: 11969.5. Samples: 254294016. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:00,767][1648981] Avg episode reward: [(0, '445.830')] [2024-06-15 17:28:01,741][1651669] Updated weights for policy 0, policy_version 496513 (0.0012) [2024-06-15 17:28:03,548][1651669] Updated weights for policy 0, policy_version 496592 (0.0023) [2024-06-15 17:28:04,638][1651669] Updated weights for policy 0, policy_version 496636 (0.0035) [2024-06-15 17:28:05,766][1648981] Fps is (10 sec: 45984.6, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1017184256. Throughput: 0: 11798.8. Samples: 254364160. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:05,767][1648981] Avg episode reward: [(0, '443.190')] [2024-06-15 17:28:05,905][1651669] Updated weights for policy 0, policy_version 496678 (0.0013) [2024-06-15 17:28:07,181][1651669] Updated weights for policy 0, policy_version 496722 (0.0075) [2024-06-15 17:28:07,858][1651669] Updated weights for policy 0, policy_version 496766 (0.0075) [2024-06-15 17:28:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1017380864. Throughput: 0: 11764.6. Samples: 254397440. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:10,767][1648981] Avg episode reward: [(0, '438.440')] [2024-06-15 17:28:13,782][1651669] Updated weights for policy 0, policy_version 496832 (0.0010) [2024-06-15 17:28:15,160][1651669] Updated weights for policy 0, policy_version 496892 (0.0014) [2024-06-15 17:28:15,767][1648981] Fps is (10 sec: 49150.0, 60 sec: 48605.7, 300 sec: 48431.9). Total num frames: 1017675776. Throughput: 0: 11996.3. Samples: 254477824. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:15,768][1648981] Avg episode reward: [(0, '448.810')] [2024-06-15 17:28:17,016][1651669] Updated weights for policy 0, policy_version 496953 (0.0011) [2024-06-15 17:28:20,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 47513.8, 300 sec: 48430.0). Total num frames: 1017905152. Throughput: 0: 11794.7. Samples: 254537728. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:20,767][1648981] Avg episode reward: [(0, '431.840')] [2024-06-15 17:28:24,903][1651669] Updated weights for policy 0, policy_version 497056 (0.0013) [2024-06-15 17:28:25,767][1648981] Fps is (10 sec: 32768.8, 60 sec: 45328.9, 300 sec: 48097.4). Total num frames: 1018003456. Throughput: 0: 11844.2. Samples: 254578688. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:25,767][1648981] Avg episode reward: [(0, '449.210')] [2024-06-15 17:28:26,903][1651274] Signal inference workers to stop experience collection... (26050 times) [2024-06-15 17:28:26,998][1651669] InferenceWorker_p0-w0: stopping experience collection (26050 times) [2024-06-15 17:28:27,212][1651274] Signal inference workers to resume experience collection... (26050 times) [2024-06-15 17:28:27,226][1651669] InferenceWorker_p0-w0: resuming experience collection (26050 times) [2024-06-15 17:28:27,363][1651669] Updated weights for policy 0, policy_version 497138 (0.0010) [2024-06-15 17:28:28,569][1651669] Updated weights for policy 0, policy_version 497200 (0.0157) [2024-06-15 17:28:29,822][1651669] Updated weights for policy 0, policy_version 497233 (0.0012) [2024-06-15 17:28:30,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48618.6, 300 sec: 48541.1). Total num frames: 1018396672. Throughput: 0: 11821.7. Samples: 254638592. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:30,767][1648981] Avg episode reward: [(0, '458.940')] [2024-06-15 17:28:35,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 45350.1, 300 sec: 47985.7). Total num frames: 1018429440. Throughput: 0: 11719.1. Samples: 254715904. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:35,767][1648981] Avg episode reward: [(0, '456.120')] [2024-06-15 17:28:36,829][1651669] Updated weights for policy 0, policy_version 497297 (0.0012) [2024-06-15 17:28:38,145][1651669] Updated weights for policy 0, policy_version 497347 (0.0112) [2024-06-15 17:28:40,028][1651669] Updated weights for policy 0, policy_version 497429 (0.0117) [2024-06-15 17:28:40,767][1648981] Fps is (10 sec: 39317.1, 60 sec: 48058.8, 300 sec: 48096.9). Total num frames: 1018789888. Throughput: 0: 11622.6. Samples: 254743552. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:40,768][1648981] Avg episode reward: [(0, '453.580')] [2024-06-15 17:28:42,465][1651669] Updated weights for policy 0, policy_version 497520 (0.0012) [2024-06-15 17:28:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 45875.8, 300 sec: 47985.7). Total num frames: 1018953728. Throughput: 0: 11229.9. Samples: 254799360. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:45,767][1648981] Avg episode reward: [(0, '452.070')] [2024-06-15 17:28:48,767][1651669] Updated weights for policy 0, policy_version 497568 (0.0012) [2024-06-15 17:28:50,326][1651669] Updated weights for policy 0, policy_version 497619 (0.0012) [2024-06-15 17:28:50,766][1648981] Fps is (10 sec: 36049.0, 60 sec: 46967.5, 300 sec: 47764.0). Total num frames: 1019150336. Throughput: 0: 11400.5. Samples: 254877184. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:50,767][1648981] Avg episode reward: [(0, '452.520')] [2024-06-15 17:28:51,716][1651669] Updated weights for policy 0, policy_version 497680 (0.0011) [2024-06-15 17:28:53,453][1651669] Updated weights for policy 0, policy_version 497746 (0.0013) [2024-06-15 17:28:54,502][1651669] Updated weights for policy 0, policy_version 497791 (0.0026) [2024-06-15 17:28:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45893.4, 300 sec: 47985.7). Total num frames: 1019478016. Throughput: 0: 11275.4. Samples: 254904832. Policy #0 lag: (min: 48.0, avg: 147.3, max: 304.0) [2024-06-15 17:28:55,767][1648981] Avg episode reward: [(0, '453.560')] [2024-06-15 17:28:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000497792_1019478016.pth... [2024-06-15 17:28:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000492160_1007943680.pth [2024-06-15 17:29:00,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47541.8). Total num frames: 1019609088. Throughput: 0: 11298.2. Samples: 254986240. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:00,767][1648981] Avg episode reward: [(0, '456.500')] [2024-06-15 17:29:00,859][1651669] Updated weights for policy 0, policy_version 497859 (0.0012) [2024-06-15 17:29:02,457][1651669] Updated weights for policy 0, policy_version 497925 (0.0013) [2024-06-15 17:29:04,251][1651669] Updated weights for policy 0, policy_version 498000 (0.0125) [2024-06-15 17:29:04,284][1651274] Signal inference workers to stop experience collection... (26100 times) [2024-06-15 17:29:04,372][1651669] InferenceWorker_p0-w0: stopping experience collection (26100 times) [2024-06-15 17:29:04,508][1651274] Signal inference workers to resume experience collection... (26100 times) [2024-06-15 17:29:04,519][1651669] InferenceWorker_p0-w0: resuming experience collection (26100 times) [2024-06-15 17:29:05,769][1648981] Fps is (10 sec: 52413.8, 60 sec: 46965.2, 300 sec: 48207.4). Total num frames: 1020002304. Throughput: 0: 11263.3. Samples: 255044608. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:05,770][1648981] Avg episode reward: [(0, '436.320')] [2024-06-15 17:29:10,515][1651669] Updated weights for policy 0, policy_version 498052 (0.0015) [2024-06-15 17:29:10,806][1648981] Fps is (10 sec: 42429.6, 60 sec: 44207.5, 300 sec: 47312.8). Total num frames: 1020035072. Throughput: 0: 11208.6. Samples: 255083520. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:10,807][1648981] Avg episode reward: [(0, '462.690')] [2024-06-15 17:29:11,328][1651669] Updated weights for policy 0, policy_version 498105 (0.0013) [2024-06-15 17:29:12,873][1651669] Updated weights for policy 0, policy_version 498160 (0.0011) [2024-06-15 17:29:14,409][1651669] Updated weights for policy 0, policy_version 498210 (0.0125) [2024-06-15 17:29:15,766][1648981] Fps is (10 sec: 42611.0, 60 sec: 45875.6, 300 sec: 48096.8). Total num frames: 1020428288. Throughput: 0: 11423.3. Samples: 255152640. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:15,767][1648981] Avg episode reward: [(0, '452.790')] [2024-06-15 17:29:16,174][1651669] Updated weights for policy 0, policy_version 498288 (0.0147) [2024-06-15 17:29:20,766][1648981] Fps is (10 sec: 49347.9, 60 sec: 43690.6, 300 sec: 47541.8). Total num frames: 1020526592. Throughput: 0: 11320.9. Samples: 255225344. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:20,767][1648981] Avg episode reward: [(0, '443.690')] [2024-06-15 17:29:22,164][1651669] Updated weights for policy 0, policy_version 498336 (0.0012) [2024-06-15 17:29:22,793][1651669] Updated weights for policy 0, policy_version 498361 (0.0011) [2024-06-15 17:29:24,145][1651669] Updated weights for policy 0, policy_version 498424 (0.0011) [2024-06-15 17:29:25,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.8, 300 sec: 47875.9). Total num frames: 1020887040. Throughput: 0: 11605.6. Samples: 255265792. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:25,767][1648981] Avg episode reward: [(0, '439.790')] [2024-06-15 17:29:26,538][1651669] Updated weights for policy 0, policy_version 498512 (0.0081) [2024-06-15 17:29:27,617][1651669] Updated weights for policy 0, policy_version 498559 (0.0011) [2024-06-15 17:29:30,767][1648981] Fps is (10 sec: 52428.7, 60 sec: 44236.7, 300 sec: 47541.4). Total num frames: 1021050880. Throughput: 0: 11650.8. Samples: 255323648. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:30,767][1648981] Avg episode reward: [(0, '437.040')] [2024-06-15 17:29:33,335][1651669] Updated weights for policy 0, policy_version 498615 (0.0012) [2024-06-15 17:29:35,610][1651669] Updated weights for policy 0, policy_version 498672 (0.0026) [2024-06-15 17:29:35,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1021280256. Throughput: 0: 11707.7. Samples: 255404032. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:35,767][1648981] Avg episode reward: [(0, '437.180')] [2024-06-15 17:29:37,574][1651669] Updated weights for policy 0, policy_version 498738 (0.0013) [2024-06-15 17:29:39,377][1651669] Updated weights for policy 0, policy_version 498805 (0.0015) [2024-06-15 17:29:40,767][1648981] Fps is (10 sec: 52427.1, 60 sec: 46421.9, 300 sec: 47765.4). Total num frames: 1021575168. Throughput: 0: 11650.7. Samples: 255429120. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:40,767][1648981] Avg episode reward: [(0, '467.180')] [2024-06-15 17:29:44,773][1651669] Updated weights for policy 0, policy_version 498876 (0.0033) [2024-06-15 17:29:45,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 45875.1, 300 sec: 47430.3). Total num frames: 1021706240. Throughput: 0: 11525.7. Samples: 255504896. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:45,767][1648981] Avg episode reward: [(0, '487.330')] [2024-06-15 17:29:46,262][1651274] Signal inference workers to stop experience collection... (26150 times) [2024-06-15 17:29:46,364][1651669] InferenceWorker_p0-w0: stopping experience collection (26150 times) [2024-06-15 17:29:46,462][1651274] Signal inference workers to resume experience collection... (26150 times) [2024-06-15 17:29:46,463][1651669] InferenceWorker_p0-w0: resuming experience collection (26150 times) [2024-06-15 17:29:46,666][1651669] Updated weights for policy 0, policy_version 498913 (0.0012) [2024-06-15 17:29:47,847][1651669] Updated weights for policy 0, policy_version 498967 (0.0010) [2024-06-15 17:29:49,314][1651669] Updated weights for policy 0, policy_version 499026 (0.0011) [2024-06-15 17:29:50,394][1651669] Updated weights for policy 0, policy_version 499071 (0.0010) [2024-06-15 17:29:50,766][1648981] Fps is (10 sec: 52431.0, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1022099456. Throughput: 0: 11742.6. Samples: 255572992. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:50,767][1648981] Avg episode reward: [(0, '495.870')] [2024-06-15 17:29:55,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45329.0, 300 sec: 47434.8). Total num frames: 1022197760. Throughput: 0: 11797.8. Samples: 255613952. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:29:55,767][1648981] Avg episode reward: [(0, '477.110')] [2024-06-15 17:29:55,782][1651669] Updated weights for policy 0, policy_version 499128 (0.0010) [2024-06-15 17:29:58,442][1651669] Updated weights for policy 0, policy_version 499216 (0.0193) [2024-06-15 17:30:00,137][1651669] Updated weights for policy 0, policy_version 499281 (0.0098) [2024-06-15 17:30:00,768][1648981] Fps is (10 sec: 49142.8, 60 sec: 49696.6, 300 sec: 48207.5). Total num frames: 1022590976. Throughput: 0: 11866.5. Samples: 255686656. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:30:00,769][1648981] Avg episode reward: [(0, '483.640')] [2024-06-15 17:30:05,290][1651669] Updated weights for policy 0, policy_version 499344 (0.0012) [2024-06-15 17:30:05,768][1648981] Fps is (10 sec: 49144.4, 60 sec: 44783.8, 300 sec: 47763.4). Total num frames: 1022689280. Throughput: 0: 11934.9. Samples: 255762432. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:30:05,769][1648981] Avg episode reward: [(0, '472.720')] [2024-06-15 17:30:06,293][1651669] Updated weights for policy 0, policy_version 499384 (0.0012) [2024-06-15 17:30:08,196][1651669] Updated weights for policy 0, policy_version 499424 (0.0012) [2024-06-15 17:30:09,598][1651669] Updated weights for policy 0, policy_version 499473 (0.0032) [2024-06-15 17:30:10,774][1648981] Fps is (10 sec: 42572.8, 60 sec: 49724.6, 300 sec: 47984.7). Total num frames: 1023016960. Throughput: 0: 11853.6. Samples: 255799296. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:30:10,775][1648981] Avg episode reward: [(0, '494.680')] [2024-06-15 17:30:11,039][1651669] Updated weights for policy 0, policy_version 499538 (0.0013) [2024-06-15 17:30:11,714][1651669] Updated weights for policy 0, policy_version 499584 (0.0010) [2024-06-15 17:30:15,770][1648981] Fps is (10 sec: 45864.5, 60 sec: 45326.0, 300 sec: 47540.7). Total num frames: 1023148032. Throughput: 0: 12093.5. Samples: 255867904. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:30:15,771][1648981] Avg episode reward: [(0, '487.990')] [2024-06-15 17:30:16,656][1651669] Updated weights for policy 0, policy_version 499645 (0.0022) [2024-06-15 17:30:19,465][1651669] Updated weights for policy 0, policy_version 499703 (0.0010) [2024-06-15 17:30:20,766][1648981] Fps is (10 sec: 45911.1, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1023475712. Throughput: 0: 12014.9. Samples: 255944704. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:30:20,767][1648981] Avg episode reward: [(0, '475.120')] [2024-06-15 17:30:21,296][1651669] Updated weights for policy 0, policy_version 499761 (0.0011) [2024-06-15 17:30:22,347][1651274] Signal inference workers to stop experience collection... (26200 times) [2024-06-15 17:30:22,413][1651669] InferenceWorker_p0-w0: stopping experience collection (26200 times) [2024-06-15 17:30:22,637][1651274] Signal inference workers to resume experience collection... (26200 times) [2024-06-15 17:30:22,660][1651669] InferenceWorker_p0-w0: resuming experience collection (26200 times) [2024-06-15 17:30:22,848][1651669] Updated weights for policy 0, policy_version 499840 (0.0013) [2024-06-15 17:30:25,774][1648981] Fps is (10 sec: 52408.9, 60 sec: 46415.3, 300 sec: 47540.1). Total num frames: 1023672320. Throughput: 0: 12092.6. Samples: 255973376. Policy #0 lag: (min: 15.0, avg: 81.8, max: 271.0) [2024-06-15 17:30:25,775][1648981] Avg episode reward: [(0, '481.600')] [2024-06-15 17:30:29,150][1651669] Updated weights for policy 0, policy_version 499908 (0.0011) [2024-06-15 17:30:30,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 47542.6). Total num frames: 1023934464. Throughput: 0: 12151.5. Samples: 256051712. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:30:30,767][1648981] Avg episode reward: [(0, '481.600')] [2024-06-15 17:30:30,876][1651669] Updated weights for policy 0, policy_version 499970 (0.0011) [2024-06-15 17:30:32,664][1651669] Updated weights for policy 0, policy_version 500034 (0.0011) [2024-06-15 17:30:33,488][1651669] Updated weights for policy 0, policy_version 500081 (0.0012) [2024-06-15 17:30:35,766][1648981] Fps is (10 sec: 52469.6, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1024196608. Throughput: 0: 12242.5. Samples: 256123904. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:30:35,767][1648981] Avg episode reward: [(0, '466.100')] [2024-06-15 17:30:38,434][1651669] Updated weights for policy 0, policy_version 500159 (0.0134) [2024-06-15 17:30:40,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.8, 300 sec: 47652.4). Total num frames: 1024393216. Throughput: 0: 12106.0. Samples: 256158720. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:30:40,767][1648981] Avg episode reward: [(0, '439.870')] [2024-06-15 17:30:41,864][1651669] Updated weights for policy 0, policy_version 500240 (0.0012) [2024-06-15 17:30:43,338][1651669] Updated weights for policy 0, policy_version 500293 (0.0014) [2024-06-15 17:30:44,847][1651669] Updated weights for policy 0, policy_version 500352 (0.0014) [2024-06-15 17:30:45,768][1648981] Fps is (10 sec: 52419.8, 60 sec: 50242.9, 300 sec: 47987.8). Total num frames: 1024720896. Throughput: 0: 11889.8. Samples: 256221696. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:30:45,769][1648981] Avg episode reward: [(0, '434.970')] [2024-06-15 17:30:49,429][1651669] Updated weights for policy 0, policy_version 500414 (0.0013) [2024-06-15 17:30:50,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 45875.0, 300 sec: 47541.3). Total num frames: 1024851968. Throughput: 0: 11924.3. Samples: 256299008. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:30:50,767][1648981] Avg episode reward: [(0, '438.270')] [2024-06-15 17:30:53,393][1651669] Updated weights for policy 0, policy_version 500486 (0.0012) [2024-06-15 17:30:54,941][1651669] Updated weights for policy 0, policy_version 500549 (0.0130) [2024-06-15 17:30:55,767][1648981] Fps is (10 sec: 45879.2, 60 sec: 49697.5, 300 sec: 47985.5). Total num frames: 1025179648. Throughput: 0: 11857.5. Samples: 256332800. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:30:55,768][1648981] Avg episode reward: [(0, '437.870')] [2024-06-15 17:30:56,370][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000500608_1025245184.pth... [2024-06-15 17:30:56,409][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000495040_1013841920.pth [2024-06-15 17:30:56,414][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000500608_1025245184.pth [2024-06-15 17:31:00,124][1651669] Updated weights for policy 0, policy_version 500609 (0.0013) [2024-06-15 17:31:00,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 45330.5, 300 sec: 47430.3). Total num frames: 1025310720. Throughput: 0: 11833.9. Samples: 256400384. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:00,767][1648981] Avg episode reward: [(0, '428.300')] [2024-06-15 17:31:01,351][1651669] Updated weights for policy 0, policy_version 500670 (0.0123) [2024-06-15 17:31:05,147][1651274] Signal inference workers to stop experience collection... (26250 times) [2024-06-15 17:31:05,206][1651669] InferenceWorker_p0-w0: stopping experience collection (26250 times) [2024-06-15 17:31:05,484][1651274] Signal inference workers to resume experience collection... (26250 times) [2024-06-15 17:31:05,486][1651669] InferenceWorker_p0-w0: resuming experience collection (26250 times) [2024-06-15 17:31:05,620][1651669] Updated weights for policy 0, policy_version 500754 (0.0013) [2024-06-15 17:31:05,766][1648981] Fps is (10 sec: 36048.0, 60 sec: 47514.9, 300 sec: 47653.7). Total num frames: 1025540096. Throughput: 0: 11548.5. Samples: 256464384. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:05,767][1648981] Avg episode reward: [(0, '434.890')] [2024-06-15 17:31:07,037][1651669] Updated weights for policy 0, policy_version 500819 (0.0042) [2024-06-15 17:31:07,975][1651669] Updated weights for policy 0, policy_version 500862 (0.0012) [2024-06-15 17:31:10,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 45881.1, 300 sec: 47542.6). Total num frames: 1025769472. Throughput: 0: 11527.6. Samples: 256492032. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:10,767][1648981] Avg episode reward: [(0, '423.850')] [2024-06-15 17:31:12,853][1651669] Updated weights for policy 0, policy_version 500928 (0.0084) [2024-06-15 17:31:15,791][1648981] Fps is (10 sec: 42493.0, 60 sec: 46951.2, 300 sec: 47315.2). Total num frames: 1025966080. Throughput: 0: 11621.7. Samples: 256574976. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:15,792][1648981] Avg episode reward: [(0, '447.590')] [2024-06-15 17:31:16,775][1651669] Updated weights for policy 0, policy_version 501011 (0.0014) [2024-06-15 17:31:19,117][1651669] Updated weights for policy 0, policy_version 501115 (0.0012) [2024-06-15 17:31:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 46967.5, 300 sec: 47544.0). Total num frames: 1026293760. Throughput: 0: 11366.4. Samples: 256635392. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:20,767][1648981] Avg episode reward: [(0, '432.960')] [2024-06-15 17:31:23,244][1651669] Updated weights for policy 0, policy_version 501154 (0.0015) [2024-06-15 17:31:23,852][1651669] Updated weights for policy 0, policy_version 501183 (0.0037) [2024-06-15 17:31:25,766][1648981] Fps is (10 sec: 45989.2, 60 sec: 45881.2, 300 sec: 47098.3). Total num frames: 1026424832. Throughput: 0: 11548.5. Samples: 256678400. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:25,767][1648981] Avg episode reward: [(0, '420.840')] [2024-06-15 17:31:27,711][1651669] Updated weights for policy 0, policy_version 501248 (0.0094) [2024-06-15 17:31:28,804][1651669] Updated weights for policy 0, policy_version 501299 (0.0185) [2024-06-15 17:31:30,213][1651669] Updated weights for policy 0, policy_version 501364 (0.0011) [2024-06-15 17:31:30,786][1648981] Fps is (10 sec: 52324.3, 60 sec: 48043.7, 300 sec: 47538.3). Total num frames: 1026818048. Throughput: 0: 11782.6. Samples: 256752128. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:30,787][1648981] Avg episode reward: [(0, '423.310')] [2024-06-15 17:31:34,304][1651669] Updated weights for policy 0, policy_version 501409 (0.0017) [2024-06-15 17:31:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1026949120. Throughput: 0: 11719.2. Samples: 256826368. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:35,767][1648981] Avg episode reward: [(0, '444.640')] [2024-06-15 17:31:36,811][1651669] Updated weights for policy 0, policy_version 501443 (0.0011) [2024-06-15 17:31:38,861][1651669] Updated weights for policy 0, policy_version 501536 (0.0015) [2024-06-15 17:31:40,296][1651669] Updated weights for policy 0, policy_version 501600 (0.0011) [2024-06-15 17:31:40,767][1648981] Fps is (10 sec: 49249.5, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1027309568. Throughput: 0: 11651.0. Samples: 256857088. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:40,767][1648981] Avg episode reward: [(0, '450.120')] [2024-06-15 17:31:45,337][1651274] Signal inference workers to stop experience collection... (26300 times) [2024-06-15 17:31:45,405][1651669] InferenceWorker_p0-w0: stopping experience collection (26300 times) [2024-06-15 17:31:45,408][1651669] Updated weights for policy 0, policy_version 501653 (0.0014) [2024-06-15 17:31:45,610][1651274] Signal inference workers to resume experience collection... (26300 times) [2024-06-15 17:31:45,611][1651669] InferenceWorker_p0-w0: resuming experience collection (26300 times) [2024-06-15 17:31:45,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 44784.3, 300 sec: 46874.9). Total num frames: 1027407872. Throughput: 0: 11821.5. Samples: 256932352. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:45,767][1648981] Avg episode reward: [(0, '458.700')] [2024-06-15 17:31:46,382][1651669] Updated weights for policy 0, policy_version 501695 (0.0015) [2024-06-15 17:31:48,452][1651669] Updated weights for policy 0, policy_version 501744 (0.0050) [2024-06-15 17:31:49,773][1651669] Updated weights for policy 0, policy_version 501808 (0.0014) [2024-06-15 17:31:50,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 49152.3, 300 sec: 47319.3). Total num frames: 1027801088. Throughput: 0: 11980.8. Samples: 257003520. Policy #0 lag: (min: 15.0, avg: 113.3, max: 271.0) [2024-06-15 17:31:50,767][1648981] Avg episode reward: [(0, '464.460')] [2024-06-15 17:31:51,254][1651669] Updated weights for policy 0, policy_version 501879 (0.0012) [2024-06-15 17:31:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 45875.8, 300 sec: 46874.9). Total num frames: 1027932160. Throughput: 0: 12231.1. Samples: 257042432. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:31:55,767][1648981] Avg episode reward: [(0, '477.080')] [2024-06-15 17:31:55,971][1651669] Updated weights for policy 0, policy_version 501923 (0.0033) [2024-06-15 17:31:58,064][1651669] Updated weights for policy 0, policy_version 501969 (0.0017) [2024-06-15 17:31:59,720][1651669] Updated weights for policy 0, policy_version 502039 (0.0015) [2024-06-15 17:32:00,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1028259840. Throughput: 0: 12021.6. Samples: 257115648. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:00,767][1648981] Avg episode reward: [(0, '478.000')] [2024-06-15 17:32:00,843][1651669] Updated weights for policy 0, policy_version 502096 (0.0016) [2024-06-15 17:32:01,798][1651669] Updated weights for policy 0, policy_version 502144 (0.0011) [2024-06-15 17:32:05,782][1648981] Fps is (10 sec: 45802.8, 60 sec: 47501.0, 300 sec: 46872.4). Total num frames: 1028390912. Throughput: 0: 12386.0. Samples: 257192960. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:05,783][1648981] Avg episode reward: [(0, '468.080')] [2024-06-15 17:32:07,723][1651669] Updated weights for policy 0, policy_version 502208 (0.0012) [2024-06-15 17:32:10,187][1651669] Updated weights for policy 0, policy_version 502257 (0.0011) [2024-06-15 17:32:10,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1028653056. Throughput: 0: 12197.0. Samples: 257227264. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:10,767][1648981] Avg episode reward: [(0, '458.570')] [2024-06-15 17:32:12,034][1651669] Updated weights for policy 0, policy_version 502336 (0.0011) [2024-06-15 17:32:15,766][1648981] Fps is (10 sec: 52511.9, 60 sec: 49172.3, 300 sec: 46986.0). Total num frames: 1028915200. Throughput: 0: 11804.0. Samples: 257283072. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:15,767][1648981] Avg episode reward: [(0, '462.410')] [2024-06-15 17:32:18,633][1651669] Updated weights for policy 0, policy_version 502416 (0.0011) [2024-06-15 17:32:20,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 45875.2, 300 sec: 46652.7). Total num frames: 1029046272. Throughput: 0: 11855.7. Samples: 257359872. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:20,767][1648981] Avg episode reward: [(0, '496.280')] [2024-06-15 17:32:21,095][1651669] Updated weights for policy 0, policy_version 502480 (0.0150) [2024-06-15 17:32:22,792][1651669] Updated weights for policy 0, policy_version 502547 (0.0012) [2024-06-15 17:32:23,171][1651274] Signal inference workers to stop experience collection... (26350 times) [2024-06-15 17:32:23,248][1651669] InferenceWorker_p0-w0: stopping experience collection (26350 times) [2024-06-15 17:32:23,460][1651274] Signal inference workers to resume experience collection... (26350 times) [2024-06-15 17:32:23,462][1651669] InferenceWorker_p0-w0: resuming experience collection (26350 times) [2024-06-15 17:32:24,555][1651669] Updated weights for policy 0, policy_version 502624 (0.0010) [2024-06-15 17:32:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 47321.7). Total num frames: 1029439488. Throughput: 0: 11810.2. Samples: 257388544. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:25,767][1648981] Avg episode reward: [(0, '502.760')] [2024-06-15 17:32:30,571][1651669] Updated weights for policy 0, policy_version 502714 (0.0014) [2024-06-15 17:32:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 45890.4, 300 sec: 46990.4). Total num frames: 1029570560. Throughput: 0: 11844.2. Samples: 257465344. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:30,767][1648981] Avg episode reward: [(0, '481.040')] [2024-06-15 17:32:33,378][1651669] Updated weights for policy 0, policy_version 502755 (0.0027) [2024-06-15 17:32:35,569][1651669] Updated weights for policy 0, policy_version 502833 (0.0012) [2024-06-15 17:32:35,766][1648981] Fps is (10 sec: 36045.2, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1029799936. Throughput: 0: 11514.3. Samples: 257521664. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:35,767][1648981] Avg episode reward: [(0, '490.030')] [2024-06-15 17:32:36,989][1651669] Updated weights for policy 0, policy_version 502888 (0.0010) [2024-06-15 17:32:40,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 44236.9, 300 sec: 46652.9). Total num frames: 1029963776. Throughput: 0: 11468.8. Samples: 257558528. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:40,767][1648981] Avg episode reward: [(0, '483.140')] [2024-06-15 17:32:41,102][1651669] Updated weights for policy 0, policy_version 502933 (0.0013) [2024-06-15 17:32:44,360][1651669] Updated weights for policy 0, policy_version 502992 (0.0020) [2024-06-15 17:32:45,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 46967.4, 300 sec: 47097.0). Total num frames: 1030225920. Throughput: 0: 11525.7. Samples: 257634304. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:45,767][1648981] Avg episode reward: [(0, '459.160')] [2024-06-15 17:32:46,051][1651669] Updated weights for policy 0, policy_version 503056 (0.0014) [2024-06-15 17:32:47,869][1651669] Updated weights for policy 0, policy_version 503121 (0.0013) [2024-06-15 17:32:50,768][1648981] Fps is (10 sec: 52420.4, 60 sec: 44781.6, 300 sec: 46656.3). Total num frames: 1030488064. Throughput: 0: 11244.8. Samples: 257698816. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:50,769][1648981] Avg episode reward: [(0, '468.440')] [2024-06-15 17:32:52,145][1651669] Updated weights for policy 0, policy_version 503173 (0.0010) [2024-06-15 17:32:55,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 44782.7, 300 sec: 46652.7). Total num frames: 1030619136. Throughput: 0: 11263.9. Samples: 257734144. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:32:55,767][1648981] Avg episode reward: [(0, '444.650')] [2024-06-15 17:32:56,281][1651669] Updated weights for policy 0, policy_version 503250 (0.0012) [2024-06-15 17:32:56,555][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000503264_1030684672.pth... [2024-06-15 17:32:56,769][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000497792_1019478016.pth [2024-06-15 17:32:59,163][1651669] Updated weights for policy 0, policy_version 503345 (0.0011) [2024-06-15 17:33:00,319][1651669] Updated weights for policy 0, policy_version 503392 (0.0011) [2024-06-15 17:33:00,766][1648981] Fps is (10 sec: 45882.7, 60 sec: 44783.0, 300 sec: 46652.8). Total num frames: 1030946816. Throughput: 0: 11423.3. Samples: 257797120. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:33:00,767][1648981] Avg episode reward: [(0, '447.210')] [2024-06-15 17:33:04,435][1651669] Updated weights for policy 0, policy_version 503440 (0.0011) [2024-06-15 17:33:05,423][1651669] Updated weights for policy 0, policy_version 503488 (0.0126) [2024-06-15 17:33:05,778][1648981] Fps is (10 sec: 52368.6, 60 sec: 45878.2, 300 sec: 46650.9). Total num frames: 1031143424. Throughput: 0: 11340.6. Samples: 257870336. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:33:05,779][1648981] Avg episode reward: [(0, '447.740')] [2024-06-15 17:33:07,043][1651274] Signal inference workers to stop experience collection... (26400 times) [2024-06-15 17:33:07,094][1651669] InferenceWorker_p0-w0: stopping experience collection (26400 times) [2024-06-15 17:33:07,235][1651274] Signal inference workers to resume experience collection... (26400 times) [2024-06-15 17:33:07,236][1651669] InferenceWorker_p0-w0: resuming experience collection (26400 times) [2024-06-15 17:33:08,036][1651669] Updated weights for policy 0, policy_version 503536 (0.0040) [2024-06-15 17:33:09,805][1651669] Updated weights for policy 0, policy_version 503601 (0.0011) [2024-06-15 17:33:10,770][1648981] Fps is (10 sec: 49133.3, 60 sec: 46418.4, 300 sec: 46652.2). Total num frames: 1031438336. Throughput: 0: 11513.4. Samples: 257906688. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:33:10,771][1648981] Avg episode reward: [(0, '454.860')] [2024-06-15 17:33:11,460][1651669] Updated weights for policy 0, policy_version 503672 (0.0117) [2024-06-15 17:33:15,732][1651669] Updated weights for policy 0, policy_version 503728 (0.0013) [2024-06-15 17:33:15,766][1648981] Fps is (10 sec: 49210.7, 60 sec: 45329.1, 300 sec: 46541.7). Total num frames: 1031634944. Throughput: 0: 11457.4. Samples: 257980928. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:33:15,767][1648981] Avg episode reward: [(0, '454.480')] [2024-06-15 17:33:18,562][1651669] Updated weights for policy 0, policy_version 503780 (0.0010) [2024-06-15 17:33:19,665][1651669] Updated weights for policy 0, policy_version 503831 (0.0012) [2024-06-15 17:33:20,766][1648981] Fps is (10 sec: 49170.6, 60 sec: 48059.7, 300 sec: 47208.2). Total num frames: 1031929856. Throughput: 0: 11650.8. Samples: 258045952. Policy #0 lag: (min: 15.0, avg: 107.7, max: 271.0) [2024-06-15 17:33:20,767][1648981] Avg episode reward: [(0, '435.940')] [2024-06-15 17:33:21,449][1651669] Updated weights for policy 0, policy_version 503906 (0.0012) [2024-06-15 17:33:25,329][1651669] Updated weights for policy 0, policy_version 503952 (0.0011) [2024-06-15 17:33:25,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 44783.0, 300 sec: 46541.7). Total num frames: 1032126464. Throughput: 0: 11741.9. Samples: 258086912. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:25,767][1648981] Avg episode reward: [(0, '426.920')] [2024-06-15 17:33:28,058][1651669] Updated weights for policy 0, policy_version 504003 (0.0136) [2024-06-15 17:33:29,116][1651669] Updated weights for policy 0, policy_version 504062 (0.0021) [2024-06-15 17:33:30,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 46964.5, 300 sec: 47318.6). Total num frames: 1032388608. Throughput: 0: 11729.5. Samples: 258162176. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:30,771][1648981] Avg episode reward: [(0, '428.580')] [2024-06-15 17:33:31,674][1651669] Updated weights for policy 0, policy_version 504144 (0.0109) [2024-06-15 17:33:32,820][1651669] Updated weights for policy 0, policy_version 504192 (0.0171) [2024-06-15 17:33:35,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46421.2, 300 sec: 46764.0). Total num frames: 1032585216. Throughput: 0: 11810.5. Samples: 258230272. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:35,767][1648981] Avg episode reward: [(0, '461.850')] [2024-06-15 17:33:37,123][1651669] Updated weights for policy 0, policy_version 504246 (0.0105) [2024-06-15 17:33:40,766][1648981] Fps is (10 sec: 42614.3, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1032814592. Throughput: 0: 11912.6. Samples: 258270208. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:40,767][1648981] Avg episode reward: [(0, '452.780')] [2024-06-15 17:33:40,880][1651669] Updated weights for policy 0, policy_version 504306 (0.0014) [2024-06-15 17:33:42,013][1651669] Updated weights for policy 0, policy_version 504352 (0.0011) [2024-06-15 17:33:43,325][1651274] Signal inference workers to stop experience collection... (26450 times) [2024-06-15 17:33:43,363][1651669] InferenceWorker_p0-w0: stopping experience collection (26450 times) [2024-06-15 17:33:43,526][1651274] Signal inference workers to resume experience collection... (26450 times) [2024-06-15 17:33:43,527][1651669] InferenceWorker_p0-w0: resuming experience collection (26450 times) [2024-06-15 17:33:43,906][1651669] Updated weights for policy 0, policy_version 504432 (0.0088) [2024-06-15 17:33:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1033109504. Throughput: 0: 11810.1. Samples: 258328576. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:45,767][1648981] Avg episode reward: [(0, '461.700')] [2024-06-15 17:33:47,712][1651669] Updated weights for policy 0, policy_version 504467 (0.0019) [2024-06-15 17:33:48,685][1651669] Updated weights for policy 0, policy_version 504508 (0.0011) [2024-06-15 17:33:50,770][1648981] Fps is (10 sec: 42582.4, 60 sec: 45873.5, 300 sec: 46652.1). Total num frames: 1033240576. Throughput: 0: 12074.0. Samples: 258413568. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:50,771][1648981] Avg episode reward: [(0, '459.450')] [2024-06-15 17:33:51,433][1651669] Updated weights for policy 0, policy_version 504548 (0.0013) [2024-06-15 17:33:52,896][1651669] Updated weights for policy 0, policy_version 504615 (0.0012) [2024-06-15 17:33:54,082][1651669] Updated weights for policy 0, policy_version 504672 (0.0015) [2024-06-15 17:33:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.5, 300 sec: 47541.3). Total num frames: 1033633792. Throughput: 0: 11913.5. Samples: 258442752. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:33:55,767][1648981] Avg episode reward: [(0, '444.100')] [2024-06-15 17:33:57,724][1651669] Updated weights for policy 0, policy_version 504726 (0.0014) [2024-06-15 17:34:00,774][1648981] Fps is (10 sec: 52408.2, 60 sec: 46961.4, 300 sec: 46652.0). Total num frames: 1033764864. Throughput: 0: 12183.5. Samples: 258529280. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:00,775][1648981] Avg episode reward: [(0, '458.430')] [2024-06-15 17:34:01,761][1651669] Updated weights for policy 0, policy_version 504800 (0.0012) [2024-06-15 17:34:04,519][1651669] Updated weights for policy 0, policy_version 504912 (0.0147) [2024-06-15 17:34:05,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50254.2, 300 sec: 47881.1). Total num frames: 1034158080. Throughput: 0: 11878.4. Samples: 258580480. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:05,767][1648981] Avg episode reward: [(0, '460.890')] [2024-06-15 17:34:09,400][1651669] Updated weights for policy 0, policy_version 504992 (0.0014) [2024-06-15 17:34:10,310][1651669] Updated weights for policy 0, policy_version 505024 (0.0011) [2024-06-15 17:34:10,799][1648981] Fps is (10 sec: 52300.0, 60 sec: 47490.9, 300 sec: 46980.8). Total num frames: 1034289152. Throughput: 0: 12097.3. Samples: 258631680. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:10,800][1648981] Avg episode reward: [(0, '475.650')] [2024-06-15 17:34:14,758][1651669] Updated weights for policy 0, policy_version 505107 (0.0089) [2024-06-15 17:34:15,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 1034551296. Throughput: 0: 11856.6. Samples: 258695680. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:15,767][1648981] Avg episode reward: [(0, '464.710')] [2024-06-15 17:34:16,091][1651669] Updated weights for policy 0, policy_version 505168 (0.0015) [2024-06-15 17:34:17,202][1651669] Updated weights for policy 0, policy_version 505213 (0.0013) [2024-06-15 17:34:20,766][1648981] Fps is (10 sec: 46024.6, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1034747904. Throughput: 0: 12037.7. Samples: 258771968. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:20,767][1648981] Avg episode reward: [(0, '468.360')] [2024-06-15 17:34:21,397][1651669] Updated weights for policy 0, policy_version 505280 (0.0012) [2024-06-15 17:34:24,416][1651274] Signal inference workers to stop experience collection... (26500 times) [2024-06-15 17:34:24,485][1651669] InferenceWorker_p0-w0: stopping experience collection (26500 times) [2024-06-15 17:34:24,745][1651274] Signal inference workers to resume experience collection... (26500 times) [2024-06-15 17:34:24,746][1651669] InferenceWorker_p0-w0: resuming experience collection (26500 times) [2024-06-15 17:34:24,747][1651669] Updated weights for policy 0, policy_version 505344 (0.0016) [2024-06-15 17:34:25,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 47319.2). Total num frames: 1035010048. Throughput: 0: 11969.4. Samples: 258808832. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:25,767][1648981] Avg episode reward: [(0, '471.950')] [2024-06-15 17:34:26,173][1651669] Updated weights for policy 0, policy_version 505398 (0.0011) [2024-06-15 17:34:27,893][1651669] Updated weights for policy 0, policy_version 505469 (0.0013) [2024-06-15 17:34:30,788][1648981] Fps is (10 sec: 45776.8, 60 sec: 46953.7, 300 sec: 47204.7). Total num frames: 1035206656. Throughput: 0: 11952.4. Samples: 258866688. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:30,788][1648981] Avg episode reward: [(0, '470.220')] [2024-06-15 17:34:32,882][1651669] Updated weights for policy 0, policy_version 505530 (0.0100) [2024-06-15 17:34:34,991][1651669] Updated weights for policy 0, policy_version 505568 (0.0030) [2024-06-15 17:34:35,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 47513.5, 300 sec: 46986.0). Total num frames: 1035436032. Throughput: 0: 11788.3. Samples: 258944000. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:35,767][1648981] Avg episode reward: [(0, '433.160')] [2024-06-15 17:34:37,087][1651669] Updated weights for policy 0, policy_version 505634 (0.0019) [2024-06-15 17:34:39,167][1651669] Updated weights for policy 0, policy_version 505727 (0.0015) [2024-06-15 17:34:40,766][1648981] Fps is (10 sec: 52541.4, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1035730944. Throughput: 0: 11571.2. Samples: 258963456. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:40,767][1648981] Avg episode reward: [(0, '435.270')] [2024-06-15 17:34:44,228][1651669] Updated weights for policy 0, policy_version 505777 (0.0014) [2024-06-15 17:34:45,782][1648981] Fps is (10 sec: 42531.7, 60 sec: 45863.1, 300 sec: 46650.2). Total num frames: 1035862016. Throughput: 0: 11455.4. Samples: 259044864. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:45,783][1648981] Avg episode reward: [(0, '434.460')] [2024-06-15 17:34:46,359][1651669] Updated weights for policy 0, policy_version 505813 (0.0012) [2024-06-15 17:34:47,419][1651669] Updated weights for policy 0, policy_version 505856 (0.0013) [2024-06-15 17:34:48,702][1651669] Updated weights for policy 0, policy_version 505904 (0.0103) [2024-06-15 17:34:50,056][1651669] Updated weights for policy 0, policy_version 505953 (0.0011) [2024-06-15 17:34:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50247.5, 300 sec: 47652.5). Total num frames: 1036255232. Throughput: 0: 11764.6. Samples: 259109888. Policy #0 lag: (min: 6.0, avg: 100.8, max: 262.0) [2024-06-15 17:34:50,767][1648981] Avg episode reward: [(0, '435.230')] [2024-06-15 17:34:54,003][1651669] Updated weights for policy 0, policy_version 506001 (0.0011) [2024-06-15 17:34:55,767][1648981] Fps is (10 sec: 52510.3, 60 sec: 45875.0, 300 sec: 46764.1). Total num frames: 1036386304. Throughput: 0: 11727.5. Samples: 259159040. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:34:55,767][1648981] Avg episode reward: [(0, '425.120')] [2024-06-15 17:34:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000506048_1036386304.pth... [2024-06-15 17:34:55,855][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000500608_1025245184.pth [2024-06-15 17:34:57,688][1651669] Updated weights for policy 0, policy_version 506082 (0.0013) [2024-06-15 17:34:59,427][1651669] Updated weights for policy 0, policy_version 506145 (0.0033) [2024-06-15 17:35:00,767][1648981] Fps is (10 sec: 42596.0, 60 sec: 48611.7, 300 sec: 47430.5). Total num frames: 1036681216. Throughput: 0: 11650.7. Samples: 259219968. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:00,767][1648981] Avg episode reward: [(0, '429.800')] [2024-06-15 17:35:01,117][1651669] Updated weights for policy 0, policy_version 506210 (0.0032) [2024-06-15 17:35:04,878][1651274] Signal inference workers to stop experience collection... (26550 times) [2024-06-15 17:35:04,958][1651669] Updated weights for policy 0, policy_version 506248 (0.0011) [2024-06-15 17:35:04,987][1651669] InferenceWorker_p0-w0: stopping experience collection (26550 times) [2024-06-15 17:35:05,031][1651274] Signal inference workers to resume experience collection... (26550 times) [2024-06-15 17:35:05,032][1651669] InferenceWorker_p0-w0: resuming experience collection (26550 times) [2024-06-15 17:35:05,766][1648981] Fps is (10 sec: 49153.7, 60 sec: 45329.1, 300 sec: 46987.2). Total num frames: 1036877824. Throughput: 0: 11662.2. Samples: 259296768. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:05,767][1648981] Avg episode reward: [(0, '424.270')] [2024-06-15 17:35:05,898][1651669] Updated weights for policy 0, policy_version 506302 (0.0012) [2024-06-15 17:35:10,237][1651669] Updated weights for policy 0, policy_version 506370 (0.0015) [2024-06-15 17:35:10,766][1648981] Fps is (10 sec: 39323.8, 60 sec: 46446.4, 300 sec: 47208.8). Total num frames: 1037074432. Throughput: 0: 11798.8. Samples: 259339776. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:10,767][1648981] Avg episode reward: [(0, '418.160')] [2024-06-15 17:35:12,113][1651669] Updated weights for policy 0, policy_version 506436 (0.0012) [2024-06-15 17:35:13,514][1651669] Updated weights for policy 0, policy_version 506496 (0.0013) [2024-06-15 17:35:15,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 45875.0, 300 sec: 46874.9). Total num frames: 1037303808. Throughput: 0: 11690.4. Samples: 259392512. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:15,768][1648981] Avg episode reward: [(0, '425.590')] [2024-06-15 17:35:17,520][1651669] Updated weights for policy 0, policy_version 506547 (0.0013) [2024-06-15 17:35:20,207][1651669] Updated weights for policy 0, policy_version 506596 (0.0032) [2024-06-15 17:35:20,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46421.2, 300 sec: 46987.2). Total num frames: 1037533184. Throughput: 0: 11696.4. Samples: 259470336. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:20,767][1648981] Avg episode reward: [(0, '427.580')] [2024-06-15 17:35:22,310][1651669] Updated weights for policy 0, policy_version 506673 (0.0019) [2024-06-15 17:35:25,766][1648981] Fps is (10 sec: 52430.7, 60 sec: 46967.5, 300 sec: 47097.1). Total num frames: 1037828096. Throughput: 0: 11753.2. Samples: 259492352. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:25,767][1648981] Avg episode reward: [(0, '442.650')] [2024-06-15 17:35:28,468][1651669] Updated weights for policy 0, policy_version 506768 (0.0212) [2024-06-15 17:35:29,414][1651669] Updated weights for policy 0, policy_version 506816 (0.0016) [2024-06-15 17:35:30,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 45891.4, 300 sec: 46652.7). Total num frames: 1037959168. Throughput: 0: 11632.1. Samples: 259568128. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:30,768][1648981] Avg episode reward: [(0, '458.160')] [2024-06-15 17:35:33,100][1651669] Updated weights for policy 0, policy_version 506880 (0.0041) [2024-06-15 17:35:34,724][1651669] Updated weights for policy 0, policy_version 506945 (0.0013) [2024-06-15 17:35:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.9, 300 sec: 47208.1). Total num frames: 1038319616. Throughput: 0: 11525.7. Samples: 259628544. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:35,767][1648981] Avg episode reward: [(0, '464.620')] [2024-06-15 17:35:36,124][1651669] Updated weights for policy 0, policy_version 507008 (0.0115) [2024-06-15 17:35:40,767][1648981] Fps is (10 sec: 45875.2, 60 sec: 44782.7, 300 sec: 46430.8). Total num frames: 1038417920. Throughput: 0: 11389.2. Samples: 259671552. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:40,767][1648981] Avg episode reward: [(0, '474.210')] [2024-06-15 17:35:41,172][1651669] Updated weights for policy 0, policy_version 507070 (0.0022) [2024-06-15 17:35:44,384][1651669] Updated weights for policy 0, policy_version 507140 (0.0014) [2024-06-15 17:35:44,674][1651274] Signal inference workers to stop experience collection... (26600 times) [2024-06-15 17:35:44,724][1651669] InferenceWorker_p0-w0: stopping experience collection (26600 times) [2024-06-15 17:35:44,903][1651274] Signal inference workers to resume experience collection... (26600 times) [2024-06-15 17:35:44,908][1651669] InferenceWorker_p0-w0: resuming experience collection (26600 times) [2024-06-15 17:35:45,499][1651669] Updated weights for policy 0, policy_version 507200 (0.0020) [2024-06-15 17:35:45,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48072.4, 300 sec: 47097.1). Total num frames: 1038745600. Throughput: 0: 11400.7. Samples: 259732992. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:45,767][1648981] Avg episode reward: [(0, '491.500')] [2024-06-15 17:35:47,146][1651669] Updated weights for policy 0, policy_version 507261 (0.0014) [2024-06-15 17:35:50,770][1648981] Fps is (10 sec: 45861.1, 60 sec: 43688.2, 300 sec: 46430.2). Total num frames: 1038876672. Throughput: 0: 11342.8. Samples: 259807232. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:50,770][1648981] Avg episode reward: [(0, '497.890')] [2024-06-15 17:35:52,740][1651669] Updated weights for policy 0, policy_version 507324 (0.0019) [2024-06-15 17:35:54,930][1651669] Updated weights for policy 0, policy_version 507376 (0.0011) [2024-06-15 17:35:55,771][1648981] Fps is (10 sec: 39303.0, 60 sec: 45871.8, 300 sec: 46874.1). Total num frames: 1039138816. Throughput: 0: 11183.2. Samples: 259843072. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:35:55,772][1648981] Avg episode reward: [(0, '503.620')] [2024-06-15 17:35:56,564][1651669] Updated weights for policy 0, policy_version 507429 (0.0011) [2024-06-15 17:35:57,542][1651669] Updated weights for policy 0, policy_version 507472 (0.0273) [2024-06-15 17:35:58,774][1651669] Updated weights for policy 0, policy_version 507520 (0.0026) [2024-06-15 17:36:00,766][1648981] Fps is (10 sec: 52446.6, 60 sec: 45329.5, 300 sec: 46986.0). Total num frames: 1039400960. Throughput: 0: 11503.0. Samples: 259910144. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:36:00,767][1648981] Avg episode reward: [(0, '492.830')] [2024-06-15 17:36:03,584][1651669] Updated weights for policy 0, policy_version 507578 (0.0106) [2024-06-15 17:36:05,647][1651669] Updated weights for policy 0, policy_version 507628 (0.0011) [2024-06-15 17:36:05,766][1648981] Fps is (10 sec: 49175.6, 60 sec: 45875.2, 300 sec: 46986.0). Total num frames: 1039630336. Throughput: 0: 11514.3. Samples: 259988480. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:36:05,767][1648981] Avg episode reward: [(0, '485.080')] [2024-06-15 17:36:07,265][1651669] Updated weights for policy 0, policy_version 507707 (0.0015) [2024-06-15 17:36:08,499][1651669] Updated weights for policy 0, policy_version 507748 (0.0013) [2024-06-15 17:36:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.6, 300 sec: 47323.2). Total num frames: 1039925248. Throughput: 0: 11696.4. Samples: 260018688. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:36:10,767][1648981] Avg episode reward: [(0, '506.760')] [2024-06-15 17:36:13,272][1651669] Updated weights for policy 0, policy_version 507779 (0.0012) [2024-06-15 17:36:15,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45875.5, 300 sec: 46652.7). Total num frames: 1040056320. Throughput: 0: 11833.0. Samples: 260100608. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:36:15,767][1648981] Avg episode reward: [(0, '513.770')] [2024-06-15 17:36:15,799][1651669] Updated weights for policy 0, policy_version 507856 (0.0012) [2024-06-15 17:36:17,854][1651669] Updated weights for policy 0, policy_version 507952 (0.0015) [2024-06-15 17:36:19,337][1651669] Updated weights for policy 0, policy_version 508024 (0.0014) [2024-06-15 17:36:20,778][1648981] Fps is (10 sec: 52366.3, 60 sec: 48596.3, 300 sec: 47539.4). Total num frames: 1040449536. Throughput: 0: 11943.5. Samples: 260166144. Policy #0 lag: (min: 15.0, avg: 112.1, max: 271.0) [2024-06-15 17:36:20,779][1648981] Avg episode reward: [(0, '525.730')] [2024-06-15 17:36:24,514][1651669] Updated weights for policy 0, policy_version 508069 (0.0138) [2024-06-15 17:36:25,782][1648981] Fps is (10 sec: 52346.2, 60 sec: 45863.1, 300 sec: 46653.4). Total num frames: 1040580608. Throughput: 0: 12101.8. Samples: 260216320. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:25,783][1648981] Avg episode reward: [(0, '510.600')] [2024-06-15 17:36:26,912][1651274] Signal inference workers to stop experience collection... (26650 times) [2024-06-15 17:36:26,978][1651669] InferenceWorker_p0-w0: stopping experience collection (26650 times) [2024-06-15 17:36:26,980][1651669] Updated weights for policy 0, policy_version 508119 (0.0014) [2024-06-15 17:36:27,070][1651274] Signal inference workers to resume experience collection... (26650 times) [2024-06-15 17:36:27,070][1651669] InferenceWorker_p0-w0: resuming experience collection (26650 times) [2024-06-15 17:36:28,479][1651669] Updated weights for policy 0, policy_version 508192 (0.0104) [2024-06-15 17:36:29,888][1651669] Updated weights for policy 0, policy_version 508257 (0.0012) [2024-06-15 17:36:30,766][1648981] Fps is (10 sec: 52491.6, 60 sec: 50244.5, 300 sec: 47541.4). Total num frames: 1040973824. Throughput: 0: 12083.2. Samples: 260276736. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:30,767][1648981] Avg episode reward: [(0, '486.080')] [2024-06-15 17:36:35,011][1651669] Updated weights for policy 0, policy_version 508304 (0.0013) [2024-06-15 17:36:35,766][1648981] Fps is (10 sec: 49229.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1041072128. Throughput: 0: 12288.9. Samples: 260360192. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:35,767][1648981] Avg episode reward: [(0, '468.620')] [2024-06-15 17:36:35,882][1651669] Updated weights for policy 0, policy_version 508345 (0.0012) [2024-06-15 17:36:37,919][1651669] Updated weights for policy 0, policy_version 508400 (0.0014) [2024-06-15 17:36:39,681][1651669] Updated weights for policy 0, policy_version 508469 (0.0012) [2024-06-15 17:36:40,790][1648981] Fps is (10 sec: 45766.0, 60 sec: 50224.5, 300 sec: 47537.5). Total num frames: 1041432576. Throughput: 0: 12248.7. Samples: 260394496. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:40,791][1648981] Avg episode reward: [(0, '464.900')] [2024-06-15 17:36:41,000][1651669] Updated weights for policy 0, policy_version 508534 (0.0011) [2024-06-15 17:36:45,656][1651669] Updated weights for policy 0, policy_version 508580 (0.0011) [2024-06-15 17:36:45,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 46652.7). Total num frames: 1041563648. Throughput: 0: 12640.7. Samples: 260478976. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:45,767][1648981] Avg episode reward: [(0, '474.850')] [2024-06-15 17:36:47,916][1651669] Updated weights for policy 0, policy_version 508640 (0.0012) [2024-06-15 17:36:49,240][1651669] Updated weights for policy 0, policy_version 508691 (0.0012) [2024-06-15 17:36:50,766][1648981] Fps is (10 sec: 49269.4, 60 sec: 50793.2, 300 sec: 47430.3). Total num frames: 1041924096. Throughput: 0: 12265.2. Samples: 260540416. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:50,767][1648981] Avg episode reward: [(0, '477.160')] [2024-06-15 17:36:51,012][1651669] Updated weights for policy 0, policy_version 508769 (0.0101) [2024-06-15 17:36:55,681][1651669] Updated weights for policy 0, policy_version 508826 (0.0012) [2024-06-15 17:36:55,782][1648981] Fps is (10 sec: 52345.1, 60 sec: 49142.8, 300 sec: 46872.4). Total num frames: 1042087936. Throughput: 0: 12522.5. Samples: 260582400. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:36:55,783][1648981] Avg episode reward: [(0, '504.300')] [2024-06-15 17:36:56,023][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000508848_1042120704.pth... [2024-06-15 17:36:56,057][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000503264_1030684672.pth [2024-06-15 17:36:56,432][1651669] Updated weights for policy 0, policy_version 508864 (0.0011) [2024-06-15 17:36:59,318][1651669] Updated weights for policy 0, policy_version 508944 (0.0013) [2024-06-15 17:37:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50244.2, 300 sec: 47543.9). Total num frames: 1042415616. Throughput: 0: 12447.3. Samples: 260660736. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:00,767][1648981] Avg episode reward: [(0, '505.150')] [2024-06-15 17:37:00,881][1651669] Updated weights for policy 0, policy_version 509008 (0.0128) [2024-06-15 17:37:01,006][1651274] Signal inference workers to stop experience collection... (26700 times) [2024-06-15 17:37:01,067][1651669] InferenceWorker_p0-w0: stopping experience collection (26700 times) [2024-06-15 17:37:01,231][1651274] Signal inference workers to resume experience collection... (26700 times) [2024-06-15 17:37:01,232][1651669] InferenceWorker_p0-w0: resuming experience collection (26700 times) [2024-06-15 17:37:05,713][1651669] Updated weights for policy 0, policy_version 509057 (0.0014) [2024-06-15 17:37:05,767][1648981] Fps is (10 sec: 45948.1, 60 sec: 48605.8, 300 sec: 47097.0). Total num frames: 1042546688. Throughput: 0: 12575.8. Samples: 260731904. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:05,767][1648981] Avg episode reward: [(0, '482.490')] [2024-06-15 17:37:07,031][1651669] Updated weights for policy 0, policy_version 509118 (0.0014) [2024-06-15 17:37:10,297][1651669] Updated weights for policy 0, policy_version 509168 (0.0013) [2024-06-15 17:37:10,774][1648981] Fps is (10 sec: 39291.0, 60 sec: 48053.5, 300 sec: 47095.8). Total num frames: 1042808832. Throughput: 0: 12267.4. Samples: 260768256. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:10,775][1648981] Avg episode reward: [(0, '472.340')] [2024-06-15 17:37:12,464][1651669] Updated weights for policy 0, policy_version 509249 (0.0012) [2024-06-15 17:37:13,670][1651669] Updated weights for policy 0, policy_version 509309 (0.0014) [2024-06-15 17:37:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 1043070976. Throughput: 0: 12276.6. Samples: 260829184. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:15,767][1648981] Avg episode reward: [(0, '449.090')] [2024-06-15 17:37:18,097][1651669] Updated weights for policy 0, policy_version 509350 (0.0013) [2024-06-15 17:37:20,772][1648981] Fps is (10 sec: 42606.0, 60 sec: 46425.9, 300 sec: 46762.9). Total num frames: 1043234816. Throughput: 0: 12149.8. Samples: 260907008. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:20,773][1648981] Avg episode reward: [(0, '466.920')] [2024-06-15 17:37:21,102][1651669] Updated weights for policy 0, policy_version 509408 (0.0130) [2024-06-15 17:37:22,485][1651669] Updated weights for policy 0, policy_version 509472 (0.0012) [2024-06-15 17:37:24,675][1651669] Updated weights for policy 0, policy_version 509559 (0.0012) [2024-06-15 17:37:25,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 50257.3, 300 sec: 47541.3). Total num frames: 1043595264. Throughput: 0: 11896.0. Samples: 260929536. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:25,768][1648981] Avg episode reward: [(0, '478.300')] [2024-06-15 17:37:29,087][1651669] Updated weights for policy 0, policy_version 509602 (0.0011) [2024-06-15 17:37:30,770][1648981] Fps is (10 sec: 49163.0, 60 sec: 45872.3, 300 sec: 47207.5). Total num frames: 1043726336. Throughput: 0: 11604.4. Samples: 261001216. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:30,771][1648981] Avg episode reward: [(0, '486.040')] [2024-06-15 17:37:31,861][1651669] Updated weights for policy 0, policy_version 509648 (0.0012) [2024-06-15 17:37:33,925][1651669] Updated weights for policy 0, policy_version 509698 (0.0012) [2024-06-15 17:37:35,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 1043988480. Throughput: 0: 11810.1. Samples: 261071872. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:35,767][1648981] Avg episode reward: [(0, '483.380')] [2024-06-15 17:37:36,169][1651669] Updated weights for policy 0, policy_version 509777 (0.0014) [2024-06-15 17:37:39,998][1651669] Updated weights for policy 0, policy_version 509843 (0.0014) [2024-06-15 17:37:40,766][1648981] Fps is (10 sec: 49170.5, 60 sec: 46439.8, 300 sec: 47430.3). Total num frames: 1044217856. Throughput: 0: 11586.7. Samples: 261103616. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:40,767][1648981] Avg episode reward: [(0, '516.350')] [2024-06-15 17:37:42,999][1651669] Updated weights for policy 0, policy_version 509890 (0.0012) [2024-06-15 17:37:45,391][1651669] Updated weights for policy 0, policy_version 509953 (0.0012) [2024-06-15 17:37:45,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 47513.6, 300 sec: 47208.4). Total num frames: 1044414464. Throughput: 0: 11423.3. Samples: 261174784. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:45,767][1648981] Avg episode reward: [(0, '511.390')] [2024-06-15 17:37:45,791][1651274] Signal inference workers to stop experience collection... (26750 times) [2024-06-15 17:37:45,870][1651669] InferenceWorker_p0-w0: stopping experience collection (26750 times) [2024-06-15 17:37:46,130][1651274] Signal inference workers to resume experience collection... (26750 times) [2024-06-15 17:37:46,146][1651669] InferenceWorker_p0-w0: resuming experience collection (26750 times) [2024-06-15 17:37:47,098][1651669] Updated weights for policy 0, policy_version 510016 (0.0013) [2024-06-15 17:37:50,772][1648981] Fps is (10 sec: 42573.2, 60 sec: 45324.6, 300 sec: 47540.5). Total num frames: 1044643840. Throughput: 0: 11296.7. Samples: 261240320. Policy #0 lag: (min: 15.0, avg: 90.4, max: 271.0) [2024-06-15 17:37:50,773][1648981] Avg episode reward: [(0, '485.020')] [2024-06-15 17:37:51,139][1651669] Updated weights for policy 0, policy_version 510084 (0.0012) [2024-06-15 17:37:54,978][1651669] Updated weights for policy 0, policy_version 510160 (0.0074) [2024-06-15 17:37:55,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46433.7, 300 sec: 47208.1). Total num frames: 1044873216. Throughput: 0: 11277.3. Samples: 261275648. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:37:55,767][1648981] Avg episode reward: [(0, '473.820')] [2024-06-15 17:37:55,886][1651669] Updated weights for policy 0, policy_version 510204 (0.0011) [2024-06-15 17:37:57,887][1651669] Updated weights for policy 0, policy_version 510243 (0.0010) [2024-06-15 17:37:59,442][1651669] Updated weights for policy 0, policy_version 510304 (0.0012) [2024-06-15 17:38:00,421][1651669] Updated weights for policy 0, policy_version 510335 (0.0012) [2024-06-15 17:38:00,766][1648981] Fps is (10 sec: 52460.1, 60 sec: 45875.2, 300 sec: 47543.3). Total num frames: 1045168128. Throughput: 0: 11457.4. Samples: 261344768. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:00,767][1648981] Avg episode reward: [(0, '496.450')] [2024-06-15 17:38:02,700][1651669] Updated weights for policy 0, policy_version 510392 (0.0016) [2024-06-15 17:38:05,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 47510.7, 300 sec: 47319.2). Total num frames: 1045397504. Throughput: 0: 11583.1. Samples: 261428224. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:05,771][1648981] Avg episode reward: [(0, '505.530')] [2024-06-15 17:38:05,911][1651669] Updated weights for policy 0, policy_version 510460 (0.0011) [2024-06-15 17:38:08,902][1651669] Updated weights for policy 0, policy_version 510531 (0.0018) [2024-06-15 17:38:10,284][1651669] Updated weights for policy 0, policy_version 510586 (0.0014) [2024-06-15 17:38:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48066.0, 300 sec: 47652.4). Total num frames: 1045692416. Throughput: 0: 11878.5. Samples: 261464064. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:10,767][1648981] Avg episode reward: [(0, '492.850')] [2024-06-15 17:38:12,990][1651669] Updated weights for policy 0, policy_version 510640 (0.0012) [2024-06-15 17:38:15,766][1648981] Fps is (10 sec: 42614.4, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1045823488. Throughput: 0: 11856.6. Samples: 261534720. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:15,767][1648981] Avg episode reward: [(0, '495.400')] [2024-06-15 17:38:16,646][1651669] Updated weights for policy 0, policy_version 510711 (0.0012) [2024-06-15 17:38:19,311][1651669] Updated weights for policy 0, policy_version 510768 (0.0012) [2024-06-15 17:38:20,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 48610.9, 300 sec: 47541.4). Total num frames: 1046151168. Throughput: 0: 11832.9. Samples: 261604352. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:20,767][1648981] Avg episode reward: [(0, '488.540')] [2024-06-15 17:38:21,126][1651669] Updated weights for policy 0, policy_version 510835 (0.0013) [2024-06-15 17:38:23,609][1651669] Updated weights for policy 0, policy_version 510880 (0.0010) [2024-06-15 17:38:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 45875.3, 300 sec: 47319.8). Total num frames: 1046347776. Throughput: 0: 11992.2. Samples: 261643264. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:25,767][1648981] Avg episode reward: [(0, '502.180')] [2024-06-15 17:38:26,471][1651669] Updated weights for policy 0, policy_version 510916 (0.0012) [2024-06-15 17:38:27,063][1651274] Signal inference workers to stop experience collection... (26800 times) [2024-06-15 17:38:27,117][1651669] InferenceWorker_p0-w0: stopping experience collection (26800 times) [2024-06-15 17:38:27,234][1651274] Signal inference workers to resume experience collection... (26800 times) [2024-06-15 17:38:27,242][1651669] InferenceWorker_p0-w0: resuming experience collection (26800 times) [2024-06-15 17:38:27,423][1651669] Updated weights for policy 0, policy_version 510976 (0.0011) [2024-06-15 17:38:29,756][1651669] Updated weights for policy 0, policy_version 511027 (0.0012) [2024-06-15 17:38:30,767][1648981] Fps is (10 sec: 52426.2, 60 sec: 49154.8, 300 sec: 47763.5). Total num frames: 1046675456. Throughput: 0: 12162.7. Samples: 261722112. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:30,767][1648981] Avg episode reward: [(0, '503.490')] [2024-06-15 17:38:31,274][1651669] Updated weights for policy 0, policy_version 511088 (0.0012) [2024-06-15 17:38:34,012][1651669] Updated weights for policy 0, policy_version 511121 (0.0014) [2024-06-15 17:38:34,733][1651669] Updated weights for policy 0, policy_version 511162 (0.0138) [2024-06-15 17:38:35,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1046872064. Throughput: 0: 12210.0. Samples: 261789696. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:35,767][1648981] Avg episode reward: [(0, '469.390')] [2024-06-15 17:38:38,655][1651669] Updated weights for policy 0, policy_version 511223 (0.0011) [2024-06-15 17:38:40,263][1651669] Updated weights for policy 0, policy_version 511280 (0.0011) [2024-06-15 17:38:40,766][1648981] Fps is (10 sec: 45876.8, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1047134208. Throughput: 0: 12299.4. Samples: 261829120. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:40,767][1648981] Avg episode reward: [(0, '463.420')] [2024-06-15 17:38:42,160][1651669] Updated weights for policy 0, policy_version 511360 (0.0110) [2024-06-15 17:38:45,685][1651669] Updated weights for policy 0, policy_version 511419 (0.0013) [2024-06-15 17:38:45,770][1648981] Fps is (10 sec: 52411.0, 60 sec: 49695.3, 300 sec: 47985.8). Total num frames: 1047396352. Throughput: 0: 12332.6. Samples: 261899776. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:45,770][1648981] Avg episode reward: [(0, '474.260')] [2024-06-15 17:38:49,280][1651669] Updated weights for policy 0, policy_version 511487 (0.0033) [2024-06-15 17:38:50,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48064.5, 300 sec: 47097.1). Total num frames: 1047527424. Throughput: 0: 12118.3. Samples: 261973504. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:50,767][1648981] Avg episode reward: [(0, '470.750')] [2024-06-15 17:38:51,969][1651669] Updated weights for policy 0, policy_version 511552 (0.0011) [2024-06-15 17:38:53,440][1651669] Updated weights for policy 0, policy_version 511614 (0.0012) [2024-06-15 17:38:55,780][1648981] Fps is (10 sec: 39283.6, 60 sec: 48595.3, 300 sec: 47540.5). Total num frames: 1047789568. Throughput: 0: 11920.5. Samples: 262000640. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:38:55,781][1648981] Avg episode reward: [(0, '477.340')] [2024-06-15 17:38:55,788][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000511616_1047789568.pth... [2024-06-15 17:38:55,970][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000506048_1036386304.pth [2024-06-15 17:38:56,974][1651669] Updated weights for policy 0, policy_version 511671 (0.0022) [2024-06-15 17:38:59,463][1651669] Updated weights for policy 0, policy_version 511728 (0.0012) [2024-06-15 17:39:00,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1048051712. Throughput: 0: 12071.8. Samples: 262077952. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:39:00,767][1648981] Avg episode reward: [(0, '473.090')] [2024-06-15 17:39:01,960][1651669] Updated weights for policy 0, policy_version 511760 (0.0010) [2024-06-15 17:39:02,936][1651669] Updated weights for policy 0, policy_version 511808 (0.0010) [2024-06-15 17:39:05,770][1648981] Fps is (10 sec: 52477.5, 60 sec: 48605.9, 300 sec: 47546.0). Total num frames: 1048313856. Throughput: 0: 12184.5. Samples: 262152704. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:39:05,771][1648981] Avg episode reward: [(0, '472.780')] [2024-06-15 17:39:06,681][1651669] Updated weights for policy 0, policy_version 511875 (0.0012) [2024-06-15 17:39:07,970][1651669] Updated weights for policy 0, policy_version 511935 (0.0012) [2024-06-15 17:39:08,615][1651274] Signal inference workers to stop experience collection... (26850 times) [2024-06-15 17:39:08,649][1651669] InferenceWorker_p0-w0: stopping experience collection (26850 times) [2024-06-15 17:39:08,870][1651274] Signal inference workers to resume experience collection... (26850 times) [2024-06-15 17:39:08,871][1651669] InferenceWorker_p0-w0: resuming experience collection (26850 times) [2024-06-15 17:39:10,090][1651669] Updated weights for policy 0, policy_version 511991 (0.0011) [2024-06-15 17:39:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1048576000. Throughput: 0: 12106.0. Samples: 262188032. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:39:10,767][1648981] Avg episode reward: [(0, '489.040')] [2024-06-15 17:39:13,640][1651669] Updated weights for policy 0, policy_version 512038 (0.0011) [2024-06-15 17:39:15,339][1651669] Updated weights for policy 0, policy_version 512112 (0.0129) [2024-06-15 17:39:15,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 50244.3, 300 sec: 47763.5). Total num frames: 1048838144. Throughput: 0: 11935.4. Samples: 262259200. Policy #0 lag: (min: 31.0, avg: 144.6, max: 287.0) [2024-06-15 17:39:15,767][1648981] Avg episode reward: [(0, '500.900')] [2024-06-15 17:39:19,027][1651669] Updated weights for policy 0, policy_version 512185 (0.0013) [2024-06-15 17:39:20,259][1651669] Updated weights for policy 0, policy_version 512227 (0.0012) [2024-06-15 17:39:20,770][1648981] Fps is (10 sec: 49133.2, 60 sec: 48602.6, 300 sec: 47651.8). Total num frames: 1049067520. Throughput: 0: 11877.4. Samples: 262324224. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:20,771][1648981] Avg episode reward: [(0, '497.200')] [2024-06-15 17:39:24,602][1651669] Updated weights for policy 0, policy_version 512288 (0.0015) [2024-06-15 17:39:25,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48605.9, 300 sec: 47655.9). Total num frames: 1049264128. Throughput: 0: 12049.0. Samples: 262371328. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:25,767][1648981] Avg episode reward: [(0, '469.240')] [2024-06-15 17:39:26,061][1651669] Updated weights for policy 0, policy_version 512352 (0.0013) [2024-06-15 17:39:28,775][1651669] Updated weights for policy 0, policy_version 512400 (0.0011) [2024-06-15 17:39:30,766][1648981] Fps is (10 sec: 42614.7, 60 sec: 46967.7, 300 sec: 47652.5). Total num frames: 1049493504. Throughput: 0: 11833.8. Samples: 262432256. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:30,767][1648981] Avg episode reward: [(0, '477.320')] [2024-06-15 17:39:32,092][1651669] Updated weights for policy 0, policy_version 512482 (0.0014) [2024-06-15 17:39:35,307][1651669] Updated weights for policy 0, policy_version 512528 (0.0011) [2024-06-15 17:39:35,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1049690112. Throughput: 0: 11878.4. Samples: 262508032. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:35,767][1648981] Avg episode reward: [(0, '468.090')] [2024-06-15 17:39:36,857][1651669] Updated weights for policy 0, policy_version 512593 (0.0025) [2024-06-15 17:39:39,618][1651669] Updated weights for policy 0, policy_version 512643 (0.0015) [2024-06-15 17:39:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 47513.5, 300 sec: 47877.2). Total num frames: 1049985024. Throughput: 0: 12041.2. Samples: 262542336. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:40,767][1648981] Avg episode reward: [(0, '489.870')] [2024-06-15 17:39:40,888][1651669] Updated weights for policy 0, policy_version 512700 (0.0017) [2024-06-15 17:39:43,035][1651669] Updated weights for policy 0, policy_version 512752 (0.0012) [2024-06-15 17:39:45,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 45877.6, 300 sec: 47097.0). Total num frames: 1050148864. Throughput: 0: 11889.7. Samples: 262612992. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:45,774][1648981] Avg episode reward: [(0, '488.160')] [2024-06-15 17:39:46,319][1651669] Updated weights for policy 0, policy_version 512785 (0.0011) [2024-06-15 17:39:47,605][1651669] Updated weights for policy 0, policy_version 512848 (0.0012) [2024-06-15 17:39:48,682][1651669] Updated weights for policy 0, policy_version 512895 (0.0014) [2024-06-15 17:39:50,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 1050443776. Throughput: 0: 11993.2. Samples: 262692352. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:50,767][1648981] Avg episode reward: [(0, '493.540')] [2024-06-15 17:39:50,833][1651274] Signal inference workers to stop experience collection... (26900 times) [2024-06-15 17:39:50,904][1651669] InferenceWorker_p0-w0: stopping experience collection (26900 times) [2024-06-15 17:39:51,118][1651274] Signal inference workers to resume experience collection... (26900 times) [2024-06-15 17:39:51,118][1651669] InferenceWorker_p0-w0: resuming experience collection (26900 times) [2024-06-15 17:39:51,557][1651669] Updated weights for policy 0, policy_version 512949 (0.0013) [2024-06-15 17:39:53,691][1651669] Updated weights for policy 0, policy_version 513013 (0.0013) [2024-06-15 17:39:55,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48070.1, 300 sec: 47430.4). Total num frames: 1050673152. Throughput: 0: 11867.0. Samples: 262722048. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:39:55,767][1648981] Avg episode reward: [(0, '496.210')] [2024-06-15 17:39:58,191][1651669] Updated weights for policy 0, policy_version 513088 (0.0014) [2024-06-15 17:39:59,319][1651669] Updated weights for policy 0, policy_version 513145 (0.0013) [2024-06-15 17:40:00,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 1050935296. Throughput: 0: 11867.0. Samples: 262793216. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:00,767][1648981] Avg episode reward: [(0, '497.700')] [2024-06-15 17:40:01,471][1651669] Updated weights for policy 0, policy_version 513184 (0.0011) [2024-06-15 17:40:03,295][1651669] Updated weights for policy 0, policy_version 513233 (0.0012) [2024-06-15 17:40:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48062.7, 300 sec: 47874.6). Total num frames: 1051197440. Throughput: 0: 12141.1. Samples: 262870528. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:05,767][1648981] Avg episode reward: [(0, '471.190')] [2024-06-15 17:40:07,775][1651669] Updated weights for policy 0, policy_version 513296 (0.0014) [2024-06-15 17:40:08,898][1651669] Updated weights for policy 0, policy_version 513348 (0.0013) [2024-06-15 17:40:10,110][1651669] Updated weights for policy 0, policy_version 513402 (0.0012) [2024-06-15 17:40:10,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 48059.5, 300 sec: 47985.7). Total num frames: 1051459584. Throughput: 0: 11969.4. Samples: 262909952. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:10,767][1648981] Avg episode reward: [(0, '463.850')] [2024-06-15 17:40:11,712][1651669] Updated weights for policy 0, policy_version 513442 (0.0012) [2024-06-15 17:40:13,788][1651669] Updated weights for policy 0, policy_version 513504 (0.0012) [2024-06-15 17:40:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 1051721728. Throughput: 0: 12276.6. Samples: 262984704. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:15,767][1648981] Avg episode reward: [(0, '454.720')] [2024-06-15 17:40:18,116][1651669] Updated weights for policy 0, policy_version 513538 (0.0014) [2024-06-15 17:40:19,829][1651669] Updated weights for policy 0, policy_version 513616 (0.0012) [2024-06-15 17:40:20,774][1648981] Fps is (10 sec: 49115.1, 60 sec: 48056.6, 300 sec: 47873.3). Total num frames: 1051951104. Throughput: 0: 12251.7. Samples: 263059456. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:20,775][1648981] Avg episode reward: [(0, '453.600')] [2024-06-15 17:40:22,104][1651669] Updated weights for policy 0, policy_version 513669 (0.0013) [2024-06-15 17:40:23,123][1651669] Updated weights for policy 0, policy_version 513723 (0.0013) [2024-06-15 17:40:24,780][1651669] Updated weights for policy 0, policy_version 513776 (0.0040) [2024-06-15 17:40:25,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 49698.2, 300 sec: 48430.1). Total num frames: 1052246016. Throughput: 0: 12253.9. Samples: 263093760. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:25,767][1648981] Avg episode reward: [(0, '465.320')] [2024-06-15 17:40:29,943][1651669] Updated weights for policy 0, policy_version 513826 (0.0013) [2024-06-15 17:40:30,778][1648981] Fps is (10 sec: 42581.3, 60 sec: 48050.3, 300 sec: 47650.5). Total num frames: 1052377088. Throughput: 0: 12455.4. Samples: 263173632. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:30,779][1648981] Avg episode reward: [(0, '459.590')] [2024-06-15 17:40:31,612][1651274] Signal inference workers to stop experience collection... (26950 times) [2024-06-15 17:40:31,648][1651669] InferenceWorker_p0-w0: stopping experience collection (26950 times) [2024-06-15 17:40:31,875][1651274] Signal inference workers to resume experience collection... (26950 times) [2024-06-15 17:40:31,876][1651669] InferenceWorker_p0-w0: resuming experience collection (26950 times) [2024-06-15 17:40:31,878][1651669] Updated weights for policy 0, policy_version 513904 (0.0012) [2024-06-15 17:40:34,169][1651669] Updated weights for policy 0, policy_version 513984 (0.0012) [2024-06-15 17:40:35,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 49698.1, 300 sec: 48319.0). Total num frames: 1052672000. Throughput: 0: 12071.8. Samples: 263235584. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:35,767][1648981] Avg episode reward: [(0, '452.340')] [2024-06-15 17:40:36,395][1651669] Updated weights for policy 0, policy_version 514044 (0.0012) [2024-06-15 17:40:40,747][1651669] Updated weights for policy 0, policy_version 514083 (0.0011) [2024-06-15 17:40:40,785][1648981] Fps is (10 sec: 45845.0, 60 sec: 47499.1, 300 sec: 47760.6). Total num frames: 1052835840. Throughput: 0: 12237.5. Samples: 263272960. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:40,785][1648981] Avg episode reward: [(0, '442.660')] [2024-06-15 17:40:42,345][1651669] Updated weights for policy 0, policy_version 514145 (0.0011) [2024-06-15 17:40:43,022][1651669] Updated weights for policy 0, policy_version 514176 (0.0010) [2024-06-15 17:40:44,798][1651669] Updated weights for policy 0, policy_version 514240 (0.0015) [2024-06-15 17:40:45,774][1648981] Fps is (10 sec: 49113.2, 60 sec: 50237.8, 300 sec: 48429.2). Total num frames: 1053163520. Throughput: 0: 12285.9. Samples: 263346176. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:45,775][1648981] Avg episode reward: [(0, '459.370')] [2024-06-15 17:40:46,748][1651669] Updated weights for policy 0, policy_version 514300 (0.0012) [2024-06-15 17:40:50,766][1648981] Fps is (10 sec: 45959.5, 60 sec: 47513.6, 300 sec: 47986.4). Total num frames: 1053294592. Throughput: 0: 12344.9. Samples: 263426048. Policy #0 lag: (min: 15.0, avg: 152.5, max: 271.0) [2024-06-15 17:40:50,767][1648981] Avg episode reward: [(0, '436.570')] [2024-06-15 17:40:52,837][1651669] Updated weights for policy 0, policy_version 514384 (0.0011) [2024-06-15 17:40:54,989][1651669] Updated weights for policy 0, policy_version 514434 (0.0011) [2024-06-15 17:40:55,767][1648981] Fps is (10 sec: 45910.4, 60 sec: 49151.9, 300 sec: 48207.8). Total num frames: 1053622272. Throughput: 0: 11980.8. Samples: 263449088. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:40:55,767][1648981] Avg episode reward: [(0, '438.870')] [2024-06-15 17:40:56,029][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000514480_1053655040.pth... [2024-06-15 17:40:56,091][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000508848_1042120704.pth [2024-06-15 17:40:56,268][1651669] Updated weights for policy 0, policy_version 514485 (0.0018) [2024-06-15 17:40:57,208][1651669] Updated weights for policy 0, policy_version 514512 (0.0011) [2024-06-15 17:40:58,018][1651669] Updated weights for policy 0, policy_version 514559 (0.0015) [2024-06-15 17:41:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48096.7). Total num frames: 1053818880. Throughput: 0: 11992.2. Samples: 263524352. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:00,767][1648981] Avg episode reward: [(0, '449.520')] [2024-06-15 17:41:03,689][1651669] Updated weights for policy 0, policy_version 514629 (0.0020) [2024-06-15 17:41:05,192][1651669] Updated weights for policy 0, policy_version 514688 (0.0012) [2024-06-15 17:41:05,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1054081024. Throughput: 0: 11834.9. Samples: 263591936. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:05,767][1648981] Avg episode reward: [(0, '452.050')] [2024-06-15 17:41:07,025][1651669] Updated weights for policy 0, policy_version 514749 (0.0011) [2024-06-15 17:41:09,514][1651669] Updated weights for policy 0, policy_version 514811 (0.0012) [2024-06-15 17:41:10,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 1054343168. Throughput: 0: 11832.9. Samples: 263626240. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:10,767][1648981] Avg episode reward: [(0, '446.570')] [2024-06-15 17:41:14,592][1651669] Updated weights for policy 0, policy_version 514864 (0.0012) [2024-06-15 17:41:15,203][1651274] Signal inference workers to stop experience collection... (27000 times) [2024-06-15 17:41:15,235][1651669] InferenceWorker_p0-w0: stopping experience collection (27000 times) [2024-06-15 17:41:15,391][1651274] Signal inference workers to resume experience collection... (27000 times) [2024-06-15 17:41:15,393][1651669] InferenceWorker_p0-w0: resuming experience collection (27000 times) [2024-06-15 17:41:15,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 46421.1, 300 sec: 47654.3). Total num frames: 1054507008. Throughput: 0: 11665.2. Samples: 263698432. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:15,768][1648981] Avg episode reward: [(0, '435.200')] [2024-06-15 17:41:15,955][1651669] Updated weights for policy 0, policy_version 514912 (0.0011) [2024-06-15 17:41:17,695][1651669] Updated weights for policy 0, policy_version 514963 (0.0013) [2024-06-15 17:41:18,589][1651669] Updated weights for policy 0, policy_version 515008 (0.0016) [2024-06-15 17:41:20,769][1648981] Fps is (10 sec: 45865.5, 60 sec: 47518.1, 300 sec: 48210.1). Total num frames: 1054801920. Throughput: 0: 11798.2. Samples: 263766528. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:20,769][1648981] Avg episode reward: [(0, '440.500')] [2024-06-15 17:41:21,263][1651669] Updated weights for policy 0, policy_version 515071 (0.0014) [2024-06-15 17:41:25,766][1648981] Fps is (10 sec: 49153.9, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1054998528. Throughput: 0: 11735.3. Samples: 263800832. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:25,767][1648981] Avg episode reward: [(0, '457.940')] [2024-06-15 17:41:26,608][1651669] Updated weights for policy 0, policy_version 515137 (0.0025) [2024-06-15 17:41:27,961][1651669] Updated weights for policy 0, policy_version 515199 (0.0011) [2024-06-15 17:41:29,707][1651669] Updated weights for policy 0, policy_version 515264 (0.0063) [2024-06-15 17:41:30,766][1648981] Fps is (10 sec: 45885.2, 60 sec: 48069.2, 300 sec: 48096.8). Total num frames: 1055260672. Throughput: 0: 11573.2. Samples: 263866880. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:30,767][1648981] Avg episode reward: [(0, '470.400')] [2024-06-15 17:41:32,700][1651669] Updated weights for policy 0, policy_version 515316 (0.0011) [2024-06-15 17:41:35,190][1651669] Updated weights for policy 0, policy_version 515360 (0.0012) [2024-06-15 17:41:35,767][1648981] Fps is (10 sec: 49146.4, 60 sec: 46966.6, 300 sec: 47656.1). Total num frames: 1055490048. Throughput: 0: 11764.3. Samples: 263955456. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:35,768][1648981] Avg episode reward: [(0, '464.910')] [2024-06-15 17:41:37,004][1651669] Updated weights for policy 0, policy_version 515428 (0.0015) [2024-06-15 17:41:38,732][1651669] Updated weights for policy 0, policy_version 515474 (0.0011) [2024-06-15 17:41:40,781][1648981] Fps is (10 sec: 52350.5, 60 sec: 49154.8, 300 sec: 48205.4). Total num frames: 1055784960. Throughput: 0: 12102.0. Samples: 263993856. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:40,782][1648981] Avg episode reward: [(0, '466.050')] [2024-06-15 17:41:41,857][1651669] Updated weights for policy 0, policy_version 515543 (0.0014) [2024-06-15 17:41:45,495][1651669] Updated weights for policy 0, policy_version 515588 (0.0011) [2024-06-15 17:41:45,807][1648981] Fps is (10 sec: 45693.4, 60 sec: 46395.8, 300 sec: 47534.8). Total num frames: 1055948800. Throughput: 0: 11947.2. Samples: 264062464. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:45,808][1648981] Avg episode reward: [(0, '449.630')] [2024-06-15 17:41:46,561][1651669] Updated weights for policy 0, policy_version 515640 (0.0011) [2024-06-15 17:41:47,578][1651669] Updated weights for policy 0, policy_version 515690 (0.0012) [2024-06-15 17:41:48,849][1651669] Updated weights for policy 0, policy_version 515716 (0.0017) [2024-06-15 17:41:49,905][1651669] Updated weights for policy 0, policy_version 515769 (0.0011) [2024-06-15 17:41:50,777][1648981] Fps is (10 sec: 52453.2, 60 sec: 50235.7, 300 sec: 48208.8). Total num frames: 1056309248. Throughput: 0: 12285.2. Samples: 264144896. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:50,777][1648981] Avg episode reward: [(0, '466.110')] [2024-06-15 17:41:51,951][1651669] Updated weights for policy 0, policy_version 515808 (0.0054) [2024-06-15 17:41:55,398][1651669] Updated weights for policy 0, policy_version 515856 (0.0022) [2024-06-15 17:41:55,767][1648981] Fps is (10 sec: 52642.7, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 1056473088. Throughput: 0: 12390.3. Samples: 264183808. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:41:55,767][1648981] Avg episode reward: [(0, '485.000')] [2024-06-15 17:41:57,746][1651669] Updated weights for policy 0, policy_version 515914 (0.0012) [2024-06-15 17:41:58,323][1651274] Signal inference workers to stop experience collection... (27050 times) [2024-06-15 17:41:58,366][1651669] InferenceWorker_p0-w0: stopping experience collection (27050 times) [2024-06-15 17:41:58,506][1651274] Signal inference workers to resume experience collection... (27050 times) [2024-06-15 17:41:58,507][1651669] InferenceWorker_p0-w0: resuming experience collection (27050 times) [2024-06-15 17:42:00,174][1651669] Updated weights for policy 0, policy_version 516007 (0.0014) [2024-06-15 17:42:00,766][1648981] Fps is (10 sec: 52483.3, 60 sec: 50244.4, 300 sec: 48430.0). Total num frames: 1056833536. Throughput: 0: 12367.8. Samples: 264254976. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:42:00,767][1648981] Avg episode reward: [(0, '460.240')] [2024-06-15 17:42:02,482][1651669] Updated weights for policy 0, policy_version 516065 (0.0011) [2024-06-15 17:42:05,766][1648981] Fps is (10 sec: 49153.2, 60 sec: 48059.7, 300 sec: 47986.9). Total num frames: 1056964608. Throughput: 0: 12652.7. Samples: 264335872. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:42:05,767][1648981] Avg episode reward: [(0, '482.370')] [2024-06-15 17:42:06,053][1651669] Updated weights for policy 0, policy_version 516116 (0.0013) [2024-06-15 17:42:06,990][1651669] Updated weights for policy 0, policy_version 516160 (0.0012) [2024-06-15 17:42:09,206][1651669] Updated weights for policy 0, policy_version 516214 (0.0025) [2024-06-15 17:42:10,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1057292288. Throughput: 0: 12617.9. Samples: 264368640. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:42:10,767][1648981] Avg episode reward: [(0, '484.320')] [2024-06-15 17:42:10,804][1651669] Updated weights for policy 0, policy_version 516262 (0.0033) [2024-06-15 17:42:12,310][1651669] Updated weights for policy 0, policy_version 516307 (0.0012) [2024-06-15 17:42:15,790][1648981] Fps is (10 sec: 52304.4, 60 sec: 49678.7, 300 sec: 48316.0). Total num frames: 1057488896. Throughput: 0: 12952.4. Samples: 264450048. Policy #0 lag: (min: 2.0, avg: 77.6, max: 258.0) [2024-06-15 17:42:15,791][1648981] Avg episode reward: [(0, '479.410')] [2024-06-15 17:42:16,382][1651669] Updated weights for policy 0, policy_version 516368 (0.0131) [2024-06-15 17:42:17,463][1651669] Updated weights for policy 0, policy_version 516416 (0.0011) [2024-06-15 17:42:19,400][1651669] Updated weights for policy 0, policy_version 516464 (0.0028) [2024-06-15 17:42:20,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49699.9, 300 sec: 48096.8). Total num frames: 1057783808. Throughput: 0: 12527.2. Samples: 264519168. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:20,767][1648981] Avg episode reward: [(0, '478.440')] [2024-06-15 17:42:21,199][1651669] Updated weights for policy 0, policy_version 516533 (0.0015) [2024-06-15 17:42:23,353][1651669] Updated weights for policy 0, policy_version 516596 (0.0046) [2024-06-15 17:42:25,781][1648981] Fps is (10 sec: 52479.0, 60 sec: 50232.3, 300 sec: 48428.3). Total num frames: 1058013184. Throughput: 0: 12413.3. Samples: 264552448. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:25,782][1648981] Avg episode reward: [(0, '476.130')] [2024-06-15 17:42:27,435][1651669] Updated weights for policy 0, policy_version 516640 (0.0112) [2024-06-15 17:42:29,159][1651669] Updated weights for policy 0, policy_version 516691 (0.0020) [2024-06-15 17:42:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50790.4, 300 sec: 48541.1). Total num frames: 1058308096. Throughput: 0: 12697.8. Samples: 264633344. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:30,768][1648981] Avg episode reward: [(0, '490.030')] [2024-06-15 17:42:30,785][1651669] Updated weights for policy 0, policy_version 516755 (0.0009) [2024-06-15 17:42:31,624][1651669] Updated weights for policy 0, policy_version 516798 (0.0011) [2024-06-15 17:42:33,732][1651669] Updated weights for policy 0, policy_version 516855 (0.0011) [2024-06-15 17:42:35,766][1648981] Fps is (10 sec: 52503.6, 60 sec: 50791.3, 300 sec: 48541.1). Total num frames: 1058537472. Throughput: 0: 12609.5. Samples: 264712192. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:35,767][1648981] Avg episode reward: [(0, '481.330')] [2024-06-15 17:42:38,138][1651669] Updated weights for policy 0, policy_version 516896 (0.0025) [2024-06-15 17:42:40,436][1651669] Updated weights for policy 0, policy_version 516976 (0.0013) [2024-06-15 17:42:40,594][1651274] Signal inference workers to stop experience collection... (27100 times) [2024-06-15 17:42:40,658][1651669] InferenceWorker_p0-w0: stopping experience collection (27100 times) [2024-06-15 17:42:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49710.5, 300 sec: 48652.1). Total num frames: 1058766848. Throughput: 0: 12754.6. Samples: 264757760. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:40,767][1648981] Avg episode reward: [(0, '477.800')] [2024-06-15 17:42:40,805][1651274] Signal inference workers to resume experience collection... (27100 times) [2024-06-15 17:42:40,807][1651669] InferenceWorker_p0-w0: resuming experience collection (27100 times) [2024-06-15 17:42:41,696][1651669] Updated weights for policy 0, policy_version 517027 (0.0012) [2024-06-15 17:42:43,498][1651669] Updated weights for policy 0, policy_version 517072 (0.0109) [2024-06-15 17:42:44,490][1651669] Updated weights for policy 0, policy_version 517112 (0.0012) [2024-06-15 17:42:45,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 51917.9, 300 sec: 48875.3). Total num frames: 1059061760. Throughput: 0: 12515.5. Samples: 264818176. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:45,767][1648981] Avg episode reward: [(0, '456.950')] [2024-06-15 17:42:48,232][1651669] Updated weights for policy 0, policy_version 517153 (0.0011) [2024-06-15 17:42:49,378][1651669] Updated weights for policy 0, policy_version 517216 (0.0011) [2024-06-15 17:42:50,778][1648981] Fps is (10 sec: 58913.3, 60 sec: 50789.2, 300 sec: 49094.5). Total num frames: 1059356672. Throughput: 0: 12591.9. Samples: 264902656. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:50,779][1648981] Avg episode reward: [(0, '477.360')] [2024-06-15 17:42:51,011][1651669] Updated weights for policy 0, policy_version 517281 (0.0012) [2024-06-15 17:42:54,090][1651669] Updated weights for policy 0, policy_version 517360 (0.0015) [2024-06-15 17:42:55,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 51882.7, 300 sec: 48874.3). Total num frames: 1059586048. Throughput: 0: 12606.5. Samples: 264935936. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:42:55,767][1648981] Avg episode reward: [(0, '465.550')] [2024-06-15 17:42:55,788][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000517376_1059586048.pth... [2024-06-15 17:42:55,851][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000511616_1047789568.pth [2024-06-15 17:42:59,223][1651669] Updated weights for policy 0, policy_version 517424 (0.0021) [2024-06-15 17:43:00,602][1651669] Updated weights for policy 0, policy_version 517488 (0.0010) [2024-06-15 17:43:00,772][1648981] Fps is (10 sec: 45904.8, 60 sec: 49693.7, 300 sec: 48874.1). Total num frames: 1059815424. Throughput: 0: 12714.2. Samples: 265021952. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:00,772][1648981] Avg episode reward: [(0, '447.190')] [2024-06-15 17:43:02,309][1651669] Updated weights for policy 0, policy_version 517563 (0.0011) [2024-06-15 17:43:04,491][1651669] Updated weights for policy 0, policy_version 517604 (0.0012) [2024-06-15 17:43:05,781][1648981] Fps is (10 sec: 52353.3, 60 sec: 52416.0, 300 sec: 48871.9). Total num frames: 1060110336. Throughput: 0: 12477.4. Samples: 265080832. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:05,781][1648981] Avg episode reward: [(0, '437.810')] [2024-06-15 17:43:09,203][1651669] Updated weights for policy 0, policy_version 517635 (0.0014) [2024-06-15 17:43:10,767][1648981] Fps is (10 sec: 42620.4, 60 sec: 49151.9, 300 sec: 48874.3). Total num frames: 1060241408. Throughput: 0: 12804.0. Samples: 265128448. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:10,767][1648981] Avg episode reward: [(0, '439.770')] [2024-06-15 17:43:10,836][1651669] Updated weights for policy 0, policy_version 517712 (0.0014) [2024-06-15 17:43:12,377][1651669] Updated weights for policy 0, policy_version 517776 (0.0012) [2024-06-15 17:43:14,994][1651669] Updated weights for policy 0, policy_version 517856 (0.0013) [2024-06-15 17:43:15,809][1648981] Fps is (10 sec: 52281.0, 60 sec: 52412.1, 300 sec: 49089.3). Total num frames: 1060634624. Throughput: 0: 12446.8. Samples: 265193984. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:15,823][1648981] Avg episode reward: [(0, '425.500')] [2024-06-15 17:43:19,973][1651669] Updated weights for policy 0, policy_version 517905 (0.0013) [2024-06-15 17:43:20,317][1651274] Signal inference workers to stop experience collection... (27150 times) [2024-06-15 17:43:20,373][1651669] InferenceWorker_p0-w0: stopping experience collection (27150 times) [2024-06-15 17:43:20,511][1651274] Signal inference workers to resume experience collection... (27150 times) [2024-06-15 17:43:20,512][1651669] InferenceWorker_p0-w0: resuming experience collection (27150 times) [2024-06-15 17:43:20,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 49698.2, 300 sec: 48874.3). Total num frames: 1060765696. Throughput: 0: 12481.4. Samples: 265273856. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:20,767][1648981] Avg episode reward: [(0, '427.130')] [2024-06-15 17:43:21,729][1651669] Updated weights for policy 0, policy_version 517973 (0.0017) [2024-06-15 17:43:23,372][1651669] Updated weights for policy 0, policy_version 518048 (0.0011) [2024-06-15 17:43:24,683][1651669] Updated weights for policy 0, policy_version 518086 (0.0012) [2024-06-15 17:43:25,706][1651669] Updated weights for policy 0, policy_version 518144 (0.0012) [2024-06-15 17:43:25,782][1648981] Fps is (10 sec: 52571.3, 60 sec: 52427.4, 300 sec: 49093.9). Total num frames: 1061158912. Throughput: 0: 12181.3. Samples: 265306112. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:25,783][1648981] Avg episode reward: [(0, '413.580')] [2024-06-15 17:43:30,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 1061224448. Throughput: 0: 12811.4. Samples: 265394688. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:30,767][1648981] Avg episode reward: [(0, '398.480')] [2024-06-15 17:43:31,459][1651669] Updated weights for policy 0, policy_version 518208 (0.0014) [2024-06-15 17:43:32,553][1651669] Updated weights for policy 0, policy_version 518256 (0.0010) [2024-06-15 17:43:33,733][1651669] Updated weights for policy 0, policy_version 518306 (0.0010) [2024-06-15 17:43:35,322][1651669] Updated weights for policy 0, policy_version 518384 (0.0039) [2024-06-15 17:43:35,766][1648981] Fps is (10 sec: 52511.6, 60 sec: 52428.8, 300 sec: 49318.6). Total num frames: 1061683200. Throughput: 0: 12416.4. Samples: 265461248. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:35,767][1648981] Avg episode reward: [(0, '438.230')] [2024-06-15 17:43:40,787][1648981] Fps is (10 sec: 45778.7, 60 sec: 48588.8, 300 sec: 48427.1). Total num frames: 1061683200. Throughput: 0: 12566.6. Samples: 265501696. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:40,788][1648981] Avg episode reward: [(0, '414.940')] [2024-06-15 17:43:41,443][1651669] Updated weights for policy 0, policy_version 518417 (0.0012) [2024-06-15 17:43:42,906][1651669] Updated weights for policy 0, policy_version 518480 (0.0015) [2024-06-15 17:43:44,465][1651669] Updated weights for policy 0, policy_version 518544 (0.0042) [2024-06-15 17:43:45,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 50790.6, 300 sec: 49429.7). Total num frames: 1062109184. Throughput: 0: 12221.2. Samples: 265571840. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:45,767][1648981] Avg episode reward: [(0, '418.330')] [2024-06-15 17:43:45,837][1651669] Updated weights for policy 0, policy_version 518614 (0.0093) [2024-06-15 17:43:50,770][1648981] Fps is (10 sec: 52519.2, 60 sec: 47519.9, 300 sec: 48875.8). Total num frames: 1062207488. Throughput: 0: 12655.1. Samples: 265650176. Policy #0 lag: (min: 31.0, avg: 140.1, max: 287.0) [2024-06-15 17:43:50,771][1648981] Avg episode reward: [(0, '397.590')] [2024-06-15 17:43:52,028][1651669] Updated weights for policy 0, policy_version 518660 (0.0012) [2024-06-15 17:43:53,655][1651669] Updated weights for policy 0, policy_version 518736 (0.0012) [2024-06-15 17:43:55,499][1651669] Updated weights for policy 0, policy_version 518802 (0.0013) [2024-06-15 17:43:55,720][1651274] Signal inference workers to stop experience collection... (27200 times) [2024-06-15 17:43:55,777][1648981] Fps is (10 sec: 42554.1, 60 sec: 49143.7, 300 sec: 49094.7). Total num frames: 1062535168. Throughput: 0: 12376.2. Samples: 265685504. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:43:55,777][1648981] Avg episode reward: [(0, '414.490')] [2024-06-15 17:43:55,815][1651669] InferenceWorker_p0-w0: stopping experience collection (27200 times) [2024-06-15 17:43:55,915][1651274] Signal inference workers to resume experience collection... (27200 times) [2024-06-15 17:43:55,922][1651669] InferenceWorker_p0-w0: resuming experience collection (27200 times) [2024-06-15 17:43:56,443][1651669] Updated weights for policy 0, policy_version 518864 (0.0120) [2024-06-15 17:43:57,423][1651669] Updated weights for policy 0, policy_version 518907 (0.0047) [2024-06-15 17:44:00,766][1648981] Fps is (10 sec: 52449.1, 60 sec: 48610.2, 300 sec: 48874.9). Total num frames: 1062731776. Throughput: 0: 12482.0. Samples: 265755136. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:00,767][1648981] Avg episode reward: [(0, '419.110')] [2024-06-15 17:44:04,378][1651669] Updated weights for policy 0, policy_version 518978 (0.0161) [2024-06-15 17:44:05,766][1648981] Fps is (10 sec: 42642.5, 60 sec: 47525.2, 300 sec: 48763.2). Total num frames: 1062961152. Throughput: 0: 12299.4. Samples: 265827328. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:05,767][1648981] Avg episode reward: [(0, '436.350')] [2024-06-15 17:44:05,986][1651669] Updated weights for policy 0, policy_version 519040 (0.0014) [2024-06-15 17:44:07,713][1651669] Updated weights for policy 0, policy_version 519107 (0.0013) [2024-06-15 17:44:08,898][1651669] Updated weights for policy 0, policy_version 519168 (0.0100) [2024-06-15 17:44:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.4, 300 sec: 48874.3). Total num frames: 1063256064. Throughput: 0: 12144.3. Samples: 265852416. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:10,767][1648981] Avg episode reward: [(0, '439.430')] [2024-06-15 17:44:15,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45908.0, 300 sec: 48541.7). Total num frames: 1063387136. Throughput: 0: 12037.7. Samples: 265936384. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:15,767][1648981] Avg episode reward: [(0, '433.330')] [2024-06-15 17:44:16,149][1651669] Updated weights for policy 0, policy_version 519248 (0.0012) [2024-06-15 17:44:18,711][1651669] Updated weights for policy 0, policy_version 519344 (0.0013) [2024-06-15 17:44:20,031][1651669] Updated weights for policy 0, policy_version 519377 (0.0012) [2024-06-15 17:44:20,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 49096.5). Total num frames: 1063747584. Throughput: 0: 11605.3. Samples: 265983488. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:20,767][1648981] Avg episode reward: [(0, '442.210')] [2024-06-15 17:44:25,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 43702.2, 300 sec: 48430.0). Total num frames: 1063780352. Throughput: 0: 11610.8. Samples: 266023936. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:25,767][1648981] Avg episode reward: [(0, '444.000')] [2024-06-15 17:44:26,036][1651669] Updated weights for policy 0, policy_version 519440 (0.0013) [2024-06-15 17:44:27,811][1651669] Updated weights for policy 0, policy_version 519520 (0.0015) [2024-06-15 17:44:29,619][1651669] Updated weights for policy 0, policy_version 519584 (0.0013) [2024-06-15 17:44:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1064173568. Throughput: 0: 11605.3. Samples: 266094080. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:30,767][1648981] Avg episode reward: [(0, '442.670')] [2024-06-15 17:44:31,399][1651669] Updated weights for policy 0, policy_version 519640 (0.0102) [2024-06-15 17:44:32,247][1651669] Updated weights for policy 0, policy_version 519678 (0.0014) [2024-06-15 17:44:35,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 43690.6, 300 sec: 48541.1). Total num frames: 1064304640. Throughput: 0: 11503.9. Samples: 266167808. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:35,767][1648981] Avg episode reward: [(0, '413.390')] [2024-06-15 17:44:37,752][1651274] Signal inference workers to stop experience collection... (27250 times) [2024-06-15 17:44:37,790][1651669] InferenceWorker_p0-w0: stopping experience collection (27250 times) [2024-06-15 17:44:38,018][1651274] Signal inference workers to resume experience collection... (27250 times) [2024-06-15 17:44:38,018][1651669] InferenceWorker_p0-w0: resuming experience collection (27250 times) [2024-06-15 17:44:39,042][1651669] Updated weights for policy 0, policy_version 519766 (0.0013) [2024-06-15 17:44:40,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48622.9, 300 sec: 48985.4). Total num frames: 1064599552. Throughput: 0: 11505.6. Samples: 266203136. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:40,767][1648981] Avg episode reward: [(0, '425.270')] [2024-06-15 17:44:41,241][1651669] Updated weights for policy 0, policy_version 519856 (0.0035) [2024-06-15 17:44:43,846][1651669] Updated weights for policy 0, policy_version 519905 (0.0013) [2024-06-15 17:44:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 45329.0, 300 sec: 48763.2). Total num frames: 1064828928. Throughput: 0: 11138.8. Samples: 266256384. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:45,767][1648981] Avg episode reward: [(0, '454.660')] [2024-06-15 17:44:49,524][1651669] Updated weights for policy 0, policy_version 519968 (0.0012) [2024-06-15 17:44:50,767][1648981] Fps is (10 sec: 39319.0, 60 sec: 46423.8, 300 sec: 48541.0). Total num frames: 1064992768. Throughput: 0: 11229.7. Samples: 266332672. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:50,768][1648981] Avg episode reward: [(0, '476.300')] [2024-06-15 17:44:51,304][1651669] Updated weights for policy 0, policy_version 520040 (0.0012) [2024-06-15 17:44:52,484][1651669] Updated weights for policy 0, policy_version 520097 (0.0013) [2024-06-15 17:44:53,754][1651669] Updated weights for policy 0, policy_version 520131 (0.0032) [2024-06-15 17:44:55,028][1651669] Updated weights for policy 0, policy_version 520182 (0.0011) [2024-06-15 17:44:55,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 46972.6, 300 sec: 48873.7). Total num frames: 1065353216. Throughput: 0: 11433.7. Samples: 266366976. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:44:55,771][1648981] Avg episode reward: [(0, '500.230')] [2024-06-15 17:44:55,776][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000520192_1065353216.pth... [2024-06-15 17:44:55,819][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000514480_1053655040.pth [2024-06-15 17:45:00,160][1651669] Updated weights for policy 0, policy_version 520224 (0.0146) [2024-06-15 17:45:00,766][1648981] Fps is (10 sec: 45878.1, 60 sec: 45329.0, 300 sec: 48318.9). Total num frames: 1065451520. Throughput: 0: 11457.4. Samples: 266451968. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:45:00,767][1648981] Avg episode reward: [(0, '485.020')] [2024-06-15 17:45:01,556][1651669] Updated weights for policy 0, policy_version 520272 (0.0012) [2024-06-15 17:45:03,131][1651669] Updated weights for policy 0, policy_version 520336 (0.0012) [2024-06-15 17:45:04,709][1651669] Updated weights for policy 0, policy_version 520402 (0.0012) [2024-06-15 17:45:05,766][1648981] Fps is (10 sec: 52448.8, 60 sec: 48605.8, 300 sec: 48874.3). Total num frames: 1065877504. Throughput: 0: 11741.9. Samples: 266511872. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:45:05,767][1648981] Avg episode reward: [(0, '486.690')] [2024-06-15 17:45:10,405][1651669] Updated weights for policy 0, policy_version 520450 (0.0010) [2024-06-15 17:45:10,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 44236.8, 300 sec: 48096.8). Total num frames: 1065910272. Throughput: 0: 11707.7. Samples: 266550784. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:45:10,767][1648981] Avg episode reward: [(0, '458.090')] [2024-06-15 17:45:11,606][1651669] Updated weights for policy 0, policy_version 520505 (0.0140) [2024-06-15 17:45:13,372][1651669] Updated weights for policy 0, policy_version 520563 (0.0012) [2024-06-15 17:45:14,223][1651274] Signal inference workers to stop experience collection... (27300 times) [2024-06-15 17:45:14,292][1651669] InferenceWorker_p0-w0: stopping experience collection (27300 times) [2024-06-15 17:45:14,430][1651274] Signal inference workers to resume experience collection... (27300 times) [2024-06-15 17:45:14,433][1651669] InferenceWorker_p0-w0: resuming experience collection (27300 times) [2024-06-15 17:45:14,435][1651669] Updated weights for policy 0, policy_version 520626 (0.0012) [2024-06-15 17:45:15,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 49151.9, 300 sec: 48764.5). Total num frames: 1066336256. Throughput: 0: 11776.0. Samples: 266624000. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:45:15,767][1648981] Avg episode reward: [(0, '459.740')] [2024-06-15 17:45:16,125][1651669] Updated weights for policy 0, policy_version 520690 (0.0012) [2024-06-15 17:45:20,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 44236.7, 300 sec: 47985.6). Total num frames: 1066401792. Throughput: 0: 11855.6. Samples: 266701312. Policy #0 lag: (min: 15.0, avg: 75.2, max: 271.0) [2024-06-15 17:45:20,767][1648981] Avg episode reward: [(0, '472.690')] [2024-06-15 17:45:21,549][1651669] Updated weights for policy 0, policy_version 520736 (0.0012) [2024-06-15 17:45:24,131][1651669] Updated weights for policy 0, policy_version 520802 (0.0012) [2024-06-15 17:45:25,701][1651669] Updated weights for policy 0, policy_version 520893 (0.0087) [2024-06-15 17:45:25,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 50244.3, 300 sec: 48876.3). Total num frames: 1066795008. Throughput: 0: 11912.5. Samples: 266739200. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:25,767][1648981] Avg episode reward: [(0, '464.740')] [2024-06-15 17:45:27,296][1651669] Updated weights for policy 0, policy_version 520950 (0.0016) [2024-06-15 17:45:30,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 45875.2, 300 sec: 48318.9). Total num frames: 1066926080. Throughput: 0: 12083.2. Samples: 266800128. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:30,767][1648981] Avg episode reward: [(0, '444.420')] [2024-06-15 17:45:32,794][1651669] Updated weights for policy 0, policy_version 521008 (0.0011) [2024-06-15 17:45:35,723][1651669] Updated weights for policy 0, policy_version 521088 (0.0011) [2024-06-15 17:45:35,774][1648981] Fps is (10 sec: 39290.2, 60 sec: 48053.4, 300 sec: 48653.9). Total num frames: 1067188224. Throughput: 0: 11956.1. Samples: 266870784. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:35,775][1648981] Avg episode reward: [(0, '438.580')] [2024-06-15 17:45:36,836][1651669] Updated weights for policy 0, policy_version 521152 (0.0011) [2024-06-15 17:45:40,767][1648981] Fps is (10 sec: 52423.3, 60 sec: 47512.8, 300 sec: 48431.1). Total num frames: 1067450368. Throughput: 0: 11879.1. Samples: 266901504. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:40,768][1648981] Avg episode reward: [(0, '442.100')] [2024-06-15 17:45:42,935][1651669] Updated weights for policy 0, policy_version 521218 (0.0015) [2024-06-15 17:45:44,260][1651669] Updated weights for policy 0, policy_version 521277 (0.0173) [2024-06-15 17:45:45,790][1648981] Fps is (10 sec: 42532.5, 60 sec: 46403.2, 300 sec: 48537.2). Total num frames: 1067614208. Throughput: 0: 11735.8. Samples: 266980352. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:45,791][1648981] Avg episode reward: [(0, '401.430')] [2024-06-15 17:45:46,511][1651669] Updated weights for policy 0, policy_version 521331 (0.0020) [2024-06-15 17:45:47,544][1651669] Updated weights for policy 0, policy_version 521392 (0.0012) [2024-06-15 17:45:49,431][1651669] Updated weights for policy 0, policy_version 521444 (0.0011) [2024-06-15 17:45:50,786][1648981] Fps is (10 sec: 52330.5, 60 sec: 49682.3, 300 sec: 48648.9). Total num frames: 1067974656. Throughput: 0: 12021.0. Samples: 267053056. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:50,787][1648981] Avg episode reward: [(0, '387.700')] [2024-06-15 17:45:53,709][1651669] Updated weights for policy 0, policy_version 521492 (0.0032) [2024-06-15 17:45:55,767][1648981] Fps is (10 sec: 49266.3, 60 sec: 45877.9, 300 sec: 48430.0). Total num frames: 1068105728. Throughput: 0: 12140.0. Samples: 267097088. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:45:55,767][1648981] Avg episode reward: [(0, '375.710')] [2024-06-15 17:45:56,130][1651669] Updated weights for policy 0, policy_version 521538 (0.0010) [2024-06-15 17:45:57,253][1651274] Signal inference workers to stop experience collection... (27350 times) [2024-06-15 17:45:57,307][1651669] InferenceWorker_p0-w0: stopping experience collection (27350 times) [2024-06-15 17:45:57,426][1651274] Signal inference workers to resume experience collection... (27350 times) [2024-06-15 17:45:57,427][1651669] InferenceWorker_p0-w0: resuming experience collection (27350 times) [2024-06-15 17:45:57,908][1651669] Updated weights for policy 0, policy_version 521621 (0.0100) [2024-06-15 17:45:58,694][1651669] Updated weights for policy 0, policy_version 521664 (0.0011) [2024-06-15 17:46:00,767][1648981] Fps is (10 sec: 52532.5, 60 sec: 50790.4, 300 sec: 48874.3). Total num frames: 1068498944. Throughput: 0: 11969.4. Samples: 267162624. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:00,767][1648981] Avg episode reward: [(0, '385.660')] [2024-06-15 17:46:04,981][1651669] Updated weights for policy 0, policy_version 521764 (0.0021) [2024-06-15 17:46:05,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1068630016. Throughput: 0: 11855.7. Samples: 267234816. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:05,767][1648981] Avg episode reward: [(0, '391.730')] [2024-06-15 17:46:07,255][1651669] Updated weights for policy 0, policy_version 521797 (0.0012) [2024-06-15 17:46:09,319][1651669] Updated weights for policy 0, policy_version 521888 (0.0128) [2024-06-15 17:46:10,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 49698.1, 300 sec: 48763.3). Total num frames: 1068892160. Throughput: 0: 11764.6. Samples: 267268608. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:10,767][1648981] Avg episode reward: [(0, '396.140')] [2024-06-15 17:46:11,206][1651669] Updated weights for policy 0, policy_version 521952 (0.0013) [2024-06-15 17:46:15,585][1651669] Updated weights for policy 0, policy_version 522003 (0.0013) [2024-06-15 17:46:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45329.2, 300 sec: 48319.3). Total num frames: 1069056000. Throughput: 0: 11958.0. Samples: 267338240. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:15,767][1648981] Avg episode reward: [(0, '390.660')] [2024-06-15 17:46:18,551][1651669] Updated weights for policy 0, policy_version 522064 (0.0011) [2024-06-15 17:46:20,217][1651669] Updated weights for policy 0, policy_version 522132 (0.0014) [2024-06-15 17:46:20,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.1, 300 sec: 48652.1). Total num frames: 1069350912. Throughput: 0: 12051.2. Samples: 267412992. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:20,767][1648981] Avg episode reward: [(0, '412.010')] [2024-06-15 17:46:22,161][1651669] Updated weights for policy 0, policy_version 522208 (0.0034) [2024-06-15 17:46:25,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1069547520. Throughput: 0: 12129.0. Samples: 267447296. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:25,767][1648981] Avg episode reward: [(0, '436.330')] [2024-06-15 17:46:26,431][1651669] Updated weights for policy 0, policy_version 522243 (0.0017) [2024-06-15 17:46:28,842][1651669] Updated weights for policy 0, policy_version 522305 (0.0013) [2024-06-15 17:46:30,434][1651669] Updated weights for policy 0, policy_version 522371 (0.0013) [2024-06-15 17:46:30,779][1648981] Fps is (10 sec: 49088.3, 60 sec: 48595.3, 300 sec: 48650.2). Total num frames: 1069842432. Throughput: 0: 12108.8. Samples: 267525120. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:30,780][1648981] Avg episode reward: [(0, '439.000')] [2024-06-15 17:46:31,520][1651669] Updated weights for policy 0, policy_version 522427 (0.0013) [2024-06-15 17:46:33,176][1651669] Updated weights for policy 0, policy_version 522480 (0.0023) [2024-06-15 17:46:35,792][1648981] Fps is (10 sec: 52295.5, 60 sec: 48045.7, 300 sec: 48428.3). Total num frames: 1070071808. Throughput: 0: 12127.2. Samples: 267598848. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:35,792][1648981] Avg episode reward: [(0, '443.450')] [2024-06-15 17:46:37,340][1651669] Updated weights for policy 0, policy_version 522523 (0.0012) [2024-06-15 17:46:39,411][1651669] Updated weights for policy 0, policy_version 522564 (0.0014) [2024-06-15 17:46:39,685][1651274] Signal inference workers to stop experience collection... (27400 times) [2024-06-15 17:46:39,745][1651669] InferenceWorker_p0-w0: stopping experience collection (27400 times) [2024-06-15 17:46:39,953][1651274] Signal inference workers to resume experience collection... (27400 times) [2024-06-15 17:46:39,962][1651669] InferenceWorker_p0-w0: resuming experience collection (27400 times) [2024-06-15 17:46:40,699][1651669] Updated weights for policy 0, policy_version 522624 (0.0011) [2024-06-15 17:46:40,776][1648981] Fps is (10 sec: 49167.3, 60 sec: 48052.7, 300 sec: 48768.4). Total num frames: 1070333952. Throughput: 0: 11944.1. Samples: 267634688. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:40,777][1648981] Avg episode reward: [(0, '428.050')] [2024-06-15 17:46:42,103][1651669] Updated weights for policy 0, policy_version 522682 (0.0012) [2024-06-15 17:46:44,240][1651669] Updated weights for policy 0, policy_version 522740 (0.0023) [2024-06-15 17:46:45,775][1648981] Fps is (10 sec: 52516.1, 60 sec: 49710.2, 300 sec: 48430.2). Total num frames: 1070596096. Throughput: 0: 12058.1. Samples: 267705344. Policy #0 lag: (min: 15.0, avg: 79.1, max: 271.0) [2024-06-15 17:46:45,776][1648981] Avg episode reward: [(0, '431.930')] [2024-06-15 17:46:48,351][1651669] Updated weights for policy 0, policy_version 522784 (0.0013) [2024-06-15 17:46:50,538][1651669] Updated weights for policy 0, policy_version 522835 (0.0015) [2024-06-15 17:46:50,768][1648981] Fps is (10 sec: 45913.0, 60 sec: 46981.7, 300 sec: 48540.8). Total num frames: 1070792704. Throughput: 0: 12071.4. Samples: 267778048. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:46:50,769][1648981] Avg episode reward: [(0, '441.380')] [2024-06-15 17:46:52,149][1651669] Updated weights for policy 0, policy_version 522912 (0.0012) [2024-06-15 17:46:54,604][1651669] Updated weights for policy 0, policy_version 522945 (0.0019) [2024-06-15 17:46:55,689][1651669] Updated weights for policy 0, policy_version 522998 (0.0012) [2024-06-15 17:46:55,766][1648981] Fps is (10 sec: 49196.1, 60 sec: 49698.4, 300 sec: 48318.9). Total num frames: 1071087616. Throughput: 0: 12003.6. Samples: 267808768. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:46:55,767][1648981] Avg episode reward: [(0, '437.770')] [2024-06-15 17:46:55,860][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000523008_1071120384.pth... [2024-06-15 17:46:55,895][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000517376_1059586048.pth [2024-06-15 17:46:59,461][1651669] Updated weights for policy 0, policy_version 523029 (0.0012) [2024-06-15 17:47:00,770][1648981] Fps is (10 sec: 45865.1, 60 sec: 45872.3, 300 sec: 48429.4). Total num frames: 1071251456. Throughput: 0: 12241.4. Samples: 267889152. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:00,771][1648981] Avg episode reward: [(0, '427.370')] [2024-06-15 17:47:01,118][1651669] Updated weights for policy 0, policy_version 523075 (0.0011) [2024-06-15 17:47:03,686][1651669] Updated weights for policy 0, policy_version 523184 (0.0085) [2024-06-15 17:47:05,770][1648981] Fps is (10 sec: 42582.0, 60 sec: 48056.7, 300 sec: 48207.2). Total num frames: 1071513600. Throughput: 0: 12013.9. Samples: 267953664. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:05,771][1648981] Avg episode reward: [(0, '444.260')] [2024-06-15 17:47:06,290][1651669] Updated weights for policy 0, policy_version 523219 (0.0011) [2024-06-15 17:47:07,284][1651669] Updated weights for policy 0, policy_version 523264 (0.0014) [2024-06-15 17:47:10,774][1648981] Fps is (10 sec: 42581.5, 60 sec: 46415.3, 300 sec: 48099.4). Total num frames: 1071677440. Throughput: 0: 12035.6. Samples: 267988992. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:10,775][1648981] Avg episode reward: [(0, '451.800')] [2024-06-15 17:47:12,055][1651669] Updated weights for policy 0, policy_version 523351 (0.0015) [2024-06-15 17:47:14,015][1651669] Updated weights for policy 0, policy_version 523408 (0.0039) [2024-06-15 17:47:15,228][1651669] Updated weights for policy 0, policy_version 523452 (0.0010) [2024-06-15 17:47:15,767][1648981] Fps is (10 sec: 52447.6, 60 sec: 49698.0, 300 sec: 48318.9). Total num frames: 1072037888. Throughput: 0: 11813.5. Samples: 268056576. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:15,767][1648981] Avg episode reward: [(0, '440.470')] [2024-06-15 17:47:17,345][1651669] Updated weights for policy 0, policy_version 523491 (0.0013) [2024-06-15 17:47:20,798][1648981] Fps is (10 sec: 49034.5, 60 sec: 46942.6, 300 sec: 47982.8). Total num frames: 1072168960. Throughput: 0: 11990.5. Samples: 268138496. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:20,799][1648981] Avg episode reward: [(0, '430.420')] [2024-06-15 17:47:20,958][1651669] Updated weights for policy 0, policy_version 523536 (0.0011) [2024-06-15 17:47:22,385][1651274] Signal inference workers to stop experience collection... (27450 times) [2024-06-15 17:47:22,467][1651669] InferenceWorker_p0-w0: stopping experience collection (27450 times) [2024-06-15 17:47:22,586][1651274] Signal inference workers to resume experience collection... (27450 times) [2024-06-15 17:47:22,587][1651669] InferenceWorker_p0-w0: resuming experience collection (27450 times) [2024-06-15 17:47:22,589][1651669] Updated weights for policy 0, policy_version 523600 (0.0011) [2024-06-15 17:47:24,506][1651669] Updated weights for policy 0, policy_version 523664 (0.0011) [2024-06-15 17:47:25,267][1651669] Updated weights for policy 0, policy_version 523702 (0.0012) [2024-06-15 17:47:25,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.2, 300 sec: 48318.9). Total num frames: 1072562176. Throughput: 0: 11892.4. Samples: 268169728. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:25,767][1648981] Avg episode reward: [(0, '419.870')] [2024-06-15 17:47:28,201][1651669] Updated weights for policy 0, policy_version 523745 (0.0015) [2024-06-15 17:47:30,766][1648981] Fps is (10 sec: 52595.8, 60 sec: 47523.9, 300 sec: 47985.7). Total num frames: 1072693248. Throughput: 0: 12051.4. Samples: 268247552. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:30,767][1648981] Avg episode reward: [(0, '428.950')] [2024-06-15 17:47:31,775][1651669] Updated weights for policy 0, policy_version 523792 (0.0011) [2024-06-15 17:47:33,470][1651669] Updated weights for policy 0, policy_version 523872 (0.0121) [2024-06-15 17:47:34,975][1651669] Updated weights for policy 0, policy_version 523907 (0.0036) [2024-06-15 17:47:35,769][1648981] Fps is (10 sec: 45861.6, 60 sec: 49170.4, 300 sec: 48318.4). Total num frames: 1073020928. Throughput: 0: 12060.1. Samples: 268320768. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:35,770][1648981] Avg episode reward: [(0, '450.740')] [2024-06-15 17:47:36,132][1651669] Updated weights for policy 0, policy_version 523965 (0.0012) [2024-06-15 17:47:38,545][1651669] Updated weights for policy 0, policy_version 524029 (0.0014) [2024-06-15 17:47:40,771][1648981] Fps is (10 sec: 52403.5, 60 sec: 48063.8, 300 sec: 47984.9). Total num frames: 1073217536. Throughput: 0: 12104.6. Samples: 268353536. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:40,772][1648981] Avg episode reward: [(0, '448.760')] [2024-06-15 17:47:43,502][1651669] Updated weights for policy 0, policy_version 524081 (0.0012) [2024-06-15 17:47:44,503][1651669] Updated weights for policy 0, policy_version 524144 (0.0011) [2024-06-15 17:47:45,766][1648981] Fps is (10 sec: 45889.2, 60 sec: 48066.9, 300 sec: 47876.5). Total num frames: 1073479680. Throughput: 0: 12095.6. Samples: 268433408. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:45,767][1648981] Avg episode reward: [(0, '468.800')] [2024-06-15 17:47:46,044][1651669] Updated weights for policy 0, policy_version 524180 (0.0011) [2024-06-15 17:47:46,799][1651669] Updated weights for policy 0, policy_version 524218 (0.0012) [2024-06-15 17:47:48,359][1651669] Updated weights for policy 0, policy_version 524264 (0.0029) [2024-06-15 17:47:50,766][1648981] Fps is (10 sec: 52454.1, 60 sec: 49153.3, 300 sec: 47985.7). Total num frames: 1073741824. Throughput: 0: 12334.5. Samples: 268508672. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:50,767][1648981] Avg episode reward: [(0, '458.010')] [2024-06-15 17:47:52,319][1651669] Updated weights for policy 0, policy_version 524292 (0.0039) [2024-06-15 17:47:53,673][1651669] Updated weights for policy 0, policy_version 524347 (0.0010) [2024-06-15 17:47:55,390][1651669] Updated weights for policy 0, policy_version 524400 (0.0012) [2024-06-15 17:47:55,778][1648981] Fps is (10 sec: 52366.8, 60 sec: 48596.2, 300 sec: 48095.7). Total num frames: 1074003968. Throughput: 0: 12412.0. Samples: 268547584. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:47:55,779][1648981] Avg episode reward: [(0, '453.840')] [2024-06-15 17:47:56,538][1651669] Updated weights for policy 0, policy_version 524450 (0.0012) [2024-06-15 17:47:57,777][1651669] Updated weights for policy 0, policy_version 524496 (0.0013) [2024-06-15 17:47:58,720][1651669] Updated weights for policy 0, policy_version 524539 (0.0012) [2024-06-15 17:48:00,768][1648981] Fps is (10 sec: 52422.7, 60 sec: 50246.5, 300 sec: 47987.9). Total num frames: 1074266112. Throughput: 0: 12594.9. Samples: 268623360. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:48:00,768][1648981] Avg episode reward: [(0, '430.760')] [2024-06-15 17:48:03,619][1651669] Updated weights for policy 0, policy_version 524597 (0.0012) [2024-06-15 17:48:04,533][1651274] Signal inference workers to stop experience collection... (27500 times) [2024-06-15 17:48:04,605][1651669] InferenceWorker_p0-w0: stopping experience collection (27500 times) [2024-06-15 17:48:04,810][1651274] Signal inference workers to resume experience collection... (27500 times) [2024-06-15 17:48:04,811][1651669] InferenceWorker_p0-w0: resuming experience collection (27500 times) [2024-06-15 17:48:05,343][1651669] Updated weights for policy 0, policy_version 524642 (0.0097) [2024-06-15 17:48:05,766][1648981] Fps is (10 sec: 49210.3, 60 sec: 49701.3, 300 sec: 48318.9). Total num frames: 1074495488. Throughput: 0: 12558.6. Samples: 268703232. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:48:05,767][1648981] Avg episode reward: [(0, '431.350')] [2024-06-15 17:48:06,851][1651669] Updated weights for policy 0, policy_version 524707 (0.0013) [2024-06-15 17:48:07,659][1651669] Updated weights for policy 0, policy_version 524739 (0.0012) [2024-06-15 17:48:10,770][1648981] Fps is (10 sec: 52415.0, 60 sec: 51886.1, 300 sec: 47992.0). Total num frames: 1074790400. Throughput: 0: 12537.3. Samples: 268733952. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:48:10,771][1648981] Avg episode reward: [(0, '407.810')] [2024-06-15 17:48:13,323][1651669] Updated weights for policy 0, policy_version 524816 (0.0012) [2024-06-15 17:48:14,265][1651669] Updated weights for policy 0, policy_version 524855 (0.0012) [2024-06-15 17:48:15,771][1648981] Fps is (10 sec: 42580.3, 60 sec: 48056.5, 300 sec: 47985.0). Total num frames: 1074921472. Throughput: 0: 12559.9. Samples: 268812800. Policy #0 lag: (min: 5.0, avg: 111.3, max: 261.0) [2024-06-15 17:48:15,771][1648981] Avg episode reward: [(0, '416.330')] [2024-06-15 17:48:16,671][1651669] Updated weights for policy 0, policy_version 524899 (0.0050) [2024-06-15 17:48:19,172][1651669] Updated weights for policy 0, policy_version 524993 (0.0012) [2024-06-15 17:48:20,359][1651669] Updated weights for policy 0, policy_version 525056 (0.0014) [2024-06-15 17:48:20,778][1648981] Fps is (10 sec: 52387.1, 60 sec: 52446.3, 300 sec: 47986.3). Total num frames: 1075314688. Throughput: 0: 12149.1. Samples: 268867584. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:20,779][1648981] Avg episode reward: [(0, '414.170')] [2024-06-15 17:48:25,766][1648981] Fps is (10 sec: 45894.3, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1075380224. Throughput: 0: 12346.2. Samples: 268909056. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:25,767][1648981] Avg episode reward: [(0, '426.390')] [2024-06-15 17:48:25,866][1651669] Updated weights for policy 0, policy_version 525104 (0.0012) [2024-06-15 17:48:27,736][1651669] Updated weights for policy 0, policy_version 525136 (0.0010) [2024-06-15 17:48:28,821][1651669] Updated weights for policy 0, policy_version 525189 (0.0012) [2024-06-15 17:48:30,352][1651669] Updated weights for policy 0, policy_version 525255 (0.0011) [2024-06-15 17:48:30,766][1648981] Fps is (10 sec: 42648.7, 60 sec: 50790.4, 300 sec: 47652.5). Total num frames: 1075740672. Throughput: 0: 12344.9. Samples: 268988928. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:30,767][1648981] Avg episode reward: [(0, '435.260')] [2024-06-15 17:48:31,542][1651669] Updated weights for policy 0, policy_version 525306 (0.0129) [2024-06-15 17:48:35,775][1648981] Fps is (10 sec: 52388.9, 60 sec: 48056.0, 300 sec: 48210.0). Total num frames: 1075904512. Throughput: 0: 12263.2. Samples: 269060608. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:35,779][1648981] Avg episode reward: [(0, '450.140')] [2024-06-15 17:48:35,957][1651669] Updated weights for policy 0, policy_version 525360 (0.0012) [2024-06-15 17:48:38,199][1651669] Updated weights for policy 0, policy_version 525396 (0.0011) [2024-06-15 17:48:39,220][1651669] Updated weights for policy 0, policy_version 525437 (0.0013) [2024-06-15 17:48:40,443][1651669] Updated weights for policy 0, policy_version 525475 (0.0011) [2024-06-15 17:48:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49702.1, 300 sec: 47763.5). Total num frames: 1076199424. Throughput: 0: 12188.8. Samples: 269095936. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:40,767][1648981] Avg episode reward: [(0, '424.190')] [2024-06-15 17:48:42,065][1651274] Signal inference workers to stop experience collection... (27550 times) [2024-06-15 17:48:42,108][1651669] InferenceWorker_p0-w0: stopping experience collection (27550 times) [2024-06-15 17:48:42,301][1651274] Signal inference workers to resume experience collection... (27550 times) [2024-06-15 17:48:42,302][1651669] InferenceWorker_p0-w0: resuming experience collection (27550 times) [2024-06-15 17:48:42,303][1651669] Updated weights for policy 0, policy_version 525552 (0.0143) [2024-06-15 17:48:45,766][1648981] Fps is (10 sec: 45910.7, 60 sec: 48059.8, 300 sec: 47986.3). Total num frames: 1076363264. Throughput: 0: 11981.1. Samples: 269162496. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:45,767][1648981] Avg episode reward: [(0, '414.020')] [2024-06-15 17:48:46,098][1651669] Updated weights for policy 0, policy_version 525586 (0.0016) [2024-06-15 17:48:49,163][1651669] Updated weights for policy 0, policy_version 525652 (0.0012) [2024-06-15 17:48:49,773][1651669] Updated weights for policy 0, policy_version 525688 (0.0025) [2024-06-15 17:48:50,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48605.8, 300 sec: 47876.3). Total num frames: 1076658176. Throughput: 0: 11969.4. Samples: 269241856. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:50,767][1648981] Avg episode reward: [(0, '410.490')] [2024-06-15 17:48:51,277][1651669] Updated weights for policy 0, policy_version 525744 (0.0015) [2024-06-15 17:48:52,143][1651669] Updated weights for policy 0, policy_version 525779 (0.0011) [2024-06-15 17:48:52,818][1651669] Updated weights for policy 0, policy_version 525821 (0.0012) [2024-06-15 17:48:55,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 48069.0, 300 sec: 47985.6). Total num frames: 1076887552. Throughput: 0: 12118.3. Samples: 269279232. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:48:55,767][1648981] Avg episode reward: [(0, '420.370')] [2024-06-15 17:48:56,606][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000525856_1076953088.pth... [2024-06-15 17:48:56,752][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000520192_1065353216.pth [2024-06-15 17:48:57,290][1651669] Updated weights for policy 0, policy_version 525887 (0.0141) [2024-06-15 17:48:59,769][1651669] Updated weights for policy 0, policy_version 525943 (0.0086) [2024-06-15 17:49:00,773][1648981] Fps is (10 sec: 49118.4, 60 sec: 48055.1, 300 sec: 48095.6). Total num frames: 1077149696. Throughput: 0: 11968.7. Samples: 269351424. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:00,774][1648981] Avg episode reward: [(0, '459.690')] [2024-06-15 17:49:02,173][1651669] Updated weights for policy 0, policy_version 525987 (0.0014) [2024-06-15 17:49:03,879][1651669] Updated weights for policy 0, policy_version 526064 (0.0012) [2024-06-15 17:49:05,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1077411840. Throughput: 0: 12279.8. Samples: 269420032. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:05,767][1648981] Avg episode reward: [(0, '478.040')] [2024-06-15 17:49:08,152][1651669] Updated weights for policy 0, policy_version 526113 (0.0011) [2024-06-15 17:49:10,766][1648981] Fps is (10 sec: 39348.7, 60 sec: 45878.1, 300 sec: 47985.7). Total num frames: 1077542912. Throughput: 0: 12060.5. Samples: 269451776. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:10,767][1648981] Avg episode reward: [(0, '467.530')] [2024-06-15 17:49:11,222][1651669] Updated weights for policy 0, policy_version 526176 (0.0028) [2024-06-15 17:49:13,407][1651669] Updated weights for policy 0, policy_version 526240 (0.0092) [2024-06-15 17:49:15,393][1651669] Updated weights for policy 0, policy_version 526306 (0.0011) [2024-06-15 17:49:15,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49701.7, 300 sec: 47985.7). Total num frames: 1077903360. Throughput: 0: 11832.9. Samples: 269521408. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:15,767][1648981] Avg episode reward: [(0, '467.890')] [2024-06-15 17:49:19,300][1651669] Updated weights for policy 0, policy_version 526340 (0.0011) [2024-06-15 17:49:20,503][1651669] Updated weights for policy 0, policy_version 526398 (0.0014) [2024-06-15 17:49:20,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 45884.3, 300 sec: 48430.0). Total num frames: 1078067200. Throughput: 0: 11834.9. Samples: 269593088. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:20,767][1648981] Avg episode reward: [(0, '436.520')] [2024-06-15 17:49:22,365][1651669] Updated weights for policy 0, policy_version 526448 (0.0016) [2024-06-15 17:49:24,086][1651669] Updated weights for policy 0, policy_version 526496 (0.0079) [2024-06-15 17:49:25,200][1651274] Signal inference workers to stop experience collection... (27600 times) [2024-06-15 17:49:25,239][1651669] InferenceWorker_p0-w0: stopping experience collection (27600 times) [2024-06-15 17:49:25,514][1651274] Signal inference workers to resume experience collection... (27600 times) [2024-06-15 17:49:25,514][1651669] InferenceWorker_p0-w0: resuming experience collection (27600 times) [2024-06-15 17:49:25,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 50244.3, 300 sec: 48207.8). Total num frames: 1078394880. Throughput: 0: 11935.3. Samples: 269633024. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:25,767][1648981] Avg episode reward: [(0, '445.090')] [2024-06-15 17:49:25,875][1651669] Updated weights for policy 0, policy_version 526576 (0.0012) [2024-06-15 17:49:30,019][1651669] Updated weights for policy 0, policy_version 526597 (0.0015) [2024-06-15 17:49:30,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 46421.3, 300 sec: 48207.9). Total num frames: 1078525952. Throughput: 0: 12117.3. Samples: 269707776. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:30,767][1648981] Avg episode reward: [(0, '443.300')] [2024-06-15 17:49:32,551][1651669] Updated weights for policy 0, policy_version 526672 (0.0012) [2024-06-15 17:49:33,431][1651669] Updated weights for policy 0, policy_version 526720 (0.0015) [2024-06-15 17:49:35,553][1651669] Updated weights for policy 0, policy_version 526769 (0.0014) [2024-06-15 17:49:35,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49158.2, 300 sec: 48318.9). Total num frames: 1078853632. Throughput: 0: 11958.0. Samples: 269779968. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:35,767][1648981] Avg episode reward: [(0, '439.870')] [2024-06-15 17:49:36,832][1651669] Updated weights for policy 0, policy_version 526832 (0.0112) [2024-06-15 17:49:40,547][1651669] Updated weights for policy 0, policy_version 526877 (0.0072) [2024-06-15 17:49:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.5, 300 sec: 48207.8). Total num frames: 1079050240. Throughput: 0: 12037.8. Samples: 269820928. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:40,767][1648981] Avg episode reward: [(0, '448.980')] [2024-06-15 17:49:43,165][1651669] Updated weights for policy 0, policy_version 526930 (0.0013) [2024-06-15 17:49:44,712][1651669] Updated weights for policy 0, policy_version 526983 (0.0012) [2024-06-15 17:49:45,773][1648981] Fps is (10 sec: 49122.5, 60 sec: 49693.1, 300 sec: 48651.3). Total num frames: 1079345152. Throughput: 0: 12083.4. Samples: 269895168. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:45,773][1648981] Avg episode reward: [(0, '455.640')] [2024-06-15 17:49:46,992][1651669] Updated weights for policy 0, policy_version 527088 (0.0014) [2024-06-15 17:49:50,790][1648981] Fps is (10 sec: 45766.4, 60 sec: 47494.8, 300 sec: 47982.4). Total num frames: 1079508992. Throughput: 0: 12270.1. Samples: 269972480. Policy #0 lag: (min: 104.0, avg: 187.0, max: 317.0) [2024-06-15 17:49:50,791][1648981] Avg episode reward: [(0, '464.530')] [2024-06-15 17:49:51,809][1651669] Updated weights for policy 0, policy_version 527140 (0.0020) [2024-06-15 17:49:54,423][1651669] Updated weights for policy 0, policy_version 527207 (0.0012) [2024-06-15 17:49:55,766][1648981] Fps is (10 sec: 42624.3, 60 sec: 48060.0, 300 sec: 48541.1). Total num frames: 1079771136. Throughput: 0: 12367.7. Samples: 270008320. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:49:55,767][1648981] Avg episode reward: [(0, '453.560')] [2024-06-15 17:49:56,487][1651669] Updated weights for policy 0, policy_version 527268 (0.0012) [2024-06-15 17:49:57,809][1651669] Updated weights for policy 0, policy_version 527330 (0.0012) [2024-06-15 17:50:00,766][1648981] Fps is (10 sec: 52554.1, 60 sec: 48065.3, 300 sec: 47985.7). Total num frames: 1080033280. Throughput: 0: 12299.4. Samples: 270074880. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:00,767][1648981] Avg episode reward: [(0, '435.200')] [2024-06-15 17:50:01,832][1651669] Updated weights for policy 0, policy_version 527378 (0.0012) [2024-06-15 17:50:04,199][1651669] Updated weights for policy 0, policy_version 527429 (0.0012) [2024-06-15 17:50:05,328][1651669] Updated weights for policy 0, policy_version 527488 (0.0013) [2024-06-15 17:50:05,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 1080295424. Throughput: 0: 12470.0. Samples: 270154240. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:05,767][1648981] Avg episode reward: [(0, '421.190')] [2024-06-15 17:50:06,935][1651274] Signal inference workers to stop experience collection... (27650 times) [2024-06-15 17:50:07,028][1651669] InferenceWorker_p0-w0: stopping experience collection (27650 times) [2024-06-15 17:50:07,300][1651274] Signal inference workers to resume experience collection... (27650 times) [2024-06-15 17:50:07,300][1651669] InferenceWorker_p0-w0: resuming experience collection (27650 times) [2024-06-15 17:50:07,302][1651669] Updated weights for policy 0, policy_version 527552 (0.0097) [2024-06-15 17:50:08,444][1651669] Updated weights for policy 0, policy_version 527609 (0.0011) [2024-06-15 17:50:10,767][1648981] Fps is (10 sec: 52423.0, 60 sec: 50243.4, 300 sec: 48207.7). Total num frames: 1080557568. Throughput: 0: 12185.3. Samples: 270181376. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:10,768][1648981] Avg episode reward: [(0, '421.580')] [2024-06-15 17:50:12,855][1651669] Updated weights for policy 0, policy_version 527651 (0.0016) [2024-06-15 17:50:15,782][1648981] Fps is (10 sec: 39259.0, 60 sec: 46408.9, 300 sec: 48427.4). Total num frames: 1080688640. Throughput: 0: 12351.9. Samples: 270263808. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:15,783][1648981] Avg episode reward: [(0, '459.610')] [2024-06-15 17:50:15,972][1651669] Updated weights for policy 0, policy_version 527696 (0.0012) [2024-06-15 17:50:17,607][1651669] Updated weights for policy 0, policy_version 527761 (0.0011) [2024-06-15 17:50:19,371][1651669] Updated weights for policy 0, policy_version 527826 (0.0013) [2024-06-15 17:50:20,339][1651669] Updated weights for policy 0, policy_version 527872 (0.0011) [2024-06-15 17:50:20,766][1648981] Fps is (10 sec: 52434.4, 60 sec: 50244.1, 300 sec: 48430.0). Total num frames: 1081081856. Throughput: 0: 12140.1. Samples: 270326272. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:20,767][1648981] Avg episode reward: [(0, '470.750')] [2024-06-15 17:50:25,766][1648981] Fps is (10 sec: 52512.7, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1081212928. Throughput: 0: 12174.2. Samples: 270368768. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:25,767][1648981] Avg episode reward: [(0, '474.420')] [2024-06-15 17:50:27,149][1651669] Updated weights for policy 0, policy_version 527938 (0.0013) [2024-06-15 17:50:28,897][1651669] Updated weights for policy 0, policy_version 528032 (0.0013) [2024-06-15 17:50:29,972][1651669] Updated weights for policy 0, policy_version 528069 (0.0016) [2024-06-15 17:50:30,780][1648981] Fps is (10 sec: 45812.1, 60 sec: 50232.7, 300 sec: 48651.2). Total num frames: 1081540608. Throughput: 0: 12047.0. Samples: 270437376. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:30,781][1648981] Avg episode reward: [(0, '473.970')] [2024-06-15 17:50:31,275][1651669] Updated weights for policy 0, policy_version 528122 (0.0010) [2024-06-15 17:50:34,540][1651669] Updated weights for policy 0, policy_version 528176 (0.0013) [2024-06-15 17:50:35,797][1648981] Fps is (10 sec: 52269.3, 60 sec: 48035.4, 300 sec: 48425.2). Total num frames: 1081737216. Throughput: 0: 12070.0. Samples: 270515712. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:35,797][1648981] Avg episode reward: [(0, '469.970')] [2024-06-15 17:50:37,781][1651669] Updated weights for policy 0, policy_version 528224 (0.0143) [2024-06-15 17:50:39,954][1651669] Updated weights for policy 0, policy_version 528320 (0.0015) [2024-06-15 17:50:40,768][1648981] Fps is (10 sec: 52493.1, 60 sec: 50243.0, 300 sec: 48989.0). Total num frames: 1082064896. Throughput: 0: 12105.5. Samples: 270553088. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:40,769][1648981] Avg episode reward: [(0, '462.160')] [2024-06-15 17:50:41,187][1651669] Updated weights for policy 0, policy_version 528375 (0.0158) [2024-06-15 17:50:45,303][1651669] Updated weights for policy 0, policy_version 528441 (0.0012) [2024-06-15 17:50:45,766][1648981] Fps is (10 sec: 52589.3, 60 sec: 48610.8, 300 sec: 48433.3). Total num frames: 1082261504. Throughput: 0: 12379.0. Samples: 270631936. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:45,767][1648981] Avg episode reward: [(0, '449.070')] [2024-06-15 17:50:48,132][1651274] Signal inference workers to stop experience collection... (27700 times) [2024-06-15 17:50:48,184][1651669] InferenceWorker_p0-w0: stopping experience collection (27700 times) [2024-06-15 17:50:48,381][1651274] Signal inference workers to resume experience collection... (27700 times) [2024-06-15 17:50:48,381][1651669] InferenceWorker_p0-w0: resuming experience collection (27700 times) [2024-06-15 17:50:48,802][1651669] Updated weights for policy 0, policy_version 528496 (0.0024) [2024-06-15 17:50:50,766][1648981] Fps is (10 sec: 42604.9, 60 sec: 49717.9, 300 sec: 48763.3). Total num frames: 1082490880. Throughput: 0: 11946.7. Samples: 270691840. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:50,767][1648981] Avg episode reward: [(0, '445.570')] [2024-06-15 17:50:50,890][1651669] Updated weights for policy 0, policy_version 528576 (0.0127) [2024-06-15 17:50:51,881][1651669] Updated weights for policy 0, policy_version 528628 (0.0014) [2024-06-15 17:50:55,767][1648981] Fps is (10 sec: 39318.8, 60 sec: 48059.2, 300 sec: 47985.6). Total num frames: 1082654720. Throughput: 0: 12254.0. Samples: 270732800. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:50:55,768][1648981] Avg episode reward: [(0, '459.250')] [2024-06-15 17:50:56,105][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000528672_1082720256.pth... [2024-06-15 17:50:56,213][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000523008_1071120384.pth [2024-06-15 17:50:56,218][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000528672_1082720256.pth [2024-06-15 17:50:56,668][1651669] Updated weights for policy 0, policy_version 528704 (0.0013) [2024-06-15 17:50:59,441][1651669] Updated weights for policy 0, policy_version 528775 (0.0015) [2024-06-15 17:51:00,767][1648981] Fps is (10 sec: 55704.3, 60 sec: 50244.0, 300 sec: 48874.3). Total num frames: 1083047936. Throughput: 0: 12326.4. Samples: 270818304. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:51:00,767][1648981] Avg episode reward: [(0, '448.700')] [2024-06-15 17:51:00,883][1651669] Updated weights for policy 0, policy_version 528836 (0.0012) [2024-06-15 17:51:02,135][1651669] Updated weights for policy 0, policy_version 528890 (0.0015) [2024-06-15 17:51:05,766][1648981] Fps is (10 sec: 52432.6, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1083179008. Throughput: 0: 12731.7. Samples: 270899200. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:51:05,767][1648981] Avg episode reward: [(0, '453.970')] [2024-06-15 17:51:06,757][1651669] Updated weights for policy 0, policy_version 528944 (0.0011) [2024-06-15 17:51:08,133][1651669] Updated weights for policy 0, policy_version 528983 (0.0011) [2024-06-15 17:51:09,464][1651669] Updated weights for policy 0, policy_version 529042 (0.0012) [2024-06-15 17:51:10,768][1648981] Fps is (10 sec: 55700.4, 60 sec: 50790.3, 300 sec: 49318.4). Total num frames: 1083604992. Throughput: 0: 12640.4. Samples: 270937600. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:51:10,768][1648981] Avg episode reward: [(0, '443.120')] [2024-06-15 17:51:11,141][1651669] Updated weights for policy 0, policy_version 529136 (0.0122) [2024-06-15 17:51:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50257.7, 300 sec: 48652.1). Total num frames: 1083703296. Throughput: 0: 12656.0. Samples: 271006720. Policy #0 lag: (min: 3.0, avg: 90.5, max: 259.0) [2024-06-15 17:51:15,767][1648981] Avg episode reward: [(0, '409.740')] [2024-06-15 17:51:17,449][1651669] Updated weights for policy 0, policy_version 529185 (0.0011) [2024-06-15 17:51:18,520][1651669] Updated weights for policy 0, policy_version 529235 (0.0010) [2024-06-15 17:51:19,968][1651669] Updated weights for policy 0, policy_version 529312 (0.0011) [2024-06-15 17:51:20,570][1651274] Signal inference workers to stop experience collection... (27750 times) [2024-06-15 17:51:20,670][1651669] InferenceWorker_p0-w0: stopping experience collection (27750 times) [2024-06-15 17:51:20,753][1651274] Signal inference workers to resume experience collection... (27750 times) [2024-06-15 17:51:20,753][1651669] InferenceWorker_p0-w0: resuming experience collection (27750 times) [2024-06-15 17:51:20,766][1648981] Fps is (10 sec: 52435.0, 60 sec: 50790.4, 300 sec: 49429.7). Total num frames: 1084129280. Throughput: 0: 12603.8. Samples: 271082496. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:20,767][1648981] Avg episode reward: [(0, '413.820')] [2024-06-15 17:51:21,082][1651669] Updated weights for policy 0, policy_version 529376 (0.0023) [2024-06-15 17:51:25,774][1648981] Fps is (10 sec: 52390.5, 60 sec: 50238.2, 300 sec: 48764.2). Total num frames: 1084227584. Throughput: 0: 12639.1. Samples: 271121920. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:25,776][1648981] Avg episode reward: [(0, '426.990')] [2024-06-15 17:51:28,023][1651669] Updated weights for policy 0, policy_version 529428 (0.0014) [2024-06-15 17:51:29,202][1651669] Updated weights for policy 0, policy_version 529488 (0.0095) [2024-06-15 17:51:30,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 49709.6, 300 sec: 48989.6). Total num frames: 1084522496. Throughput: 0: 12561.1. Samples: 271197184. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:30,767][1648981] Avg episode reward: [(0, '441.260')] [2024-06-15 17:51:31,080][1651669] Updated weights for policy 0, policy_version 529568 (0.0115) [2024-06-15 17:51:32,544][1651669] Updated weights for policy 0, policy_version 529648 (0.0107) [2024-06-15 17:51:35,775][1648981] Fps is (10 sec: 52421.3, 60 sec: 50262.5, 300 sec: 48874.5). Total num frames: 1084751872. Throughput: 0: 12683.8. Samples: 271262720. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:35,776][1648981] Avg episode reward: [(0, '439.530')] [2024-06-15 17:51:39,502][1651669] Updated weights for policy 0, policy_version 529682 (0.0011) [2024-06-15 17:51:40,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 47514.6, 300 sec: 48542.5). Total num frames: 1084915712. Throughput: 0: 12663.6. Samples: 271302656. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:40,768][1648981] Avg episode reward: [(0, '456.350')] [2024-06-15 17:51:40,995][1651669] Updated weights for policy 0, policy_version 529760 (0.0108) [2024-06-15 17:51:41,992][1651669] Updated weights for policy 0, policy_version 529799 (0.0012) [2024-06-15 17:51:43,284][1651669] Updated weights for policy 0, policy_version 529862 (0.0021) [2024-06-15 17:51:44,480][1651669] Updated weights for policy 0, policy_version 529912 (0.0014) [2024-06-15 17:51:45,798][1648981] Fps is (10 sec: 52308.3, 60 sec: 50217.7, 300 sec: 49091.4). Total num frames: 1085276160. Throughput: 0: 12143.0. Samples: 271365120. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:45,799][1648981] Avg episode reward: [(0, '465.850')] [2024-06-15 17:51:50,525][1651669] Updated weights for policy 0, policy_version 529982 (0.0015) [2024-06-15 17:51:50,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1085407232. Throughput: 0: 12197.0. Samples: 271448064. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:50,767][1648981] Avg episode reward: [(0, '489.910')] [2024-06-15 17:51:52,404][1651669] Updated weights for policy 0, policy_version 530048 (0.0012) [2024-06-15 17:51:54,448][1651669] Updated weights for policy 0, policy_version 530133 (0.0013) [2024-06-15 17:51:55,302][1651669] Updated weights for policy 0, policy_version 530176 (0.0013) [2024-06-15 17:51:55,774][1648981] Fps is (10 sec: 52553.9, 60 sec: 52422.4, 300 sec: 49317.9). Total num frames: 1085800448. Throughput: 0: 11956.2. Samples: 271475712. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:51:55,775][1648981] Avg episode reward: [(0, '476.720')] [2024-06-15 17:52:00,767][1648981] Fps is (10 sec: 39319.8, 60 sec: 45875.1, 300 sec: 48430.5). Total num frames: 1085800448. Throughput: 0: 12140.0. Samples: 271553024. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:00,767][1648981] Avg episode reward: [(0, '518.770')] [2024-06-15 17:52:02,305][1651274] Signal inference workers to stop experience collection... (27800 times) [2024-06-15 17:52:02,378][1651669] InferenceWorker_p0-w0: stopping experience collection (27800 times) [2024-06-15 17:52:02,550][1651274] Signal inference workers to resume experience collection... (27800 times) [2024-06-15 17:52:02,559][1651669] InferenceWorker_p0-w0: resuming experience collection (27800 times) [2024-06-15 17:52:03,317][1651669] Updated weights for policy 0, policy_version 530273 (0.0012) [2024-06-15 17:52:04,870][1651669] Updated weights for policy 0, policy_version 530337 (0.0052) [2024-06-15 17:52:05,770][1648981] Fps is (10 sec: 39337.9, 60 sec: 50241.0, 300 sec: 49208.2). Total num frames: 1086193664. Throughput: 0: 11831.9. Samples: 271614976. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:05,771][1648981] Avg episode reward: [(0, '513.100')] [2024-06-15 17:52:06,488][1651669] Updated weights for policy 0, policy_version 530401 (0.0012) [2024-06-15 17:52:06,952][1651669] Updated weights for policy 0, policy_version 530431 (0.0011) [2024-06-15 17:52:10,766][1648981] Fps is (10 sec: 52430.7, 60 sec: 45329.9, 300 sec: 48430.0). Total num frames: 1086324736. Throughput: 0: 11766.5. Samples: 271651328. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:10,767][1648981] Avg episode reward: [(0, '540.050')] [2024-06-15 17:52:13,229][1651669] Updated weights for policy 0, policy_version 530496 (0.0012) [2024-06-15 17:52:14,633][1651669] Updated weights for policy 0, policy_version 530560 (0.0014) [2024-06-15 17:52:15,766][1648981] Fps is (10 sec: 49170.6, 60 sec: 49698.1, 300 sec: 49212.8). Total num frames: 1086685184. Throughput: 0: 11810.1. Samples: 271728640. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:15,767][1648981] Avg episode reward: [(0, '539.080')] [2024-06-15 17:52:16,072][1651669] Updated weights for policy 0, policy_version 530624 (0.0013) [2024-06-15 17:52:17,453][1651669] Updated weights for policy 0, policy_version 530686 (0.0012) [2024-06-15 17:52:20,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45329.1, 300 sec: 48430.0). Total num frames: 1086849024. Throughput: 0: 12017.3. Samples: 271803392. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:20,767][1648981] Avg episode reward: [(0, '553.650')] [2024-06-15 17:52:23,717][1651669] Updated weights for policy 0, policy_version 530740 (0.0013) [2024-06-15 17:52:25,701][1651669] Updated weights for policy 0, policy_version 530819 (0.0137) [2024-06-15 17:52:25,776][1648981] Fps is (10 sec: 42558.2, 60 sec: 48058.0, 300 sec: 48872.7). Total num frames: 1087111168. Throughput: 0: 12069.4. Samples: 271845888. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:25,776][1648981] Avg episode reward: [(0, '559.810')] [2024-06-15 17:52:26,195][1651274] Saving new best policy, reward=559.810! [2024-06-15 17:52:27,472][1651669] Updated weights for policy 0, policy_version 530899 (0.0013) [2024-06-15 17:52:30,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 47513.4, 300 sec: 48652.6). Total num frames: 1087373312. Throughput: 0: 11909.5. Samples: 271900672. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:30,767][1648981] Avg episode reward: [(0, '584.770')] [2024-06-15 17:52:30,768][1651274] Saving new best policy, reward=584.770! [2024-06-15 17:52:33,354][1651669] Updated weights for policy 0, policy_version 530953 (0.0011) [2024-06-15 17:52:34,269][1651669] Updated weights for policy 0, policy_version 531008 (0.0011) [2024-06-15 17:52:35,766][1648981] Fps is (10 sec: 49199.0, 60 sec: 47520.6, 300 sec: 48764.0). Total num frames: 1087602688. Throughput: 0: 12060.4. Samples: 271990784. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:35,767][1648981] Avg episode reward: [(0, '583.710')] [2024-06-15 17:52:35,903][1651274] Signal inference workers to stop experience collection... (27850 times) [2024-06-15 17:52:35,976][1651669] InferenceWorker_p0-w0: stopping experience collection (27850 times) [2024-06-15 17:52:36,107][1651274] Signal inference workers to resume experience collection... (27850 times) [2024-06-15 17:52:36,107][1651669] InferenceWorker_p0-w0: resuming experience collection (27850 times) [2024-06-15 17:52:36,110][1651669] Updated weights for policy 0, policy_version 531088 (0.0090) [2024-06-15 17:52:38,055][1651669] Updated weights for policy 0, policy_version 531174 (0.0113) [2024-06-15 17:52:40,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 49698.4, 300 sec: 48874.3). Total num frames: 1087897600. Throughput: 0: 12005.7. Samples: 272015872. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:40,767][1648981] Avg episode reward: [(0, '597.650')] [2024-06-15 17:52:40,798][1651274] Saving new best policy, reward=597.650! [2024-06-15 17:52:44,462][1651669] Updated weights for policy 0, policy_version 531216 (0.0013) [2024-06-15 17:52:45,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46445.9, 300 sec: 48541.1). Total num frames: 1088061440. Throughput: 0: 12208.5. Samples: 272102400. Policy #0 lag: (min: 77.0, avg: 158.8, max: 304.0) [2024-06-15 17:52:45,767][1648981] Avg episode reward: [(0, '583.900')] [2024-06-15 17:52:45,885][1651669] Updated weights for policy 0, policy_version 531283 (0.0012) [2024-06-15 17:52:47,302][1651669] Updated weights for policy 0, policy_version 531344 (0.0011) [2024-06-15 17:52:48,912][1651669] Updated weights for policy 0, policy_version 531412 (0.0011) [2024-06-15 17:52:50,004][1651669] Updated weights for policy 0, policy_version 531456 (0.0013) [2024-06-15 17:52:50,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 50243.9, 300 sec: 48876.2). Total num frames: 1088421888. Throughput: 0: 12141.0. Samples: 272161280. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:52:50,767][1648981] Avg episode reward: [(0, '581.920')] [2024-06-15 17:52:55,626][1651669] Updated weights for policy 0, policy_version 531507 (0.0010) [2024-06-15 17:52:55,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 45334.9, 300 sec: 48319.1). Total num frames: 1088520192. Throughput: 0: 12299.3. Samples: 272204800. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:52:55,767][1648981] Avg episode reward: [(0, '596.400')] [2024-06-15 17:52:55,966][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000531520_1088552960.pth... [2024-06-15 17:52:56,103][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000525856_1076953088.pth [2024-06-15 17:52:56,638][1651669] Updated weights for policy 0, policy_version 531552 (0.0011) [2024-06-15 17:52:57,811][1651669] Updated weights for policy 0, policy_version 531604 (0.0010) [2024-06-15 17:52:59,402][1651669] Updated weights for policy 0, policy_version 531667 (0.0010) [2024-06-15 17:53:00,275][1651669] Updated weights for policy 0, policy_version 531712 (0.0012) [2024-06-15 17:53:00,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 52429.2, 300 sec: 48985.4). Total num frames: 1088946176. Throughput: 0: 12174.2. Samples: 272276480. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:00,767][1648981] Avg episode reward: [(0, '582.810')] [2024-06-15 17:53:05,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 45878.2, 300 sec: 47986.3). Total num frames: 1088946176. Throughput: 0: 12322.1. Samples: 272357888. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:05,767][1648981] Avg episode reward: [(0, '551.320')] [2024-06-15 17:53:06,600][1651669] Updated weights for policy 0, policy_version 531760 (0.0012) [2024-06-15 17:53:07,856][1651669] Updated weights for policy 0, policy_version 531824 (0.0013) [2024-06-15 17:53:09,239][1651669] Updated weights for policy 0, policy_version 531878 (0.0011) [2024-06-15 17:53:10,380][1651274] Signal inference workers to stop experience collection... (27900 times) [2024-06-15 17:53:10,428][1651669] InferenceWorker_p0-w0: stopping experience collection (27900 times) [2024-06-15 17:53:10,582][1651274] Signal inference workers to resume experience collection... (27900 times) [2024-06-15 17:53:10,583][1651669] InferenceWorker_p0-w0: resuming experience collection (27900 times) [2024-06-15 17:53:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 51336.6, 300 sec: 49097.2). Total num frames: 1089404928. Throughput: 0: 12131.3. Samples: 272391680. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:10,767][1648981] Avg episode reward: [(0, '545.280')] [2024-06-15 17:53:10,770][1651669] Updated weights for policy 0, policy_version 531937 (0.0012) [2024-06-15 17:53:15,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 46421.2, 300 sec: 47987.6). Total num frames: 1089470464. Throughput: 0: 12697.6. Samples: 272472064. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:15,767][1648981] Avg episode reward: [(0, '540.820')] [2024-06-15 17:53:16,428][1651669] Updated weights for policy 0, policy_version 532000 (0.0012) [2024-06-15 17:53:17,592][1651669] Updated weights for policy 0, policy_version 532048 (0.0024) [2024-06-15 17:53:19,431][1651669] Updated weights for policy 0, policy_version 532128 (0.0013) [2024-06-15 17:53:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50790.4, 300 sec: 49207.6). Total num frames: 1089896448. Throughput: 0: 12049.1. Samples: 272532992. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:20,767][1648981] Avg episode reward: [(0, '519.420')] [2024-06-15 17:53:21,545][1651669] Updated weights for policy 0, policy_version 532208 (0.0112) [2024-06-15 17:53:25,778][1648981] Fps is (10 sec: 52368.1, 60 sec: 48057.9, 300 sec: 48317.0). Total num frames: 1089994752. Throughput: 0: 12262.0. Samples: 272567808. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:25,779][1648981] Avg episode reward: [(0, '513.170')] [2024-06-15 17:53:27,306][1651669] Updated weights for policy 0, policy_version 532256 (0.0014) [2024-06-15 17:53:28,957][1651669] Updated weights for policy 0, policy_version 532322 (0.0109) [2024-06-15 17:53:30,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 49152.1, 300 sec: 48875.6). Total num frames: 1090322432. Throughput: 0: 11992.2. Samples: 272642048. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:30,767][1648981] Avg episode reward: [(0, '493.050')] [2024-06-15 17:53:30,999][1651669] Updated weights for policy 0, policy_version 532404 (0.0013) [2024-06-15 17:53:32,179][1651669] Updated weights for policy 0, policy_version 532451 (0.0109) [2024-06-15 17:53:35,798][1648981] Fps is (10 sec: 52324.2, 60 sec: 48580.1, 300 sec: 48535.8). Total num frames: 1090519040. Throughput: 0: 12302.2. Samples: 272715264. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:35,799][1648981] Avg episode reward: [(0, '473.790')] [2024-06-15 17:53:37,592][1651669] Updated weights for policy 0, policy_version 532487 (0.0025) [2024-06-15 17:53:39,830][1651669] Updated weights for policy 0, policy_version 532580 (0.0092) [2024-06-15 17:53:40,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 48059.6, 300 sec: 48874.3). Total num frames: 1090781184. Throughput: 0: 12344.9. Samples: 272760320. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:40,767][1648981] Avg episode reward: [(0, '476.000')] [2024-06-15 17:53:41,957][1651669] Updated weights for policy 0, policy_version 532657 (0.0030) [2024-06-15 17:53:43,528][1651669] Updated weights for policy 0, policy_version 532726 (0.0012) [2024-06-15 17:53:45,767][1648981] Fps is (10 sec: 52595.0, 60 sec: 49698.0, 300 sec: 48763.2). Total num frames: 1091043328. Throughput: 0: 11935.2. Samples: 272813568. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:45,768][1648981] Avg episode reward: [(0, '449.730')] [2024-06-15 17:53:48,933][1651669] Updated weights for policy 0, policy_version 532768 (0.0012) [2024-06-15 17:53:49,706][1651274] Signal inference workers to stop experience collection... (27950 times) [2024-06-15 17:53:49,753][1651669] InferenceWorker_p0-w0: stopping experience collection (27950 times) [2024-06-15 17:53:49,949][1651274] Signal inference workers to resume experience collection... (27950 times) [2024-06-15 17:53:49,950][1651669] InferenceWorker_p0-w0: resuming experience collection (27950 times) [2024-06-15 17:53:50,454][1651669] Updated weights for policy 0, policy_version 532832 (0.0119) [2024-06-15 17:53:50,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 47513.6, 300 sec: 48763.2). Total num frames: 1091272704. Throughput: 0: 12003.5. Samples: 272898048. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:50,767][1648981] Avg episode reward: [(0, '445.360')] [2024-06-15 17:53:52,356][1651669] Updated weights for policy 0, policy_version 532898 (0.0123) [2024-06-15 17:53:53,792][1651669] Updated weights for policy 0, policy_version 532962 (0.0011) [2024-06-15 17:53:54,393][1651669] Updated weights for policy 0, policy_version 532992 (0.0010) [2024-06-15 17:53:55,773][1648981] Fps is (10 sec: 52396.2, 60 sec: 50785.2, 300 sec: 48874.4). Total num frames: 1091567616. Throughput: 0: 11706.1. Samples: 272918528. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:53:55,773][1648981] Avg episode reward: [(0, '432.810')] [2024-06-15 17:53:59,923][1651669] Updated weights for policy 0, policy_version 533040 (0.0013) [2024-06-15 17:54:00,766][1648981] Fps is (10 sec: 45876.4, 60 sec: 46421.3, 300 sec: 48541.1). Total num frames: 1091731456. Throughput: 0: 11992.2. Samples: 273011712. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:54:00,767][1648981] Avg episode reward: [(0, '415.460')] [2024-06-15 17:54:01,064][1651669] Updated weights for policy 0, policy_version 533080 (0.0011) [2024-06-15 17:54:02,360][1651669] Updated weights for policy 0, policy_version 533137 (0.0011) [2024-06-15 17:54:04,090][1651669] Updated weights for policy 0, policy_version 533216 (0.0097) [2024-06-15 17:54:05,766][1648981] Fps is (10 sec: 52462.3, 60 sec: 52428.8, 300 sec: 49318.6). Total num frames: 1092091904. Throughput: 0: 12003.5. Samples: 273073152. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:54:05,767][1648981] Avg episode reward: [(0, '421.280')] [2024-06-15 17:54:09,788][1651669] Updated weights for policy 0, policy_version 533269 (0.0157) [2024-06-15 17:54:10,465][1651669] Updated weights for policy 0, policy_version 533312 (0.0012) [2024-06-15 17:54:10,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46967.4, 300 sec: 48541.1). Total num frames: 1092222976. Throughput: 0: 12314.0. Samples: 273121792. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:54:10,767][1648981] Avg episode reward: [(0, '429.660')] [2024-06-15 17:54:11,982][1651669] Updated weights for policy 0, policy_version 533376 (0.0012) [2024-06-15 17:54:13,151][1651669] Updated weights for policy 0, policy_version 533424 (0.0129) [2024-06-15 17:54:14,923][1651669] Updated weights for policy 0, policy_version 533492 (0.0011) [2024-06-15 17:54:15,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 52429.0, 300 sec: 49318.6). Total num frames: 1092616192. Throughput: 0: 12094.6. Samples: 273186304. Policy #0 lag: (min: 196.0, avg: 245.1, max: 435.0) [2024-06-15 17:54:15,767][1648981] Avg episode reward: [(0, '474.720')] [2024-06-15 17:54:20,235][1651669] Updated weights for policy 0, policy_version 533538 (0.0011) [2024-06-15 17:54:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 48652.2). Total num frames: 1092747264. Throughput: 0: 12262.5. Samples: 273266688. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:20,767][1648981] Avg episode reward: [(0, '471.390')] [2024-06-15 17:54:22,809][1651669] Updated weights for policy 0, policy_version 533588 (0.0012) [2024-06-15 17:54:24,821][1651669] Updated weights for policy 0, policy_version 533667 (0.0011) [2024-06-15 17:54:25,105][1651274] Signal inference workers to stop experience collection... (28000 times) [2024-06-15 17:54:25,255][1651669] InferenceWorker_p0-w0: stopping experience collection (28000 times) [2024-06-15 17:54:25,432][1651274] Signal inference workers to resume experience collection... (28000 times) [2024-06-15 17:54:25,432][1651669] InferenceWorker_p0-w0: resuming experience collection (28000 times) [2024-06-15 17:54:25,767][1648981] Fps is (10 sec: 42594.0, 60 sec: 50799.5, 300 sec: 49207.4). Total num frames: 1093042176. Throughput: 0: 12105.7. Samples: 273305088. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:25,768][1648981] Avg episode reward: [(0, '488.840')] [2024-06-15 17:54:26,625][1651669] Updated weights for policy 0, policy_version 533744 (0.0102) [2024-06-15 17:54:30,781][1648981] Fps is (10 sec: 39266.6, 60 sec: 46956.6, 300 sec: 48427.7). Total num frames: 1093140480. Throughput: 0: 12341.1. Samples: 273369088. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:30,783][1648981] Avg episode reward: [(0, '461.880')] [2024-06-15 17:54:30,833][1651669] Updated weights for policy 0, policy_version 533776 (0.0012) [2024-06-15 17:54:34,772][1651669] Updated weights for policy 0, policy_version 533873 (0.0011) [2024-06-15 17:54:35,766][1648981] Fps is (10 sec: 42602.5, 60 sec: 49178.0, 300 sec: 48874.3). Total num frames: 1093468160. Throughput: 0: 12071.9. Samples: 273441280. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:35,767][1648981] Avg episode reward: [(0, '461.420')] [2024-06-15 17:54:36,160][1651669] Updated weights for policy 0, policy_version 533952 (0.0012) [2024-06-15 17:54:37,776][1651669] Updated weights for policy 0, policy_version 534008 (0.0013) [2024-06-15 17:54:40,766][1648981] Fps is (10 sec: 52502.0, 60 sec: 48059.8, 300 sec: 48542.1). Total num frames: 1093664768. Throughput: 0: 12232.8. Samples: 273468928. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:40,767][1648981] Avg episode reward: [(0, '467.780')] [2024-06-15 17:54:42,717][1651669] Updated weights for policy 0, policy_version 534077 (0.0016) [2024-06-15 17:54:45,766][1648981] Fps is (10 sec: 36044.9, 60 sec: 46421.5, 300 sec: 48545.0). Total num frames: 1093828608. Throughput: 0: 11878.4. Samples: 273546240. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:45,767][1648981] Avg episode reward: [(0, '464.820')] [2024-06-15 17:54:46,393][1651669] Updated weights for policy 0, policy_version 534133 (0.0015) [2024-06-15 17:54:48,377][1651669] Updated weights for policy 0, policy_version 534211 (0.0011) [2024-06-15 17:54:49,642][1651669] Updated weights for policy 0, policy_version 534269 (0.0116) [2024-06-15 17:54:50,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1094189056. Throughput: 0: 11912.5. Samples: 273609216. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:50,767][1648981] Avg episode reward: [(0, '449.050')] [2024-06-15 17:54:54,975][1651669] Updated weights for policy 0, policy_version 534328 (0.0013) [2024-06-15 17:54:55,774][1648981] Fps is (10 sec: 49112.4, 60 sec: 45873.9, 300 sec: 48428.7). Total num frames: 1094320128. Throughput: 0: 11842.1. Samples: 273654784. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:54:55,775][1648981] Avg episode reward: [(0, '447.210')] [2024-06-15 17:54:55,779][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000534336_1094320128.pth... [2024-06-15 17:54:55,826][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000528672_1082720256.pth [2024-06-15 17:54:56,751][1651669] Updated weights for policy 0, policy_version 534368 (0.0011) [2024-06-15 17:54:57,886][1651669] Updated weights for policy 0, policy_version 534419 (0.0012) [2024-06-15 17:54:59,694][1651669] Updated weights for policy 0, policy_version 534496 (0.0012) [2024-06-15 17:55:00,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1094713344. Throughput: 0: 11855.6. Samples: 273719808. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:00,767][1648981] Avg episode reward: [(0, '436.500')] [2024-06-15 17:55:04,382][1651669] Updated weights for policy 0, policy_version 534544 (0.0012) [2024-06-15 17:55:05,767][1648981] Fps is (10 sec: 52470.0, 60 sec: 45875.0, 300 sec: 48430.1). Total num frames: 1094844416. Throughput: 0: 11775.9. Samples: 273796608. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:05,768][1648981] Avg episode reward: [(0, '439.870')] [2024-06-15 17:55:06,872][1651669] Updated weights for policy 0, policy_version 534594 (0.0013) [2024-06-15 17:55:07,233][1651274] Signal inference workers to stop experience collection... (28050 times) [2024-06-15 17:55:07,317][1651669] InferenceWorker_p0-w0: stopping experience collection (28050 times) [2024-06-15 17:55:07,482][1651274] Signal inference workers to resume experience collection... (28050 times) [2024-06-15 17:55:07,483][1651669] InferenceWorker_p0-w0: resuming experience collection (28050 times) [2024-06-15 17:55:08,999][1651669] Updated weights for policy 0, policy_version 534692 (0.0144) [2024-06-15 17:55:10,767][1648981] Fps is (10 sec: 45872.2, 60 sec: 49151.5, 300 sec: 49099.0). Total num frames: 1095172096. Throughput: 0: 11662.3. Samples: 273829888. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:10,768][1648981] Avg episode reward: [(0, '436.410')] [2024-06-15 17:55:10,858][1651669] Updated weights for policy 0, policy_version 534768 (0.0011) [2024-06-15 17:55:15,767][1648981] Fps is (10 sec: 42599.1, 60 sec: 44236.7, 300 sec: 48096.7). Total num frames: 1095270400. Throughput: 0: 11791.0. Samples: 273899520. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:15,769][1648981] Avg episode reward: [(0, '438.710')] [2024-06-15 17:55:16,348][1651669] Updated weights for policy 0, policy_version 534817 (0.0012) [2024-06-15 17:55:18,973][1651669] Updated weights for policy 0, policy_version 534896 (0.0015) [2024-06-15 17:55:20,528][1651669] Updated weights for policy 0, policy_version 534946 (0.0012) [2024-06-15 17:55:20,766][1648981] Fps is (10 sec: 42601.0, 60 sec: 47513.6, 300 sec: 48763.2). Total num frames: 1095598080. Throughput: 0: 11571.2. Samples: 273961984. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:20,767][1648981] Avg episode reward: [(0, '413.480')] [2024-06-15 17:55:21,940][1651669] Updated weights for policy 0, policy_version 535011 (0.0012) [2024-06-15 17:55:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45329.8, 300 sec: 48210.1). Total num frames: 1095761920. Throughput: 0: 11855.6. Samples: 274002432. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:25,767][1648981] Avg episode reward: [(0, '425.160')] [2024-06-15 17:55:27,145][1651669] Updated weights for policy 0, policy_version 535088 (0.0014) [2024-06-15 17:55:29,687][1651669] Updated weights for policy 0, policy_version 535128 (0.0013) [2024-06-15 17:55:30,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48070.9, 300 sec: 48435.0). Total num frames: 1096024064. Throughput: 0: 11844.3. Samples: 274079232. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:30,767][1648981] Avg episode reward: [(0, '424.110')] [2024-06-15 17:55:31,176][1651669] Updated weights for policy 0, policy_version 535187 (0.0012) [2024-06-15 17:55:32,845][1651669] Updated weights for policy 0, policy_version 535264 (0.0012) [2024-06-15 17:55:35,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 46967.6, 300 sec: 48208.1). Total num frames: 1096286208. Throughput: 0: 12117.4. Samples: 274154496. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:35,767][1648981] Avg episode reward: [(0, '430.090')] [2024-06-15 17:55:37,015][1651669] Updated weights for policy 0, policy_version 535316 (0.0010) [2024-06-15 17:55:40,020][1651669] Updated weights for policy 0, policy_version 535379 (0.0115) [2024-06-15 17:55:40,767][1648981] Fps is (10 sec: 49151.8, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1096515584. Throughput: 0: 11891.9. Samples: 274189824. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:40,767][1648981] Avg episode reward: [(0, '431.870')] [2024-06-15 17:55:41,296][1651669] Updated weights for policy 0, policy_version 535430 (0.0015) [2024-06-15 17:55:42,795][1651669] Updated weights for policy 0, policy_version 535488 (0.0009) [2024-06-15 17:55:43,608][1651274] Signal inference workers to stop experience collection... (28100 times) [2024-06-15 17:55:43,651][1651669] InferenceWorker_p0-w0: stopping experience collection (28100 times) [2024-06-15 17:55:43,835][1651274] Signal inference workers to resume experience collection... (28100 times) [2024-06-15 17:55:43,836][1651669] InferenceWorker_p0-w0: resuming experience collection (28100 times) [2024-06-15 17:55:44,020][1651669] Updated weights for policy 0, policy_version 535549 (0.0012) [2024-06-15 17:55:45,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49698.2, 300 sec: 48541.1). Total num frames: 1096810496. Throughput: 0: 11878.4. Samples: 274254336. Policy #0 lag: (min: 15.0, avg: 93.0, max: 271.0) [2024-06-15 17:55:45,767][1648981] Avg episode reward: [(0, '420.660')] [2024-06-15 17:55:48,382][1651669] Updated weights for policy 0, policy_version 535609 (0.0012) [2024-06-15 17:55:50,749][1651669] Updated weights for policy 0, policy_version 535651 (0.0012) [2024-06-15 17:55:50,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 46967.6, 300 sec: 48652.3). Total num frames: 1097007104. Throughput: 0: 12026.4. Samples: 274337792. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:55:50,767][1648981] Avg episode reward: [(0, '425.500')] [2024-06-15 17:55:52,429][1651669] Updated weights for policy 0, policy_version 535700 (0.0012) [2024-06-15 17:55:54,098][1651669] Updated weights for policy 0, policy_version 535763 (0.0011) [2024-06-15 17:55:55,786][1648981] Fps is (10 sec: 52325.2, 60 sec: 50234.5, 300 sec: 48426.8). Total num frames: 1097334784. Throughput: 0: 11862.0. Samples: 274363904. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:55:55,787][1648981] Avg episode reward: [(0, '421.190')] [2024-06-15 17:55:58,243][1651669] Updated weights for policy 0, policy_version 535813 (0.0018) [2024-06-15 17:56:00,486][1651669] Updated weights for policy 0, policy_version 535876 (0.0014) [2024-06-15 17:56:00,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 48541.1). Total num frames: 1097498624. Throughput: 0: 12060.5. Samples: 274442240. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:00,767][1648981] Avg episode reward: [(0, '428.480')] [2024-06-15 17:56:01,753][1651669] Updated weights for policy 0, policy_version 535935 (0.0103) [2024-06-15 17:56:04,944][1651669] Updated weights for policy 0, policy_version 535990 (0.0014) [2024-06-15 17:56:05,767][1648981] Fps is (10 sec: 42679.5, 60 sec: 48605.4, 300 sec: 47985.7). Total num frames: 1097760768. Throughput: 0: 12105.8. Samples: 274506752. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:05,768][1648981] Avg episode reward: [(0, '424.190')] [2024-06-15 17:56:06,651][1651669] Updated weights for policy 0, policy_version 536056 (0.0013) [2024-06-15 17:56:09,792][1651669] Updated weights for policy 0, policy_version 536096 (0.0011) [2024-06-15 17:56:10,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 46967.8, 300 sec: 48430.0). Total num frames: 1097990144. Throughput: 0: 12037.7. Samples: 274544128. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:10,767][1648981] Avg episode reward: [(0, '427.330')] [2024-06-15 17:56:11,231][1651669] Updated weights for policy 0, policy_version 536134 (0.0012) [2024-06-15 17:56:12,242][1651669] Updated weights for policy 0, policy_version 536190 (0.0012) [2024-06-15 17:56:15,703][1651669] Updated weights for policy 0, policy_version 536275 (0.0012) [2024-06-15 17:56:15,778][1648981] Fps is (10 sec: 52370.8, 60 sec: 50234.4, 300 sec: 47983.8). Total num frames: 1098285056. Throughput: 0: 12205.2. Samples: 274628608. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:15,779][1648981] Avg episode reward: [(0, '407.200')] [2024-06-15 17:56:16,686][1651669] Updated weights for policy 0, policy_version 536318 (0.0011) [2024-06-15 17:56:20,368][1651669] Updated weights for policy 0, policy_version 536368 (0.0013) [2024-06-15 17:56:20,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 48605.9, 300 sec: 48431.2). Total num frames: 1098514432. Throughput: 0: 12219.7. Samples: 274704384. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:20,767][1648981] Avg episode reward: [(0, '389.920')] [2024-06-15 17:56:22,872][1651669] Updated weights for policy 0, policy_version 536446 (0.0011) [2024-06-15 17:56:25,737][1651669] Updated weights for policy 0, policy_version 536512 (0.0011) [2024-06-15 17:56:25,772][1648981] Fps is (10 sec: 49181.0, 60 sec: 50239.3, 300 sec: 48317.9). Total num frames: 1098776576. Throughput: 0: 12206.8. Samples: 274739200. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:25,773][1648981] Avg episode reward: [(0, '388.550')] [2024-06-15 17:56:25,898][1651274] Signal inference workers to stop experience collection... (28150 times) [2024-06-15 17:56:25,946][1651669] InferenceWorker_p0-w0: stopping experience collection (28150 times) [2024-06-15 17:56:26,203][1651274] Signal inference workers to resume experience collection... (28150 times) [2024-06-15 17:56:26,214][1651669] InferenceWorker_p0-w0: resuming experience collection (28150 times) [2024-06-15 17:56:27,253][1651669] Updated weights for policy 0, policy_version 536568 (0.0023) [2024-06-15 17:56:30,767][1648981] Fps is (10 sec: 39321.1, 60 sec: 48059.7, 300 sec: 47987.1). Total num frames: 1098907648. Throughput: 0: 12276.6. Samples: 274806784. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:30,769][1648981] Avg episode reward: [(0, '381.270')] [2024-06-15 17:56:31,278][1651669] Updated weights for policy 0, policy_version 536608 (0.0011) [2024-06-15 17:56:33,159][1651669] Updated weights for policy 0, policy_version 536662 (0.0017) [2024-06-15 17:56:33,957][1651669] Updated weights for policy 0, policy_version 536704 (0.0012) [2024-06-15 17:56:35,766][1648981] Fps is (10 sec: 49181.8, 60 sec: 49698.2, 300 sec: 48652.2). Total num frames: 1099268096. Throughput: 0: 12162.9. Samples: 274885120. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:35,767][1648981] Avg episode reward: [(0, '374.410')] [2024-06-15 17:56:36,172][1651669] Updated weights for policy 0, policy_version 536768 (0.0146) [2024-06-15 17:56:37,454][1651669] Updated weights for policy 0, policy_version 536825 (0.0013) [2024-06-15 17:56:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48605.9, 300 sec: 47990.8). Total num frames: 1099431936. Throughput: 0: 12338.9. Samples: 274918912. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:40,767][1648981] Avg episode reward: [(0, '378.270')] [2024-06-15 17:56:42,304][1651669] Updated weights for policy 0, policy_version 536880 (0.0015) [2024-06-15 17:56:44,110][1651669] Updated weights for policy 0, policy_version 536944 (0.0102) [2024-06-15 17:56:45,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1099694080. Throughput: 0: 12265.3. Samples: 274994176. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:45,767][1648981] Avg episode reward: [(0, '377.810')] [2024-06-15 17:56:46,806][1651669] Updated weights for policy 0, policy_version 537010 (0.0012) [2024-06-15 17:56:48,186][1651669] Updated weights for policy 0, policy_version 537079 (0.0013) [2024-06-15 17:56:50,778][1648981] Fps is (10 sec: 52367.5, 60 sec: 49142.3, 300 sec: 47985.1). Total num frames: 1099956224. Throughput: 0: 12649.0. Samples: 275076096. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:50,779][1648981] Avg episode reward: [(0, '381.610')] [2024-06-15 17:56:52,668][1651669] Updated weights for policy 0, policy_version 537140 (0.0011) [2024-06-15 17:56:53,999][1651669] Updated weights for policy 0, policy_version 537184 (0.0011) [2024-06-15 17:56:55,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 48075.4, 300 sec: 48874.3). Total num frames: 1100218368. Throughput: 0: 12561.0. Samples: 275109376. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:56:55,767][1648981] Avg episode reward: [(0, '397.570')] [2024-06-15 17:56:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000537216_1100218368.pth... [2024-06-15 17:56:55,845][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000531520_1088552960.pth [2024-06-15 17:56:56,965][1651669] Updated weights for policy 0, policy_version 537248 (0.0044) [2024-06-15 17:56:58,890][1651669] Updated weights for policy 0, policy_version 537335 (0.0032) [2024-06-15 17:57:00,774][1648981] Fps is (10 sec: 52449.2, 60 sec: 49691.6, 300 sec: 48429.3). Total num frames: 1100480512. Throughput: 0: 12209.4. Samples: 275177984. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:57:00,775][1648981] Avg episode reward: [(0, '398.050')] [2024-06-15 17:57:03,373][1651669] Updated weights for policy 0, policy_version 537392 (0.0013) [2024-06-15 17:57:04,428][1651669] Updated weights for policy 0, policy_version 537427 (0.0013) [2024-06-15 17:57:05,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 49698.8, 300 sec: 48874.3). Total num frames: 1100742656. Throughput: 0: 12140.1. Samples: 275250688. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:57:05,767][1648981] Avg episode reward: [(0, '396.870')] [2024-06-15 17:57:08,018][1651669] Updated weights for policy 0, policy_version 537490 (0.0011) [2024-06-15 17:57:08,317][1651274] Signal inference workers to stop experience collection... (28200 times) [2024-06-15 17:57:08,398][1651669] InferenceWorker_p0-w0: stopping experience collection (28200 times) [2024-06-15 17:57:08,499][1651274] Signal inference workers to resume experience collection... (28200 times) [2024-06-15 17:57:08,500][1651669] InferenceWorker_p0-w0: resuming experience collection (28200 times) [2024-06-15 17:57:08,994][1651669] Updated weights for policy 0, policy_version 537540 (0.0011) [2024-06-15 17:57:10,106][1651669] Updated weights for policy 0, policy_version 537592 (0.0013) [2024-06-15 17:57:10,766][1648981] Fps is (10 sec: 52470.2, 60 sec: 50244.4, 300 sec: 48541.1). Total num frames: 1101004800. Throughput: 0: 12198.6. Samples: 275288064. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:57:10,767][1648981] Avg episode reward: [(0, '393.220')] [2024-06-15 17:57:14,175][1651669] Updated weights for policy 0, policy_version 537648 (0.0012) [2024-06-15 17:57:15,430][1651669] Updated weights for policy 0, policy_version 537685 (0.0103) [2024-06-15 17:57:15,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48615.5, 300 sec: 48652.2). Total num frames: 1101201408. Throughput: 0: 12458.7. Samples: 275367424. Policy #0 lag: (min: 15.0, avg: 103.1, max: 271.0) [2024-06-15 17:57:15,767][1648981] Avg episode reward: [(0, '403.410')] [2024-06-15 17:57:16,352][1651669] Updated weights for policy 0, policy_version 537725 (0.0012) [2024-06-15 17:57:18,974][1651669] Updated weights for policy 0, policy_version 537776 (0.0011) [2024-06-15 17:57:20,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.0, 300 sec: 48764.8). Total num frames: 1101496320. Throughput: 0: 12014.9. Samples: 275425792. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:20,767][1648981] Avg episode reward: [(0, '394.110')] [2024-06-15 17:57:20,842][1651669] Updated weights for policy 0, policy_version 537856 (0.0012) [2024-06-15 17:57:25,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 47518.2, 300 sec: 48318.9). Total num frames: 1101627392. Throughput: 0: 12310.7. Samples: 275472896. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:25,767][1648981] Avg episode reward: [(0, '427.820')] [2024-06-15 17:57:26,112][1651669] Updated weights for policy 0, policy_version 537921 (0.0011) [2024-06-15 17:57:27,365][1651669] Updated weights for policy 0, policy_version 537977 (0.0012) [2024-06-15 17:57:29,368][1651669] Updated weights for policy 0, policy_version 538032 (0.0012) [2024-06-15 17:57:30,768][1648981] Fps is (10 sec: 49144.4, 60 sec: 51335.2, 300 sec: 48763.0). Total num frames: 1101987840. Throughput: 0: 12082.8. Samples: 275537920. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:30,769][1648981] Avg episode reward: [(0, '431.600')] [2024-06-15 17:57:31,209][1651669] Updated weights for policy 0, policy_version 538104 (0.0010) [2024-06-15 17:57:35,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46421.2, 300 sec: 47985.7). Total num frames: 1102053376. Throughput: 0: 12040.8. Samples: 275617792. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:35,767][1648981] Avg episode reward: [(0, '437.410')] [2024-06-15 17:57:36,164][1651669] Updated weights for policy 0, policy_version 538129 (0.0011) [2024-06-15 17:57:37,554][1651669] Updated weights for policy 0, policy_version 538195 (0.0300) [2024-06-15 17:57:38,537][1651669] Updated weights for policy 0, policy_version 538240 (0.0012) [2024-06-15 17:57:40,406][1651669] Updated weights for policy 0, policy_version 538297 (0.0012) [2024-06-15 17:57:40,770][1648981] Fps is (10 sec: 45865.2, 60 sec: 50241.2, 300 sec: 48762.6). Total num frames: 1102446592. Throughput: 0: 12048.1. Samples: 275651584. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:40,771][1648981] Avg episode reward: [(0, '437.490')] [2024-06-15 17:57:41,900][1651669] Updated weights for policy 0, policy_version 538357 (0.0011) [2024-06-15 17:57:45,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 48059.5, 300 sec: 47985.7). Total num frames: 1102577664. Throughput: 0: 12096.6. Samples: 275722240. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:45,769][1648981] Avg episode reward: [(0, '430.490')] [2024-06-15 17:57:47,240][1651669] Updated weights for policy 0, policy_version 538407 (0.0013) [2024-06-15 17:57:48,536][1651274] Signal inference workers to stop experience collection... (28250 times) [2024-06-15 17:57:48,573][1651669] InferenceWorker_p0-w0: stopping experience collection (28250 times) [2024-06-15 17:57:48,576][1651669] Updated weights for policy 0, policy_version 538434 (0.0013) [2024-06-15 17:57:48,763][1651274] Signal inference workers to resume experience collection... (28250 times) [2024-06-15 17:57:48,764][1651669] InferenceWorker_p0-w0: resuming experience collection (28250 times) [2024-06-15 17:57:49,870][1651669] Updated weights for policy 0, policy_version 538498 (0.0047) [2024-06-15 17:57:50,769][1648981] Fps is (10 sec: 49157.2, 60 sec: 49705.7, 300 sec: 48873.9). Total num frames: 1102938112. Throughput: 0: 12184.9. Samples: 275799040. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:50,770][1648981] Avg episode reward: [(0, '403.480')] [2024-06-15 17:57:51,316][1651669] Updated weights for policy 0, policy_version 538562 (0.0022) [2024-06-15 17:57:55,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 1103101952. Throughput: 0: 11969.4. Samples: 275826688. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:57:55,767][1648981] Avg episode reward: [(0, '397.570')] [2024-06-15 17:57:57,030][1651669] Updated weights for policy 0, policy_version 538640 (0.0013) [2024-06-15 17:57:59,752][1651669] Updated weights for policy 0, policy_version 538690 (0.0013) [2024-06-15 17:58:00,766][1648981] Fps is (10 sec: 39332.4, 60 sec: 47519.9, 300 sec: 48763.2). Total num frames: 1103331328. Throughput: 0: 12037.7. Samples: 275909120. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:00,767][1648981] Avg episode reward: [(0, '405.510')] [2024-06-15 17:58:01,100][1651669] Updated weights for policy 0, policy_version 538755 (0.0016) [2024-06-15 17:58:02,989][1651669] Updated weights for policy 0, policy_version 538832 (0.0135) [2024-06-15 17:58:03,970][1651669] Updated weights for policy 0, policy_version 538879 (0.0037) [2024-06-15 17:58:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1103626240. Throughput: 0: 12140.1. Samples: 275972096. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:05,767][1648981] Avg episode reward: [(0, '421.480')] [2024-06-15 17:58:08,350][1651669] Updated weights for policy 0, policy_version 538928 (0.0014) [2024-06-15 17:58:10,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 45875.1, 300 sec: 48430.0). Total num frames: 1103757312. Throughput: 0: 11878.4. Samples: 276007424. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:10,767][1648981] Avg episode reward: [(0, '418.870')] [2024-06-15 17:58:11,497][1651669] Updated weights for policy 0, policy_version 538979 (0.0011) [2024-06-15 17:58:13,248][1651669] Updated weights for policy 0, policy_version 539056 (0.0011) [2024-06-15 17:58:14,365][1651669] Updated weights for policy 0, policy_version 539109 (0.0118) [2024-06-15 17:58:15,794][1648981] Fps is (10 sec: 52283.3, 60 sec: 49129.1, 300 sec: 48314.3). Total num frames: 1104150528. Throughput: 0: 12144.4. Samples: 276084736. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:15,795][1648981] Avg episode reward: [(0, '411.720')] [2024-06-15 17:58:17,890][1651669] Updated weights for policy 0, policy_version 539168 (0.0012) [2024-06-15 17:58:20,779][1648981] Fps is (10 sec: 52365.1, 60 sec: 46411.9, 300 sec: 48429.9). Total num frames: 1104281600. Throughput: 0: 12250.5. Samples: 276169216. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:20,779][1648981] Avg episode reward: [(0, '408.120')] [2024-06-15 17:58:21,104][1651669] Updated weights for policy 0, policy_version 539216 (0.0013) [2024-06-15 17:58:22,992][1651669] Updated weights for policy 0, policy_version 539296 (0.0015) [2024-06-15 17:58:24,214][1651274] Signal inference workers to stop experience collection... (28300 times) [2024-06-15 17:58:24,259][1651669] Updated weights for policy 0, policy_version 539349 (0.0014) [2024-06-15 17:58:24,281][1651669] InferenceWorker_p0-w0: stopping experience collection (28300 times) [2024-06-15 17:58:24,392][1651274] Signal inference workers to resume experience collection... (28300 times) [2024-06-15 17:58:24,393][1651669] InferenceWorker_p0-w0: resuming experience collection (28300 times) [2024-06-15 17:58:25,040][1651669] Updated weights for policy 0, policy_version 539392 (0.0012) [2024-06-15 17:58:25,769][1648981] Fps is (10 sec: 52563.7, 60 sec: 50788.7, 300 sec: 48651.8). Total num frames: 1104674816. Throughput: 0: 12208.8. Samples: 276200960. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:25,769][1648981] Avg episode reward: [(0, '404.170')] [2024-06-15 17:58:28,284][1651669] Updated weights for policy 0, policy_version 539453 (0.0010) [2024-06-15 17:58:30,766][1648981] Fps is (10 sec: 55773.8, 60 sec: 47514.9, 300 sec: 48546.3). Total num frames: 1104838656. Throughput: 0: 12538.4. Samples: 276286464. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:30,767][1648981] Avg episode reward: [(0, '409.720')] [2024-06-15 17:58:31,489][1651669] Updated weights for policy 0, policy_version 539512 (0.0106) [2024-06-15 17:58:33,760][1651669] Updated weights for policy 0, policy_version 539570 (0.0013) [2024-06-15 17:58:35,189][1651669] Updated weights for policy 0, policy_version 539642 (0.0010) [2024-06-15 17:58:35,766][1648981] Fps is (10 sec: 52440.4, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1105199104. Throughput: 0: 12311.5. Samples: 276353024. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:35,767][1648981] Avg episode reward: [(0, '415.320')] [2024-06-15 17:58:38,652][1651669] Updated weights for policy 0, policy_version 539696 (0.0012) [2024-06-15 17:58:40,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 48608.8, 300 sec: 48541.1). Total num frames: 1105362944. Throughput: 0: 12595.2. Samples: 276393472. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:40,767][1648981] Avg episode reward: [(0, '420.120')] [2024-06-15 17:58:41,287][1651669] Updated weights for policy 0, policy_version 539745 (0.0028) [2024-06-15 17:58:43,081][1651669] Updated weights for policy 0, policy_version 539781 (0.0013) [2024-06-15 17:58:45,032][1651669] Updated weights for policy 0, policy_version 539856 (0.0115) [2024-06-15 17:58:45,767][1648981] Fps is (10 sec: 49149.4, 60 sec: 51882.4, 300 sec: 48874.3). Total num frames: 1105690624. Throughput: 0: 12413.0. Samples: 276467712. Policy #0 lag: (min: 79.0, avg: 190.8, max: 335.0) [2024-06-15 17:58:45,767][1648981] Avg episode reward: [(0, '419.000')] [2024-06-15 17:58:48,444][1651669] Updated weights for policy 0, policy_version 539920 (0.0013) [2024-06-15 17:58:50,774][1648981] Fps is (10 sec: 49113.1, 60 sec: 48601.5, 300 sec: 48429.7). Total num frames: 1105854464. Throughput: 0: 12695.3. Samples: 276543488. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:58:50,775][1648981] Avg episode reward: [(0, '456.270')] [2024-06-15 17:58:50,936][1651669] Updated weights for policy 0, policy_version 539971 (0.0011) [2024-06-15 17:58:52,274][1651669] Updated weights for policy 0, policy_version 540027 (0.0011) [2024-06-15 17:58:55,136][1651669] Updated weights for policy 0, policy_version 540086 (0.0113) [2024-06-15 17:58:55,769][1648981] Fps is (10 sec: 42590.0, 60 sec: 50242.2, 300 sec: 48762.8). Total num frames: 1106116608. Throughput: 0: 12810.7. Samples: 276583936. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:58:55,769][1648981] Avg episode reward: [(0, '436.710')] [2024-06-15 17:58:56,431][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000540128_1106182144.pth... [2024-06-15 17:58:56,649][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000534336_1094320128.pth [2024-06-15 17:58:56,876][1651669] Updated weights for policy 0, policy_version 540144 (0.0011) [2024-06-15 17:58:59,840][1651669] Updated weights for policy 0, policy_version 540195 (0.0010) [2024-06-15 17:59:00,766][1648981] Fps is (10 sec: 52471.2, 60 sec: 50790.4, 300 sec: 48430.0). Total num frames: 1106378752. Throughput: 0: 12568.9. Samples: 276649984. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:00,767][1648981] Avg episode reward: [(0, '432.230')] [2024-06-15 17:59:02,029][1651669] Updated weights for policy 0, policy_version 540241 (0.0012) [2024-06-15 17:59:02,972][1651669] Updated weights for policy 0, policy_version 540284 (0.0010) [2024-06-15 17:59:05,766][1648981] Fps is (10 sec: 45887.1, 60 sec: 49152.1, 300 sec: 48652.2). Total num frames: 1106575360. Throughput: 0: 12473.5. Samples: 276730368. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:05,767][1648981] Avg episode reward: [(0, '428.150')] [2024-06-15 17:59:05,872][1651669] Updated weights for policy 0, policy_version 540336 (0.0011) [2024-06-15 17:59:06,464][1651274] Signal inference workers to stop experience collection... (28350 times) [2024-06-15 17:59:06,510][1651669] InferenceWorker_p0-w0: stopping experience collection (28350 times) [2024-06-15 17:59:06,788][1651274] Signal inference workers to resume experience collection... (28350 times) [2024-06-15 17:59:06,789][1651669] InferenceWorker_p0-w0: resuming experience collection (28350 times) [2024-06-15 17:59:07,463][1651669] Updated weights for policy 0, policy_version 540387 (0.0029) [2024-06-15 17:59:09,845][1651669] Updated weights for policy 0, policy_version 540435 (0.0025) [2024-06-15 17:59:10,769][1648981] Fps is (10 sec: 52420.4, 60 sec: 52427.5, 300 sec: 48429.7). Total num frames: 1106903040. Throughput: 0: 12424.7. Samples: 276760064. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:10,772][1648981] Avg episode reward: [(0, '429.340')] [2024-06-15 17:59:12,200][1651669] Updated weights for policy 0, policy_version 540496 (0.0013) [2024-06-15 17:59:13,264][1651669] Updated weights for policy 0, policy_version 540544 (0.0011) [2024-06-15 17:59:15,766][1648981] Fps is (10 sec: 49151.1, 60 sec: 48628.4, 300 sec: 48541.1). Total num frames: 1107066880. Throughput: 0: 12197.0. Samples: 276835328. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:15,767][1648981] Avg episode reward: [(0, '411.770')] [2024-06-15 17:59:17,427][1651669] Updated weights for policy 0, policy_version 540624 (0.0013) [2024-06-15 17:59:18,488][1651669] Updated weights for policy 0, policy_version 540667 (0.0092) [2024-06-15 17:59:20,665][1651669] Updated weights for policy 0, policy_version 540726 (0.0012) [2024-06-15 17:59:20,766][1648981] Fps is (10 sec: 49159.9, 60 sec: 51893.3, 300 sec: 48652.3). Total num frames: 1107394560. Throughput: 0: 12379.0. Samples: 276910080. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:20,767][1648981] Avg episode reward: [(0, '410.850')] [2024-06-15 17:59:24,309][1651669] Updated weights for policy 0, policy_version 540792 (0.0127) [2024-06-15 17:59:25,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 48061.6, 300 sec: 48876.6). Total num frames: 1107558400. Throughput: 0: 12322.2. Samples: 276947968. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:25,767][1648981] Avg episode reward: [(0, '425.310')] [2024-06-15 17:59:27,745][1651669] Updated weights for policy 0, policy_version 540864 (0.0012) [2024-06-15 17:59:29,176][1651669] Updated weights for policy 0, policy_version 540928 (0.0013) [2024-06-15 17:59:30,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 49698.1, 300 sec: 48652.2). Total num frames: 1107820544. Throughput: 0: 12242.6. Samples: 277018624. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:30,767][1648981] Avg episode reward: [(0, '396.000')] [2024-06-15 17:59:31,938][1651669] Updated weights for policy 0, policy_version 540987 (0.0107) [2024-06-15 17:59:34,496][1651669] Updated weights for policy 0, policy_version 541029 (0.0026) [2024-06-15 17:59:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1108082688. Throughput: 0: 12176.4. Samples: 277091328. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:35,767][1648981] Avg episode reward: [(0, '419.060')] [2024-06-15 17:59:38,571][1651669] Updated weights for policy 0, policy_version 541104 (0.0011) [2024-06-15 17:59:40,223][1651669] Updated weights for policy 0, policy_version 541172 (0.0012) [2024-06-15 17:59:40,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.3, 300 sec: 49207.5). Total num frames: 1108344832. Throughput: 0: 12163.5. Samples: 277131264. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:40,767][1648981] Avg episode reward: [(0, '440.770')] [2024-06-15 17:59:41,972][1651669] Updated weights for policy 0, policy_version 541205 (0.0012) [2024-06-15 17:59:45,051][1651669] Updated weights for policy 0, policy_version 541280 (0.0093) [2024-06-15 17:59:45,767][1648981] Fps is (10 sec: 52427.1, 60 sec: 48606.1, 300 sec: 48874.3). Total num frames: 1108606976. Throughput: 0: 12322.0. Samples: 277204480. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:45,767][1648981] Avg episode reward: [(0, '442.510')] [2024-06-15 17:59:48,353][1651669] Updated weights for policy 0, policy_version 541344 (0.0013) [2024-06-15 17:59:48,460][1651274] Signal inference workers to stop experience collection... (28400 times) [2024-06-15 17:59:48,516][1651669] InferenceWorker_p0-w0: stopping experience collection (28400 times) [2024-06-15 17:59:48,672][1651274] Signal inference workers to resume experience collection... (28400 times) [2024-06-15 17:59:48,673][1651669] InferenceWorker_p0-w0: resuming experience collection (28400 times) [2024-06-15 17:59:49,335][1651669] Updated weights for policy 0, policy_version 541392 (0.0012) [2024-06-15 17:59:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50251.0, 300 sec: 49320.0). Total num frames: 1108869120. Throughput: 0: 12231.1. Samples: 277280768. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:50,767][1648981] Avg episode reward: [(0, '445.630')] [2024-06-15 17:59:51,669][1651669] Updated weights for policy 0, policy_version 541457 (0.0011) [2024-06-15 17:59:52,782][1651669] Updated weights for policy 0, policy_version 541504 (0.0013) [2024-06-15 17:59:55,411][1651669] Updated weights for policy 0, policy_version 541558 (0.0012) [2024-06-15 17:59:55,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 50246.3, 300 sec: 48874.3). Total num frames: 1109131264. Throughput: 0: 12481.8. Samples: 277321728. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 17:59:55,767][1648981] Avg episode reward: [(0, '422.720')] [2024-06-15 17:59:59,092][1651669] Updated weights for policy 0, policy_version 541616 (0.0020) [2024-06-15 18:00:00,366][1651669] Updated weights for policy 0, policy_version 541651 (0.0014) [2024-06-15 18:00:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49151.9, 300 sec: 49096.5). Total num frames: 1109327872. Throughput: 0: 12367.7. Samples: 277391872. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 18:00:00,767][1648981] Avg episode reward: [(0, '439.930')] [2024-06-15 18:00:02,139][1651669] Updated weights for policy 0, policy_version 541716 (0.0012) [2024-06-15 18:00:04,827][1651669] Updated weights for policy 0, policy_version 541776 (0.0011) [2024-06-15 18:00:05,775][1648981] Fps is (10 sec: 49112.3, 60 sec: 50783.4, 300 sec: 48984.1). Total num frames: 1109622784. Throughput: 0: 12592.9. Samples: 277476864. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 18:00:05,775][1648981] Avg episode reward: [(0, '456.920')] [2024-06-15 18:00:05,926][1651669] Updated weights for policy 0, policy_version 541819 (0.0015) [2024-06-15 18:00:09,180][1651669] Updated weights for policy 0, policy_version 541874 (0.0013) [2024-06-15 18:00:10,150][1651669] Updated weights for policy 0, policy_version 541906 (0.0010) [2024-06-15 18:00:10,766][1648981] Fps is (10 sec: 55706.0, 60 sec: 49699.5, 300 sec: 49540.8). Total num frames: 1109884928. Throughput: 0: 12538.3. Samples: 277512192. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 18:00:10,767][1648981] Avg episode reward: [(0, '452.420')] [2024-06-15 18:00:12,441][1651669] Updated weights for policy 0, policy_version 541984 (0.0010) [2024-06-15 18:00:14,767][1651669] Updated weights for policy 0, policy_version 542032 (0.0012) [2024-06-15 18:00:15,766][1648981] Fps is (10 sec: 52471.5, 60 sec: 51336.6, 300 sec: 49318.6). Total num frames: 1110147072. Throughput: 0: 12822.8. Samples: 277595648. Policy #0 lag: (min: 63.0, avg: 180.6, max: 319.0) [2024-06-15 18:00:15,767][1648981] Avg episode reward: [(0, '461.970')] [2024-06-15 18:00:18,383][1651669] Updated weights for policy 0, policy_version 542096 (0.0011) [2024-06-15 18:00:19,273][1651669] Updated weights for policy 0, policy_version 542136 (0.0014) [2024-06-15 18:00:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 1110343680. Throughput: 0: 12811.4. Samples: 277667840. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:20,767][1648981] Avg episode reward: [(0, '464.770')] [2024-06-15 18:00:21,272][1651669] Updated weights for policy 0, policy_version 542181 (0.0051) [2024-06-15 18:00:22,880][1651669] Updated weights for policy 0, policy_version 542256 (0.0011) [2024-06-15 18:00:25,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 51336.5, 300 sec: 49540.8). Total num frames: 1110638592. Throughput: 0: 12686.2. Samples: 277702144. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:25,767][1648981] Avg episode reward: [(0, '452.370')] [2024-06-15 18:00:26,036][1651669] Updated weights for policy 0, policy_version 542329 (0.0011) [2024-06-15 18:00:30,101][1651669] Updated weights for policy 0, policy_version 542384 (0.0010) [2024-06-15 18:00:30,701][1651274] Signal inference workers to stop experience collection... (28450 times) [2024-06-15 18:00:30,769][1648981] Fps is (10 sec: 49142.6, 60 sec: 50242.7, 300 sec: 49318.3). Total num frames: 1110835200. Throughput: 0: 12822.3. Samples: 277781504. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:30,771][1648981] Avg episode reward: [(0, '447.260')] [2024-06-15 18:00:30,827][1651669] InferenceWorker_p0-w0: stopping experience collection (28450 times) [2024-06-15 18:00:30,829][1651669] Updated weights for policy 0, policy_version 542407 (0.0019) [2024-06-15 18:00:30,991][1651274] Signal inference workers to resume experience collection... (28450 times) [2024-06-15 18:00:30,992][1651669] InferenceWorker_p0-w0: resuming experience collection (28450 times) [2024-06-15 18:00:32,151][1651669] Updated weights for policy 0, policy_version 542462 (0.0012) [2024-06-15 18:00:35,767][1648981] Fps is (10 sec: 45871.0, 60 sec: 50243.5, 300 sec: 49429.6). Total num frames: 1111097344. Throughput: 0: 12640.5. Samples: 277849600. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:35,768][1648981] Avg episode reward: [(0, '462.390')] [2024-06-15 18:00:35,834][1651669] Updated weights for policy 0, policy_version 542529 (0.0013) [2024-06-15 18:00:36,972][1651669] Updated weights for policy 0, policy_version 542587 (0.0010) [2024-06-15 18:00:40,766][1648981] Fps is (10 sec: 45883.9, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1111293952. Throughput: 0: 12595.2. Samples: 277888512. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:40,767][1648981] Avg episode reward: [(0, '417.740')] [2024-06-15 18:00:40,944][1651669] Updated weights for policy 0, policy_version 542628 (0.0020) [2024-06-15 18:00:42,417][1651669] Updated weights for policy 0, policy_version 542688 (0.0045) [2024-06-15 18:00:44,122][1651669] Updated weights for policy 0, policy_version 542736 (0.0011) [2024-06-15 18:00:45,018][1651669] Updated weights for policy 0, policy_version 542784 (0.0012) [2024-06-15 18:00:45,768][1648981] Fps is (10 sec: 52426.3, 60 sec: 50243.4, 300 sec: 49540.5). Total num frames: 1111621632. Throughput: 0: 12412.8. Samples: 277950464. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:45,769][1648981] Avg episode reward: [(0, '404.010')] [2024-06-15 18:00:48,375][1651669] Updated weights for policy 0, policy_version 542841 (0.0011) [2024-06-15 18:00:50,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 48877.6). Total num frames: 1111752704. Throughput: 0: 12256.1. Samples: 278028288. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:50,767][1648981] Avg episode reward: [(0, '402.430')] [2024-06-15 18:00:52,361][1651669] Updated weights for policy 0, policy_version 542902 (0.0122) [2024-06-15 18:00:53,645][1651669] Updated weights for policy 0, policy_version 542944 (0.0012) [2024-06-15 18:00:54,527][1651669] Updated weights for policy 0, policy_version 542975 (0.0010) [2024-06-15 18:00:55,786][1648981] Fps is (10 sec: 45790.3, 60 sec: 49135.8, 300 sec: 49426.4). Total num frames: 1112080384. Throughput: 0: 12237.1. Samples: 278063104. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:00:55,787][1648981] Avg episode reward: [(0, '407.390')] [2024-06-15 18:00:55,871][1651669] Updated weights for policy 0, policy_version 543024 (0.0011) [2024-06-15 18:00:56,229][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000543040_1112145920.pth... [2024-06-15 18:00:56,280][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000537216_1100218368.pth [2024-06-15 18:00:57,546][1651669] Updated weights for policy 0, policy_version 543041 (0.0011) [2024-06-15 18:00:58,742][1651669] Updated weights for policy 0, policy_version 543104 (0.0011) [2024-06-15 18:01:00,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 49207.7). Total num frames: 1112276992. Throughput: 0: 11992.2. Samples: 278135296. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:00,769][1648981] Avg episode reward: [(0, '389.960')] [2024-06-15 18:01:03,701][1651669] Updated weights for policy 0, policy_version 543168 (0.0026) [2024-06-15 18:01:05,423][1651669] Updated weights for policy 0, policy_version 543227 (0.0014) [2024-06-15 18:01:05,766][1648981] Fps is (10 sec: 45967.0, 60 sec: 48612.5, 300 sec: 49318.7). Total num frames: 1112539136. Throughput: 0: 11889.8. Samples: 278202880. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:05,767][1648981] Avg episode reward: [(0, '410.220')] [2024-06-15 18:01:06,879][1651669] Updated weights for policy 0, policy_version 543280 (0.0024) [2024-06-15 18:01:08,769][1651669] Updated weights for policy 0, policy_version 543332 (0.0011) [2024-06-15 18:01:09,364][1651669] Updated weights for policy 0, policy_version 543360 (0.0012) [2024-06-15 18:01:10,767][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 49209.5). Total num frames: 1112801280. Throughput: 0: 11958.0. Samples: 278240256. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:10,767][1648981] Avg episode reward: [(0, '393.180')] [2024-06-15 18:01:15,293][1651274] Signal inference workers to stop experience collection... (28500 times) [2024-06-15 18:01:15,321][1651669] Updated weights for policy 0, policy_version 543426 (0.0094) [2024-06-15 18:01:15,354][1651669] InferenceWorker_p0-w0: stopping experience collection (28500 times) [2024-06-15 18:01:15,571][1651274] Signal inference workers to resume experience collection... (28500 times) [2024-06-15 18:01:15,571][1651669] InferenceWorker_p0-w0: resuming experience collection (28500 times) [2024-06-15 18:01:15,808][1648981] Fps is (10 sec: 42422.0, 60 sec: 46935.0, 300 sec: 48978.5). Total num frames: 1112965120. Throughput: 0: 11890.7. Samples: 278317056. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:15,808][1648981] Avg episode reward: [(0, '407.770')] [2024-06-15 18:01:17,377][1651669] Updated weights for policy 0, policy_version 543506 (0.0015) [2024-06-15 18:01:18,179][1651669] Updated weights for policy 0, policy_version 543551 (0.0011) [2024-06-15 18:01:20,607][1651669] Updated weights for policy 0, policy_version 543605 (0.0015) [2024-06-15 18:01:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 49208.5). Total num frames: 1113292800. Throughput: 0: 11719.4. Samples: 278376960. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:20,767][1648981] Avg episode reward: [(0, '395.370')] [2024-06-15 18:01:25,766][1648981] Fps is (10 sec: 36194.7, 60 sec: 44782.8, 300 sec: 48874.3). Total num frames: 1113325568. Throughput: 0: 11719.1. Samples: 278415872. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:25,767][1648981] Avg episode reward: [(0, '383.510')] [2024-06-15 18:01:26,142][1651669] Updated weights for policy 0, policy_version 543648 (0.0011) [2024-06-15 18:01:28,589][1651669] Updated weights for policy 0, policy_version 543744 (0.0116) [2024-06-15 18:01:29,716][1651669] Updated weights for policy 0, policy_version 543803 (0.0011) [2024-06-15 18:01:30,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48061.3, 300 sec: 48985.4). Total num frames: 1113718784. Throughput: 0: 11753.6. Samples: 278479360. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:30,767][1648981] Avg episode reward: [(0, '387.200')] [2024-06-15 18:01:31,896][1651669] Updated weights for policy 0, policy_version 543865 (0.0013) [2024-06-15 18:01:35,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 45875.9, 300 sec: 48874.3). Total num frames: 1113849856. Throughput: 0: 11707.7. Samples: 278555136. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:35,767][1648981] Avg episode reward: [(0, '388.930')] [2024-06-15 18:01:37,939][1651669] Updated weights for policy 0, policy_version 543904 (0.0010) [2024-06-15 18:01:39,620][1651669] Updated weights for policy 0, policy_version 543973 (0.0013) [2024-06-15 18:01:40,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 47513.4, 300 sec: 48985.3). Total num frames: 1114144768. Throughput: 0: 11838.1. Samples: 278595584. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:40,771][1648981] Avg episode reward: [(0, '404.800')] [2024-06-15 18:01:41,621][1651669] Updated weights for policy 0, policy_version 544052 (0.0012) [2024-06-15 18:01:43,281][1651669] Updated weights for policy 0, policy_version 544117 (0.0027) [2024-06-15 18:01:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 45876.2, 300 sec: 48876.3). Total num frames: 1114374144. Throughput: 0: 11628.1. Samples: 278658560. Policy #0 lag: (min: 51.0, avg: 145.0, max: 307.0) [2024-06-15 18:01:45,767][1648981] Avg episode reward: [(0, '412.600')] [2024-06-15 18:01:48,494][1651669] Updated weights for policy 0, policy_version 544145 (0.0036) [2024-06-15 18:01:49,739][1651669] Updated weights for policy 0, policy_version 544208 (0.0042) [2024-06-15 18:01:50,766][1648981] Fps is (10 sec: 49153.3, 60 sec: 48059.8, 300 sec: 48874.4). Total num frames: 1114636288. Throughput: 0: 11719.1. Samples: 278730240. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:01:50,767][1648981] Avg episode reward: [(0, '421.650')] [2024-06-15 18:01:51,100][1651669] Updated weights for policy 0, policy_version 544272 (0.0029) [2024-06-15 18:01:52,104][1651274] Signal inference workers to stop experience collection... (28550 times) [2024-06-15 18:01:52,199][1651274] Signal inference workers to resume experience collection... (28550 times) [2024-06-15 18:01:52,317][1651669] InferenceWorker_p0-w0: stopping experience collection (28550 times) [2024-06-15 18:01:52,318][1651669] InferenceWorker_p0-w0: resuming experience collection (28550 times) [2024-06-15 18:01:52,354][1651669] Updated weights for policy 0, policy_version 544323 (0.0014) [2024-06-15 18:01:53,724][1651669] Updated weights for policy 0, policy_version 544384 (0.0012) [2024-06-15 18:01:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46983.0, 300 sec: 48875.6). Total num frames: 1114898432. Throughput: 0: 11468.8. Samples: 278756352. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:01:55,767][1648981] Avg episode reward: [(0, '434.720')] [2024-06-15 18:02:00,570][1651669] Updated weights for policy 0, policy_version 544433 (0.0011) [2024-06-15 18:02:00,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1115029504. Throughput: 0: 11707.1. Samples: 278843392. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:00,767][1648981] Avg episode reward: [(0, '432.960')] [2024-06-15 18:02:01,372][1651669] Updated weights for policy 0, policy_version 544480 (0.0013) [2024-06-15 18:02:03,190][1651669] Updated weights for policy 0, policy_version 544549 (0.0039) [2024-06-15 18:02:04,909][1651669] Updated weights for policy 0, policy_version 544611 (0.0119) [2024-06-15 18:02:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1115422720. Throughput: 0: 11502.9. Samples: 278894592. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:05,767][1648981] Avg episode reward: [(0, '452.430')] [2024-06-15 18:02:10,178][1651669] Updated weights for policy 0, policy_version 544643 (0.0010) [2024-06-15 18:02:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 44783.0, 300 sec: 48430.0). Total num frames: 1115488256. Throughput: 0: 11764.6. Samples: 278945280. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:10,767][1648981] Avg episode reward: [(0, '453.380')] [2024-06-15 18:02:11,215][1651669] Updated weights for policy 0, policy_version 544701 (0.0016) [2024-06-15 18:02:12,578][1651669] Updated weights for policy 0, policy_version 544739 (0.0011) [2024-06-15 18:02:13,805][1651669] Updated weights for policy 0, policy_version 544800 (0.0012) [2024-06-15 18:02:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48639.5, 300 sec: 48763.2). Total num frames: 1115881472. Throughput: 0: 11878.4. Samples: 279013888. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:15,767][1648981] Avg episode reward: [(0, '463.100')] [2024-06-15 18:02:16,140][1651669] Updated weights for policy 0, policy_version 544886 (0.0015) [2024-06-15 18:02:20,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 44236.8, 300 sec: 48541.1). Total num frames: 1115947008. Throughput: 0: 11992.2. Samples: 279094784. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:20,767][1648981] Avg episode reward: [(0, '452.810')] [2024-06-15 18:02:21,872][1651669] Updated weights for policy 0, policy_version 544951 (0.0017) [2024-06-15 18:02:23,832][1651669] Updated weights for policy 0, policy_version 545008 (0.0013) [2024-06-15 18:02:25,147][1651669] Updated weights for policy 0, policy_version 545072 (0.0086) [2024-06-15 18:02:25,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 50244.4, 300 sec: 48652.4). Total num frames: 1116340224. Throughput: 0: 11787.5. Samples: 279126016. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:25,767][1648981] Avg episode reward: [(0, '461.650')] [2024-06-15 18:02:26,741][1651669] Updated weights for policy 0, policy_version 545137 (0.0012) [2024-06-15 18:02:30,774][1648981] Fps is (10 sec: 52387.6, 60 sec: 45869.2, 300 sec: 48873.0). Total num frames: 1116471296. Throughput: 0: 11978.7. Samples: 279197696. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:30,775][1648981] Avg episode reward: [(0, '454.460')] [2024-06-15 18:02:32,304][1651669] Updated weights for policy 0, policy_version 545169 (0.0013) [2024-06-15 18:02:33,062][1651669] Updated weights for policy 0, policy_version 545214 (0.0012) [2024-06-15 18:02:33,380][1651274] Signal inference workers to stop experience collection... (28600 times) [2024-06-15 18:02:33,430][1651669] InferenceWorker_p0-w0: stopping experience collection (28600 times) [2024-06-15 18:02:33,682][1651274] Signal inference workers to resume experience collection... (28600 times) [2024-06-15 18:02:33,688][1651669] InferenceWorker_p0-w0: resuming experience collection (28600 times) [2024-06-15 18:02:35,102][1651669] Updated weights for policy 0, policy_version 545296 (0.0012) [2024-06-15 18:02:35,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 49151.9, 300 sec: 48652.8). Total num frames: 1116798976. Throughput: 0: 11878.4. Samples: 279264768. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:35,767][1648981] Avg episode reward: [(0, '438.640')] [2024-06-15 18:02:36,576][1651669] Updated weights for policy 0, policy_version 545360 (0.0014) [2024-06-15 18:02:40,766][1648981] Fps is (10 sec: 52470.3, 60 sec: 47513.8, 300 sec: 48874.4). Total num frames: 1116995584. Throughput: 0: 12026.3. Samples: 279297536. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:40,767][1648981] Avg episode reward: [(0, '445.880')] [2024-06-15 18:02:42,261][1651669] Updated weights for policy 0, policy_version 545409 (0.0011) [2024-06-15 18:02:43,509][1651669] Updated weights for policy 0, policy_version 545472 (0.0010) [2024-06-15 18:02:45,460][1651669] Updated weights for policy 0, policy_version 545552 (0.0078) [2024-06-15 18:02:45,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48605.9, 300 sec: 48652.6). Total num frames: 1117290496. Throughput: 0: 12094.6. Samples: 279387648. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:45,767][1648981] Avg episode reward: [(0, '425.940')] [2024-06-15 18:02:46,476][1651669] Updated weights for policy 0, policy_version 545602 (0.0043) [2024-06-15 18:02:47,709][1651669] Updated weights for policy 0, policy_version 545657 (0.0021) [2024-06-15 18:02:50,766][1648981] Fps is (10 sec: 52427.6, 60 sec: 48059.6, 300 sec: 48874.3). Total num frames: 1117519872. Throughput: 0: 12549.6. Samples: 279459328. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:50,767][1648981] Avg episode reward: [(0, '435.540')] [2024-06-15 18:02:54,257][1651669] Updated weights for policy 0, policy_version 545696 (0.0013) [2024-06-15 18:02:55,767][1648981] Fps is (10 sec: 42596.7, 60 sec: 46967.2, 300 sec: 48763.2). Total num frames: 1117716480. Throughput: 0: 12504.1. Samples: 279507968. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:02:55,767][1648981] Avg episode reward: [(0, '430.500')] [2024-06-15 18:02:56,439][1651669] Updated weights for policy 0, policy_version 545792 (0.0013) [2024-06-15 18:02:56,439][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000545792_1117782016.pth... [2024-06-15 18:02:56,595][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000540128_1106182144.pth [2024-06-15 18:02:58,372][1651669] Updated weights for policy 0, policy_version 545872 (0.0097) [2024-06-15 18:02:59,432][1651669] Updated weights for policy 0, policy_version 545919 (0.0011) [2024-06-15 18:03:00,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1118044160. Throughput: 0: 12162.8. Samples: 279561216. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:03:00,767][1648981] Avg episode reward: [(0, '425.600')] [2024-06-15 18:03:05,766][1648981] Fps is (10 sec: 39322.9, 60 sec: 44782.9, 300 sec: 48652.2). Total num frames: 1118109696. Throughput: 0: 12310.7. Samples: 279648768. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:03:05,767][1648981] Avg episode reward: [(0, '434.490')] [2024-06-15 18:03:05,901][1651669] Updated weights for policy 0, policy_version 545968 (0.0147) [2024-06-15 18:03:07,912][1651669] Updated weights for policy 0, policy_version 546064 (0.0012) [2024-06-15 18:03:08,427][1651274] Signal inference workers to stop experience collection... (28650 times) [2024-06-15 18:03:08,485][1651669] InferenceWorker_p0-w0: stopping experience collection (28650 times) [2024-06-15 18:03:08,626][1651274] Signal inference workers to resume experience collection... (28650 times) [2024-06-15 18:03:08,633][1651669] InferenceWorker_p0-w0: resuming experience collection (28650 times) [2024-06-15 18:03:08,994][1651669] Updated weights for policy 0, policy_version 546112 (0.0010) [2024-06-15 18:03:10,503][1651669] Updated weights for policy 0, policy_version 546176 (0.0024) [2024-06-15 18:03:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 51336.5, 300 sec: 48878.9). Total num frames: 1118568448. Throughput: 0: 12174.2. Samples: 279673856. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:03:10,767][1648981] Avg episode reward: [(0, '439.580')] [2024-06-15 18:03:15,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 44782.9, 300 sec: 48432.0). Total num frames: 1118568448. Throughput: 0: 12233.2. Samples: 279748096. Policy #0 lag: (min: 47.0, avg: 110.8, max: 303.0) [2024-06-15 18:03:15,767][1648981] Avg episode reward: [(0, '446.280')] [2024-06-15 18:03:17,717][1651669] Updated weights for policy 0, policy_version 546256 (0.0011) [2024-06-15 18:03:19,354][1651669] Updated weights for policy 0, policy_version 546325 (0.0013) [2024-06-15 18:03:20,768][1648981] Fps is (10 sec: 42590.5, 60 sec: 50788.7, 300 sec: 48541.1). Total num frames: 1118994432. Throughput: 0: 12151.0. Samples: 279811584. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:20,769][1648981] Avg episode reward: [(0, '452.890')] [2024-06-15 18:03:20,812][1651669] Updated weights for policy 0, policy_version 546386 (0.0011) [2024-06-15 18:03:25,798][1648981] Fps is (10 sec: 52262.5, 60 sec: 45850.8, 300 sec: 48313.7). Total num frames: 1119092736. Throughput: 0: 12211.1. Samples: 279847424. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:25,799][1648981] Avg episode reward: [(0, '449.940')] [2024-06-15 18:03:28,856][1651669] Updated weights for policy 0, policy_version 546496 (0.0013) [2024-06-15 18:03:30,510][1651669] Updated weights for policy 0, policy_version 546560 (0.0016) [2024-06-15 18:03:30,766][1648981] Fps is (10 sec: 36051.8, 60 sec: 48066.0, 300 sec: 47985.7). Total num frames: 1119354880. Throughput: 0: 11696.3. Samples: 279913984. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:30,767][1648981] Avg episode reward: [(0, '458.170')] [2024-06-15 18:03:31,980][1651669] Updated weights for policy 0, policy_version 546624 (0.0119) [2024-06-15 18:03:33,241][1651669] Updated weights for policy 0, policy_version 546679 (0.0018) [2024-06-15 18:03:35,766][1648981] Fps is (10 sec: 52596.1, 60 sec: 46967.5, 300 sec: 48318.9). Total num frames: 1119617024. Throughput: 0: 11650.9. Samples: 279983616. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:35,767][1648981] Avg episode reward: [(0, '459.790')] [2024-06-15 18:03:40,379][1651669] Updated weights for policy 0, policy_version 546752 (0.0095) [2024-06-15 18:03:40,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 47763.6). Total num frames: 1119780864. Throughput: 0: 11457.5. Samples: 280023552. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:40,767][1648981] Avg episode reward: [(0, '461.990')] [2024-06-15 18:03:42,256][1651669] Updated weights for policy 0, policy_version 546832 (0.0011) [2024-06-15 18:03:44,167][1651669] Updated weights for policy 0, policy_version 546928 (0.0085) [2024-06-15 18:03:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47513.5, 300 sec: 48431.3). Total num frames: 1120141312. Throughput: 0: 11343.6. Samples: 280071680. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:45,767][1648981] Avg episode reward: [(0, '448.080')] [2024-06-15 18:03:50,207][1651274] Signal inference workers to stop experience collection... (28700 times) [2024-06-15 18:03:50,252][1651669] InferenceWorker_p0-w0: stopping experience collection (28700 times) [2024-06-15 18:03:50,437][1651274] Signal inference workers to resume experience collection... (28700 times) [2024-06-15 18:03:50,438][1651669] InferenceWorker_p0-w0: resuming experience collection (28700 times) [2024-06-15 18:03:50,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 44236.9, 300 sec: 47652.9). Total num frames: 1120174080. Throughput: 0: 11389.2. Samples: 280161280. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:50,767][1648981] Avg episode reward: [(0, '472.470')] [2024-06-15 18:03:51,300][1651669] Updated weights for policy 0, policy_version 546992 (0.0017) [2024-06-15 18:03:52,459][1651669] Updated weights for policy 0, policy_version 547040 (0.0196) [2024-06-15 18:03:53,901][1651669] Updated weights for policy 0, policy_version 547104 (0.0083) [2024-06-15 18:03:55,416][1651669] Updated weights for policy 0, policy_version 547172 (0.0012) [2024-06-15 18:03:55,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1120632832. Throughput: 0: 11389.1. Samples: 280186368. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:03:55,767][1648981] Avg episode reward: [(0, '475.420')] [2024-06-15 18:04:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 43690.6, 300 sec: 47763.5). Total num frames: 1120665600. Throughput: 0: 11411.9. Samples: 280261632. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:00,767][1648981] Avg episode reward: [(0, '467.500')] [2024-06-15 18:04:01,911][1651669] Updated weights for policy 0, policy_version 547232 (0.0012) [2024-06-15 18:04:03,536][1651669] Updated weights for policy 0, policy_version 547300 (0.0011) [2024-06-15 18:04:04,933][1651669] Updated weights for policy 0, policy_version 547360 (0.0013) [2024-06-15 18:04:05,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 49152.0, 300 sec: 47985.9). Total num frames: 1121058816. Throughput: 0: 11435.2. Samples: 280326144. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:05,767][1648981] Avg episode reward: [(0, '445.810')] [2024-06-15 18:04:06,042][1651669] Updated weights for policy 0, policy_version 547417 (0.0082) [2024-06-15 18:04:10,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 43690.5, 300 sec: 47874.6). Total num frames: 1121189888. Throughput: 0: 11545.2. Samples: 280366592. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:10,768][1648981] Avg episode reward: [(0, '440.470')] [2024-06-15 18:04:11,616][1651669] Updated weights for policy 0, policy_version 547457 (0.0016) [2024-06-15 18:04:13,367][1651669] Updated weights for policy 0, policy_version 547524 (0.0020) [2024-06-15 18:04:14,414][1651669] Updated weights for policy 0, policy_version 547573 (0.0012) [2024-06-15 18:04:15,695][1651669] Updated weights for policy 0, policy_version 547633 (0.0011) [2024-06-15 18:04:15,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1121550336. Throughput: 0: 11685.0. Samples: 280439808. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:15,767][1648981] Avg episode reward: [(0, '450.860')] [2024-06-15 18:04:16,949][1651669] Updated weights for policy 0, policy_version 547707 (0.0013) [2024-06-15 18:04:20,814][1648981] Fps is (10 sec: 52180.7, 60 sec: 45294.4, 300 sec: 47977.9). Total num frames: 1121714176. Throughput: 0: 11968.1. Samples: 280522752. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:20,815][1648981] Avg episode reward: [(0, '450.730')] [2024-06-15 18:04:22,897][1651669] Updated weights for policy 0, policy_version 547760 (0.0051) [2024-06-15 18:04:23,835][1651274] Signal inference workers to stop experience collection... (28750 times) [2024-06-15 18:04:23,935][1651669] InferenceWorker_p0-w0: stopping experience collection (28750 times) [2024-06-15 18:04:23,936][1651669] Updated weights for policy 0, policy_version 547800 (0.0014) [2024-06-15 18:04:24,140][1651274] Signal inference workers to resume experience collection... (28750 times) [2024-06-15 18:04:24,140][1651669] InferenceWorker_p0-w0: resuming experience collection (28750 times) [2024-06-15 18:04:25,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49178.1, 300 sec: 48207.8). Total num frames: 1122041856. Throughput: 0: 11912.5. Samples: 280559616. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:25,767][1648981] Avg episode reward: [(0, '467.820')] [2024-06-15 18:04:25,898][1651669] Updated weights for policy 0, policy_version 547888 (0.0179) [2024-06-15 18:04:27,074][1651669] Updated weights for policy 0, policy_version 547940 (0.0014) [2024-06-15 18:04:30,766][1648981] Fps is (10 sec: 52680.8, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1122238464. Throughput: 0: 12208.4. Samples: 280621056. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:30,767][1648981] Avg episode reward: [(0, '457.680')] [2024-06-15 18:04:33,784][1651669] Updated weights for policy 0, policy_version 548016 (0.0051) [2024-06-15 18:04:34,760][1651669] Updated weights for policy 0, policy_version 548048 (0.0011) [2024-06-15 18:04:35,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 47513.5, 300 sec: 47874.6). Total num frames: 1122467840. Throughput: 0: 11992.1. Samples: 280700928. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:35,768][1648981] Avg episode reward: [(0, '473.950')] [2024-06-15 18:04:36,653][1651669] Updated weights for policy 0, policy_version 548115 (0.0072) [2024-06-15 18:04:38,192][1651669] Updated weights for policy 0, policy_version 548183 (0.0015) [2024-06-15 18:04:40,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49698.0, 300 sec: 47985.7). Total num frames: 1122762752. Throughput: 0: 11992.2. Samples: 280726016. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:40,767][1648981] Avg episode reward: [(0, '473.490')] [2024-06-15 18:04:43,653][1651669] Updated weights for policy 0, policy_version 548227 (0.0013) [2024-06-15 18:04:44,977][1651669] Updated weights for policy 0, policy_version 548289 (0.0011) [2024-06-15 18:04:45,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1122959360. Throughput: 0: 12208.4. Samples: 280811008. Policy #0 lag: (min: 127.0, avg: 181.6, max: 366.0) [2024-06-15 18:04:45,767][1648981] Avg episode reward: [(0, '458.830')] [2024-06-15 18:04:46,329][1651669] Updated weights for policy 0, policy_version 548351 (0.0012) [2024-06-15 18:04:48,345][1651669] Updated weights for policy 0, policy_version 548416 (0.0012) [2024-06-15 18:04:49,850][1651669] Updated weights for policy 0, policy_version 548471 (0.0013) [2024-06-15 18:04:50,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 51882.6, 300 sec: 47985.7). Total num frames: 1123287040. Throughput: 0: 12094.6. Samples: 280870400. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:04:50,767][1648981] Avg episode reward: [(0, '440.040')] [2024-06-15 18:04:55,242][1651669] Updated weights for policy 0, policy_version 548512 (0.0115) [2024-06-15 18:04:55,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 45875.4, 300 sec: 47652.4). Total num frames: 1123385344. Throughput: 0: 12185.7. Samples: 280914944. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:04:55,767][1648981] Avg episode reward: [(0, '436.750')] [2024-06-15 18:04:55,918][1651669] Updated weights for policy 0, policy_version 548543 (0.0012) [2024-06-15 18:04:55,925][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000548544_1123418112.pth... [2024-06-15 18:04:55,968][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000543040_1112145920.pth [2024-06-15 18:04:57,676][1651669] Updated weights for policy 0, policy_version 548596 (0.0011) [2024-06-15 18:04:58,447][1651669] Updated weights for policy 0, policy_version 548624 (0.0047) [2024-06-15 18:05:00,255][1651274] Signal inference workers to stop experience collection... (28800 times) [2024-06-15 18:05:00,304][1651669] InferenceWorker_p0-w0: stopping experience collection (28800 times) [2024-06-15 18:05:00,308][1651669] Updated weights for policy 0, policy_version 548694 (0.0286) [2024-06-15 18:05:00,430][1651274] Signal inference workers to resume experience collection... (28800 times) [2024-06-15 18:05:00,431][1651669] InferenceWorker_p0-w0: resuming experience collection (28800 times) [2024-06-15 18:05:00,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 51336.3, 300 sec: 47875.9). Total num frames: 1123745792. Throughput: 0: 12037.6. Samples: 280981504. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:00,768][1648981] Avg episode reward: [(0, '440.110')] [2024-06-15 18:05:05,662][1651669] Updated weights for policy 0, policy_version 548742 (0.0012) [2024-06-15 18:05:05,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 45875.2, 300 sec: 47208.1). Total num frames: 1123811328. Throughput: 0: 11913.8. Samples: 281058304. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:05,767][1648981] Avg episode reward: [(0, '430.780')] [2024-06-15 18:05:07,030][1651669] Updated weights for policy 0, policy_version 548799 (0.0013) [2024-06-15 18:05:08,954][1651669] Updated weights for policy 0, policy_version 548854 (0.0013) [2024-06-15 18:05:09,823][1651669] Updated weights for policy 0, policy_version 548883 (0.0009) [2024-06-15 18:05:10,798][1648981] Fps is (10 sec: 42463.8, 60 sec: 49671.9, 300 sec: 47536.2). Total num frames: 1124171776. Throughput: 0: 11824.5. Samples: 281092096. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:10,799][1648981] Avg episode reward: [(0, '409.830')] [2024-06-15 18:05:11,297][1651669] Updated weights for policy 0, policy_version 548944 (0.0011) [2024-06-15 18:05:12,213][1651669] Updated weights for policy 0, policy_version 548986 (0.0013) [2024-06-15 18:05:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46421.3, 300 sec: 47430.3). Total num frames: 1124335616. Throughput: 0: 12128.7. Samples: 281166848. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:15,767][1648981] Avg episode reward: [(0, '421.690')] [2024-06-15 18:05:16,694][1651669] Updated weights for policy 0, policy_version 549054 (0.0013) [2024-06-15 18:05:19,755][1651669] Updated weights for policy 0, policy_version 549107 (0.0012) [2024-06-15 18:05:20,766][1648981] Fps is (10 sec: 46022.8, 60 sec: 48644.7, 300 sec: 47430.3). Total num frames: 1124630528. Throughput: 0: 11810.2. Samples: 281232384. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:20,767][1648981] Avg episode reward: [(0, '414.400')] [2024-06-15 18:05:21,700][1651669] Updated weights for policy 0, policy_version 549184 (0.0011) [2024-06-15 18:05:23,031][1651669] Updated weights for policy 0, policy_version 549238 (0.0129) [2024-06-15 18:05:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 46967.4, 300 sec: 47541.7). Total num frames: 1124859904. Throughput: 0: 11844.3. Samples: 281259008. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:25,767][1648981] Avg episode reward: [(0, '417.840')] [2024-06-15 18:05:28,308][1651669] Updated weights for policy 0, policy_version 549300 (0.0031) [2024-06-15 18:05:30,779][1648981] Fps is (10 sec: 42543.7, 60 sec: 46957.4, 300 sec: 47317.3). Total num frames: 1125056512. Throughput: 0: 11772.6. Samples: 281340928. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:30,780][1648981] Avg episode reward: [(0, '394.770')] [2024-06-15 18:05:31,114][1651669] Updated weights for policy 0, policy_version 549370 (0.0013) [2024-06-15 18:05:32,801][1651669] Updated weights for policy 0, policy_version 549424 (0.0015) [2024-06-15 18:05:34,517][1651669] Updated weights for policy 0, policy_version 549488 (0.0011) [2024-06-15 18:05:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48606.0, 300 sec: 47763.5). Total num frames: 1125384192. Throughput: 0: 11810.1. Samples: 281401856. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:35,767][1648981] Avg episode reward: [(0, '399.540')] [2024-06-15 18:05:40,253][1651669] Updated weights for policy 0, policy_version 549561 (0.0014) [2024-06-15 18:05:40,766][1648981] Fps is (10 sec: 45933.9, 60 sec: 45875.3, 300 sec: 47097.3). Total num frames: 1125515264. Throughput: 0: 11798.8. Samples: 281445888. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:40,767][1648981] Avg episode reward: [(0, '396.730')] [2024-06-15 18:05:41,931][1651669] Updated weights for policy 0, policy_version 549606 (0.0012) [2024-06-15 18:05:42,529][1651669] Updated weights for policy 0, policy_version 549632 (0.0013) [2024-06-15 18:05:42,731][1651274] Signal inference workers to stop experience collection... (28850 times) [2024-06-15 18:05:42,798][1651669] InferenceWorker_p0-w0: stopping experience collection (28850 times) [2024-06-15 18:05:42,988][1651274] Signal inference workers to resume experience collection... (28850 times) [2024-06-15 18:05:42,989][1651669] InferenceWorker_p0-w0: resuming experience collection (28850 times) [2024-06-15 18:05:44,457][1651669] Updated weights for policy 0, policy_version 549697 (0.0011) [2024-06-15 18:05:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.8, 300 sec: 47874.6). Total num frames: 1125875712. Throughput: 0: 11753.3. Samples: 281510400. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:45,767][1648981] Avg episode reward: [(0, '395.810')] [2024-06-15 18:05:49,754][1651669] Updated weights for policy 0, policy_version 549762 (0.0011) [2024-06-15 18:05:50,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 44782.9, 300 sec: 47100.2). Total num frames: 1125974016. Throughput: 0: 11696.3. Samples: 281584640. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:50,767][1648981] Avg episode reward: [(0, '386.510')] [2024-06-15 18:05:51,058][1651669] Updated weights for policy 0, policy_version 549813 (0.0012) [2024-06-15 18:05:52,773][1651669] Updated weights for policy 0, policy_version 549843 (0.0011) [2024-06-15 18:05:54,214][1651669] Updated weights for policy 0, policy_version 549905 (0.0027) [2024-06-15 18:05:55,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1126301696. Throughput: 0: 11829.9. Samples: 281624064. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:05:55,767][1648981] Avg episode reward: [(0, '395.470')] [2024-06-15 18:05:56,351][1651669] Updated weights for policy 0, policy_version 549985 (0.0011) [2024-06-15 18:06:00,773][1648981] Fps is (10 sec: 49117.8, 60 sec: 45324.0, 300 sec: 47207.0). Total num frames: 1126465536. Throughput: 0: 11614.9. Samples: 281689600. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:06:00,774][1648981] Avg episode reward: [(0, '397.240')] [2024-06-15 18:06:00,893][1651669] Updated weights for policy 0, policy_version 550033 (0.0012) [2024-06-15 18:06:03,896][1651669] Updated weights for policy 0, policy_version 550096 (0.0011) [2024-06-15 18:06:05,401][1651669] Updated weights for policy 0, policy_version 550160 (0.0011) [2024-06-15 18:06:05,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 1126727680. Throughput: 0: 11787.3. Samples: 281762816. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:06:05,767][1648981] Avg episode reward: [(0, '422.160')] [2024-06-15 18:06:06,520][1651669] Updated weights for policy 0, policy_version 550202 (0.0014) [2024-06-15 18:06:08,274][1651669] Updated weights for policy 0, policy_version 550265 (0.0012) [2024-06-15 18:06:10,766][1648981] Fps is (10 sec: 55745.0, 60 sec: 47539.0, 300 sec: 47659.2). Total num frames: 1127022592. Throughput: 0: 11832.9. Samples: 281791488. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:06:10,767][1648981] Avg episode reward: [(0, '444.820')] [2024-06-15 18:06:11,056][1651669] Updated weights for policy 0, policy_version 550321 (0.0089) [2024-06-15 18:06:15,175][1651669] Updated weights for policy 0, policy_version 550368 (0.0011) [2024-06-15 18:06:15,770][1648981] Fps is (10 sec: 45858.2, 60 sec: 47510.6, 300 sec: 47096.4). Total num frames: 1127186432. Throughput: 0: 11983.2. Samples: 281880064. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:06:15,771][1648981] Avg episode reward: [(0, '460.050')] [2024-06-15 18:06:16,776][1651669] Updated weights for policy 0, policy_version 550436 (0.0017) [2024-06-15 18:06:18,586][1651669] Updated weights for policy 0, policy_version 550498 (0.0035) [2024-06-15 18:06:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1127481344. Throughput: 0: 12026.3. Samples: 281943040. Policy #0 lag: (min: 100.0, avg: 217.5, max: 356.0) [2024-06-15 18:06:20,767][1648981] Avg episode reward: [(0, '473.790')] [2024-06-15 18:06:20,992][1651669] Updated weights for policy 0, policy_version 550544 (0.0011) [2024-06-15 18:06:21,072][1651274] Signal inference workers to stop experience collection... (28900 times) [2024-06-15 18:06:21,110][1651669] InferenceWorker_p0-w0: stopping experience collection (28900 times) [2024-06-15 18:06:21,275][1651274] Signal inference workers to resume experience collection... (28900 times) [2024-06-15 18:06:21,276][1651669] InferenceWorker_p0-w0: resuming experience collection (28900 times) [2024-06-15 18:06:25,766][1648981] Fps is (10 sec: 45892.2, 60 sec: 46421.4, 300 sec: 47208.1). Total num frames: 1127645184. Throughput: 0: 11946.6. Samples: 281983488. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:25,767][1648981] Avg episode reward: [(0, '486.880')] [2024-06-15 18:06:25,850][1651669] Updated weights for policy 0, policy_version 550609 (0.0014) [2024-06-15 18:06:27,432][1651669] Updated weights for policy 0, policy_version 550676 (0.0025) [2024-06-15 18:06:28,786][1651669] Updated weights for policy 0, policy_version 550721 (0.0015) [2024-06-15 18:06:29,863][1651669] Updated weights for policy 0, policy_version 550768 (0.0024) [2024-06-15 18:06:30,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49162.3, 300 sec: 47985.6). Total num frames: 1128005632. Throughput: 0: 11980.8. Samples: 282049536. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:30,768][1648981] Avg episode reward: [(0, '496.630')] [2024-06-15 18:06:32,321][1651669] Updated weights for policy 0, policy_version 550807 (0.0011) [2024-06-15 18:06:35,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 1128136704. Throughput: 0: 12128.7. Samples: 282130432. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:35,767][1648981] Avg episode reward: [(0, '514.920')] [2024-06-15 18:06:36,426][1651669] Updated weights for policy 0, policy_version 550864 (0.0013) [2024-06-15 18:06:37,931][1651669] Updated weights for policy 0, policy_version 550916 (0.0014) [2024-06-15 18:06:39,104][1651669] Updated weights for policy 0, policy_version 550972 (0.0012) [2024-06-15 18:06:40,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1128464384. Throughput: 0: 11867.0. Samples: 282158080. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:40,767][1648981] Avg episode reward: [(0, '514.270')] [2024-06-15 18:06:41,294][1651669] Updated weights for policy 0, policy_version 551032 (0.0013) [2024-06-15 18:06:43,162][1651669] Updated weights for policy 0, policy_version 551073 (0.0012) [2024-06-15 18:06:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46421.4, 300 sec: 47541.4). Total num frames: 1128660992. Throughput: 0: 12039.6. Samples: 282231296. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:45,767][1648981] Avg episode reward: [(0, '529.660')] [2024-06-15 18:06:47,159][1651669] Updated weights for policy 0, policy_version 551112 (0.0011) [2024-06-15 18:06:48,949][1651669] Updated weights for policy 0, policy_version 551172 (0.0108) [2024-06-15 18:06:50,177][1651669] Updated weights for policy 0, policy_version 551230 (0.0011) [2024-06-15 18:06:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 1128923136. Throughput: 0: 12014.9. Samples: 282303488. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:50,767][1648981] Avg episode reward: [(0, '503.890')] [2024-06-15 18:06:51,926][1651669] Updated weights for policy 0, policy_version 551280 (0.0017) [2024-06-15 18:06:53,106][1651669] Updated weights for policy 0, policy_version 551299 (0.0022) [2024-06-15 18:06:55,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1129185280. Throughput: 0: 12253.8. Samples: 282342912. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:06:55,767][1648981] Avg episode reward: [(0, '488.720')] [2024-06-15 18:06:55,804][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000551360_1129185280.pth... [2024-06-15 18:06:55,868][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000545792_1117782016.pth [2024-06-15 18:06:58,317][1651669] Updated weights for policy 0, policy_version 551365 (0.0015) [2024-06-15 18:07:00,334][1651669] Updated weights for policy 0, policy_version 551440 (0.0013) [2024-06-15 18:07:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48611.5, 300 sec: 47319.2). Total num frames: 1129381888. Throughput: 0: 11981.8. Samples: 282419200. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:00,767][1648981] Avg episode reward: [(0, '491.760')] [2024-06-15 18:07:01,801][1651669] Updated weights for policy 0, policy_version 551490 (0.0012) [2024-06-15 18:07:02,171][1651274] Signal inference workers to stop experience collection... (28950 times) [2024-06-15 18:07:02,274][1651669] InferenceWorker_p0-w0: stopping experience collection (28950 times) [2024-06-15 18:07:02,460][1651274] Signal inference workers to resume experience collection... (28950 times) [2024-06-15 18:07:02,461][1651669] InferenceWorker_p0-w0: resuming experience collection (28950 times) [2024-06-15 18:07:03,142][1651669] Updated weights for policy 0, policy_version 551552 (0.0102) [2024-06-15 18:07:05,420][1651669] Updated weights for policy 0, policy_version 551609 (0.0016) [2024-06-15 18:07:05,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 49698.3, 300 sec: 48207.9). Total num frames: 1129709568. Throughput: 0: 11935.3. Samples: 282480128. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:05,767][1648981] Avg episode reward: [(0, '490.360')] [2024-06-15 18:07:09,998][1651669] Updated weights for policy 0, policy_version 551651 (0.0012) [2024-06-15 18:07:10,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1129840640. Throughput: 0: 11958.1. Samples: 282521600. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:10,767][1648981] Avg episode reward: [(0, '470.920')] [2024-06-15 18:07:12,030][1651669] Updated weights for policy 0, policy_version 551728 (0.0126) [2024-06-15 18:07:14,260][1651669] Updated weights for policy 0, policy_version 551800 (0.0013) [2024-06-15 18:07:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 49155.2, 300 sec: 48096.8). Total num frames: 1130135552. Throughput: 0: 11889.8. Samples: 282584576. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:15,767][1648981] Avg episode reward: [(0, '448.830')] [2024-06-15 18:07:16,233][1651669] Updated weights for policy 0, policy_version 551856 (0.0015) [2024-06-15 18:07:20,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1130233856. Throughput: 0: 11832.9. Samples: 282662912. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:20,767][1648981] Avg episode reward: [(0, '422.330')] [2024-06-15 18:07:21,050][1651669] Updated weights for policy 0, policy_version 551889 (0.0011) [2024-06-15 18:07:22,849][1651669] Updated weights for policy 0, policy_version 551955 (0.0112) [2024-06-15 18:07:24,851][1651669] Updated weights for policy 0, policy_version 552001 (0.0011) [2024-06-15 18:07:25,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.1, 300 sec: 47875.9). Total num frames: 1130594304. Throughput: 0: 11730.5. Samples: 282685952. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:25,767][1648981] Avg episode reward: [(0, '407.710')] [2024-06-15 18:07:26,096][1651669] Updated weights for policy 0, policy_version 552064 (0.0012) [2024-06-15 18:07:27,888][1651669] Updated weights for policy 0, policy_version 552128 (0.0011) [2024-06-15 18:07:30,778][1648981] Fps is (10 sec: 52366.9, 60 sec: 45866.3, 300 sec: 47317.3). Total num frames: 1130758144. Throughput: 0: 11613.7. Samples: 282754048. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:30,779][1648981] Avg episode reward: [(0, '400.650')] [2024-06-15 18:07:35,233][1651669] Updated weights for policy 0, policy_version 552215 (0.0012) [2024-06-15 18:07:35,766][1648981] Fps is (10 sec: 36044.8, 60 sec: 46967.5, 300 sec: 47319.2). Total num frames: 1130954752. Throughput: 0: 11537.1. Samples: 282822656. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:35,767][1648981] Avg episode reward: [(0, '405.040')] [2024-06-15 18:07:36,551][1651669] Updated weights for policy 0, policy_version 552259 (0.0012) [2024-06-15 18:07:37,707][1651669] Updated weights for policy 0, policy_version 552313 (0.0015) [2024-06-15 18:07:39,067][1651669] Updated weights for policy 0, policy_version 552382 (0.0019) [2024-06-15 18:07:40,766][1648981] Fps is (10 sec: 52490.9, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1131282432. Throughput: 0: 11400.6. Samples: 282855936. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:40,767][1648981] Avg episode reward: [(0, '396.310')] [2024-06-15 18:07:44,544][1651669] Updated weights for policy 0, policy_version 552421 (0.0013) [2024-06-15 18:07:45,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46421.4, 300 sec: 47208.2). Total num frames: 1131446272. Throughput: 0: 11548.5. Samples: 282938880. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:45,767][1648981] Avg episode reward: [(0, '398.160')] [2024-06-15 18:07:45,952][1651274] Signal inference workers to stop experience collection... (29000 times) [2024-06-15 18:07:45,980][1651669] InferenceWorker_p0-w0: stopping experience collection (29000 times) [2024-06-15 18:07:46,171][1651274] Signal inference workers to resume experience collection... (29000 times) [2024-06-15 18:07:46,177][1651669] InferenceWorker_p0-w0: resuming experience collection (29000 times) [2024-06-15 18:07:46,315][1651669] Updated weights for policy 0, policy_version 552501 (0.0016) [2024-06-15 18:07:47,714][1651669] Updated weights for policy 0, policy_version 552548 (0.0012) [2024-06-15 18:07:49,825][1651669] Updated weights for policy 0, policy_version 552610 (0.0024) [2024-06-15 18:07:50,790][1648981] Fps is (10 sec: 52304.0, 60 sec: 48040.7, 300 sec: 47759.7). Total num frames: 1131806720. Throughput: 0: 11599.2. Samples: 283002368. Policy #0 lag: (min: 15.0, avg: 147.8, max: 271.0) [2024-06-15 18:07:50,791][1648981] Avg episode reward: [(0, '413.100')] [2024-06-15 18:07:55,176][1651669] Updated weights for policy 0, policy_version 552674 (0.0012) [2024-06-15 18:07:55,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 45329.2, 300 sec: 46986.0). Total num frames: 1131905024. Throughput: 0: 11673.6. Samples: 283046912. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:07:55,767][1648981] Avg episode reward: [(0, '428.430')] [2024-06-15 18:07:57,101][1651669] Updated weights for policy 0, policy_version 552752 (0.0012) [2024-06-15 18:07:58,672][1651669] Updated weights for policy 0, policy_version 552816 (0.0014) [2024-06-15 18:08:00,493][1651669] Updated weights for policy 0, policy_version 552864 (0.0012) [2024-06-15 18:08:00,766][1648981] Fps is (10 sec: 45984.6, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1132265472. Throughput: 0: 11685.0. Samples: 283110400. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:00,767][1648981] Avg episode reward: [(0, '421.560')] [2024-06-15 18:08:05,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1132331008. Throughput: 0: 11764.6. Samples: 283192320. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:05,767][1648981] Avg episode reward: [(0, '419.680')] [2024-06-15 18:08:06,243][1651669] Updated weights for policy 0, policy_version 552928 (0.0013) [2024-06-15 18:08:08,238][1651669] Updated weights for policy 0, policy_version 553008 (0.0013) [2024-06-15 18:08:09,463][1651669] Updated weights for policy 0, policy_version 553042 (0.0014) [2024-06-15 18:08:10,406][1651669] Updated weights for policy 0, policy_version 553088 (0.0011) [2024-06-15 18:08:10,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 48059.5, 300 sec: 47985.6). Total num frames: 1132724224. Throughput: 0: 11775.9. Samples: 283215872. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:10,768][1648981] Avg episode reward: [(0, '418.600')] [2024-06-15 18:08:12,745][1651669] Updated weights for policy 0, policy_version 553151 (0.0024) [2024-06-15 18:08:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45329.0, 300 sec: 46986.3). Total num frames: 1132855296. Throughput: 0: 12006.7. Samples: 283294208. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:15,767][1648981] Avg episode reward: [(0, '435.020')] [2024-06-15 18:08:17,953][1651669] Updated weights for policy 0, policy_version 553207 (0.0175) [2024-06-15 18:08:19,084][1651669] Updated weights for policy 0, policy_version 553251 (0.0011) [2024-06-15 18:08:20,568][1651669] Updated weights for policy 0, policy_version 553316 (0.0013) [2024-06-15 18:08:20,774][1648981] Fps is (10 sec: 49113.6, 60 sec: 49691.5, 300 sec: 47878.5). Total num frames: 1133215744. Throughput: 0: 11762.5. Samples: 283352064. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:20,775][1648981] Avg episode reward: [(0, '439.170')] [2024-06-15 18:08:23,166][1651669] Updated weights for policy 0, policy_version 553376 (0.0114) [2024-06-15 18:08:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1133379584. Throughput: 0: 11901.1. Samples: 283391488. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:25,767][1648981] Avg episode reward: [(0, '448.380')] [2024-06-15 18:08:27,601][1651669] Updated weights for policy 0, policy_version 553427 (0.0013) [2024-06-15 18:08:27,920][1651274] Signal inference workers to stop experience collection... (29050 times) [2024-06-15 18:08:27,996][1651669] InferenceWorker_p0-w0: stopping experience collection (29050 times) [2024-06-15 18:08:28,182][1651274] Signal inference workers to resume experience collection... (29050 times) [2024-06-15 18:08:28,183][1651669] InferenceWorker_p0-w0: resuming experience collection (29050 times) [2024-06-15 18:08:29,272][1651669] Updated weights for policy 0, policy_version 553488 (0.0011) [2024-06-15 18:08:30,766][1648981] Fps is (10 sec: 42632.7, 60 sec: 48069.2, 300 sec: 47541.4). Total num frames: 1133641728. Throughput: 0: 11867.0. Samples: 283472896. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:30,767][1648981] Avg episode reward: [(0, '444.430')] [2024-06-15 18:08:31,369][1651669] Updated weights for policy 0, policy_version 553574 (0.0012) [2024-06-15 18:08:32,887][1651669] Updated weights for policy 0, policy_version 553617 (0.0014) [2024-06-15 18:08:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1133903872. Throughput: 0: 12055.4. Samples: 283544576. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:35,767][1648981] Avg episode reward: [(0, '449.780')] [2024-06-15 18:08:37,887][1651669] Updated weights for policy 0, policy_version 553665 (0.0016) [2024-06-15 18:08:40,233][1651669] Updated weights for policy 0, policy_version 553733 (0.0012) [2024-06-15 18:08:40,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1134067712. Throughput: 0: 12003.6. Samples: 283587072. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:40,767][1648981] Avg episode reward: [(0, '437.040')] [2024-06-15 18:08:42,123][1651669] Updated weights for policy 0, policy_version 553808 (0.0012) [2024-06-15 18:08:44,066][1651669] Updated weights for policy 0, policy_version 553863 (0.0016) [2024-06-15 18:08:45,076][1651669] Updated weights for policy 0, policy_version 553914 (0.0011) [2024-06-15 18:08:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1134428160. Throughput: 0: 11946.7. Samples: 283648000. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:45,767][1648981] Avg episode reward: [(0, '439.050')] [2024-06-15 18:08:49,627][1651669] Updated weights for policy 0, policy_version 553968 (0.0029) [2024-06-15 18:08:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45893.4, 300 sec: 47208.2). Total num frames: 1134559232. Throughput: 0: 12014.9. Samples: 283732992. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:50,767][1648981] Avg episode reward: [(0, '450.560')] [2024-06-15 18:08:51,800][1651669] Updated weights for policy 0, policy_version 554020 (0.0014) [2024-06-15 18:08:53,509][1651669] Updated weights for policy 0, policy_version 554087 (0.0134) [2024-06-15 18:08:54,801][1651669] Updated weights for policy 0, policy_version 554130 (0.0012) [2024-06-15 18:08:55,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 50790.2, 300 sec: 48430.0). Total num frames: 1134952448. Throughput: 0: 12060.5. Samples: 283758592. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:08:55,767][1648981] Avg episode reward: [(0, '446.490')] [2024-06-15 18:08:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000554176_1134952448.pth... [2024-06-15 18:08:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000548544_1123418112.pth [2024-06-15 18:08:59,403][1651669] Updated weights for policy 0, policy_version 554195 (0.0012) [2024-06-15 18:09:00,280][1651669] Updated weights for policy 0, policy_version 554239 (0.0012) [2024-06-15 18:09:00,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 46967.3, 300 sec: 47541.3). Total num frames: 1135083520. Throughput: 0: 12276.6. Samples: 283846656. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:09:00,768][1648981] Avg episode reward: [(0, '444.560')] [2024-06-15 18:09:02,475][1651669] Updated weights for policy 0, policy_version 554289 (0.0011) [2024-06-15 18:09:03,923][1651669] Updated weights for policy 0, policy_version 554352 (0.0110) [2024-06-15 18:09:05,111][1651669] Updated weights for policy 0, policy_version 554387 (0.0012) [2024-06-15 18:09:05,766][1648981] Fps is (10 sec: 49153.2, 60 sec: 51882.7, 300 sec: 48319.0). Total num frames: 1135443968. Throughput: 0: 12392.6. Samples: 283909632. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:09:05,767][1648981] Avg episode reward: [(0, '437.700')] [2024-06-15 18:09:06,163][1651669] Updated weights for policy 0, policy_version 554432 (0.0011) [2024-06-15 18:09:10,192][1651274] Signal inference workers to stop experience collection... (29100 times) [2024-06-15 18:09:10,244][1651669] InferenceWorker_p0-w0: stopping experience collection (29100 times) [2024-06-15 18:09:10,371][1651274] Signal inference workers to resume experience collection... (29100 times) [2024-06-15 18:09:10,372][1651669] InferenceWorker_p0-w0: resuming experience collection (29100 times) [2024-06-15 18:09:10,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 47513.8, 300 sec: 47541.4). Total num frames: 1135575040. Throughput: 0: 12447.3. Samples: 283951616. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:09:10,767][1648981] Avg episode reward: [(0, '467.670')] [2024-06-15 18:09:11,564][1651669] Updated weights for policy 0, policy_version 554497 (0.0014) [2024-06-15 18:09:13,226][1651669] Updated weights for policy 0, policy_version 554563 (0.0052) [2024-06-15 18:09:15,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 50244.2, 300 sec: 47993.5). Total num frames: 1135869952. Throughput: 0: 12117.3. Samples: 284018176. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:09:15,767][1648981] Avg episode reward: [(0, '452.420')] [2024-06-15 18:09:16,154][1651669] Updated weights for policy 0, policy_version 554640 (0.0012) [2024-06-15 18:09:17,277][1651669] Updated weights for policy 0, policy_version 554686 (0.0010) [2024-06-15 18:09:20,790][1648981] Fps is (10 sec: 42497.4, 60 sec: 46409.2, 300 sec: 47315.4). Total num frames: 1136001024. Throughput: 0: 12304.3. Samples: 284098560. Policy #0 lag: (min: 0.0, avg: 67.5, max: 256.0) [2024-06-15 18:09:20,791][1648981] Avg episode reward: [(0, '456.540')] [2024-06-15 18:09:23,226][1651669] Updated weights for policy 0, policy_version 554768 (0.0137) [2024-06-15 18:09:25,232][1651669] Updated weights for policy 0, policy_version 554841 (0.0017) [2024-06-15 18:09:25,775][1648981] Fps is (10 sec: 49109.4, 60 sec: 49690.9, 300 sec: 47873.2). Total num frames: 1136361472. Throughput: 0: 12115.0. Samples: 284132352. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:25,776][1648981] Avg episode reward: [(0, '449.390')] [2024-06-15 18:09:26,887][1651669] Updated weights for policy 0, policy_version 554896 (0.0012) [2024-06-15 18:09:27,998][1651669] Updated weights for policy 0, policy_version 554944 (0.0013) [2024-06-15 18:09:30,766][1648981] Fps is (10 sec: 52553.9, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1136525312. Throughput: 0: 12208.4. Samples: 284197376. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:30,767][1648981] Avg episode reward: [(0, '452.530')] [2024-06-15 18:09:34,479][1651669] Updated weights for policy 0, policy_version 555003 (0.0012) [2024-06-15 18:09:35,766][1648981] Fps is (10 sec: 39355.8, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1136754688. Throughput: 0: 11867.0. Samples: 284267008. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:35,767][1648981] Avg episode reward: [(0, '451.880')] [2024-06-15 18:09:35,956][1651669] Updated weights for policy 0, policy_version 555060 (0.0105) [2024-06-15 18:09:37,523][1651669] Updated weights for policy 0, policy_version 555130 (0.0013) [2024-06-15 18:09:39,257][1651669] Updated weights for policy 0, policy_version 555169 (0.0013) [2024-06-15 18:09:40,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 49695.0, 300 sec: 47762.9). Total num frames: 1137049600. Throughput: 0: 11968.5. Samples: 284297216. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:40,771][1648981] Avg episode reward: [(0, '450.320')] [2024-06-15 18:09:44,171][1651669] Updated weights for policy 0, policy_version 555202 (0.0013) [2024-06-15 18:09:45,748][1651669] Updated weights for policy 0, policy_version 555265 (0.0014) [2024-06-15 18:09:45,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1137180672. Throughput: 0: 11730.6. Samples: 284374528. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:45,767][1648981] Avg episode reward: [(0, '451.630')] [2024-06-15 18:09:47,194][1651669] Updated weights for policy 0, policy_version 555328 (0.0012) [2024-06-15 18:09:48,649][1651669] Updated weights for policy 0, policy_version 555388 (0.0013) [2024-06-15 18:09:50,089][1651274] Signal inference workers to stop experience collection... (29150 times) [2024-06-15 18:09:50,135][1651669] InferenceWorker_p0-w0: stopping experience collection (29150 times) [2024-06-15 18:09:50,364][1651274] Signal inference workers to resume experience collection... (29150 times) [2024-06-15 18:09:50,365][1651669] InferenceWorker_p0-w0: resuming experience collection (29150 times) [2024-06-15 18:09:50,767][1648981] Fps is (10 sec: 42614.1, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 1137475584. Throughput: 0: 11741.8. Samples: 284438016. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:50,767][1648981] Avg episode reward: [(0, '460.240')] [2024-06-15 18:09:51,357][1651669] Updated weights for policy 0, policy_version 555445 (0.0011) [2024-06-15 18:09:55,772][1648981] Fps is (10 sec: 45848.2, 60 sec: 44778.7, 300 sec: 47096.2). Total num frames: 1137639424. Throughput: 0: 11626.6. Samples: 284474880. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:09:55,773][1648981] Avg episode reward: [(0, '440.450')] [2024-06-15 18:09:55,830][1651669] Updated weights for policy 0, policy_version 555489 (0.0012) [2024-06-15 18:09:56,982][1651669] Updated weights for policy 0, policy_version 555536 (0.0014) [2024-06-15 18:09:58,608][1651669] Updated weights for policy 0, policy_version 555590 (0.0011) [2024-06-15 18:09:59,465][1651669] Updated weights for policy 0, policy_version 555634 (0.0011) [2024-06-15 18:10:00,770][1648981] Fps is (10 sec: 49134.1, 60 sec: 48056.9, 300 sec: 47985.1). Total num frames: 1137967104. Throughput: 0: 11581.6. Samples: 284539392. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:00,771][1648981] Avg episode reward: [(0, '449.790')] [2024-06-15 18:10:01,812][1651669] Updated weights for policy 0, policy_version 555680 (0.0013) [2024-06-15 18:10:02,693][1651669] Updated weights for policy 0, policy_version 555712 (0.0010) [2024-06-15 18:10:05,767][1648981] Fps is (10 sec: 45900.2, 60 sec: 44236.5, 300 sec: 47213.2). Total num frames: 1138098176. Throughput: 0: 11543.1. Samples: 284617728. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:05,768][1648981] Avg episode reward: [(0, '450.400')] [2024-06-15 18:10:08,676][1651669] Updated weights for policy 0, policy_version 555792 (0.0227) [2024-06-15 18:10:09,691][1651669] Updated weights for policy 0, policy_version 555835 (0.0016) [2024-06-15 18:10:10,766][1648981] Fps is (10 sec: 45892.4, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 1138425856. Throughput: 0: 11584.8. Samples: 284653568. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:10,767][1648981] Avg episode reward: [(0, '449.740')] [2024-06-15 18:10:11,212][1651669] Updated weights for policy 0, policy_version 555899 (0.0107) [2024-06-15 18:10:13,224][1651669] Updated weights for policy 0, policy_version 555957 (0.0074) [2024-06-15 18:10:15,769][1648981] Fps is (10 sec: 52414.8, 60 sec: 45872.9, 300 sec: 47429.8). Total num frames: 1138622464. Throughput: 0: 11661.4. Samples: 284722176. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:15,770][1648981] Avg episode reward: [(0, '428.090')] [2024-06-15 18:10:17,663][1651669] Updated weights for policy 0, policy_version 556000 (0.0033) [2024-06-15 18:10:19,710][1651669] Updated weights for policy 0, policy_version 556049 (0.0014) [2024-06-15 18:10:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 47532.4, 300 sec: 47430.3). Total num frames: 1138851840. Throughput: 0: 11741.9. Samples: 284795392. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:20,767][1648981] Avg episode reward: [(0, '428.580')] [2024-06-15 18:10:21,313][1651669] Updated weights for policy 0, policy_version 556118 (0.0128) [2024-06-15 18:10:22,886][1651669] Updated weights for policy 0, policy_version 556161 (0.0012) [2024-06-15 18:10:24,270][1651669] Updated weights for policy 0, policy_version 556223 (0.0014) [2024-06-15 18:10:25,766][1648981] Fps is (10 sec: 52444.9, 60 sec: 46428.1, 300 sec: 47765.6). Total num frames: 1139146752. Throughput: 0: 11845.3. Samples: 284830208. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:25,767][1648981] Avg episode reward: [(0, '429.630')] [2024-06-15 18:10:28,231][1651669] Updated weights for policy 0, policy_version 556282 (0.0107) [2024-06-15 18:10:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1139277824. Throughput: 0: 11741.9. Samples: 284902912. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:30,767][1648981] Avg episode reward: [(0, '424.730')] [2024-06-15 18:10:31,723][1651669] Updated weights for policy 0, policy_version 556336 (0.0013) [2024-06-15 18:10:32,537][1651274] Signal inference workers to stop experience collection... (29200 times) [2024-06-15 18:10:32,664][1651669] InferenceWorker_p0-w0: stopping experience collection (29200 times) [2024-06-15 18:10:32,837][1651274] Signal inference workers to resume experience collection... (29200 times) [2024-06-15 18:10:32,837][1651669] InferenceWorker_p0-w0: resuming experience collection (29200 times) [2024-06-15 18:10:33,211][1651669] Updated weights for policy 0, policy_version 556400 (0.0013) [2024-06-15 18:10:34,801][1651669] Updated weights for policy 0, policy_version 556436 (0.0013) [2024-06-15 18:10:35,747][1651669] Updated weights for policy 0, policy_version 556478 (0.0011) [2024-06-15 18:10:35,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48606.0, 300 sec: 47985.7). Total num frames: 1139671040. Throughput: 0: 11924.0. Samples: 284974592. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:35,767][1648981] Avg episode reward: [(0, '414.040')] [2024-06-15 18:10:38,363][1651669] Updated weights for policy 0, policy_version 556516 (0.0013) [2024-06-15 18:10:40,802][1648981] Fps is (10 sec: 52241.9, 60 sec: 45850.8, 300 sec: 47202.4). Total num frames: 1139802112. Throughput: 0: 11927.4. Samples: 285011968. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:40,803][1648981] Avg episode reward: [(0, '422.470')] [2024-06-15 18:10:42,784][1651669] Updated weights for policy 0, policy_version 556593 (0.0012) [2024-06-15 18:10:44,117][1651669] Updated weights for policy 0, policy_version 556656 (0.0012) [2024-06-15 18:10:45,784][1648981] Fps is (10 sec: 39251.8, 60 sec: 48045.6, 300 sec: 47760.7). Total num frames: 1140064256. Throughput: 0: 11977.1. Samples: 285078528. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:45,785][1648981] Avg episode reward: [(0, '426.780')] [2024-06-15 18:10:46,300][1651669] Updated weights for policy 0, policy_version 556689 (0.0012) [2024-06-15 18:10:49,208][1651669] Updated weights for policy 0, policy_version 556768 (0.0012) [2024-06-15 18:10:50,777][1648981] Fps is (10 sec: 52561.1, 60 sec: 47505.3, 300 sec: 47539.7). Total num frames: 1140326400. Throughput: 0: 11864.3. Samples: 285151744. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:50,778][1648981] Avg episode reward: [(0, '422.210')] [2024-06-15 18:10:52,684][1651669] Updated weights for policy 0, policy_version 556818 (0.0013) [2024-06-15 18:10:54,693][1651669] Updated weights for policy 0, policy_version 556899 (0.0055) [2024-06-15 18:10:55,776][1648981] Fps is (10 sec: 52469.4, 60 sec: 49148.7, 300 sec: 47874.1). Total num frames: 1140588544. Throughput: 0: 11887.2. Samples: 285188608. Policy #0 lag: (min: 95.0, avg: 177.0, max: 335.0) [2024-06-15 18:10:55,777][1648981] Avg episode reward: [(0, '411.820')] [2024-06-15 18:10:55,781][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000556928_1140588544.pth... [2024-06-15 18:10:55,845][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000551360_1129185280.pth [2024-06-15 18:10:55,850][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000556928_1140588544.pth [2024-06-15 18:10:57,267][1651669] Updated weights for policy 0, policy_version 556932 (0.0012) [2024-06-15 18:10:58,542][1651669] Updated weights for policy 0, policy_version 556992 (0.0018) [2024-06-15 18:11:00,554][1651669] Updated weights for policy 0, policy_version 557053 (0.0027) [2024-06-15 18:11:00,770][1648981] Fps is (10 sec: 52464.5, 60 sec: 48059.7, 300 sec: 47874.0). Total num frames: 1140850688. Throughput: 0: 11980.6. Samples: 285261312. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:00,771][1648981] Avg episode reward: [(0, '416.010')] [2024-06-15 18:11:04,859][1651669] Updated weights for policy 0, policy_version 557124 (0.0011) [2024-06-15 18:11:05,766][1648981] Fps is (10 sec: 49200.9, 60 sec: 49698.5, 300 sec: 47652.4). Total num frames: 1141080064. Throughput: 0: 11992.2. Samples: 285335040. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:05,767][1648981] Avg episode reward: [(0, '418.260')] [2024-06-15 18:11:05,922][1651669] Updated weights for policy 0, policy_version 557177 (0.0011) [2024-06-15 18:11:08,314][1651669] Updated weights for policy 0, policy_version 557221 (0.0012) [2024-06-15 18:11:10,767][1648981] Fps is (10 sec: 45891.9, 60 sec: 48059.6, 300 sec: 47875.2). Total num frames: 1141309440. Throughput: 0: 12083.2. Samples: 285373952. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:10,767][1648981] Avg episode reward: [(0, '438.910')] [2024-06-15 18:11:10,958][1651669] Updated weights for policy 0, policy_version 557303 (0.0012) [2024-06-15 18:11:15,037][1651669] Updated weights for policy 0, policy_version 557346 (0.0011) [2024-06-15 18:11:15,432][1651274] Signal inference workers to stop experience collection... (29250 times) [2024-06-15 18:11:15,468][1651669] InferenceWorker_p0-w0: stopping experience collection (29250 times) [2024-06-15 18:11:15,708][1651274] Signal inference workers to resume experience collection... (29250 times) [2024-06-15 18:11:15,709][1651669] InferenceWorker_p0-w0: resuming experience collection (29250 times) [2024-06-15 18:11:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48062.2, 300 sec: 47541.4). Total num frames: 1141506048. Throughput: 0: 12117.3. Samples: 285448192. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:15,767][1648981] Avg episode reward: [(0, '442.960')] [2024-06-15 18:11:16,674][1651669] Updated weights for policy 0, policy_version 557408 (0.0011) [2024-06-15 18:11:19,290][1651669] Updated weights for policy 0, policy_version 557472 (0.0014) [2024-06-15 18:11:20,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 1141768192. Throughput: 0: 11935.3. Samples: 285511680. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:20,767][1648981] Avg episode reward: [(0, '442.100')] [2024-06-15 18:11:21,481][1651669] Updated weights for policy 0, policy_version 557521 (0.0012) [2024-06-15 18:11:22,605][1651669] Updated weights for policy 0, policy_version 557568 (0.0027) [2024-06-15 18:11:25,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 46967.4, 300 sec: 47319.2). Total num frames: 1141964800. Throughput: 0: 11956.1. Samples: 285549568. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:25,767][1648981] Avg episode reward: [(0, '463.290')] [2024-06-15 18:11:26,009][1651669] Updated weights for policy 0, policy_version 557622 (0.0013) [2024-06-15 18:11:27,858][1651669] Updated weights for policy 0, policy_version 557689 (0.0012) [2024-06-15 18:11:30,021][1651669] Updated weights for policy 0, policy_version 557730 (0.0011) [2024-06-15 18:11:30,767][1648981] Fps is (10 sec: 52424.8, 60 sec: 50243.6, 300 sec: 47985.6). Total num frames: 1142292480. Throughput: 0: 12042.2. Samples: 285620224. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:30,768][1648981] Avg episode reward: [(0, '475.320')] [2024-06-15 18:11:32,545][1651669] Updated weights for policy 0, policy_version 557776 (0.0012) [2024-06-15 18:11:33,862][1651669] Updated weights for policy 0, policy_version 557821 (0.0011) [2024-06-15 18:11:35,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 46421.3, 300 sec: 47430.3). Total num frames: 1142456320. Throughput: 0: 12131.6. Samples: 285697536. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:35,767][1648981] Avg episode reward: [(0, '491.220')] [2024-06-15 18:11:36,549][1651669] Updated weights for policy 0, policy_version 557883 (0.0011) [2024-06-15 18:11:38,439][1651669] Updated weights for policy 0, policy_version 557936 (0.0011) [2024-06-15 18:11:40,637][1651669] Updated weights for policy 0, policy_version 558000 (0.0016) [2024-06-15 18:11:40,766][1648981] Fps is (10 sec: 49155.4, 60 sec: 49727.7, 300 sec: 47874.6). Total num frames: 1142784000. Throughput: 0: 12085.8. Samples: 285732352. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:40,767][1648981] Avg episode reward: [(0, '497.510')] [2024-06-15 18:11:43,301][1651669] Updated weights for policy 0, policy_version 558038 (0.0013) [2024-06-15 18:11:45,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48073.7, 300 sec: 47541.3). Total num frames: 1142947840. Throughput: 0: 12106.9. Samples: 285806080. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:45,768][1648981] Avg episode reward: [(0, '527.960')] [2024-06-15 18:11:46,060][1651669] Updated weights for policy 0, policy_version 558096 (0.0013) [2024-06-15 18:11:48,608][1651669] Updated weights for policy 0, policy_version 558160 (0.0011) [2024-06-15 18:11:50,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 48068.3, 300 sec: 47541.4). Total num frames: 1143209984. Throughput: 0: 12094.6. Samples: 285879296. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:50,767][1648981] Avg episode reward: [(0, '532.730')] [2024-06-15 18:11:51,181][1651669] Updated weights for policy 0, policy_version 558240 (0.0107) [2024-06-15 18:11:54,368][1651669] Updated weights for policy 0, policy_version 558304 (0.0013) [2024-06-15 18:11:55,806][1648981] Fps is (10 sec: 52222.1, 60 sec: 48035.8, 300 sec: 47757.1). Total num frames: 1143472128. Throughput: 0: 12027.1. Samples: 285915648. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:11:55,807][1648981] Avg episode reward: [(0, '535.190')] [2024-06-15 18:11:57,130][1651669] Updated weights for policy 0, policy_version 558352 (0.0012) [2024-06-15 18:11:58,139][1651669] Updated weights for policy 0, policy_version 558400 (0.0046) [2024-06-15 18:11:58,817][1651274] Signal inference workers to stop experience collection... (29300 times) [2024-06-15 18:11:58,852][1651669] InferenceWorker_p0-w0: stopping experience collection (29300 times) [2024-06-15 18:11:59,131][1651274] Signal inference workers to resume experience collection... (29300 times) [2024-06-15 18:11:59,132][1651669] InferenceWorker_p0-w0: resuming experience collection (29300 times) [2024-06-15 18:12:00,359][1651669] Updated weights for policy 0, policy_version 558464 (0.0011) [2024-06-15 18:12:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48062.8, 300 sec: 47541.4). Total num frames: 1143734272. Throughput: 0: 11923.9. Samples: 285984768. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:12:00,767][1648981] Avg episode reward: [(0, '518.830')] [2024-06-15 18:12:03,230][1651669] Updated weights for policy 0, policy_version 558527 (0.0012) [2024-06-15 18:12:05,664][1651669] Updated weights for policy 0, policy_version 558586 (0.0013) [2024-06-15 18:12:05,766][1648981] Fps is (10 sec: 52637.8, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1143996416. Throughput: 0: 12208.3. Samples: 286061056. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:12:05,767][1648981] Avg episode reward: [(0, '507.180')] [2024-06-15 18:12:08,479][1651669] Updated weights for policy 0, policy_version 558654 (0.0013) [2024-06-15 18:12:10,770][1648981] Fps is (10 sec: 49133.0, 60 sec: 48602.9, 300 sec: 47762.9). Total num frames: 1144225792. Throughput: 0: 12207.3. Samples: 286098944. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:12:10,771][1648981] Avg episode reward: [(0, '504.960')] [2024-06-15 18:12:10,800][1651669] Updated weights for policy 0, policy_version 558720 (0.0012) [2024-06-15 18:12:13,769][1651669] Updated weights for policy 0, policy_version 558779 (0.0021) [2024-06-15 18:12:15,767][1648981] Fps is (10 sec: 42594.3, 60 sec: 48605.0, 300 sec: 48096.6). Total num frames: 1144422400. Throughput: 0: 12253.8. Samples: 286171648. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:12:15,768][1648981] Avg episode reward: [(0, '510.860')] [2024-06-15 18:12:16,507][1651669] Updated weights for policy 0, policy_version 558832 (0.0085) [2024-06-15 18:12:18,858][1651669] Updated weights for policy 0, policy_version 558869 (0.0011) [2024-06-15 18:12:19,477][1651669] Updated weights for policy 0, policy_version 558912 (0.0013) [2024-06-15 18:12:20,766][1648981] Fps is (10 sec: 49170.7, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1144717312. Throughput: 0: 12128.7. Samples: 286243328. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:12:20,767][1648981] Avg episode reward: [(0, '511.670')] [2024-06-15 18:12:20,990][1651669] Updated weights for policy 0, policy_version 558973 (0.0011) [2024-06-15 18:12:24,196][1651669] Updated weights for policy 0, policy_version 559034 (0.0105) [2024-06-15 18:12:25,773][1648981] Fps is (10 sec: 49125.8, 60 sec: 49146.9, 300 sec: 47986.6). Total num frames: 1144913920. Throughput: 0: 12229.4. Samples: 286282752. Policy #0 lag: (min: 15.0, avg: 129.4, max: 271.0) [2024-06-15 18:12:25,776][1648981] Avg episode reward: [(0, '492.580')] [2024-06-15 18:12:27,093][1651669] Updated weights for policy 0, policy_version 559088 (0.0011) [2024-06-15 18:12:28,794][1651669] Updated weights for policy 0, policy_version 559121 (0.0011) [2024-06-15 18:12:29,662][1651669] Updated weights for policy 0, policy_version 559168 (0.0012) [2024-06-15 18:12:30,781][1648981] Fps is (10 sec: 49081.8, 60 sec: 48594.9, 300 sec: 48316.6). Total num frames: 1145208832. Throughput: 0: 12306.9. Samples: 286360064. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:12:30,781][1648981] Avg episode reward: [(0, '507.030')] [2024-06-15 18:12:33,554][1651669] Updated weights for policy 0, policy_version 559233 (0.0013) [2024-06-15 18:12:34,734][1651669] Updated weights for policy 0, policy_version 559294 (0.0144) [2024-06-15 18:12:35,766][1648981] Fps is (10 sec: 52461.6, 60 sec: 49698.0, 300 sec: 47985.7). Total num frames: 1145438208. Throughput: 0: 12310.7. Samples: 286433280. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:12:35,767][1648981] Avg episode reward: [(0, '492.290')] [2024-06-15 18:12:37,126][1651669] Updated weights for policy 0, policy_version 559344 (0.0013) [2024-06-15 18:12:39,319][1651669] Updated weights for policy 0, policy_version 559376 (0.0011) [2024-06-15 18:12:40,451][1651669] Updated weights for policy 0, policy_version 559420 (0.0010) [2024-06-15 18:12:40,789][1648981] Fps is (10 sec: 49113.0, 60 sec: 48587.9, 300 sec: 48315.3). Total num frames: 1145700352. Throughput: 0: 12497.7. Samples: 286477824. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:12:40,789][1648981] Avg episode reward: [(0, '484.760')] [2024-06-15 18:12:41,696][1651274] Signal inference workers to stop experience collection... (29350 times) [2024-06-15 18:12:41,777][1651669] InferenceWorker_p0-w0: stopping experience collection (29350 times) [2024-06-15 18:12:42,035][1651274] Signal inference workers to resume experience collection... (29350 times) [2024-06-15 18:12:42,035][1651669] InferenceWorker_p0-w0: resuming experience collection (29350 times) [2024-06-15 18:12:42,278][1651669] Updated weights for policy 0, policy_version 559487 (0.0162) [2024-06-15 18:12:45,228][1651669] Updated weights for policy 0, policy_version 559542 (0.0012) [2024-06-15 18:12:45,767][1648981] Fps is (10 sec: 52425.6, 60 sec: 50243.8, 300 sec: 47989.4). Total num frames: 1145962496. Throughput: 0: 12447.1. Samples: 286544896. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:12:45,768][1648981] Avg episode reward: [(0, '466.560')] [2024-06-15 18:12:47,452][1651669] Updated weights for policy 0, policy_version 559600 (0.0102) [2024-06-15 18:12:50,766][1648981] Fps is (10 sec: 45977.6, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 1146159104. Throughput: 0: 12561.1. Samples: 286626304. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:12:50,767][1648981] Avg episode reward: [(0, '462.480')] [2024-06-15 18:12:50,791][1651669] Updated weights for policy 0, policy_version 559649 (0.0033) [2024-06-15 18:12:52,297][1651669] Updated weights for policy 0, policy_version 559712 (0.0011) [2024-06-15 18:12:55,197][1651669] Updated weights for policy 0, policy_version 559748 (0.0013) [2024-06-15 18:12:55,767][1648981] Fps is (10 sec: 45877.4, 60 sec: 49184.4, 300 sec: 47985.6). Total num frames: 1146421248. Throughput: 0: 12311.7. Samples: 286652928. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:12:55,767][1648981] Avg episode reward: [(0, '470.160')] [2024-06-15 18:12:56,009][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000559792_1146454016.pth... [2024-06-15 18:12:56,058][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000554176_1134952448.pth [2024-06-15 18:12:56,222][1651669] Updated weights for policy 0, policy_version 559802 (0.0044) [2024-06-15 18:12:58,158][1651669] Updated weights for policy 0, policy_version 559868 (0.0013) [2024-06-15 18:13:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1146617856. Throughput: 0: 12402.1. Samples: 286729728. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:00,767][1648981] Avg episode reward: [(0, '473.630')] [2024-06-15 18:13:02,660][1651669] Updated weights for policy 0, policy_version 559942 (0.0011) [2024-06-15 18:13:04,007][1651669] Updated weights for policy 0, policy_version 560000 (0.0012) [2024-06-15 18:13:05,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 48605.5, 300 sec: 48096.7). Total num frames: 1146912768. Throughput: 0: 12390.3. Samples: 286800896. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:05,768][1648981] Avg episode reward: [(0, '468.570')] [2024-06-15 18:13:08,271][1651669] Updated weights for policy 0, policy_version 560080 (0.0018) [2024-06-15 18:13:10,770][1648981] Fps is (10 sec: 52408.2, 60 sec: 48605.8, 300 sec: 48429.4). Total num frames: 1147142144. Throughput: 0: 12288.7. Samples: 286835712. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:10,771][1648981] Avg episode reward: [(0, '466.420')] [2024-06-15 18:13:12,505][1651669] Updated weights for policy 0, policy_version 560135 (0.0013) [2024-06-15 18:13:13,861][1651669] Updated weights for policy 0, policy_version 560195 (0.0028) [2024-06-15 18:13:15,254][1651669] Updated weights for policy 0, policy_version 560241 (0.0011) [2024-06-15 18:13:15,767][1648981] Fps is (10 sec: 49154.3, 60 sec: 49698.9, 300 sec: 48098.1). Total num frames: 1147404288. Throughput: 0: 12348.8. Samples: 286915584. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:15,769][1648981] Avg episode reward: [(0, '492.910')] [2024-06-15 18:13:17,049][1651669] Updated weights for policy 0, policy_version 560310 (0.0089) [2024-06-15 18:13:19,493][1651669] Updated weights for policy 0, policy_version 560355 (0.0091) [2024-06-15 18:13:20,770][1648981] Fps is (10 sec: 52429.0, 60 sec: 49148.9, 300 sec: 48429.4). Total num frames: 1147666432. Throughput: 0: 12093.6. Samples: 286977536. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:20,771][1648981] Avg episode reward: [(0, '492.930')] [2024-06-15 18:13:24,258][1651669] Updated weights for policy 0, policy_version 560416 (0.0025) [2024-06-15 18:13:24,720][1651274] Signal inference workers to stop experience collection... (29400 times) [2024-06-15 18:13:24,745][1651669] InferenceWorker_p0-w0: stopping experience collection (29400 times) [2024-06-15 18:13:24,986][1651274] Signal inference workers to resume experience collection... (29400 times) [2024-06-15 18:13:24,987][1651669] InferenceWorker_p0-w0: resuming experience collection (29400 times) [2024-06-15 18:13:25,782][1648981] Fps is (10 sec: 42534.2, 60 sec: 48598.7, 300 sec: 48094.3). Total num frames: 1147830272. Throughput: 0: 12233.0. Samples: 287028224. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:25,782][1648981] Avg episode reward: [(0, '507.340')] [2024-06-15 18:13:26,005][1651669] Updated weights for policy 0, policy_version 560483 (0.0095) [2024-06-15 18:13:27,827][1651669] Updated weights for policy 0, policy_version 560547 (0.0020) [2024-06-15 18:13:29,986][1651669] Updated weights for policy 0, policy_version 560594 (0.0012) [2024-06-15 18:13:30,777][1648981] Fps is (10 sec: 52392.0, 60 sec: 49701.0, 300 sec: 48428.2). Total num frames: 1148190720. Throughput: 0: 12012.2. Samples: 287085568. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:30,778][1648981] Avg episode reward: [(0, '499.450')] [2024-06-15 18:13:35,171][1651669] Updated weights for policy 0, policy_version 560659 (0.0082) [2024-06-15 18:13:35,766][1648981] Fps is (10 sec: 45945.5, 60 sec: 47513.8, 300 sec: 48207.9). Total num frames: 1148289024. Throughput: 0: 11958.1. Samples: 287164416. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:35,767][1648981] Avg episode reward: [(0, '493.580')] [2024-06-15 18:13:37,693][1651669] Updated weights for policy 0, policy_version 560763 (0.0012) [2024-06-15 18:13:39,406][1651669] Updated weights for policy 0, policy_version 560826 (0.0011) [2024-06-15 18:13:40,766][1648981] Fps is (10 sec: 39364.2, 60 sec: 48077.5, 300 sec: 47985.7). Total num frames: 1148583936. Throughput: 0: 11878.5. Samples: 287187456. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:40,767][1648981] Avg episode reward: [(0, '495.470')] [2024-06-15 18:13:41,860][1651669] Updated weights for policy 0, policy_version 560868 (0.0012) [2024-06-15 18:13:45,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 45875.8, 300 sec: 47985.7). Total num frames: 1148715008. Throughput: 0: 11764.6. Samples: 287259136. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:45,767][1648981] Avg episode reward: [(0, '489.780')] [2024-06-15 18:13:47,458][1651669] Updated weights for policy 0, policy_version 560949 (0.0014) [2024-06-15 18:13:49,290][1651669] Updated weights for policy 0, policy_version 561024 (0.0012) [2024-06-15 18:13:50,587][1651669] Updated weights for policy 0, policy_version 561085 (0.0036) [2024-06-15 18:13:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1149108224. Throughput: 0: 11730.6. Samples: 287328768. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:50,767][1648981] Avg episode reward: [(0, '486.770')] [2024-06-15 18:13:53,770][1651669] Updated weights for policy 0, policy_version 561145 (0.0012) [2024-06-15 18:13:55,767][1648981] Fps is (10 sec: 52427.1, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1149239296. Throughput: 0: 11833.8. Samples: 287368192. Policy #0 lag: (min: 15.0, avg: 120.2, max: 271.0) [2024-06-15 18:13:55,768][1648981] Avg episode reward: [(0, '490.270')] [2024-06-15 18:13:57,905][1651669] Updated weights for policy 0, policy_version 561200 (0.0017) [2024-06-15 18:13:58,802][1651669] Updated weights for policy 0, policy_version 561248 (0.0012) [2024-06-15 18:14:00,301][1651669] Updated weights for policy 0, policy_version 561314 (0.0011) [2024-06-15 18:14:00,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1149599744. Throughput: 0: 11719.1. Samples: 287442944. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:00,767][1648981] Avg episode reward: [(0, '506.420')] [2024-06-15 18:14:03,293][1651669] Updated weights for policy 0, policy_version 561360 (0.0012) [2024-06-15 18:14:03,452][1651274] Signal inference workers to stop experience collection... (29450 times) [2024-06-15 18:14:03,544][1651669] InferenceWorker_p0-w0: stopping experience collection (29450 times) [2024-06-15 18:14:03,750][1651274] Signal inference workers to resume experience collection... (29450 times) [2024-06-15 18:14:03,751][1651669] InferenceWorker_p0-w0: resuming experience collection (29450 times) [2024-06-15 18:14:05,776][1648981] Fps is (10 sec: 52382.4, 60 sec: 47506.8, 300 sec: 48095.3). Total num frames: 1149763584. Throughput: 0: 11990.8. Samples: 287517184. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:05,777][1648981] Avg episode reward: [(0, '472.250')] [2024-06-15 18:14:07,676][1651669] Updated weights for policy 0, policy_version 561424 (0.0153) [2024-06-15 18:14:08,477][1651669] Updated weights for policy 0, policy_version 561465 (0.0013) [2024-06-15 18:14:09,814][1651669] Updated weights for policy 0, policy_version 561520 (0.0012) [2024-06-15 18:14:10,778][1648981] Fps is (10 sec: 45820.7, 60 sec: 48599.4, 300 sec: 48094.8). Total num frames: 1150058496. Throughput: 0: 11765.5. Samples: 287557632. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:10,779][1648981] Avg episode reward: [(0, '471.400')] [2024-06-15 18:14:11,421][1651669] Updated weights for policy 0, policy_version 561591 (0.0012) [2024-06-15 18:14:14,447][1651669] Updated weights for policy 0, policy_version 561654 (0.0087) [2024-06-15 18:14:15,778][1648981] Fps is (10 sec: 52414.9, 60 sec: 48050.3, 300 sec: 48432.0). Total num frames: 1150287872. Throughput: 0: 12071.6. Samples: 287628800. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:15,779][1648981] Avg episode reward: [(0, '484.270')] [2024-06-15 18:14:18,573][1651669] Updated weights for policy 0, policy_version 561696 (0.0012) [2024-06-15 18:14:20,164][1651669] Updated weights for policy 0, policy_version 561776 (0.0014) [2024-06-15 18:14:20,769][1648981] Fps is (10 sec: 49197.5, 60 sec: 48060.7, 300 sec: 48097.8). Total num frames: 1150550016. Throughput: 0: 12025.6. Samples: 287705600. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:20,770][1648981] Avg episode reward: [(0, '462.680')] [2024-06-15 18:14:22,057][1651669] Updated weights for policy 0, policy_version 561855 (0.0013) [2024-06-15 18:14:25,548][1651669] Updated weights for policy 0, policy_version 561918 (0.0017) [2024-06-15 18:14:25,767][1648981] Fps is (10 sec: 52489.5, 60 sec: 49710.5, 300 sec: 48429.9). Total num frames: 1150812160. Throughput: 0: 12310.7. Samples: 287741440. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:25,767][1648981] Avg episode reward: [(0, '443.290')] [2024-06-15 18:14:30,154][1651669] Updated weights for policy 0, policy_version 561973 (0.0012) [2024-06-15 18:14:30,766][1648981] Fps is (10 sec: 42609.3, 60 sec: 46429.7, 300 sec: 48207.8). Total num frames: 1150976000. Throughput: 0: 12515.5. Samples: 287822336. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:30,767][1648981] Avg episode reward: [(0, '457.630')] [2024-06-15 18:14:32,232][1651669] Updated weights for policy 0, policy_version 562064 (0.0012) [2024-06-15 18:14:34,887][1651669] Updated weights for policy 0, policy_version 562114 (0.0032) [2024-06-15 18:14:35,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 49698.1, 300 sec: 48208.5). Total num frames: 1151270912. Throughput: 0: 12390.4. Samples: 287886336. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:35,767][1648981] Avg episode reward: [(0, '457.110')] [2024-06-15 18:14:36,185][1651669] Updated weights for policy 0, policy_version 562170 (0.0018) [2024-06-15 18:14:40,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 1151401984. Throughput: 0: 12492.9. Samples: 287930368. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:40,767][1648981] Avg episode reward: [(0, '481.870')] [2024-06-15 18:14:40,944][1651669] Updated weights for policy 0, policy_version 562224 (0.0014) [2024-06-15 18:14:42,004][1651669] Updated weights for policy 0, policy_version 562272 (0.0011) [2024-06-15 18:14:42,434][1651274] Signal inference workers to stop experience collection... (29500 times) [2024-06-15 18:14:42,475][1651669] InferenceWorker_p0-w0: stopping experience collection (29500 times) [2024-06-15 18:14:42,773][1651274] Signal inference workers to resume experience collection... (29500 times) [2024-06-15 18:14:42,778][1651669] InferenceWorker_p0-w0: resuming experience collection (29500 times) [2024-06-15 18:14:43,725][1651669] Updated weights for policy 0, policy_version 562352 (0.0012) [2024-06-15 18:14:45,524][1651669] Updated weights for policy 0, policy_version 562384 (0.0011) [2024-06-15 18:14:45,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50790.4, 300 sec: 48430.0). Total num frames: 1151762432. Throughput: 0: 12310.7. Samples: 287996928. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:45,767][1648981] Avg episode reward: [(0, '479.090')] [2024-06-15 18:14:46,627][1651669] Updated weights for policy 0, policy_version 562431 (0.0020) [2024-06-15 18:14:50,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 46421.4, 300 sec: 48319.9). Total num frames: 1151893504. Throughput: 0: 12518.1. Samples: 288080384. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:50,767][1648981] Avg episode reward: [(0, '487.830')] [2024-06-15 18:14:52,670][1651669] Updated weights for policy 0, policy_version 562532 (0.0091) [2024-06-15 18:14:54,312][1651669] Updated weights for policy 0, policy_version 562596 (0.0012) [2024-06-15 18:14:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 50244.5, 300 sec: 48430.6). Total num frames: 1152253952. Throughput: 0: 12109.1. Samples: 288102400. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:14:55,767][1648981] Avg episode reward: [(0, '497.490')] [2024-06-15 18:14:55,830][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000562624_1152253952.pth... [2024-06-15 18:14:55,878][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000556928_1140588544.pth [2024-06-15 18:14:57,492][1651669] Updated weights for policy 0, policy_version 562644 (0.0013) [2024-06-15 18:15:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 46421.3, 300 sec: 48430.1). Total num frames: 1152385024. Throughput: 0: 12154.7. Samples: 288175616. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:00,767][1648981] Avg episode reward: [(0, '509.620')] [2024-06-15 18:15:01,407][1651669] Updated weights for policy 0, policy_version 562689 (0.0013) [2024-06-15 18:15:03,359][1651669] Updated weights for policy 0, policy_version 562772 (0.0012) [2024-06-15 18:15:05,381][1651669] Updated weights for policy 0, policy_version 562850 (0.0011) [2024-06-15 18:15:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49705.8, 300 sec: 48541.1). Total num frames: 1152745472. Throughput: 0: 11913.2. Samples: 288241664. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:05,767][1648981] Avg episode reward: [(0, '515.950')] [2024-06-15 18:15:07,542][1651669] Updated weights for policy 0, policy_version 562882 (0.0020) [2024-06-15 18:15:08,520][1651669] Updated weights for policy 0, policy_version 562933 (0.0020) [2024-06-15 18:15:10,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 47522.8, 300 sec: 48430.5). Total num frames: 1152909312. Throughput: 0: 12037.7. Samples: 288283136. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:10,767][1648981] Avg episode reward: [(0, '508.000')] [2024-06-15 18:15:14,303][1651669] Updated weights for policy 0, policy_version 563012 (0.0109) [2024-06-15 18:15:15,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48069.3, 300 sec: 48541.1). Total num frames: 1153171456. Throughput: 0: 11810.2. Samples: 288353792. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:15,767][1648981] Avg episode reward: [(0, '518.180')] [2024-06-15 18:15:16,327][1651669] Updated weights for policy 0, policy_version 563091 (0.0031) [2024-06-15 18:15:19,230][1651669] Updated weights for policy 0, policy_version 563144 (0.0015) [2024-06-15 18:15:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48061.8, 300 sec: 48430.0). Total num frames: 1153433600. Throughput: 0: 11798.7. Samples: 288417280. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:20,767][1648981] Avg episode reward: [(0, '536.650')] [2024-06-15 18:15:24,317][1651669] Updated weights for policy 0, policy_version 563216 (0.0011) [2024-06-15 18:15:24,507][1651274] Signal inference workers to stop experience collection... (29550 times) [2024-06-15 18:15:24,722][1651274] Signal inference workers to resume experience collection... (29550 times) [2024-06-15 18:15:24,752][1651669] InferenceWorker_p0-w0: stopping experience collection (29550 times) [2024-06-15 18:15:24,787][1651669] InferenceWorker_p0-w0: resuming experience collection (29550 times) [2024-06-15 18:15:25,706][1651669] Updated weights for policy 0, policy_version 563267 (0.0017) [2024-06-15 18:15:25,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 45875.4, 300 sec: 48430.0). Total num frames: 1153564672. Throughput: 0: 11787.4. Samples: 288460800. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:25,767][1648981] Avg episode reward: [(0, '525.240')] [2024-06-15 18:15:27,294][1651669] Updated weights for policy 0, policy_version 563333 (0.0011) [2024-06-15 18:15:28,627][1651669] Updated weights for policy 0, policy_version 563392 (0.0020) [2024-06-15 18:15:30,769][1648981] Fps is (10 sec: 42588.9, 60 sec: 48058.0, 300 sec: 48096.4). Total num frames: 1153859584. Throughput: 0: 11695.8. Samples: 288523264. Policy #0 lag: (min: 57.0, avg: 167.2, max: 297.0) [2024-06-15 18:15:30,769][1648981] Avg episode reward: [(0, '500.730')] [2024-06-15 18:15:31,827][1651669] Updated weights for policy 0, policy_version 563456 (0.0015) [2024-06-15 18:15:35,778][1648981] Fps is (10 sec: 42550.2, 60 sec: 45320.5, 300 sec: 48100.8). Total num frames: 1153990656. Throughput: 0: 11556.9. Samples: 288600576. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:15:35,778][1648981] Avg episode reward: [(0, '514.100')] [2024-06-15 18:15:36,812][1651669] Updated weights for policy 0, policy_version 563525 (0.0013) [2024-06-15 18:15:37,855][1651669] Updated weights for policy 0, policy_version 563574 (0.0028) [2024-06-15 18:15:38,841][1651669] Updated weights for policy 0, policy_version 563616 (0.0013) [2024-06-15 18:15:40,766][1648981] Fps is (10 sec: 49163.0, 60 sec: 49152.1, 300 sec: 48432.9). Total num frames: 1154351104. Throughput: 0: 11730.5. Samples: 288630272. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:15:40,767][1648981] Avg episode reward: [(0, '503.480')] [2024-06-15 18:15:41,049][1651669] Updated weights for policy 0, policy_version 563651 (0.0042) [2024-06-15 18:15:42,352][1651669] Updated weights for policy 0, policy_version 563703 (0.0013) [2024-06-15 18:15:45,770][1648981] Fps is (10 sec: 49190.4, 60 sec: 45326.4, 300 sec: 47986.8). Total num frames: 1154482176. Throughput: 0: 11740.9. Samples: 288704000. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:15:45,770][1648981] Avg episode reward: [(0, '508.910')] [2024-06-15 18:15:47,751][1651669] Updated weights for policy 0, policy_version 563776 (0.0011) [2024-06-15 18:15:48,707][1651669] Updated weights for policy 0, policy_version 563833 (0.0014) [2024-06-15 18:15:49,865][1651669] Updated weights for policy 0, policy_version 563872 (0.0010) [2024-06-15 18:15:50,757][1651669] Updated weights for policy 0, policy_version 563904 (0.0010) [2024-06-15 18:15:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49698.2, 300 sec: 48431.6). Total num frames: 1154875392. Throughput: 0: 11821.5. Samples: 288773632. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:15:50,767][1648981] Avg episode reward: [(0, '499.410')] [2024-06-15 18:15:52,792][1651669] Updated weights for policy 0, policy_version 563961 (0.0011) [2024-06-15 18:15:55,766][1648981] Fps is (10 sec: 52446.9, 60 sec: 45875.2, 300 sec: 47986.3). Total num frames: 1155006464. Throughput: 0: 11787.4. Samples: 288813568. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:15:55,767][1648981] Avg episode reward: [(0, '513.890')] [2024-06-15 18:15:58,263][1651669] Updated weights for policy 0, policy_version 564032 (0.0027) [2024-06-15 18:15:59,524][1651669] Updated weights for policy 0, policy_version 564096 (0.0021) [2024-06-15 18:15:59,874][1651274] Signal inference workers to stop experience collection... (29600 times) [2024-06-15 18:15:59,927][1651669] InferenceWorker_p0-w0: stopping experience collection (29600 times) [2024-06-15 18:16:00,088][1651274] Signal inference workers to resume experience collection... (29600 times) [2024-06-15 18:16:00,089][1651669] InferenceWorker_p0-w0: resuming experience collection (29600 times) [2024-06-15 18:16:00,475][1651669] Updated weights for policy 0, policy_version 564144 (0.0015) [2024-06-15 18:16:00,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 1155366912. Throughput: 0: 11776.0. Samples: 288883712. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:00,767][1648981] Avg episode reward: [(0, '506.250')] [2024-06-15 18:16:02,394][1651669] Updated weights for policy 0, policy_version 564178 (0.0009) [2024-06-15 18:16:05,772][1648981] Fps is (10 sec: 52401.0, 60 sec: 46417.1, 300 sec: 48207.0). Total num frames: 1155530752. Throughput: 0: 12286.5. Samples: 288970240. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:05,772][1648981] Avg episode reward: [(0, '505.060')] [2024-06-15 18:16:07,145][1651669] Updated weights for policy 0, policy_version 564240 (0.0018) [2024-06-15 18:16:08,695][1651669] Updated weights for policy 0, policy_version 564304 (0.0012) [2024-06-15 18:16:10,104][1651669] Updated weights for policy 0, policy_version 564368 (0.0012) [2024-06-15 18:16:10,767][1648981] Fps is (10 sec: 52425.5, 60 sec: 49697.7, 300 sec: 48763.1). Total num frames: 1155891200. Throughput: 0: 12037.5. Samples: 289002496. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:10,768][1648981] Avg episode reward: [(0, '503.000')] [2024-06-15 18:16:13,111][1651669] Updated weights for policy 0, policy_version 564432 (0.0013) [2024-06-15 18:16:14,201][1651669] Updated weights for policy 0, policy_version 564475 (0.0011) [2024-06-15 18:16:15,766][1648981] Fps is (10 sec: 52457.1, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1156055040. Throughput: 0: 12379.6. Samples: 289080320. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:15,767][1648981] Avg episode reward: [(0, '506.300')] [2024-06-15 18:16:19,019][1651669] Updated weights for policy 0, policy_version 564531 (0.0015) [2024-06-15 18:16:20,354][1651669] Updated weights for policy 0, policy_version 564594 (0.0014) [2024-06-15 18:16:20,777][1648981] Fps is (10 sec: 42554.9, 60 sec: 48051.0, 300 sec: 48650.4). Total num frames: 1156317184. Throughput: 0: 12231.2. Samples: 289150976. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:20,780][1648981] Avg episode reward: [(0, '515.500')] [2024-06-15 18:16:21,454][1651669] Updated weights for policy 0, policy_version 564656 (0.0077) [2024-06-15 18:16:24,121][1651669] Updated weights for policy 0, policy_version 564691 (0.0012) [2024-06-15 18:16:25,056][1651669] Updated weights for policy 0, policy_version 564734 (0.0014) [2024-06-15 18:16:25,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 50244.0, 300 sec: 48430.1). Total num frames: 1156579328. Throughput: 0: 12458.6. Samples: 289190912. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:25,767][1648981] Avg episode reward: [(0, '484.980')] [2024-06-15 18:16:29,447][1651669] Updated weights for policy 0, policy_version 564816 (0.0120) [2024-06-15 18:16:30,658][1651669] Updated weights for policy 0, policy_version 564864 (0.0012) [2024-06-15 18:16:30,766][1648981] Fps is (10 sec: 52486.2, 60 sec: 49700.0, 300 sec: 48763.2). Total num frames: 1156841472. Throughput: 0: 12459.6. Samples: 289264640. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:30,767][1648981] Avg episode reward: [(0, '498.980')] [2024-06-15 18:16:31,838][1651669] Updated weights for policy 0, policy_version 564926 (0.0014) [2024-06-15 18:16:35,267][1651669] Updated weights for policy 0, policy_version 564983 (0.0012) [2024-06-15 18:16:35,766][1648981] Fps is (10 sec: 52430.5, 60 sec: 51892.5, 300 sec: 48541.1). Total num frames: 1157103616. Throughput: 0: 12504.2. Samples: 289336320. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:35,767][1648981] Avg episode reward: [(0, '498.530')] [2024-06-15 18:16:38,611][1651669] Updated weights for policy 0, policy_version 565011 (0.0011) [2024-06-15 18:16:39,191][1651274] Signal inference workers to stop experience collection... (29650 times) [2024-06-15 18:16:39,245][1651669] InferenceWorker_p0-w0: stopping experience collection (29650 times) [2024-06-15 18:16:39,322][1651274] Signal inference workers to resume experience collection... (29650 times) [2024-06-15 18:16:39,323][1651669] InferenceWorker_p0-w0: resuming experience collection (29650 times) [2024-06-15 18:16:39,850][1651669] Updated weights for policy 0, policy_version 565078 (0.0013) [2024-06-15 18:16:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 48763.3). Total num frames: 1157332992. Throughput: 0: 12913.8. Samples: 289394688. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:40,767][1648981] Avg episode reward: [(0, '480.710')] [2024-06-15 18:16:41,465][1651669] Updated weights for policy 0, policy_version 565140 (0.0015) [2024-06-15 18:16:44,092][1651669] Updated weights for policy 0, policy_version 565187 (0.0011) [2024-06-15 18:16:45,437][1651669] Updated weights for policy 0, policy_version 565246 (0.0010) [2024-06-15 18:16:45,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 52431.7, 300 sec: 48874.3). Total num frames: 1157627904. Throughput: 0: 12856.9. Samples: 289462272. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:45,767][1648981] Avg episode reward: [(0, '470.520')] [2024-06-15 18:16:49,307][1651669] Updated weights for policy 0, policy_version 565285 (0.0010) [2024-06-15 18:16:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49151.9, 300 sec: 48658.7). Total num frames: 1157824512. Throughput: 0: 12528.4. Samples: 289533952. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:50,767][1648981] Avg episode reward: [(0, '463.320')] [2024-06-15 18:16:51,294][1651669] Updated weights for policy 0, policy_version 565363 (0.0099) [2024-06-15 18:16:52,656][1651669] Updated weights for policy 0, policy_version 565434 (0.0011) [2024-06-15 18:16:55,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 51882.7, 300 sec: 48763.2). Total num frames: 1158119424. Throughput: 0: 12527.1. Samples: 289566208. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:16:55,767][1648981] Avg episode reward: [(0, '470.210')] [2024-06-15 18:16:55,845][1651669] Updated weights for policy 0, policy_version 565502 (0.0014) [2024-06-15 18:16:55,868][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000565504_1158152192.pth... [2024-06-15 18:16:55,913][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000559792_1146454016.pth [2024-06-15 18:17:00,708][1651669] Updated weights for policy 0, policy_version 565556 (0.0014) [2024-06-15 18:17:00,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 48059.9, 300 sec: 48318.9). Total num frames: 1158250496. Throughput: 0: 12618.0. Samples: 289648128. Policy #0 lag: (min: 31.0, avg: 168.4, max: 287.0) [2024-06-15 18:17:00,767][1648981] Avg episode reward: [(0, '453.990')] [2024-06-15 18:17:02,059][1651669] Updated weights for policy 0, policy_version 565616 (0.0012) [2024-06-15 18:17:03,606][1651669] Updated weights for policy 0, policy_version 565691 (0.0013) [2024-06-15 18:17:05,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 50795.0, 300 sec: 48652.8). Total num frames: 1158578176. Throughput: 0: 12495.8. Samples: 289713152. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:05,767][1648981] Avg episode reward: [(0, '435.290')] [2024-06-15 18:17:06,311][1651669] Updated weights for policy 0, policy_version 565747 (0.0013) [2024-06-15 18:17:10,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46968.0, 300 sec: 48430.2). Total num frames: 1158709248. Throughput: 0: 12447.4. Samples: 289751040. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:10,767][1648981] Avg episode reward: [(0, '435.340')] [2024-06-15 18:17:11,029][1651669] Updated weights for policy 0, policy_version 565793 (0.0021) [2024-06-15 18:17:12,444][1651669] Updated weights for policy 0, policy_version 565829 (0.0011) [2024-06-15 18:17:13,898][1651669] Updated weights for policy 0, policy_version 565904 (0.0113) [2024-06-15 18:17:15,721][1651669] Updated weights for policy 0, policy_version 565953 (0.0012) [2024-06-15 18:17:15,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 50244.1, 300 sec: 48652.1). Total num frames: 1159069696. Throughput: 0: 12344.8. Samples: 289820160. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:15,767][1648981] Avg episode reward: [(0, '449.340')] [2024-06-15 18:17:16,478][1651274] Signal inference workers to stop experience collection... (29700 times) [2024-06-15 18:17:16,514][1651669] InferenceWorker_p0-w0: stopping experience collection (29700 times) [2024-06-15 18:17:16,717][1651274] Signal inference workers to resume experience collection... (29700 times) [2024-06-15 18:17:16,717][1651669] InferenceWorker_p0-w0: resuming experience collection (29700 times) [2024-06-15 18:17:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48068.5, 300 sec: 48431.0). Total num frames: 1159200768. Throughput: 0: 12470.0. Samples: 289897472. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:20,767][1648981] Avg episode reward: [(0, '432.200')] [2024-06-15 18:17:21,439][1651669] Updated weights for policy 0, policy_version 566022 (0.0013) [2024-06-15 18:17:22,858][1651669] Updated weights for policy 0, policy_version 566085 (0.0013) [2024-06-15 18:17:24,695][1651669] Updated weights for policy 0, policy_version 566164 (0.0013) [2024-06-15 18:17:25,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 50244.5, 300 sec: 48765.6). Total num frames: 1159593984. Throughput: 0: 12003.6. Samples: 289934848. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:25,767][1648981] Avg episode reward: [(0, '443.650')] [2024-06-15 18:17:26,912][1651669] Updated weights for policy 0, policy_version 566224 (0.0014) [2024-06-15 18:17:30,768][1648981] Fps is (10 sec: 52420.3, 60 sec: 48058.4, 300 sec: 48429.7). Total num frames: 1159725056. Throughput: 0: 11900.8. Samples: 289997824. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:30,768][1648981] Avg episode reward: [(0, '456.070')] [2024-06-15 18:17:32,430][1651669] Updated weights for policy 0, policy_version 566288 (0.0013) [2024-06-15 18:17:33,853][1651669] Updated weights for policy 0, policy_version 566337 (0.0043) [2024-06-15 18:17:35,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 48544.7). Total num frames: 1160019968. Throughput: 0: 11980.8. Samples: 290073088. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:35,767][1648981] Avg episode reward: [(0, '455.790')] [2024-06-15 18:17:35,954][1651669] Updated weights for policy 0, policy_version 566432 (0.0014) [2024-06-15 18:17:37,384][1651669] Updated weights for policy 0, policy_version 566471 (0.0014) [2024-06-15 18:17:38,377][1651669] Updated weights for policy 0, policy_version 566522 (0.0097) [2024-06-15 18:17:40,767][1648981] Fps is (10 sec: 52434.1, 60 sec: 48605.3, 300 sec: 48430.0). Total num frames: 1160249344. Throughput: 0: 12014.8. Samples: 290106880. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:40,768][1648981] Avg episode reward: [(0, '460.500')] [2024-06-15 18:17:43,759][1651669] Updated weights for policy 0, policy_version 566591 (0.0011) [2024-06-15 18:17:45,633][1651669] Updated weights for policy 0, policy_version 566658 (0.0011) [2024-06-15 18:17:45,774][1648981] Fps is (10 sec: 49116.9, 60 sec: 48054.1, 300 sec: 48651.0). Total num frames: 1160511488. Throughput: 0: 12149.5. Samples: 290194944. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:45,774][1648981] Avg episode reward: [(0, '459.870')] [2024-06-15 18:17:46,815][1651669] Updated weights for policy 0, policy_version 566709 (0.0018) [2024-06-15 18:17:48,628][1651669] Updated weights for policy 0, policy_version 566774 (0.0012) [2024-06-15 18:17:50,769][1648981] Fps is (10 sec: 52416.1, 60 sec: 49149.5, 300 sec: 48651.7). Total num frames: 1160773632. Throughput: 0: 12230.3. Samples: 290263552. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:50,770][1648981] Avg episode reward: [(0, '482.020')] [2024-06-15 18:17:53,510][1651669] Updated weights for policy 0, policy_version 566819 (0.0111) [2024-06-15 18:17:55,160][1651669] Updated weights for policy 0, policy_version 566864 (0.0030) [2024-06-15 18:17:55,793][1648981] Fps is (10 sec: 45786.3, 60 sec: 47492.6, 300 sec: 48647.8). Total num frames: 1160970240. Throughput: 0: 12269.4. Samples: 290303488. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:17:55,793][1648981] Avg episode reward: [(0, '511.730')] [2024-06-15 18:17:56,698][1651669] Updated weights for policy 0, policy_version 566929 (0.0011) [2024-06-15 18:17:57,094][1651274] Signal inference workers to stop experience collection... (29750 times) [2024-06-15 18:17:57,130][1651669] InferenceWorker_p0-w0: stopping experience collection (29750 times) [2024-06-15 18:17:57,398][1651274] Signal inference workers to resume experience collection... (29750 times) [2024-06-15 18:17:57,399][1651669] InferenceWorker_p0-w0: resuming experience collection (29750 times) [2024-06-15 18:17:58,632][1651669] Updated weights for policy 0, policy_version 567008 (0.0126) [2024-06-15 18:17:59,407][1651669] Updated weights for policy 0, policy_version 567039 (0.0016) [2024-06-15 18:18:00,766][1648981] Fps is (10 sec: 52444.6, 60 sec: 50790.3, 300 sec: 48763.3). Total num frames: 1161297920. Throughput: 0: 12242.5. Samples: 290371072. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:00,767][1648981] Avg episode reward: [(0, '509.150')] [2024-06-15 18:18:03,814][1651669] Updated weights for policy 0, policy_version 567093 (0.0013) [2024-06-15 18:18:05,766][1648981] Fps is (10 sec: 45997.1, 60 sec: 47513.5, 300 sec: 48430.6). Total num frames: 1161428992. Throughput: 0: 12413.1. Samples: 290456064. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:05,767][1648981] Avg episode reward: [(0, '498.450')] [2024-06-15 18:18:06,761][1651669] Updated weights for policy 0, policy_version 567152 (0.0012) [2024-06-15 18:18:07,924][1651669] Updated weights for policy 0, policy_version 567202 (0.0018) [2024-06-15 18:18:09,916][1651669] Updated weights for policy 0, policy_version 567266 (0.0012) [2024-06-15 18:18:10,679][1651669] Updated weights for policy 0, policy_version 567296 (0.0011) [2024-06-15 18:18:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 51882.6, 300 sec: 48874.3). Total num frames: 1161822208. Throughput: 0: 12333.5. Samples: 290489856. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:10,767][1648981] Avg episode reward: [(0, '495.410')] [2024-06-15 18:18:14,911][1651669] Updated weights for policy 0, policy_version 567360 (0.0014) [2024-06-15 18:18:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 48430.6). Total num frames: 1161953280. Throughput: 0: 12368.1. Samples: 290554368. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:15,767][1648981] Avg episode reward: [(0, '524.120')] [2024-06-15 18:18:18,927][1651669] Updated weights for policy 0, policy_version 567430 (0.0157) [2024-06-15 18:18:20,722][1651669] Updated weights for policy 0, policy_version 567506 (0.0012) [2024-06-15 18:18:20,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 50790.5, 300 sec: 48876.8). Total num frames: 1162248192. Throughput: 0: 12197.0. Samples: 290621952. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:20,767][1648981] Avg episode reward: [(0, '517.210')] [2024-06-15 18:18:21,752][1651669] Updated weights for policy 0, policy_version 567552 (0.0030) [2024-06-15 18:18:25,770][1648981] Fps is (10 sec: 52411.5, 60 sec: 48057.1, 300 sec: 48431.2). Total num frames: 1162477568. Throughput: 0: 12241.8. Samples: 290657792. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:25,770][1648981] Avg episode reward: [(0, '507.230')] [2024-06-15 18:18:28,855][1651669] Updated weights for policy 0, policy_version 567632 (0.0121) [2024-06-15 18:18:29,938][1651669] Updated weights for policy 0, policy_version 567677 (0.0012) [2024-06-15 18:18:30,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 49153.3, 300 sec: 48763.2). Total num frames: 1162674176. Throughput: 0: 11925.8. Samples: 290731520. Policy #0 lag: (min: 73.0, avg: 175.0, max: 325.0) [2024-06-15 18:18:30,767][1648981] Avg episode reward: [(0, '493.820')] [2024-06-15 18:18:31,688][1651669] Updated weights for policy 0, policy_version 567744 (0.0084) [2024-06-15 18:18:33,042][1651669] Updated weights for policy 0, policy_version 567800 (0.0013) [2024-06-15 18:18:35,766][1648981] Fps is (10 sec: 42612.4, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 1162903552. Throughput: 0: 11958.9. Samples: 290801664. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:18:35,767][1648981] Avg episode reward: [(0, '472.620')] [2024-06-15 18:18:36,458][1651669] Updated weights for policy 0, policy_version 567867 (0.0012) [2024-06-15 18:18:39,533][1651274] Signal inference workers to stop experience collection... (29800 times) [2024-06-15 18:18:39,580][1651669] InferenceWorker_p0-w0: stopping experience collection (29800 times) [2024-06-15 18:18:39,832][1651274] Signal inference workers to resume experience collection... (29800 times) [2024-06-15 18:18:39,834][1651669] InferenceWorker_p0-w0: resuming experience collection (29800 times) [2024-06-15 18:18:40,699][1651669] Updated weights for policy 0, policy_version 567920 (0.0014) [2024-06-15 18:18:40,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 47514.1, 300 sec: 48763.2). Total num frames: 1163100160. Throughput: 0: 11896.8. Samples: 290838528. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:18:40,767][1648981] Avg episode reward: [(0, '457.860')] [2024-06-15 18:18:41,885][1651669] Updated weights for policy 0, policy_version 567953 (0.0070) [2024-06-15 18:18:43,644][1651669] Updated weights for policy 0, policy_version 568019 (0.0011) [2024-06-15 18:18:44,411][1651669] Updated weights for policy 0, policy_version 568062 (0.0019) [2024-06-15 18:18:45,786][1648981] Fps is (10 sec: 49053.7, 60 sec: 48049.4, 300 sec: 48426.7). Total num frames: 1163395072. Throughput: 0: 11850.4. Samples: 290904576. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:18:45,787][1648981] Avg episode reward: [(0, '425.520')] [2024-06-15 18:18:47,142][1651669] Updated weights for policy 0, policy_version 568124 (0.0133) [2024-06-15 18:18:50,767][1648981] Fps is (10 sec: 45872.1, 60 sec: 46423.2, 300 sec: 48541.0). Total num frames: 1163558912. Throughput: 0: 11741.7. Samples: 290984448. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:18:50,768][1648981] Avg episode reward: [(0, '423.290')] [2024-06-15 18:18:51,718][1651669] Updated weights for policy 0, policy_version 568185 (0.0086) [2024-06-15 18:18:53,825][1651669] Updated weights for policy 0, policy_version 568243 (0.0011) [2024-06-15 18:18:55,767][1648981] Fps is (10 sec: 52532.1, 60 sec: 49173.5, 300 sec: 48541.0). Total num frames: 1163919360. Throughput: 0: 11684.9. Samples: 291015680. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:18:55,767][1648981] Avg episode reward: [(0, '417.140')] [2024-06-15 18:18:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000568320_1163919360.pth... [2024-06-15 18:18:55,824][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000562624_1152253952.pth [2024-06-15 18:18:57,721][1651669] Updated weights for policy 0, policy_version 568324 (0.0012) [2024-06-15 18:18:58,596][1651669] Updated weights for policy 0, policy_version 568382 (0.0093) [2024-06-15 18:19:00,778][1648981] Fps is (10 sec: 49099.5, 60 sec: 45866.5, 300 sec: 48429.6). Total num frames: 1164050432. Throughput: 0: 11738.9. Samples: 291082752. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:00,779][1648981] Avg episode reward: [(0, '413.920')] [2024-06-15 18:19:02,229][1651669] Updated weights for policy 0, policy_version 568436 (0.0012) [2024-06-15 18:19:04,352][1651669] Updated weights for policy 0, policy_version 568472 (0.0013) [2024-06-15 18:19:05,766][1648981] Fps is (10 sec: 42600.0, 60 sec: 48605.9, 300 sec: 48431.9). Total num frames: 1164345344. Throughput: 0: 11867.0. Samples: 291155968. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:05,767][1648981] Avg episode reward: [(0, '408.050')] [2024-06-15 18:19:06,211][1651669] Updated weights for policy 0, policy_version 568560 (0.0013) [2024-06-15 18:19:08,830][1651669] Updated weights for policy 0, policy_version 568595 (0.0021) [2024-06-15 18:19:10,766][1648981] Fps is (10 sec: 52488.5, 60 sec: 45875.3, 300 sec: 48431.9). Total num frames: 1164574720. Throughput: 0: 11958.9. Samples: 291195904. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:10,767][1648981] Avg episode reward: [(0, '412.090')] [2024-06-15 18:19:11,922][1651669] Updated weights for policy 0, policy_version 568656 (0.0012) [2024-06-15 18:19:12,900][1651669] Updated weights for policy 0, policy_version 568704 (0.0015) [2024-06-15 18:19:15,659][1651669] Updated weights for policy 0, policy_version 568764 (0.0011) [2024-06-15 18:19:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 48319.3). Total num frames: 1164804096. Throughput: 0: 11958.1. Samples: 291269632. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:15,767][1648981] Avg episode reward: [(0, '375.210')] [2024-06-15 18:19:17,118][1651669] Updated weights for policy 0, policy_version 568821 (0.0013) [2024-06-15 18:19:19,505][1651274] Signal inference workers to stop experience collection... (29850 times) [2024-06-15 18:19:19,560][1651669] InferenceWorker_p0-w0: stopping experience collection (29850 times) [2024-06-15 18:19:19,831][1651274] Signal inference workers to resume experience collection... (29850 times) [2024-06-15 18:19:19,831][1651669] InferenceWorker_p0-w0: resuming experience collection (29850 times) [2024-06-15 18:19:19,833][1651669] Updated weights for policy 0, policy_version 568864 (0.0013) [2024-06-15 18:19:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.5, 300 sec: 48430.0). Total num frames: 1165099008. Throughput: 0: 11889.8. Samples: 291336704. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:20,767][1648981] Avg episode reward: [(0, '376.440')] [2024-06-15 18:19:22,488][1651669] Updated weights for policy 0, policy_version 568901 (0.0012) [2024-06-15 18:19:23,718][1651669] Updated weights for policy 0, policy_version 568957 (0.0029) [2024-06-15 18:19:25,537][1651669] Updated weights for policy 0, policy_version 568995 (0.0011) [2024-06-15 18:19:25,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 47516.2, 300 sec: 48652.1). Total num frames: 1165328384. Throughput: 0: 11878.4. Samples: 291373056. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:25,767][1648981] Avg episode reward: [(0, '387.680')] [2024-06-15 18:19:27,706][1651669] Updated weights for policy 0, policy_version 569072 (0.0022) [2024-06-15 18:19:30,256][1651669] Updated weights for policy 0, policy_version 569109 (0.0011) [2024-06-15 18:19:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1165557760. Throughput: 0: 12156.9. Samples: 291451392. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:30,767][1648981] Avg episode reward: [(0, '388.650')] [2024-06-15 18:19:33,390][1651669] Updated weights for policy 0, policy_version 569154 (0.0011) [2024-06-15 18:19:34,485][1651669] Updated weights for policy 0, policy_version 569206 (0.0011) [2024-06-15 18:19:35,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 48652.2). Total num frames: 1165754368. Throughput: 0: 11958.2. Samples: 291522560. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:35,767][1648981] Avg episode reward: [(0, '396.210')] [2024-06-15 18:19:36,671][1651669] Updated weights for policy 0, policy_version 569264 (0.0014) [2024-06-15 18:19:37,988][1651669] Updated weights for policy 0, policy_version 569315 (0.0010) [2024-06-15 18:19:40,680][1651669] Updated weights for policy 0, policy_version 569361 (0.0011) [2024-06-15 18:19:40,778][1648981] Fps is (10 sec: 49093.1, 60 sec: 49142.2, 300 sec: 48428.0). Total num frames: 1166049280. Throughput: 0: 12125.6. Samples: 291561472. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:40,779][1648981] Avg episode reward: [(0, '394.690')] [2024-06-15 18:19:41,587][1651669] Updated weights for policy 0, policy_version 569408 (0.0013) [2024-06-15 18:19:44,606][1651669] Updated weights for policy 0, policy_version 569464 (0.0013) [2024-06-15 18:19:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48075.8, 300 sec: 48763.2). Total num frames: 1166278656. Throughput: 0: 12336.6. Samples: 291637760. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:45,767][1648981] Avg episode reward: [(0, '404.610')] [2024-06-15 18:19:46,944][1651669] Updated weights for policy 0, policy_version 569529 (0.0211) [2024-06-15 18:19:48,620][1651669] Updated weights for policy 0, policy_version 569584 (0.0013) [2024-06-15 18:19:50,763][1651669] Updated weights for policy 0, policy_version 569617 (0.0011) [2024-06-15 18:19:50,766][1648981] Fps is (10 sec: 52492.0, 60 sec: 50244.9, 300 sec: 48541.1). Total num frames: 1166573568. Throughput: 0: 12629.3. Samples: 291724288. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:50,767][1648981] Avg episode reward: [(0, '425.770')] [2024-06-15 18:19:51,972][1651669] Updated weights for policy 0, policy_version 569664 (0.0011) [2024-06-15 18:19:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48060.0, 300 sec: 48874.3). Total num frames: 1166802944. Throughput: 0: 12470.0. Samples: 291757056. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:19:55,767][1648981] Avg episode reward: [(0, '423.160')] [2024-06-15 18:19:56,002][1651669] Updated weights for policy 0, policy_version 569729 (0.0163) [2024-06-15 18:19:57,435][1651669] Updated weights for policy 0, policy_version 569784 (0.0041) [2024-06-15 18:19:59,104][1651669] Updated weights for policy 0, policy_version 569827 (0.0093) [2024-06-15 18:20:00,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 50253.7, 300 sec: 48541.0). Total num frames: 1167065088. Throughput: 0: 12287.9. Samples: 291822592. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:20:00,767][1648981] Avg episode reward: [(0, '413.850')] [2024-06-15 18:20:01,577][1651669] Updated weights for policy 0, policy_version 569872 (0.0013) [2024-06-15 18:20:02,704][1651669] Updated weights for policy 0, policy_version 569914 (0.0010) [2024-06-15 18:20:04,200][1651274] Signal inference workers to stop experience collection... (29900 times) [2024-06-15 18:20:04,232][1651669] InferenceWorker_p0-w0: stopping experience collection (29900 times) [2024-06-15 18:20:04,420][1651274] Signal inference workers to resume experience collection... (29900 times) [2024-06-15 18:20:04,421][1651669] InferenceWorker_p0-w0: resuming experience collection (29900 times) [2024-06-15 18:20:05,294][1651669] Updated weights for policy 0, policy_version 569975 (0.0021) [2024-06-15 18:20:05,767][1648981] Fps is (10 sec: 52426.5, 60 sec: 49697.8, 300 sec: 48874.3). Total num frames: 1167327232. Throughput: 0: 12583.7. Samples: 291902976. Policy #0 lag: (min: 8.0, avg: 128.0, max: 264.0) [2024-06-15 18:20:05,767][1648981] Avg episode reward: [(0, '431.490')] [2024-06-15 18:20:06,800][1651669] Updated weights for policy 0, policy_version 570021 (0.0157) [2024-06-15 18:20:07,289][1651669] Updated weights for policy 0, policy_version 570044 (0.0010) [2024-06-15 18:20:10,279][1651669] Updated weights for policy 0, policy_version 570105 (0.0013) [2024-06-15 18:20:10,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1167589376. Throughput: 0: 12538.3. Samples: 291937280. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:10,767][1648981] Avg episode reward: [(0, '469.780')] [2024-06-15 18:20:13,014][1651669] Updated weights for policy 0, policy_version 570168 (0.0015) [2024-06-15 18:20:15,776][1648981] Fps is (10 sec: 45831.8, 60 sec: 49689.9, 300 sec: 48650.5). Total num frames: 1167785984. Throughput: 0: 12660.7. Samples: 292021248. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:15,777][1648981] Avg episode reward: [(0, '484.310')] [2024-06-15 18:20:15,959][1651669] Updated weights for policy 0, policy_version 570225 (0.0012) [2024-06-15 18:20:17,651][1651669] Updated weights for policy 0, policy_version 570297 (0.0013) [2024-06-15 18:20:20,546][1651669] Updated weights for policy 0, policy_version 570337 (0.0015) [2024-06-15 18:20:20,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.1, 300 sec: 49096.5). Total num frames: 1168048128. Throughput: 0: 12435.9. Samples: 292082176. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:20,767][1648981] Avg episode reward: [(0, '486.750')] [2024-06-15 18:20:23,609][1651669] Updated weights for policy 0, policy_version 570400 (0.0029) [2024-06-15 18:20:24,323][1651669] Updated weights for policy 0, policy_version 570432 (0.0011) [2024-06-15 18:20:25,766][1648981] Fps is (10 sec: 49200.7, 60 sec: 49152.0, 300 sec: 48874.7). Total num frames: 1168277504. Throughput: 0: 12609.9. Samples: 292128768. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:25,767][1648981] Avg episode reward: [(0, '481.150')] [2024-06-15 18:20:26,712][1651669] Updated weights for policy 0, policy_version 570496 (0.0014) [2024-06-15 18:20:28,373][1651669] Updated weights for policy 0, policy_version 570551 (0.0139) [2024-06-15 18:20:30,787][1648981] Fps is (10 sec: 45782.7, 60 sec: 49135.5, 300 sec: 49206.1). Total num frames: 1168506880. Throughput: 0: 12384.8. Samples: 292195328. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:30,787][1648981] Avg episode reward: [(0, '483.840')] [2024-06-15 18:20:31,128][1651669] Updated weights for policy 0, policy_version 570581 (0.0026) [2024-06-15 18:20:34,712][1651669] Updated weights for policy 0, policy_version 570646 (0.0069) [2024-06-15 18:20:35,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1168769024. Throughput: 0: 12197.0. Samples: 292273152. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:35,767][1648981] Avg episode reward: [(0, '482.540')] [2024-06-15 18:20:36,442][1651669] Updated weights for policy 0, policy_version 570705 (0.0023) [2024-06-15 18:20:38,172][1651669] Updated weights for policy 0, policy_version 570769 (0.0025) [2024-06-15 18:20:39,212][1651669] Updated weights for policy 0, policy_version 570814 (0.0018) [2024-06-15 18:20:40,766][1648981] Fps is (10 sec: 52534.3, 60 sec: 49708.0, 300 sec: 49319.2). Total num frames: 1169031168. Throughput: 0: 12140.1. Samples: 292303360. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:40,767][1648981] Avg episode reward: [(0, '474.180')] [2024-06-15 18:20:42,279][1651669] Updated weights for policy 0, policy_version 570870 (0.0012) [2024-06-15 18:20:45,422][1651669] Updated weights for policy 0, policy_version 570912 (0.0012) [2024-06-15 18:20:45,537][1651274] Signal inference workers to stop experience collection... (29950 times) [2024-06-15 18:20:45,615][1651669] InferenceWorker_p0-w0: stopping experience collection (29950 times) [2024-06-15 18:20:45,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 1169227776. Throughput: 0: 12436.0. Samples: 292382208. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:45,767][1648981] Avg episode reward: [(0, '478.650')] [2024-06-15 18:20:45,769][1651274] Signal inference workers to resume experience collection... (29950 times) [2024-06-15 18:20:45,770][1651669] InferenceWorker_p0-w0: resuming experience collection (29950 times) [2024-06-15 18:20:47,400][1651669] Updated weights for policy 0, policy_version 570981 (0.0012) [2024-06-15 18:20:48,980][1651669] Updated weights for policy 0, policy_version 571040 (0.0011) [2024-06-15 18:20:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 1169555456. Throughput: 0: 12197.1. Samples: 292451840. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:50,767][1648981] Avg episode reward: [(0, '491.970')] [2024-06-15 18:20:52,305][1651669] Updated weights for policy 0, policy_version 571091 (0.0012) [2024-06-15 18:20:54,852][1651669] Updated weights for policy 0, policy_version 571152 (0.0011) [2024-06-15 18:20:55,766][1648981] Fps is (10 sec: 55705.0, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1169784832. Throughput: 0: 12344.9. Samples: 292492800. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:20:55,767][1648981] Avg episode reward: [(0, '502.510')] [2024-06-15 18:20:55,861][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000571200_1169817600.pth... [2024-06-15 18:20:55,900][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000565504_1158152192.pth [2024-06-15 18:20:57,181][1651669] Updated weights for policy 0, policy_version 571216 (0.0037) [2024-06-15 18:20:59,356][1651669] Updated weights for policy 0, policy_version 571296 (0.0012) [2024-06-15 18:21:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.4, 300 sec: 49319.5). Total num frames: 1170079744. Throughput: 0: 12142.8. Samples: 292567552. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:00,767][1648981] Avg episode reward: [(0, '493.620')] [2024-06-15 18:21:03,032][1651669] Updated weights for policy 0, policy_version 571344 (0.0114) [2024-06-15 18:21:04,253][1651669] Updated weights for policy 0, policy_version 571385 (0.0012) [2024-06-15 18:21:05,627][1651669] Updated weights for policy 0, policy_version 571440 (0.0011) [2024-06-15 18:21:05,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1170309120. Throughput: 0: 12435.8. Samples: 292641792. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:05,767][1648981] Avg episode reward: [(0, '479.360')] [2024-06-15 18:21:08,675][1651669] Updated weights for policy 0, policy_version 571493 (0.0011) [2024-06-15 18:21:10,347][1651669] Updated weights for policy 0, policy_version 571573 (0.0221) [2024-06-15 18:21:10,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 50244.2, 300 sec: 49318.6). Total num frames: 1170604032. Throughput: 0: 12253.8. Samples: 292680192. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:10,767][1648981] Avg episode reward: [(0, '474.060')] [2024-06-15 18:21:15,427][1651669] Updated weights for policy 0, policy_version 571635 (0.0131) [2024-06-15 18:21:15,782][1648981] Fps is (10 sec: 42533.2, 60 sec: 49147.2, 300 sec: 48873.5). Total num frames: 1170735104. Throughput: 0: 12482.6. Samples: 292756992. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:15,783][1648981] Avg episode reward: [(0, '465.070')] [2024-06-15 18:21:17,267][1651669] Updated weights for policy 0, policy_version 571707 (0.0011) [2024-06-15 18:21:20,543][1651669] Updated weights for policy 0, policy_version 571760 (0.0011) [2024-06-15 18:21:20,766][1648981] Fps is (10 sec: 36045.1, 60 sec: 48605.8, 300 sec: 48763.3). Total num frames: 1170964480. Throughput: 0: 12208.4. Samples: 292822528. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:20,767][1648981] Avg episode reward: [(0, '423.870')] [2024-06-15 18:21:21,601][1651669] Updated weights for policy 0, policy_version 571812 (0.0013) [2024-06-15 18:21:24,981][1651669] Updated weights for policy 0, policy_version 571857 (0.0016) [2024-06-15 18:21:25,766][1648981] Fps is (10 sec: 45947.6, 60 sec: 48605.9, 300 sec: 48652.1). Total num frames: 1171193856. Throughput: 0: 12538.3. Samples: 292867584. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:25,767][1648981] Avg episode reward: [(0, '413.210')] [2024-06-15 18:21:25,918][1651274] Signal inference workers to stop experience collection... (30000 times) [2024-06-15 18:21:26,000][1651669] InferenceWorker_p0-w0: stopping experience collection (30000 times) [2024-06-15 18:21:26,287][1651274] Signal inference workers to resume experience collection... (30000 times) [2024-06-15 18:21:26,288][1651669] InferenceWorker_p0-w0: resuming experience collection (30000 times) [2024-06-15 18:21:26,917][1651669] Updated weights for policy 0, policy_version 571922 (0.0137) [2024-06-15 18:21:30,721][1651669] Updated weights for policy 0, policy_version 571984 (0.0012) [2024-06-15 18:21:30,774][1648981] Fps is (10 sec: 45839.8, 60 sec: 48615.9, 300 sec: 48539.8). Total num frames: 1171423232. Throughput: 0: 12183.5. Samples: 292930560. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:30,775][1648981] Avg episode reward: [(0, '411.030')] [2024-06-15 18:21:32,435][1651669] Updated weights for policy 0, policy_version 572064 (0.0122) [2024-06-15 18:21:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 1171652608. Throughput: 0: 12356.3. Samples: 293007872. Policy #0 lag: (min: 63.0, avg: 186.6, max: 319.0) [2024-06-15 18:21:35,767][1648981] Avg episode reward: [(0, '411.060')] [2024-06-15 18:21:36,382][1651669] Updated weights for policy 0, policy_version 572112 (0.0013) [2024-06-15 18:21:38,309][1651669] Updated weights for policy 0, policy_version 572179 (0.0010) [2024-06-15 18:21:39,410][1651669] Updated weights for policy 0, policy_version 572223 (0.0011) [2024-06-15 18:21:40,766][1648981] Fps is (10 sec: 49190.4, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1171914752. Throughput: 0: 12083.2. Samples: 293036544. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:21:40,767][1648981] Avg episode reward: [(0, '409.720')] [2024-06-15 18:21:42,411][1651669] Updated weights for policy 0, policy_version 572272 (0.0016) [2024-06-15 18:21:43,775][1651669] Updated weights for policy 0, policy_version 572336 (0.0011) [2024-06-15 18:21:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1172176896. Throughput: 0: 12049.1. Samples: 293109760. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:21:45,767][1648981] Avg episode reward: [(0, '406.490')] [2024-06-15 18:21:47,839][1651669] Updated weights for policy 0, policy_version 572385 (0.0021) [2024-06-15 18:21:49,509][1651669] Updated weights for policy 0, policy_version 572450 (0.0012) [2024-06-15 18:21:50,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 48059.6, 300 sec: 48541.0). Total num frames: 1172439040. Throughput: 0: 11935.3. Samples: 293178880. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:21:50,767][1648981] Avg episode reward: [(0, '399.770')] [2024-06-15 18:21:52,357][1651669] Updated weights for policy 0, policy_version 572500 (0.0026) [2024-06-15 18:21:53,241][1651669] Updated weights for policy 0, policy_version 572544 (0.0034) [2024-06-15 18:21:55,768][1648981] Fps is (10 sec: 52417.7, 60 sec: 48604.2, 300 sec: 48985.0). Total num frames: 1172701184. Throughput: 0: 12003.0. Samples: 293220352. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:21:55,769][1648981] Avg episode reward: [(0, '379.840')] [2024-06-15 18:21:58,226][1651669] Updated weights for policy 0, policy_version 572624 (0.0016) [2024-06-15 18:22:00,246][1651669] Updated weights for policy 0, policy_version 572689 (0.0012) [2024-06-15 18:22:00,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 46967.4, 300 sec: 48541.1). Total num frames: 1172897792. Throughput: 0: 11814.3. Samples: 293288448. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:00,767][1648981] Avg episode reward: [(0, '392.860')] [2024-06-15 18:22:03,286][1651669] Updated weights for policy 0, policy_version 572753 (0.0108) [2024-06-15 18:22:04,277][1651669] Updated weights for policy 0, policy_version 572800 (0.0013) [2024-06-15 18:22:04,933][1651274] Signal inference workers to stop experience collection... (30050 times) [2024-06-15 18:22:04,981][1651669] InferenceWorker_p0-w0: stopping experience collection (30050 times) [2024-06-15 18:22:05,151][1651274] Signal inference workers to resume experience collection... (30050 times) [2024-06-15 18:22:05,152][1651669] InferenceWorker_p0-w0: resuming experience collection (30050 times) [2024-06-15 18:22:05,766][1648981] Fps is (10 sec: 49162.2, 60 sec: 48060.1, 300 sec: 49096.5). Total num frames: 1173192704. Throughput: 0: 11753.3. Samples: 293351424. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:05,767][1648981] Avg episode reward: [(0, '395.820')] [2024-06-15 18:22:05,811][1651669] Updated weights for policy 0, policy_version 572864 (0.0012) [2024-06-15 18:22:10,343][1651669] Updated weights for policy 0, policy_version 572928 (0.0012) [2024-06-15 18:22:10,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1173356544. Throughput: 0: 11673.6. Samples: 293392896. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:10,767][1648981] Avg episode reward: [(0, '389.040')] [2024-06-15 18:22:12,418][1651669] Updated weights for policy 0, policy_version 572987 (0.0012) [2024-06-15 18:22:14,528][1651669] Updated weights for policy 0, policy_version 573035 (0.0012) [2024-06-15 18:22:15,092][1651669] Updated weights for policy 0, policy_version 573059 (0.0012) [2024-06-15 18:22:15,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49165.0, 300 sec: 49096.5). Total num frames: 1173684224. Throughput: 0: 11982.9. Samples: 293469696. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:15,767][1648981] Avg episode reward: [(0, '396.000')] [2024-06-15 18:22:16,206][1651669] Updated weights for policy 0, policy_version 573115 (0.0014) [2024-06-15 18:22:19,898][1651669] Updated weights for policy 0, policy_version 573157 (0.0031) [2024-06-15 18:22:20,768][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1173880832. Throughput: 0: 11992.1. Samples: 293547520. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:20,770][1648981] Avg episode reward: [(0, '406.100')] [2024-06-15 18:22:21,364][1651669] Updated weights for policy 0, policy_version 573200 (0.0013) [2024-06-15 18:22:22,232][1651669] Updated weights for policy 0, policy_version 573244 (0.0015) [2024-06-15 18:22:24,260][1651669] Updated weights for policy 0, policy_version 573296 (0.0130) [2024-06-15 18:22:25,456][1651669] Updated weights for policy 0, policy_version 573330 (0.0012) [2024-06-15 18:22:25,770][1648981] Fps is (10 sec: 52407.8, 60 sec: 50241.0, 300 sec: 49096.1). Total num frames: 1174208512. Throughput: 0: 12264.2. Samples: 293588480. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:25,771][1648981] Avg episode reward: [(0, '407.720')] [2024-06-15 18:22:29,319][1651669] Updated weights for policy 0, policy_version 573379 (0.0026) [2024-06-15 18:22:30,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 49158.4, 300 sec: 48652.2). Total num frames: 1174372352. Throughput: 0: 12515.6. Samples: 293672960. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:30,767][1648981] Avg episode reward: [(0, '398.660')] [2024-06-15 18:22:30,813][1651669] Updated weights for policy 0, policy_version 573440 (0.0012) [2024-06-15 18:22:32,232][1651669] Updated weights for policy 0, policy_version 573502 (0.0012) [2024-06-15 18:22:34,666][1651669] Updated weights for policy 0, policy_version 573560 (0.0012) [2024-06-15 18:22:35,766][1648981] Fps is (10 sec: 49171.6, 60 sec: 50790.4, 300 sec: 48985.5). Total num frames: 1174700032. Throughput: 0: 12424.6. Samples: 293737984. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:35,767][1648981] Avg episode reward: [(0, '399.170')] [2024-06-15 18:22:36,588][1651669] Updated weights for policy 0, policy_version 573628 (0.0013) [2024-06-15 18:22:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49152.0, 300 sec: 48653.3). Total num frames: 1174863872. Throughput: 0: 12334.1. Samples: 293775360. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:40,767][1648981] Avg episode reward: [(0, '392.390')] [2024-06-15 18:22:40,946][1651669] Updated weights for policy 0, policy_version 573668 (0.0015) [2024-06-15 18:22:42,537][1651669] Updated weights for policy 0, policy_version 573729 (0.0010) [2024-06-15 18:22:44,669][1651669] Updated weights for policy 0, policy_version 573778 (0.0013) [2024-06-15 18:22:45,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 50244.2, 300 sec: 48874.8). Total num frames: 1175191552. Throughput: 0: 12435.9. Samples: 293848064. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:45,767][1648981] Avg episode reward: [(0, '401.090')] [2024-06-15 18:22:46,460][1651274] Signal inference workers to stop experience collection... (30100 times) [2024-06-15 18:22:46,496][1651669] InferenceWorker_p0-w0: stopping experience collection (30100 times) [2024-06-15 18:22:46,674][1651274] Signal inference workers to resume experience collection... (30100 times) [2024-06-15 18:22:46,675][1651669] InferenceWorker_p0-w0: resuming experience collection (30100 times) [2024-06-15 18:22:46,812][1651669] Updated weights for policy 0, policy_version 573844 (0.0048) [2024-06-15 18:22:50,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 48059.8, 300 sec: 48656.5). Total num frames: 1175322624. Throughput: 0: 12777.2. Samples: 293926400. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:50,767][1648981] Avg episode reward: [(0, '414.600')] [2024-06-15 18:22:51,514][1651669] Updated weights for policy 0, policy_version 573908 (0.0013) [2024-06-15 18:22:53,546][1651669] Updated weights for policy 0, policy_version 573986 (0.0012) [2024-06-15 18:22:55,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48607.5, 300 sec: 48541.1). Total num frames: 1175617536. Throughput: 0: 12492.8. Samples: 293955072. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:22:55,767][1648981] Avg episode reward: [(0, '422.300')] [2024-06-15 18:22:56,056][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000574048_1175650304.pth... [2024-06-15 18:22:56,213][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000568320_1163919360.pth [2024-06-15 18:22:56,418][1651669] Updated weights for policy 0, policy_version 574064 (0.0017) [2024-06-15 18:22:58,383][1651669] Updated weights for policy 0, policy_version 574134 (0.0013) [2024-06-15 18:23:00,772][1648981] Fps is (10 sec: 52397.7, 60 sec: 49147.0, 300 sec: 48873.3). Total num frames: 1175846912. Throughput: 0: 12320.4. Samples: 294024192. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:23:00,773][1648981] Avg episode reward: [(0, '420.640')] [2024-06-15 18:23:02,475][1651669] Updated weights for policy 0, policy_version 574176 (0.0011) [2024-06-15 18:23:04,292][1651669] Updated weights for policy 0, policy_version 574240 (0.0081) [2024-06-15 18:23:05,086][1651669] Updated weights for policy 0, policy_version 574269 (0.0013) [2024-06-15 18:23:05,778][1648981] Fps is (10 sec: 49094.3, 60 sec: 48596.3, 300 sec: 48428.1). Total num frames: 1176109056. Throughput: 0: 12284.8. Samples: 294100480. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:23:05,779][1648981] Avg episode reward: [(0, '423.990')] [2024-06-15 18:23:07,175][1651669] Updated weights for policy 0, policy_version 574325 (0.0089) [2024-06-15 18:23:09,574][1651669] Updated weights for policy 0, policy_version 574384 (0.0012) [2024-06-15 18:23:10,766][1648981] Fps is (10 sec: 52461.0, 60 sec: 50244.4, 300 sec: 48874.3). Total num frames: 1176371200. Throughput: 0: 12209.4. Samples: 294137856. Policy #0 lag: (min: 127.0, avg: 209.1, max: 366.0) [2024-06-15 18:23:10,767][1648981] Avg episode reward: [(0, '442.650')] [2024-06-15 18:23:13,503][1651669] Updated weights for policy 0, policy_version 574435 (0.0011) [2024-06-15 18:23:15,120][1651669] Updated weights for policy 0, policy_version 574503 (0.0118) [2024-06-15 18:23:15,786][1648981] Fps is (10 sec: 52386.9, 60 sec: 49135.8, 300 sec: 48759.9). Total num frames: 1176633344. Throughput: 0: 11952.8. Samples: 294211072. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:15,787][1648981] Avg episode reward: [(0, '433.320')] [2024-06-15 18:23:16,616][1651669] Updated weights for policy 0, policy_version 574530 (0.0012) [2024-06-15 18:23:17,920][1651669] Updated weights for policy 0, policy_version 574592 (0.0135) [2024-06-15 18:23:20,767][1648981] Fps is (10 sec: 49149.5, 60 sec: 49697.8, 300 sec: 48763.7). Total num frames: 1176862720. Throughput: 0: 11992.0. Samples: 294277632. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:20,767][1648981] Avg episode reward: [(0, '426.960')] [2024-06-15 18:23:23,600][1651669] Updated weights for policy 0, policy_version 574657 (0.0011) [2024-06-15 18:23:24,793][1651669] Updated weights for policy 0, policy_version 574720 (0.0021) [2024-06-15 18:23:25,769][1648981] Fps is (10 sec: 42671.6, 60 sec: 47514.7, 300 sec: 48762.8). Total num frames: 1177059328. Throughput: 0: 12150.8. Samples: 294322176. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:25,769][1648981] Avg episode reward: [(0, '415.310')] [2024-06-15 18:23:26,446][1651669] Updated weights for policy 0, policy_version 574775 (0.0014) [2024-06-15 18:23:28,451][1651669] Updated weights for policy 0, policy_version 574838 (0.0016) [2024-06-15 18:23:30,348][1651274] Signal inference workers to stop experience collection... (30150 times) [2024-06-15 18:23:30,404][1651669] InferenceWorker_p0-w0: stopping experience collection (30150 times) [2024-06-15 18:23:30,681][1651274] Signal inference workers to resume experience collection... (30150 times) [2024-06-15 18:23:30,683][1651669] InferenceWorker_p0-w0: resuming experience collection (30150 times) [2024-06-15 18:23:30,786][1648981] Fps is (10 sec: 45786.5, 60 sec: 49135.7, 300 sec: 48871.0). Total num frames: 1177321472. Throughput: 0: 11975.5. Samples: 294387200. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:30,787][1648981] Avg episode reward: [(0, '409.700')] [2024-06-15 18:23:31,764][1651669] Updated weights for policy 0, policy_version 574902 (0.0015) [2024-06-15 18:23:35,487][1651669] Updated weights for policy 0, policy_version 574929 (0.0030) [2024-06-15 18:23:35,767][1648981] Fps is (10 sec: 42609.0, 60 sec: 46421.2, 300 sec: 48763.2). Total num frames: 1177485312. Throughput: 0: 11935.3. Samples: 294463488. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:35,767][1648981] Avg episode reward: [(0, '420.670')] [2024-06-15 18:23:37,039][1651669] Updated weights for policy 0, policy_version 574995 (0.0011) [2024-06-15 18:23:39,060][1651669] Updated weights for policy 0, policy_version 575041 (0.0016) [2024-06-15 18:23:40,269][1651669] Updated weights for policy 0, policy_version 575104 (0.0012) [2024-06-15 18:23:40,767][1648981] Fps is (10 sec: 49248.2, 60 sec: 49151.7, 300 sec: 48877.6). Total num frames: 1177812992. Throughput: 0: 12003.5. Samples: 294495232. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:40,769][1648981] Avg episode reward: [(0, '427.100')] [2024-06-15 18:23:45,776][1648981] Fps is (10 sec: 45830.3, 60 sec: 45867.7, 300 sec: 48761.7). Total num frames: 1177944064. Throughput: 0: 11957.0. Samples: 294562304. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:45,777][1648981] Avg episode reward: [(0, '435.100')] [2024-06-15 18:23:46,899][1651669] Updated weights for policy 0, policy_version 575200 (0.0134) [2024-06-15 18:23:48,597][1651669] Updated weights for policy 0, policy_version 575280 (0.0043) [2024-06-15 18:23:50,321][1651669] Updated weights for policy 0, policy_version 575312 (0.0015) [2024-06-15 18:23:50,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 48606.0, 300 sec: 48541.1). Total num frames: 1178238976. Throughput: 0: 12018.1. Samples: 294641152. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:50,767][1648981] Avg episode reward: [(0, '443.690')] [2024-06-15 18:23:51,505][1651669] Updated weights for policy 0, policy_version 575357 (0.0011) [2024-06-15 18:23:53,409][1651669] Updated weights for policy 0, policy_version 575417 (0.0014) [2024-06-15 18:23:55,766][1648981] Fps is (10 sec: 52480.8, 60 sec: 47513.6, 300 sec: 48876.2). Total num frames: 1178468352. Throughput: 0: 11832.9. Samples: 294670336. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:23:55,767][1648981] Avg episode reward: [(0, '435.410')] [2024-06-15 18:23:58,008][1651669] Updated weights for policy 0, policy_version 575458 (0.0012) [2024-06-15 18:23:59,155][1651669] Updated weights for policy 0, policy_version 575507 (0.0011) [2024-06-15 18:24:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48064.6, 300 sec: 48763.2). Total num frames: 1178730496. Throughput: 0: 11792.5. Samples: 294741504. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:00,767][1648981] Avg episode reward: [(0, '423.780')] [2024-06-15 18:24:01,519][1651669] Updated weights for policy 0, policy_version 575568 (0.0010) [2024-06-15 18:24:02,705][1651669] Updated weights for policy 0, policy_version 575616 (0.0012) [2024-06-15 18:24:04,739][1651669] Updated weights for policy 0, policy_version 575677 (0.0012) [2024-06-15 18:24:05,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 48069.0, 300 sec: 48874.3). Total num frames: 1178992640. Throughput: 0: 11855.7. Samples: 294811136. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:05,767][1648981] Avg episode reward: [(0, '419.180')] [2024-06-15 18:24:09,556][1651669] Updated weights for policy 0, policy_version 575728 (0.0010) [2024-06-15 18:24:10,767][1648981] Fps is (10 sec: 42598.1, 60 sec: 46421.2, 300 sec: 48652.1). Total num frames: 1179156480. Throughput: 0: 11788.0. Samples: 294852608. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:10,767][1648981] Avg episode reward: [(0, '416.620')] [2024-06-15 18:24:11,495][1651669] Updated weights for policy 0, policy_version 575805 (0.0010) [2024-06-15 18:24:13,162][1651274] Signal inference workers to stop experience collection... (30200 times) [2024-06-15 18:24:13,283][1651669] InferenceWorker_p0-w0: stopping experience collection (30200 times) [2024-06-15 18:24:13,352][1651274] Signal inference workers to resume experience collection... (30200 times) [2024-06-15 18:24:13,352][1651669] InferenceWorker_p0-w0: resuming experience collection (30200 times) [2024-06-15 18:24:13,524][1651669] Updated weights for policy 0, policy_version 575847 (0.0010) [2024-06-15 18:24:15,608][1651669] Updated weights for policy 0, policy_version 575928 (0.0012) [2024-06-15 18:24:15,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48075.6, 300 sec: 48874.3). Total num frames: 1179516928. Throughput: 0: 11849.5. Samples: 294920192. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:15,767][1648981] Avg episode reward: [(0, '400.610')] [2024-06-15 18:24:20,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 45875.4, 300 sec: 48430.0). Total num frames: 1179615232. Throughput: 0: 11707.7. Samples: 294990336. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:20,767][1648981] Avg episode reward: [(0, '406.200')] [2024-06-15 18:24:20,877][1651669] Updated weights for policy 0, policy_version 575988 (0.0014) [2024-06-15 18:24:22,557][1651669] Updated weights for policy 0, policy_version 576058 (0.0012) [2024-06-15 18:24:25,316][1651669] Updated weights for policy 0, policy_version 576113 (0.0013) [2024-06-15 18:24:25,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 47515.7, 300 sec: 48652.2). Total num frames: 1179910144. Throughput: 0: 11742.0. Samples: 295023616. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:25,767][1648981] Avg episode reward: [(0, '443.580')] [2024-06-15 18:24:26,956][1651669] Updated weights for policy 0, policy_version 576184 (0.0013) [2024-06-15 18:24:30,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 45344.1, 300 sec: 48430.0). Total num frames: 1180041216. Throughput: 0: 11972.1. Samples: 295100928. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:30,767][1648981] Avg episode reward: [(0, '447.470')] [2024-06-15 18:24:31,399][1651669] Updated weights for policy 0, policy_version 576225 (0.0011) [2024-06-15 18:24:32,502][1651669] Updated weights for policy 0, policy_version 576276 (0.0011) [2024-06-15 18:24:35,399][1651669] Updated weights for policy 0, policy_version 576336 (0.0013) [2024-06-15 18:24:35,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 47513.8, 300 sec: 48432.0). Total num frames: 1180336128. Throughput: 0: 11855.7. Samples: 295174656. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:35,767][1648981] Avg episode reward: [(0, '447.110')] [2024-06-15 18:24:36,908][1651669] Updated weights for policy 0, policy_version 576403 (0.0101) [2024-06-15 18:24:37,729][1651669] Updated weights for policy 0, policy_version 576448 (0.0013) [2024-06-15 18:24:40,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 45875.4, 300 sec: 48430.0). Total num frames: 1180565504. Throughput: 0: 11958.0. Samples: 295208448. Policy #0 lag: (min: 2.0, avg: 89.5, max: 258.0) [2024-06-15 18:24:40,767][1648981] Avg episode reward: [(0, '451.410')] [2024-06-15 18:24:42,882][1651669] Updated weights for policy 0, policy_version 576528 (0.0013) [2024-06-15 18:24:45,539][1651669] Updated weights for policy 0, policy_version 576592 (0.0012) [2024-06-15 18:24:45,777][1648981] Fps is (10 sec: 52375.6, 60 sec: 48605.7, 300 sec: 48428.3). Total num frames: 1180860416. Throughput: 0: 11921.3. Samples: 295278080. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:24:45,777][1648981] Avg episode reward: [(0, '457.810')] [2024-06-15 18:24:46,657][1651669] Updated weights for policy 0, policy_version 576641 (0.0012) [2024-06-15 18:24:47,806][1651669] Updated weights for policy 0, policy_version 576698 (0.0012) [2024-06-15 18:24:50,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 47513.5, 300 sec: 48430.0). Total num frames: 1181089792. Throughput: 0: 12140.1. Samples: 295357440. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:24:50,767][1648981] Avg episode reward: [(0, '448.610')] [2024-06-15 18:24:52,804][1651274] Signal inference workers to stop experience collection... (30250 times) [2024-06-15 18:24:52,876][1651669] InferenceWorker_p0-w0: stopping experience collection (30250 times) [2024-06-15 18:24:53,094][1651274] Signal inference workers to resume experience collection... (30250 times) [2024-06-15 18:24:53,095][1651669] InferenceWorker_p0-w0: resuming experience collection (30250 times) [2024-06-15 18:24:53,691][1651669] Updated weights for policy 0, policy_version 576769 (0.0131) [2024-06-15 18:24:54,823][1651669] Updated weights for policy 0, policy_version 576825 (0.0011) [2024-06-15 18:24:55,767][1648981] Fps is (10 sec: 52480.3, 60 sec: 48605.6, 300 sec: 48541.0). Total num frames: 1181384704. Throughput: 0: 11958.0. Samples: 295390720. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:24:55,767][1648981] Avg episode reward: [(0, '438.330')] [2024-06-15 18:24:56,126][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000576864_1181417472.pth... [2024-06-15 18:24:56,243][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000571200_1169817600.pth [2024-06-15 18:24:56,703][1651669] Updated weights for policy 0, policy_version 576887 (0.0011) [2024-06-15 18:24:58,409][1651669] Updated weights for policy 0, policy_version 576929 (0.0012) [2024-06-15 18:25:00,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 48059.8, 300 sec: 48430.1). Total num frames: 1181614080. Throughput: 0: 12083.2. Samples: 295463936. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:00,767][1648981] Avg episode reward: [(0, '440.460')] [2024-06-15 18:25:02,993][1651669] Updated weights for policy 0, policy_version 576979 (0.0012) [2024-06-15 18:25:04,207][1651669] Updated weights for policy 0, policy_version 577027 (0.0012) [2024-06-15 18:25:05,716][1651669] Updated weights for policy 0, policy_version 577083 (0.0012) [2024-06-15 18:25:05,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 47513.8, 300 sec: 48318.9). Total num frames: 1181843456. Throughput: 0: 12117.4. Samples: 295535616. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:05,767][1648981] Avg episode reward: [(0, '458.210')] [2024-06-15 18:25:07,149][1651669] Updated weights for policy 0, policy_version 577136 (0.0032) [2024-06-15 18:25:09,899][1651669] Updated weights for policy 0, policy_version 577189 (0.0065) [2024-06-15 18:25:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49698.2, 300 sec: 48653.8). Total num frames: 1182138368. Throughput: 0: 12174.2. Samples: 295571456. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:10,767][1648981] Avg episode reward: [(0, '451.300')] [2024-06-15 18:25:13,769][1651669] Updated weights for policy 0, policy_version 577232 (0.0126) [2024-06-15 18:25:15,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1182334976. Throughput: 0: 12162.8. Samples: 295648256. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:15,767][1648981] Avg episode reward: [(0, '450.390')] [2024-06-15 18:25:15,908][1651669] Updated weights for policy 0, policy_version 577313 (0.0013) [2024-06-15 18:25:17,553][1651669] Updated weights for policy 0, policy_version 577360 (0.0014) [2024-06-15 18:25:20,769][1648981] Fps is (10 sec: 39313.1, 60 sec: 48604.3, 300 sec: 48318.6). Total num frames: 1182531584. Throughput: 0: 12037.1. Samples: 295716352. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:20,769][1648981] Avg episode reward: [(0, '475.940')] [2024-06-15 18:25:20,820][1651669] Updated weights for policy 0, policy_version 577424 (0.0014) [2024-06-15 18:25:25,206][1651669] Updated weights for policy 0, policy_version 577505 (0.0015) [2024-06-15 18:25:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 47513.6, 300 sec: 48322.2). Total num frames: 1182760960. Throughput: 0: 12037.7. Samples: 295750144. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:25,767][1648981] Avg episode reward: [(0, '464.050')] [2024-06-15 18:25:27,254][1651669] Updated weights for policy 0, policy_version 577588 (0.0013) [2024-06-15 18:25:28,480][1651669] Updated weights for policy 0, policy_version 577602 (0.0011) [2024-06-15 18:25:29,544][1651669] Updated weights for policy 0, policy_version 577655 (0.0012) [2024-06-15 18:25:30,774][1648981] Fps is (10 sec: 52399.6, 60 sec: 50237.7, 300 sec: 48428.7). Total num frames: 1183055872. Throughput: 0: 11913.2. Samples: 295814144. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:30,775][1648981] Avg episode reward: [(0, '461.510')] [2024-06-15 18:25:33,034][1651669] Updated weights for policy 0, policy_version 577726 (0.0012) [2024-06-15 18:25:34,980][1651274] Signal inference workers to stop experience collection... (30300 times) [2024-06-15 18:25:35,031][1651669] InferenceWorker_p0-w0: stopping experience collection (30300 times) [2024-06-15 18:25:35,238][1651274] Signal inference workers to resume experience collection... (30300 times) [2024-06-15 18:25:35,242][1651669] InferenceWorker_p0-w0: resuming experience collection (30300 times) [2024-06-15 18:25:35,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48605.8, 300 sec: 48207.8). Total num frames: 1183252480. Throughput: 0: 11844.3. Samples: 295890432. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:35,767][1648981] Avg episode reward: [(0, '459.520')] [2024-06-15 18:25:36,018][1651669] Updated weights for policy 0, policy_version 577792 (0.0078) [2024-06-15 18:25:38,575][1651669] Updated weights for policy 0, policy_version 577851 (0.0011) [2024-06-15 18:25:39,722][1651669] Updated weights for policy 0, policy_version 577878 (0.0099) [2024-06-15 18:25:40,579][1651669] Updated weights for policy 0, policy_version 577916 (0.0015) [2024-06-15 18:25:40,766][1648981] Fps is (10 sec: 52469.1, 60 sec: 50244.3, 300 sec: 48652.1). Total num frames: 1183580160. Throughput: 0: 11855.7. Samples: 295924224. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:40,767][1648981] Avg episode reward: [(0, '455.310')] [2024-06-15 18:25:45,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47521.6, 300 sec: 47985.7). Total num frames: 1183711232. Throughput: 0: 11946.7. Samples: 296001536. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:45,767][1648981] Avg episode reward: [(0, '466.740')] [2024-06-15 18:25:46,177][1651669] Updated weights for policy 0, policy_version 577985 (0.0028) [2024-06-15 18:25:47,534][1651669] Updated weights for policy 0, policy_version 578048 (0.0012) [2024-06-15 18:25:49,408][1651669] Updated weights for policy 0, policy_version 578108 (0.0048) [2024-06-15 18:25:50,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 49152.2, 300 sec: 48318.9). Total num frames: 1184038912. Throughput: 0: 11935.3. Samples: 296072704. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:50,767][1648981] Avg episode reward: [(0, '463.520')] [2024-06-15 18:25:51,123][1651669] Updated weights for policy 0, policy_version 578166 (0.0010) [2024-06-15 18:25:55,032][1651669] Updated weights for policy 0, policy_version 578227 (0.0083) [2024-06-15 18:25:55,796][1648981] Fps is (10 sec: 52275.5, 60 sec: 47490.6, 300 sec: 47980.9). Total num frames: 1184235520. Throughput: 0: 12109.5. Samples: 296116736. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:25:55,796][1648981] Avg episode reward: [(0, '448.530')] [2024-06-15 18:25:57,331][1651669] Updated weights for policy 0, policy_version 578274 (0.0012) [2024-06-15 18:25:59,678][1651669] Updated weights for policy 0, policy_version 578352 (0.0012) [2024-06-15 18:26:00,771][1648981] Fps is (10 sec: 45852.8, 60 sec: 48055.8, 300 sec: 48096.0). Total num frames: 1184497664. Throughput: 0: 11786.1. Samples: 296178688. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:26:00,772][1648981] Avg episode reward: [(0, '451.490')] [2024-06-15 18:26:01,022][1651669] Updated weights for policy 0, policy_version 578387 (0.0011) [2024-06-15 18:26:04,431][1651669] Updated weights for policy 0, policy_version 578448 (0.0101) [2024-06-15 18:26:05,138][1651669] Updated weights for policy 0, policy_version 578492 (0.0013) [2024-06-15 18:26:05,766][1648981] Fps is (10 sec: 52582.7, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1184759808. Throughput: 0: 12243.1. Samples: 296267264. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:26:05,767][1648981] Avg episode reward: [(0, '454.630')] [2024-06-15 18:26:08,229][1651669] Updated weights for policy 0, policy_version 578548 (0.0011) [2024-06-15 18:26:09,272][1651669] Updated weights for policy 0, policy_version 578599 (0.0012) [2024-06-15 18:26:10,766][1648981] Fps is (10 sec: 52453.9, 60 sec: 48059.7, 300 sec: 48432.6). Total num frames: 1185021952. Throughput: 0: 12197.0. Samples: 296299008. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:26:10,767][1648981] Avg episode reward: [(0, '454.480')] [2024-06-15 18:26:11,716][1651669] Updated weights for policy 0, policy_version 578658 (0.0014) [2024-06-15 18:26:14,485][1651669] Updated weights for policy 0, policy_version 578704 (0.0013) [2024-06-15 18:26:15,669][1651669] Updated weights for policy 0, policy_version 578750 (0.0010) [2024-06-15 18:26:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49151.9, 300 sec: 48541.1). Total num frames: 1185284096. Throughput: 0: 12677.0. Samples: 296384512. Policy #0 lag: (min: 15.0, avg: 126.8, max: 271.0) [2024-06-15 18:26:15,767][1648981] Avg episode reward: [(0, '447.130')] [2024-06-15 18:26:17,161][1651274] Signal inference workers to stop experience collection... (30350 times) [2024-06-15 18:26:17,199][1651669] InferenceWorker_p0-w0: stopping experience collection (30350 times) [2024-06-15 18:26:17,500][1651274] Signal inference workers to resume experience collection... (30350 times) [2024-06-15 18:26:17,501][1651669] InferenceWorker_p0-w0: resuming experience collection (30350 times) [2024-06-15 18:26:18,073][1651669] Updated weights for policy 0, policy_version 578802 (0.0013) [2024-06-15 18:26:19,627][1651669] Updated weights for policy 0, policy_version 578873 (0.0012) [2024-06-15 18:26:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50246.1, 300 sec: 48652.1). Total num frames: 1185546240. Throughput: 0: 12458.7. Samples: 296451072. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:20,767][1648981] Avg episode reward: [(0, '437.660')] [2024-06-15 18:26:22,615][1651669] Updated weights for policy 0, policy_version 578928 (0.0013) [2024-06-15 18:26:25,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 49152.0, 300 sec: 48431.3). Total num frames: 1185710080. Throughput: 0: 12504.2. Samples: 296486912. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:25,767][1648981] Avg episode reward: [(0, '410.100')] [2024-06-15 18:26:26,401][1651669] Updated weights for policy 0, policy_version 578996 (0.0012) [2024-06-15 18:26:28,879][1651669] Updated weights for policy 0, policy_version 579045 (0.0108) [2024-06-15 18:26:30,411][1651669] Updated weights for policy 0, policy_version 579109 (0.0011) [2024-06-15 18:26:30,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49704.6, 300 sec: 48763.2). Total num frames: 1186037760. Throughput: 0: 12344.9. Samples: 296557056. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:30,767][1648981] Avg episode reward: [(0, '411.410')] [2024-06-15 18:26:33,121][1651669] Updated weights for policy 0, policy_version 579156 (0.0017) [2024-06-15 18:26:35,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 49151.9, 300 sec: 48430.0). Total num frames: 1186201600. Throughput: 0: 12424.5. Samples: 296631808. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:35,767][1648981] Avg episode reward: [(0, '396.630')] [2024-06-15 18:26:36,162][1651669] Updated weights for policy 0, policy_version 579216 (0.0021) [2024-06-15 18:26:37,032][1651669] Updated weights for policy 0, policy_version 579263 (0.0011) [2024-06-15 18:26:39,607][1651669] Updated weights for policy 0, policy_version 579315 (0.0107) [2024-06-15 18:26:40,688][1651669] Updated weights for policy 0, policy_version 579345 (0.0015) [2024-06-15 18:26:40,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1186496512. Throughput: 0: 12478.1. Samples: 296677888. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:40,767][1648981] Avg episode reward: [(0, '381.800')] [2024-06-15 18:26:43,399][1651669] Updated weights for policy 0, policy_version 579408 (0.0011) [2024-06-15 18:26:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1186725888. Throughput: 0: 12505.5. Samples: 296741376. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:45,767][1648981] Avg episode reward: [(0, '387.990')] [2024-06-15 18:26:46,989][1651669] Updated weights for policy 0, policy_version 579472 (0.0020) [2024-06-15 18:26:49,673][1651669] Updated weights for policy 0, policy_version 579536 (0.0010) [2024-06-15 18:26:50,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49151.9, 300 sec: 48430.3). Total num frames: 1186988032. Throughput: 0: 12253.8. Samples: 296818688. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:50,767][1648981] Avg episode reward: [(0, '407.580')] [2024-06-15 18:26:51,822][1651669] Updated weights for policy 0, policy_version 579616 (0.0014) [2024-06-15 18:26:53,483][1651669] Updated weights for policy 0, policy_version 579649 (0.0010) [2024-06-15 18:26:55,014][1651669] Updated weights for policy 0, policy_version 579712 (0.0013) [2024-06-15 18:26:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50268.8, 300 sec: 48652.1). Total num frames: 1187250176. Throughput: 0: 12288.0. Samples: 296851968. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:26:55,767][1648981] Avg episode reward: [(0, '404.230')] [2024-06-15 18:26:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000579712_1187250176.pth... [2024-06-15 18:26:55,845][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000574048_1175650304.pth [2024-06-15 18:26:59,155][1651669] Updated weights for policy 0, policy_version 579772 (0.0083) [2024-06-15 18:27:00,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 48063.6, 300 sec: 48096.8). Total num frames: 1187381248. Throughput: 0: 11901.2. Samples: 296920064. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:00,767][1648981] Avg episode reward: [(0, '396.800')] [2024-06-15 18:27:01,674][1651274] Signal inference workers to stop experience collection... (30400 times) [2024-06-15 18:27:01,750][1651669] InferenceWorker_p0-w0: stopping experience collection (30400 times) [2024-06-15 18:27:01,868][1651274] Signal inference workers to resume experience collection... (30400 times) [2024-06-15 18:27:01,869][1651669] InferenceWorker_p0-w0: resuming experience collection (30400 times) [2024-06-15 18:27:02,555][1651669] Updated weights for policy 0, policy_version 579840 (0.0032) [2024-06-15 18:27:04,130][1651669] Updated weights for policy 0, policy_version 579902 (0.0082) [2024-06-15 18:27:05,714][1651669] Updated weights for policy 0, policy_version 579961 (0.0014) [2024-06-15 18:27:05,767][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1187741696. Throughput: 0: 11946.7. Samples: 296988672. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:05,767][1648981] Avg episode reward: [(0, '441.070')] [2024-06-15 18:27:08,416][1651669] Updated weights for policy 0, policy_version 579991 (0.0033) [2024-06-15 18:27:10,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1187905536. Throughput: 0: 12288.0. Samples: 297039872. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:10,767][1648981] Avg episode reward: [(0, '470.550')] [2024-06-15 18:27:12,534][1651669] Updated weights for policy 0, policy_version 580064 (0.0014) [2024-06-15 18:27:14,579][1651669] Updated weights for policy 0, policy_version 580144 (0.0014) [2024-06-15 18:27:15,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1188200448. Throughput: 0: 12310.8. Samples: 297111040. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:15,767][1648981] Avg episode reward: [(0, '468.060')] [2024-06-15 18:27:16,175][1651669] Updated weights for policy 0, policy_version 580197 (0.0079) [2024-06-15 18:27:19,571][1651669] Updated weights for policy 0, policy_version 580256 (0.0036) [2024-06-15 18:27:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 48208.5). Total num frames: 1188429824. Throughput: 0: 12356.3. Samples: 297187840. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:20,767][1648981] Avg episode reward: [(0, '468.700')] [2024-06-15 18:27:22,956][1651669] Updated weights for policy 0, policy_version 580320 (0.0105) [2024-06-15 18:27:25,433][1651669] Updated weights for policy 0, policy_version 580418 (0.0098) [2024-06-15 18:27:25,766][1648981] Fps is (10 sec: 52427.9, 60 sec: 50244.1, 300 sec: 48652.1). Total num frames: 1188724736. Throughput: 0: 12060.4. Samples: 297220608. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:25,767][1648981] Avg episode reward: [(0, '462.420')] [2024-06-15 18:27:26,425][1651669] Updated weights for policy 0, policy_version 580466 (0.0019) [2024-06-15 18:27:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1188855808. Throughput: 0: 12276.6. Samples: 297293824. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:30,767][1648981] Avg episode reward: [(0, '477.230')] [2024-06-15 18:27:31,308][1651669] Updated weights for policy 0, policy_version 580528 (0.0011) [2024-06-15 18:27:33,446][1651669] Updated weights for policy 0, policy_version 580576 (0.0012) [2024-06-15 18:27:34,896][1651669] Updated weights for policy 0, policy_version 580624 (0.0011) [2024-06-15 18:27:35,768][1648981] Fps is (10 sec: 45867.5, 60 sec: 49696.7, 300 sec: 48540.8). Total num frames: 1189183488. Throughput: 0: 12151.0. Samples: 297365504. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:35,769][1648981] Avg episode reward: [(0, '484.530')] [2024-06-15 18:27:37,051][1651274] Signal inference workers to stop experience collection... (30450 times) [2024-06-15 18:27:37,105][1651669] InferenceWorker_p0-w0: stopping experience collection (30450 times) [2024-06-15 18:27:37,119][1651669] Updated weights for policy 0, policy_version 580711 (0.0149) [2024-06-15 18:27:37,214][1651274] Signal inference workers to resume experience collection... (30450 times) [2024-06-15 18:27:37,215][1651669] InferenceWorker_p0-w0: resuming experience collection (30450 times) [2024-06-15 18:27:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 1189347328. Throughput: 0: 11980.8. Samples: 297391104. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:40,767][1648981] Avg episode reward: [(0, '503.860')] [2024-06-15 18:27:42,164][1651669] Updated weights for policy 0, policy_version 580784 (0.0016) [2024-06-15 18:27:43,613][1651669] Updated weights for policy 0, policy_version 580816 (0.0011) [2024-06-15 18:27:45,533][1651669] Updated weights for policy 0, policy_version 580865 (0.0016) [2024-06-15 18:27:45,766][1648981] Fps is (10 sec: 42606.1, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1189609472. Throughput: 0: 12356.3. Samples: 297476096. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:45,767][1648981] Avg episode reward: [(0, '497.170')] [2024-06-15 18:27:46,890][1651669] Updated weights for policy 0, policy_version 580928 (0.0014) [2024-06-15 18:27:50,775][1648981] Fps is (10 sec: 52381.6, 60 sec: 48052.6, 300 sec: 48317.4). Total num frames: 1189871616. Throughput: 0: 12365.2. Samples: 297545216. Policy #0 lag: (min: 43.0, avg: 141.0, max: 299.0) [2024-06-15 18:27:50,776][1648981] Avg episode reward: [(0, '510.330')] [2024-06-15 18:27:51,988][1651669] Updated weights for policy 0, policy_version 581024 (0.0014) [2024-06-15 18:27:54,737][1651669] Updated weights for policy 0, policy_version 581089 (0.0016) [2024-06-15 18:27:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 48431.0). Total num frames: 1190133760. Throughput: 0: 12162.9. Samples: 297587200. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:27:55,767][1648981] Avg episode reward: [(0, '494.320')] [2024-06-15 18:27:56,247][1651669] Updated weights for policy 0, policy_version 581140 (0.0032) [2024-06-15 18:27:57,156][1651669] Updated weights for policy 0, policy_version 581180 (0.0013) [2024-06-15 18:28:00,806][1648981] Fps is (10 sec: 52268.0, 60 sec: 50210.9, 300 sec: 48425.4). Total num frames: 1190395904. Throughput: 0: 12049.8. Samples: 297653760. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:00,807][1648981] Avg episode reward: [(0, '500.320')] [2024-06-15 18:28:02,002][1651669] Updated weights for policy 0, policy_version 581249 (0.0116) [2024-06-15 18:28:03,364][1651669] Updated weights for policy 0, policy_version 581312 (0.0032) [2024-06-15 18:28:05,774][1648981] Fps is (10 sec: 45838.9, 60 sec: 47507.4, 300 sec: 48206.5). Total num frames: 1190592512. Throughput: 0: 12206.2. Samples: 297737216. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:05,775][1648981] Avg episode reward: [(0, '489.930')] [2024-06-15 18:28:05,982][1651669] Updated weights for policy 0, policy_version 581368 (0.0012) [2024-06-15 18:28:07,798][1651669] Updated weights for policy 0, policy_version 581440 (0.0114) [2024-06-15 18:28:09,148][1651669] Updated weights for policy 0, policy_version 581504 (0.0011) [2024-06-15 18:28:10,766][1648981] Fps is (10 sec: 52638.0, 60 sec: 50244.3, 300 sec: 48433.2). Total num frames: 1190920192. Throughput: 0: 12106.0. Samples: 297765376. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:10,767][1648981] Avg episode reward: [(0, '472.030')] [2024-06-15 18:28:13,648][1651669] Updated weights for policy 0, policy_version 581567 (0.0014) [2024-06-15 18:28:15,766][1648981] Fps is (10 sec: 45911.7, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 1191051264. Throughput: 0: 12174.2. Samples: 297841664. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:15,767][1648981] Avg episode reward: [(0, '457.320')] [2024-06-15 18:28:17,959][1651669] Updated weights for policy 0, policy_version 581636 (0.0011) [2024-06-15 18:28:18,909][1651669] Updated weights for policy 0, policy_version 581682 (0.0011) [2024-06-15 18:28:19,153][1651274] Signal inference workers to stop experience collection... (30500 times) [2024-06-15 18:28:19,195][1651669] InferenceWorker_p0-w0: stopping experience collection (30500 times) [2024-06-15 18:28:19,334][1651274] Signal inference workers to resume experience collection... (30500 times) [2024-06-15 18:28:19,335][1651669] InferenceWorker_p0-w0: resuming experience collection (30500 times) [2024-06-15 18:28:19,956][1651669] Updated weights for policy 0, policy_version 581734 (0.0011) [2024-06-15 18:28:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 48763.7). Total num frames: 1191444480. Throughput: 0: 12129.2. Samples: 297911296. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:20,767][1648981] Avg episode reward: [(0, '434.100')] [2024-06-15 18:28:24,259][1651669] Updated weights for policy 0, policy_version 581792 (0.0051) [2024-06-15 18:28:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.7, 300 sec: 48322.2). Total num frames: 1191575552. Throughput: 0: 12481.4. Samples: 297952768. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:25,767][1648981] Avg episode reward: [(0, '424.610')] [2024-06-15 18:28:26,431][1651669] Updated weights for policy 0, policy_version 581825 (0.0012) [2024-06-15 18:28:27,653][1651669] Updated weights for policy 0, policy_version 581883 (0.0011) [2024-06-15 18:28:29,165][1651669] Updated weights for policy 0, policy_version 581924 (0.0229) [2024-06-15 18:28:29,998][1651669] Updated weights for policy 0, policy_version 581956 (0.0010) [2024-06-15 18:28:30,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 51336.3, 300 sec: 48985.4). Total num frames: 1191936000. Throughput: 0: 12117.3. Samples: 298021376. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:30,767][1648981] Avg episode reward: [(0, '417.800')] [2024-06-15 18:28:34,852][1651669] Updated weights for policy 0, policy_version 582048 (0.0209) [2024-06-15 18:28:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48607.3, 300 sec: 48430.1). Total num frames: 1192099840. Throughput: 0: 12245.0. Samples: 298096128. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:35,767][1648981] Avg episode reward: [(0, '431.140')] [2024-06-15 18:28:37,574][1651669] Updated weights for policy 0, policy_version 582098 (0.0011) [2024-06-15 18:28:39,565][1651669] Updated weights for policy 0, policy_version 582161 (0.0018) [2024-06-15 18:28:40,704][1651669] Updated weights for policy 0, policy_version 582211 (0.0013) [2024-06-15 18:28:40,799][1648981] Fps is (10 sec: 42461.6, 60 sec: 50217.1, 300 sec: 48870.6). Total num frames: 1192361984. Throughput: 0: 12074.5. Samples: 298130944. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:40,799][1648981] Avg episode reward: [(0, '426.560')] [2024-06-15 18:28:41,791][1651669] Updated weights for policy 0, policy_version 582272 (0.0012) [2024-06-15 18:28:45,263][1651669] Updated weights for policy 0, policy_version 582330 (0.0012) [2024-06-15 18:28:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1192624128. Throughput: 0: 12344.4. Samples: 298208768. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:45,767][1648981] Avg episode reward: [(0, '430.680')] [2024-06-15 18:28:48,498][1651669] Updated weights for policy 0, policy_version 582384 (0.0014) [2024-06-15 18:28:50,747][1651669] Updated weights for policy 0, policy_version 582448 (0.0069) [2024-06-15 18:28:50,766][1648981] Fps is (10 sec: 49312.7, 60 sec: 49705.7, 300 sec: 48763.2). Total num frames: 1192853504. Throughput: 0: 12210.5. Samples: 298286592. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:50,767][1648981] Avg episode reward: [(0, '429.260')] [2024-06-15 18:28:52,489][1651669] Updated weights for policy 0, policy_version 582512 (0.0187) [2024-06-15 18:28:55,778][1648981] Fps is (10 sec: 39275.2, 60 sec: 48050.2, 300 sec: 48428.1). Total num frames: 1193017344. Throughput: 0: 12216.5. Samples: 298315264. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:28:55,779][1648981] Avg episode reward: [(0, '435.290')] [2024-06-15 18:28:56,312][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000582560_1193082880.pth... [2024-06-15 18:28:56,417][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000576864_1181417472.pth [2024-06-15 18:28:56,736][1651669] Updated weights for policy 0, policy_version 582576 (0.0029) [2024-06-15 18:28:59,826][1651669] Updated weights for policy 0, policy_version 582626 (0.0014) [2024-06-15 18:29:00,299][1651669] Updated weights for policy 0, policy_version 582656 (0.0012) [2024-06-15 18:29:00,769][1648981] Fps is (10 sec: 42587.0, 60 sec: 48089.6, 300 sec: 48429.6). Total num frames: 1193279488. Throughput: 0: 12139.4. Samples: 298387968. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:29:00,769][1648981] Avg episode reward: [(0, '457.310')] [2024-06-15 18:29:01,959][1651274] Signal inference workers to stop experience collection... (30550 times) [2024-06-15 18:29:02,026][1651669] InferenceWorker_p0-w0: stopping experience collection (30550 times) [2024-06-15 18:29:02,241][1651274] Signal inference workers to resume experience collection... (30550 times) [2024-06-15 18:29:02,242][1651669] InferenceWorker_p0-w0: resuming experience collection (30550 times) [2024-06-15 18:29:02,244][1651669] Updated weights for policy 0, policy_version 582720 (0.0010) [2024-06-15 18:29:03,462][1651669] Updated weights for policy 0, policy_version 582774 (0.0013) [2024-06-15 18:29:05,766][1648981] Fps is (10 sec: 52491.7, 60 sec: 49158.6, 300 sec: 48763.3). Total num frames: 1193541632. Throughput: 0: 12197.0. Samples: 298460160. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:29:05,767][1648981] Avg episode reward: [(0, '459.320')] [2024-06-15 18:29:07,316][1651669] Updated weights for policy 0, policy_version 582816 (0.0140) [2024-06-15 18:29:09,881][1651669] Updated weights for policy 0, policy_version 582851 (0.0022) [2024-06-15 18:29:10,766][1648981] Fps is (10 sec: 45886.8, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 1193738240. Throughput: 0: 12049.1. Samples: 298494976. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:29:10,767][1648981] Avg episode reward: [(0, '459.320')] [2024-06-15 18:29:11,216][1651669] Updated weights for policy 0, policy_version 582912 (0.0010) [2024-06-15 18:29:13,534][1651669] Updated weights for policy 0, policy_version 582992 (0.0013) [2024-06-15 18:29:14,644][1651669] Updated weights for policy 0, policy_version 583039 (0.0012) [2024-06-15 18:29:15,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 50244.0, 300 sec: 48985.4). Total num frames: 1194065920. Throughput: 0: 12026.3. Samples: 298562560. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:29:15,768][1648981] Avg episode reward: [(0, '454.040')] [2024-06-15 18:29:18,824][1651669] Updated weights for policy 0, policy_version 583097 (0.0013) [2024-06-15 18:29:20,244][1651669] Updated weights for policy 0, policy_version 583125 (0.0010) [2024-06-15 18:29:20,769][1648981] Fps is (10 sec: 55689.9, 60 sec: 47511.4, 300 sec: 48762.8). Total num frames: 1194295296. Throughput: 0: 12275.8. Samples: 298648576. Policy #0 lag: (min: 15.0, avg: 109.5, max: 271.0) [2024-06-15 18:29:20,770][1648981] Avg episode reward: [(0, '450.140')] [2024-06-15 18:29:21,612][1651669] Updated weights for policy 0, policy_version 583169 (0.0013) [2024-06-15 18:29:22,859][1651669] Updated weights for policy 0, policy_version 583226 (0.0011) [2024-06-15 18:29:23,953][1651669] Updated weights for policy 0, policy_version 583264 (0.0016) [2024-06-15 18:29:25,770][1648981] Fps is (10 sec: 52410.2, 60 sec: 50241.1, 300 sec: 49318.0). Total num frames: 1194590208. Throughput: 0: 12318.6. Samples: 298684928. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:25,771][1648981] Avg episode reward: [(0, '471.770')] [2024-06-15 18:29:28,098][1651669] Updated weights for policy 0, policy_version 583304 (0.0011) [2024-06-15 18:29:29,587][1651669] Updated weights for policy 0, policy_version 583360 (0.0138) [2024-06-15 18:29:30,767][1648981] Fps is (10 sec: 52443.1, 60 sec: 48059.8, 300 sec: 49096.4). Total num frames: 1194819584. Throughput: 0: 12379.0. Samples: 298765824. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:30,767][1648981] Avg episode reward: [(0, '469.180')] [2024-06-15 18:29:31,073][1651669] Updated weights for policy 0, policy_version 583424 (0.0103) [2024-06-15 18:29:34,051][1651669] Updated weights for policy 0, policy_version 583490 (0.0012) [2024-06-15 18:29:35,408][1651669] Updated weights for policy 0, policy_version 583552 (0.0011) [2024-06-15 18:29:35,779][1648981] Fps is (10 sec: 52385.1, 60 sec: 50234.1, 300 sec: 49316.6). Total num frames: 1195114496. Throughput: 0: 12057.2. Samples: 298829312. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:35,779][1648981] Avg episode reward: [(0, '457.530')] [2024-06-15 18:29:40,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 47539.4, 300 sec: 48653.8). Total num frames: 1195212800. Throughput: 0: 12462.0. Samples: 298875904. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:40,767][1648981] Avg episode reward: [(0, '469.480')] [2024-06-15 18:29:41,160][1651669] Updated weights for policy 0, policy_version 583617 (0.0016) [2024-06-15 18:29:42,968][1651669] Updated weights for policy 0, policy_version 583683 (0.0012) [2024-06-15 18:29:43,677][1651274] Signal inference workers to stop experience collection... (30600 times) [2024-06-15 18:29:43,766][1651669] InferenceWorker_p0-w0: stopping experience collection (30600 times) [2024-06-15 18:29:43,990][1651274] Signal inference workers to resume experience collection... (30600 times) [2024-06-15 18:29:43,991][1651669] InferenceWorker_p0-w0: resuming experience collection (30600 times) [2024-06-15 18:29:45,173][1651669] Updated weights for policy 0, policy_version 583746 (0.0010) [2024-06-15 18:29:45,766][1648981] Fps is (10 sec: 42649.6, 60 sec: 48605.8, 300 sec: 48985.4). Total num frames: 1195540480. Throughput: 0: 12129.4. Samples: 298933760. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:45,767][1648981] Avg episode reward: [(0, '477.000')] [2024-06-15 18:29:50,770][1648981] Fps is (10 sec: 42583.0, 60 sec: 46418.5, 300 sec: 48318.4). Total num frames: 1195638784. Throughput: 0: 12378.0. Samples: 299017216. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:50,770][1648981] Avg episode reward: [(0, '475.840')] [2024-06-15 18:29:51,032][1651669] Updated weights for policy 0, policy_version 583824 (0.0034) [2024-06-15 18:29:53,021][1651669] Updated weights for policy 0, policy_version 583904 (0.0013) [2024-06-15 18:29:54,341][1651669] Updated weights for policy 0, policy_version 583939 (0.0040) [2024-06-15 18:29:55,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 50254.2, 300 sec: 48874.3). Total num frames: 1196032000. Throughput: 0: 12208.4. Samples: 299044352. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:29:55,767][1648981] Avg episode reward: [(0, '462.840')] [2024-06-15 18:29:56,235][1651669] Updated weights for policy 0, policy_version 584002 (0.0013) [2024-06-15 18:29:57,515][1651669] Updated weights for policy 0, policy_version 584064 (0.0012) [2024-06-15 18:30:00,767][1648981] Fps is (10 sec: 52446.4, 60 sec: 48061.6, 300 sec: 48541.0). Total num frames: 1196163072. Throughput: 0: 12356.3. Samples: 299118592. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:00,767][1648981] Avg episode reward: [(0, '474.390')] [2024-06-15 18:30:03,505][1651669] Updated weights for policy 0, policy_version 584144 (0.0045) [2024-06-15 18:30:04,755][1651669] Updated weights for policy 0, policy_version 584191 (0.0011) [2024-06-15 18:30:05,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1196457984. Throughput: 0: 11993.0. Samples: 299188224. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:05,767][1648981] Avg episode reward: [(0, '468.060')] [2024-06-15 18:30:06,579][1651669] Updated weights for policy 0, policy_version 584247 (0.0012) [2024-06-15 18:30:07,714][1651669] Updated weights for policy 0, policy_version 584274 (0.0023) [2024-06-15 18:30:10,768][1648981] Fps is (10 sec: 52419.6, 60 sec: 49150.4, 300 sec: 48651.8). Total num frames: 1196687360. Throughput: 0: 11878.9. Samples: 299219456. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:10,769][1648981] Avg episode reward: [(0, '473.010')] [2024-06-15 18:30:12,537][1651669] Updated weights for policy 0, policy_version 584341 (0.0011) [2024-06-15 18:30:13,914][1651669] Updated weights for policy 0, policy_version 584400 (0.0121) [2024-06-15 18:30:15,150][1651669] Updated weights for policy 0, policy_version 584444 (0.0060) [2024-06-15 18:30:15,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48059.8, 300 sec: 48874.6). Total num frames: 1196949504. Throughput: 0: 11787.4. Samples: 299296256. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:15,767][1648981] Avg episode reward: [(0, '467.040')] [2024-06-15 18:30:17,162][1651669] Updated weights for policy 0, policy_version 584498 (0.0012) [2024-06-15 18:30:19,153][1651669] Updated weights for policy 0, policy_version 584544 (0.0013) [2024-06-15 18:30:20,766][1648981] Fps is (10 sec: 52439.1, 60 sec: 48608.2, 300 sec: 48985.4). Total num frames: 1197211648. Throughput: 0: 12075.1. Samples: 299372544. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:20,767][1648981] Avg episode reward: [(0, '459.160')] [2024-06-15 18:30:23,064][1651669] Updated weights for policy 0, policy_version 584595 (0.0012) [2024-06-15 18:30:24,407][1651669] Updated weights for policy 0, policy_version 584658 (0.0015) [2024-06-15 18:30:25,433][1651669] Updated weights for policy 0, policy_version 584704 (0.0012) [2024-06-15 18:30:25,781][1648981] Fps is (10 sec: 52353.7, 60 sec: 48051.2, 300 sec: 48873.2). Total num frames: 1197473792. Throughput: 0: 11999.7. Samples: 299416064. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:25,783][1648981] Avg episode reward: [(0, '455.740')] [2024-06-15 18:30:25,889][1651274] Signal inference workers to stop experience collection... (30650 times) [2024-06-15 18:30:25,951][1651669] InferenceWorker_p0-w0: stopping experience collection (30650 times) [2024-06-15 18:30:26,219][1651274] Signal inference workers to resume experience collection... (30650 times) [2024-06-15 18:30:26,219][1651669] InferenceWorker_p0-w0: resuming experience collection (30650 times) [2024-06-15 18:30:27,433][1651669] Updated weights for policy 0, policy_version 584767 (0.0012) [2024-06-15 18:30:30,770][1648981] Fps is (10 sec: 49133.4, 60 sec: 48056.8, 300 sec: 48984.8). Total num frames: 1197703168. Throughput: 0: 12196.0. Samples: 299482624. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:30,771][1648981] Avg episode reward: [(0, '460.940')] [2024-06-15 18:30:30,963][1651669] Updated weights for policy 0, policy_version 584830 (0.0012) [2024-06-15 18:30:34,810][1651669] Updated weights for policy 0, policy_version 584880 (0.0018) [2024-06-15 18:30:35,766][1648981] Fps is (10 sec: 42660.5, 60 sec: 46430.8, 300 sec: 48541.1). Total num frames: 1197899776. Throughput: 0: 11970.4. Samples: 299555840. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:35,767][1648981] Avg episode reward: [(0, '460.520')] [2024-06-15 18:30:36,579][1651669] Updated weights for policy 0, policy_version 584945 (0.0010) [2024-06-15 18:30:37,994][1651669] Updated weights for policy 0, policy_version 585008 (0.0024) [2024-06-15 18:30:40,770][1648981] Fps is (10 sec: 42597.5, 60 sec: 48602.6, 300 sec: 48873.6). Total num frames: 1198129152. Throughput: 0: 11957.0. Samples: 299582464. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:40,771][1648981] Avg episode reward: [(0, '463.620')] [2024-06-15 18:30:41,798][1651669] Updated weights for policy 0, policy_version 585057 (0.0072) [2024-06-15 18:30:44,872][1651669] Updated weights for policy 0, policy_version 585120 (0.0010) [2024-06-15 18:30:45,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47513.7, 300 sec: 48652.1). Total num frames: 1198391296. Throughput: 0: 12151.5. Samples: 299665408. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:45,767][1648981] Avg episode reward: [(0, '455.150')] [2024-06-15 18:30:46,220][1651669] Updated weights for policy 0, policy_version 585171 (0.0013) [2024-06-15 18:30:47,445][1651669] Updated weights for policy 0, policy_version 585217 (0.0013) [2024-06-15 18:30:48,924][1651669] Updated weights for policy 0, policy_version 585275 (0.0011) [2024-06-15 18:30:50,766][1648981] Fps is (10 sec: 52450.1, 60 sec: 50247.3, 300 sec: 48879.2). Total num frames: 1198653440. Throughput: 0: 12174.2. Samples: 299736064. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:50,767][1648981] Avg episode reward: [(0, '444.770')] [2024-06-15 18:30:52,746][1651669] Updated weights for policy 0, policy_version 585328 (0.0015) [2024-06-15 18:30:55,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 48652.9). Total num frames: 1198850048. Throughput: 0: 12254.4. Samples: 299770880. Policy #0 lag: (min: 48.0, avg: 205.3, max: 351.0) [2024-06-15 18:30:55,767][1648981] Avg episode reward: [(0, '455.560')] [2024-06-15 18:30:56,142][1651669] Updated weights for policy 0, policy_version 585402 (0.0014) [2024-06-15 18:30:56,203][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000585408_1198915584.pth... [2024-06-15 18:30:56,280][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000579712_1187250176.pth [2024-06-15 18:30:56,286][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000585408_1198915584.pth [2024-06-15 18:30:58,078][1651669] Updated weights for policy 0, policy_version 585456 (0.0012) [2024-06-15 18:30:59,185][1651669] Updated weights for policy 0, policy_version 585489 (0.0011) [2024-06-15 18:31:00,276][1651669] Updated weights for policy 0, policy_version 585531 (0.0009) [2024-06-15 18:31:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.5, 300 sec: 48874.3). Total num frames: 1199177728. Throughput: 0: 12049.1. Samples: 299838464. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:00,767][1648981] Avg episode reward: [(0, '459.670')] [2024-06-15 18:31:03,326][1651669] Updated weights for policy 0, policy_version 585584 (0.0011) [2024-06-15 18:31:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 1199341568. Throughput: 0: 12208.4. Samples: 299921920. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:05,767][1648981] Avg episode reward: [(0, '443.960')] [2024-06-15 18:31:05,837][1651669] Updated weights for policy 0, policy_version 585619 (0.0039) [2024-06-15 18:31:08,510][1651274] Signal inference workers to stop experience collection... (30700 times) [2024-06-15 18:31:08,572][1651669] Updated weights for policy 0, policy_version 585700 (0.0018) [2024-06-15 18:31:08,659][1651669] InferenceWorker_p0-w0: stopping experience collection (30700 times) [2024-06-15 18:31:08,755][1651274] Signal inference workers to resume experience collection... (30700 times) [2024-06-15 18:31:08,756][1651669] InferenceWorker_p0-w0: resuming experience collection (30700 times) [2024-06-15 18:31:09,675][1651669] Updated weights for policy 0, policy_version 585744 (0.0049) [2024-06-15 18:31:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49699.9, 300 sec: 48763.3). Total num frames: 1199669248. Throughput: 0: 11893.6. Samples: 299951104. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:10,767][1648981] Avg episode reward: [(0, '455.600')] [2024-06-15 18:31:10,769][1651669] Updated weights for policy 0, policy_version 585790 (0.0011) [2024-06-15 18:31:13,643][1651669] Updated weights for policy 0, policy_version 585825 (0.0014) [2024-06-15 18:31:15,784][1648981] Fps is (10 sec: 55605.6, 60 sec: 49137.4, 300 sec: 48649.2). Total num frames: 1199898624. Throughput: 0: 12409.2. Samples: 300041216. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:15,785][1648981] Avg episode reward: [(0, '451.210')] [2024-06-15 18:31:15,872][1651669] Updated weights for policy 0, policy_version 585904 (0.0015) [2024-06-15 18:31:18,284][1651669] Updated weights for policy 0, policy_version 585937 (0.0017) [2024-06-15 18:31:19,431][1651669] Updated weights for policy 0, policy_version 585984 (0.0030) [2024-06-15 18:31:20,766][1648981] Fps is (10 sec: 52427.9, 60 sec: 49698.1, 300 sec: 49096.4). Total num frames: 1200193536. Throughput: 0: 12094.5. Samples: 300100096. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:20,767][1648981] Avg episode reward: [(0, '468.360')] [2024-06-15 18:31:20,919][1651669] Updated weights for policy 0, policy_version 586040 (0.0011) [2024-06-15 18:31:23,932][1651669] Updated weights for policy 0, policy_version 586067 (0.0031) [2024-06-15 18:31:24,794][1651669] Updated weights for policy 0, policy_version 586109 (0.0106) [2024-06-15 18:31:25,766][1648981] Fps is (10 sec: 55806.7, 60 sec: 49710.2, 300 sec: 48874.3). Total num frames: 1200455680. Throughput: 0: 12710.1. Samples: 300154368. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:25,767][1648981] Avg episode reward: [(0, '480.570')] [2024-06-15 18:31:25,961][1651669] Updated weights for policy 0, policy_version 586175 (0.0100) [2024-06-15 18:31:30,415][1651669] Updated weights for policy 0, policy_version 586242 (0.0012) [2024-06-15 18:31:30,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49155.1, 300 sec: 48985.4). Total num frames: 1200652288. Throughput: 0: 12356.3. Samples: 300221440. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:30,767][1648981] Avg episode reward: [(0, '482.110')] [2024-06-15 18:31:31,766][1651669] Updated weights for policy 0, policy_version 586303 (0.0108) [2024-06-15 18:31:35,538][1651669] Updated weights for policy 0, policy_version 586362 (0.0010) [2024-06-15 18:31:35,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 49698.0, 300 sec: 48763.2). Total num frames: 1200881664. Throughput: 0: 12413.1. Samples: 300294656. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:35,767][1648981] Avg episode reward: [(0, '510.600')] [2024-06-15 18:31:36,622][1651669] Updated weights for policy 0, policy_version 586406 (0.0015) [2024-06-15 18:31:40,778][1648981] Fps is (10 sec: 42548.1, 60 sec: 49145.6, 300 sec: 48650.2). Total num frames: 1201078272. Throughput: 0: 12535.0. Samples: 300335104. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:40,779][1648981] Avg episode reward: [(0, '531.990')] [2024-06-15 18:31:40,786][1651669] Updated weights for policy 0, policy_version 586470 (0.0012) [2024-06-15 18:31:42,984][1651669] Updated weights for policy 0, policy_version 586555 (0.0012) [2024-06-15 18:31:45,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1201307648. Throughput: 0: 12424.5. Samples: 300397568. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:45,767][1648981] Avg episode reward: [(0, '544.000')] [2024-06-15 18:31:46,139][1651669] Updated weights for policy 0, policy_version 586595 (0.0011) [2024-06-15 18:31:47,551][1651274] Signal inference workers to stop experience collection... (30750 times) [2024-06-15 18:31:47,592][1651669] InferenceWorker_p0-w0: stopping experience collection (30750 times) [2024-06-15 18:31:47,843][1651274] Signal inference workers to resume experience collection... (30750 times) [2024-06-15 18:31:47,844][1651669] InferenceWorker_p0-w0: resuming experience collection (30750 times) [2024-06-15 18:31:47,906][1651669] Updated weights for policy 0, policy_version 586674 (0.0012) [2024-06-15 18:31:50,766][1648981] Fps is (10 sec: 45929.8, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1201537024. Throughput: 0: 12310.8. Samples: 300475904. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:50,767][1648981] Avg episode reward: [(0, '548.420')] [2024-06-15 18:31:51,367][1651669] Updated weights for policy 0, policy_version 586721 (0.0012) [2024-06-15 18:31:53,230][1651669] Updated weights for policy 0, policy_version 586800 (0.0012) [2024-06-15 18:31:55,768][1648981] Fps is (10 sec: 49142.4, 60 sec: 49150.4, 300 sec: 48874.0). Total num frames: 1201799168. Throughput: 0: 12298.8. Samples: 300504576. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:31:55,769][1648981] Avg episode reward: [(0, '527.020')] [2024-06-15 18:31:57,694][1651669] Updated weights for policy 0, policy_version 586869 (0.0012) [2024-06-15 18:31:59,416][1651669] Updated weights for policy 0, policy_version 586942 (0.0022) [2024-06-15 18:32:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 1202061312. Throughput: 0: 11837.6. Samples: 300573696. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:32:00,767][1648981] Avg episode reward: [(0, '511.630')] [2024-06-15 18:32:02,660][1651669] Updated weights for policy 0, policy_version 587003 (0.0024) [2024-06-15 18:32:05,293][1651669] Updated weights for policy 0, policy_version 587068 (0.0010) [2024-06-15 18:32:05,766][1648981] Fps is (10 sec: 52439.0, 60 sec: 49698.2, 300 sec: 48874.3). Total num frames: 1202323456. Throughput: 0: 12140.1. Samples: 300646400. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:32:05,767][1648981] Avg episode reward: [(0, '515.550')] [2024-06-15 18:32:08,680][1651669] Updated weights for policy 0, policy_version 587120 (0.0011) [2024-06-15 18:32:10,386][1651669] Updated weights for policy 0, policy_version 587184 (0.0013) [2024-06-15 18:32:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 1202585600. Throughput: 0: 11787.4. Samples: 300684800. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:32:10,767][1648981] Avg episode reward: [(0, '513.970')] [2024-06-15 18:32:13,390][1651669] Updated weights for policy 0, policy_version 587232 (0.0017) [2024-06-15 18:32:14,994][1651669] Updated weights for policy 0, policy_version 587269 (0.0013) [2024-06-15 18:32:15,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48074.2, 300 sec: 48652.2). Total num frames: 1202782208. Throughput: 0: 11923.9. Samples: 300758016. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:32:15,767][1648981] Avg episode reward: [(0, '502.960')] [2024-06-15 18:32:16,227][1651669] Updated weights for policy 0, policy_version 587323 (0.0094) [2024-06-15 18:32:19,431][1651669] Updated weights for policy 0, policy_version 587376 (0.0188) [2024-06-15 18:32:20,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.7, 300 sec: 48541.1). Total num frames: 1203044352. Throughput: 0: 11764.6. Samples: 300824064. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:32:20,767][1648981] Avg episode reward: [(0, '521.450')] [2024-06-15 18:32:20,855][1651669] Updated weights for policy 0, policy_version 587445 (0.0012) [2024-06-15 18:32:23,928][1651669] Updated weights for policy 0, policy_version 587488 (0.0011) [2024-06-15 18:32:25,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46967.4, 300 sec: 48874.3). Total num frames: 1203273728. Throughput: 0: 11961.2. Samples: 300873216. Policy #0 lag: (min: 11.0, avg: 132.0, max: 281.0) [2024-06-15 18:32:25,767][1648981] Avg episode reward: [(0, '534.960')] [2024-06-15 18:32:26,185][1651669] Updated weights for policy 0, policy_version 587568 (0.0100) [2024-06-15 18:32:29,401][1651669] Updated weights for policy 0, policy_version 587616 (0.0012) [2024-06-15 18:32:30,075][1651274] Signal inference workers to stop experience collection... (30800 times) [2024-06-15 18:32:30,137][1651669] InferenceWorker_p0-w0: stopping experience collection (30800 times) [2024-06-15 18:32:30,279][1651274] Signal inference workers to resume experience collection... (30800 times) [2024-06-15 18:32:30,289][1651669] InferenceWorker_p0-w0: resuming experience collection (30800 times) [2024-06-15 18:32:30,291][1651669] Updated weights for policy 0, policy_version 587667 (0.0012) [2024-06-15 18:32:30,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 48605.6, 300 sec: 48763.5). Total num frames: 1203568640. Throughput: 0: 12140.0. Samples: 300943872. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:32:30,768][1648981] Avg episode reward: [(0, '516.700')] [2024-06-15 18:32:34,585][1651669] Updated weights for policy 0, policy_version 587729 (0.0012) [2024-06-15 18:32:35,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1203765248. Throughput: 0: 12094.6. Samples: 301020160. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:32:35,767][1648981] Avg episode reward: [(0, '483.910')] [2024-06-15 18:32:35,811][1651669] Updated weights for policy 0, policy_version 587778 (0.0011) [2024-06-15 18:32:37,126][1651669] Updated weights for policy 0, policy_version 587830 (0.0019) [2024-06-15 18:32:40,256][1651669] Updated weights for policy 0, policy_version 587872 (0.0014) [2024-06-15 18:32:40,767][1648981] Fps is (10 sec: 42598.4, 60 sec: 48615.2, 300 sec: 48763.2). Total num frames: 1203994624. Throughput: 0: 12299.8. Samples: 301058048. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:32:40,768][1648981] Avg episode reward: [(0, '474.120')] [2024-06-15 18:32:41,535][1651669] Updated weights for policy 0, policy_version 587920 (0.0091) [2024-06-15 18:32:42,491][1651669] Updated weights for policy 0, policy_version 587968 (0.0021) [2024-06-15 18:32:45,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48059.7, 300 sec: 48542.6). Total num frames: 1204191232. Throughput: 0: 12379.0. Samples: 301130752. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:32:45,767][1648981] Avg episode reward: [(0, '453.770')] [2024-06-15 18:32:46,970][1651669] Updated weights for policy 0, policy_version 588034 (0.0011) [2024-06-15 18:32:48,200][1651669] Updated weights for policy 0, policy_version 588084 (0.0011) [2024-06-15 18:32:50,766][1648981] Fps is (10 sec: 49153.6, 60 sec: 49151.9, 300 sec: 48652.1). Total num frames: 1204486144. Throughput: 0: 12344.9. Samples: 301201920. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:32:50,767][1648981] Avg episode reward: [(0, '447.430')] [2024-06-15 18:32:50,788][1651669] Updated weights for policy 0, policy_version 588130 (0.0011) [2024-06-15 18:32:52,433][1651669] Updated weights for policy 0, policy_version 588222 (0.0016) [2024-06-15 18:32:55,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 48061.1, 300 sec: 48436.5). Total num frames: 1204682752. Throughput: 0: 12208.3. Samples: 301234176. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:32:55,767][1648981] Avg episode reward: [(0, '441.790')] [2024-06-15 18:32:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000588224_1204682752.pth... [2024-06-15 18:32:55,849][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000582560_1193082880.pth [2024-06-15 18:32:57,325][1651669] Updated weights for policy 0, policy_version 588274 (0.0014) [2024-06-15 18:32:58,917][1651669] Updated weights for policy 0, policy_version 588343 (0.0013) [2024-06-15 18:33:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48653.5). Total num frames: 1204944896. Throughput: 0: 12174.2. Samples: 301305856. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:00,767][1648981] Avg episode reward: [(0, '466.750')] [2024-06-15 18:33:01,519][1651669] Updated weights for policy 0, policy_version 588400 (0.0121) [2024-06-15 18:33:02,996][1651669] Updated weights for policy 0, policy_version 588464 (0.0011) [2024-06-15 18:33:05,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1205207040. Throughput: 0: 12640.7. Samples: 301392896. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:05,767][1648981] Avg episode reward: [(0, '458.010')] [2024-06-15 18:33:08,468][1651669] Updated weights for policy 0, policy_version 588546 (0.0084) [2024-06-15 18:33:08,816][1651274] Signal inference workers to stop experience collection... (30850 times) [2024-06-15 18:33:08,863][1651669] InferenceWorker_p0-w0: stopping experience collection (30850 times) [2024-06-15 18:33:09,124][1651274] Signal inference workers to resume experience collection... (30850 times) [2024-06-15 18:33:09,125][1651669] InferenceWorker_p0-w0: resuming experience collection (30850 times) [2024-06-15 18:33:09,980][1651669] Updated weights for policy 0, policy_version 588608 (0.0012) [2024-06-15 18:33:10,772][1648981] Fps is (10 sec: 52397.8, 60 sec: 48055.0, 300 sec: 48873.3). Total num frames: 1205469184. Throughput: 0: 12286.4. Samples: 301426176. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:10,774][1648981] Avg episode reward: [(0, '475.340')] [2024-06-15 18:33:13,166][1651669] Updated weights for policy 0, policy_version 588704 (0.0014) [2024-06-15 18:33:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1205731328. Throughput: 0: 12003.6. Samples: 301484032. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:15,767][1648981] Avg episode reward: [(0, '464.720')] [2024-06-15 18:33:17,707][1651669] Updated weights for policy 0, policy_version 588738 (0.0015) [2024-06-15 18:33:19,486][1651669] Updated weights for policy 0, policy_version 588806 (0.0106) [2024-06-15 18:33:20,522][1651669] Updated weights for policy 0, policy_version 588854 (0.0020) [2024-06-15 18:33:20,766][1648981] Fps is (10 sec: 52459.8, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1205993472. Throughput: 0: 12140.1. Samples: 301566464. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:20,767][1648981] Avg episode reward: [(0, '463.000')] [2024-06-15 18:33:22,636][1651669] Updated weights for policy 0, policy_version 588897 (0.0016) [2024-06-15 18:33:24,079][1651669] Updated weights for policy 0, policy_version 588962 (0.0120) [2024-06-15 18:33:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 1206255616. Throughput: 0: 12026.4. Samples: 301599232. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:25,767][1648981] Avg episode reward: [(0, '481.650')] [2024-06-15 18:33:28,786][1651669] Updated weights for policy 0, policy_version 589008 (0.0057) [2024-06-15 18:33:29,981][1651669] Updated weights for policy 0, policy_version 589059 (0.0011) [2024-06-15 18:33:30,786][1648981] Fps is (10 sec: 45784.8, 60 sec: 48044.2, 300 sec: 48648.9). Total num frames: 1206452224. Throughput: 0: 12259.9. Samples: 301682688. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:30,787][1648981] Avg episode reward: [(0, '463.820')] [2024-06-15 18:33:32,146][1651669] Updated weights for policy 0, policy_version 589121 (0.0011) [2024-06-15 18:33:33,145][1651669] Updated weights for policy 0, policy_version 589172 (0.0013) [2024-06-15 18:33:33,969][1651669] Updated weights for policy 0, policy_version 589202 (0.0073) [2024-06-15 18:33:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.2, 300 sec: 48879.7). Total num frames: 1206779904. Throughput: 0: 12140.1. Samples: 301748224. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:35,767][1648981] Avg episode reward: [(0, '446.570')] [2024-06-15 18:33:38,637][1651669] Updated weights for policy 0, policy_version 589249 (0.0015) [2024-06-15 18:33:39,987][1651669] Updated weights for policy 0, policy_version 589303 (0.0012) [2024-06-15 18:33:40,768][1648981] Fps is (10 sec: 49240.5, 60 sec: 49150.8, 300 sec: 48540.8). Total num frames: 1206943744. Throughput: 0: 12435.5. Samples: 301793792. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:40,769][1648981] Avg episode reward: [(0, '435.990')] [2024-06-15 18:33:40,996][1651669] Updated weights for policy 0, policy_version 589345 (0.0011) [2024-06-15 18:33:43,157][1651669] Updated weights for policy 0, policy_version 589400 (0.0017) [2024-06-15 18:33:44,577][1651669] Updated weights for policy 0, policy_version 589458 (0.0011) [2024-06-15 18:33:44,840][1651274] Signal inference workers to stop experience collection... (30900 times) [2024-06-15 18:33:44,934][1651669] InferenceWorker_p0-w0: stopping experience collection (30900 times) [2024-06-15 18:33:45,093][1651274] Signal inference workers to resume experience collection... (30900 times) [2024-06-15 18:33:45,095][1651669] InferenceWorker_p0-w0: resuming experience collection (30900 times) [2024-06-15 18:33:45,417][1651669] Updated weights for policy 0, policy_version 589503 (0.0098) [2024-06-15 18:33:45,767][1648981] Fps is (10 sec: 52425.6, 60 sec: 51882.2, 300 sec: 48985.3). Total num frames: 1207304192. Throughput: 0: 12401.6. Samples: 301863936. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:45,768][1648981] Avg episode reward: [(0, '471.510')] [2024-06-15 18:33:49,909][1651669] Updated weights for policy 0, policy_version 589567 (0.0015) [2024-06-15 18:33:50,774][1648981] Fps is (10 sec: 52397.3, 60 sec: 49691.7, 300 sec: 48986.1). Total num frames: 1207468032. Throughput: 0: 12251.7. Samples: 301944320. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:50,775][1648981] Avg episode reward: [(0, '473.630')] [2024-06-15 18:33:51,386][1651669] Updated weights for policy 0, policy_version 589616 (0.0013) [2024-06-15 18:33:52,831][1651669] Updated weights for policy 0, policy_version 589633 (0.0017) [2024-06-15 18:33:53,814][1651669] Updated weights for policy 0, policy_version 589683 (0.0010) [2024-06-15 18:33:54,887][1651669] Updated weights for policy 0, policy_version 589730 (0.0015) [2024-06-15 18:33:55,331][1651669] Updated weights for policy 0, policy_version 589760 (0.0046) [2024-06-15 18:33:55,790][1648981] Fps is (10 sec: 52307.4, 60 sec: 52408.3, 300 sec: 49315.1). Total num frames: 1207828480. Throughput: 0: 12453.7. Samples: 301986816. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:33:55,791][1648981] Avg episode reward: [(0, '498.740')] [2024-06-15 18:33:59,682][1651669] Updated weights for policy 0, policy_version 589821 (0.0108) [2024-06-15 18:34:00,781][1648981] Fps is (10 sec: 55665.3, 60 sec: 51323.7, 300 sec: 49093.9). Total num frames: 1208025088. Throughput: 0: 12750.2. Samples: 302057984. Policy #0 lag: (min: 63.0, avg: 145.2, max: 287.0) [2024-06-15 18:34:00,784][1648981] Avg episode reward: [(0, '490.670')] [2024-06-15 18:34:01,317][1651669] Updated weights for policy 0, policy_version 589877 (0.0014) [2024-06-15 18:34:04,415][1651669] Updated weights for policy 0, policy_version 589943 (0.0012) [2024-06-15 18:34:05,766][1648981] Fps is (10 sec: 45984.4, 60 sec: 51336.5, 300 sec: 49318.6). Total num frames: 1208287232. Throughput: 0: 12652.1. Samples: 302135808. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:05,767][1648981] Avg episode reward: [(0, '501.680')] [2024-06-15 18:34:06,070][1651669] Updated weights for policy 0, policy_version 590004 (0.0163) [2024-06-15 18:34:09,847][1651669] Updated weights for policy 0, policy_version 590048 (0.0013) [2024-06-15 18:34:10,785][1648981] Fps is (10 sec: 45860.7, 60 sec: 50234.0, 300 sec: 48871.3). Total num frames: 1208483840. Throughput: 0: 12726.6. Samples: 302172160. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:10,785][1648981] Avg episode reward: [(0, '506.490')] [2024-06-15 18:34:10,873][1651669] Updated weights for policy 0, policy_version 590083 (0.0014) [2024-06-15 18:34:11,902][1651669] Updated weights for policy 0, policy_version 590135 (0.0018) [2024-06-15 18:34:14,761][1651669] Updated weights for policy 0, policy_version 590208 (0.0014) [2024-06-15 18:34:15,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 50244.3, 300 sec: 48985.9). Total num frames: 1208745984. Throughput: 0: 12589.4. Samples: 302248960. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:15,767][1648981] Avg episode reward: [(0, '506.380')] [2024-06-15 18:34:16,881][1651669] Updated weights for policy 0, policy_version 590264 (0.0014) [2024-06-15 18:34:20,168][1651669] Updated weights for policy 0, policy_version 590305 (0.0011) [2024-06-15 18:34:20,766][1648981] Fps is (10 sec: 52524.1, 60 sec: 50244.2, 300 sec: 48874.9). Total num frames: 1209008128. Throughput: 0: 12845.5. Samples: 302326272. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:20,767][1648981] Avg episode reward: [(0, '499.300')] [2024-06-15 18:34:21,785][1651669] Updated weights for policy 0, policy_version 590384 (0.0013) [2024-06-15 18:34:24,697][1651669] Updated weights for policy 0, policy_version 590417 (0.0012) [2024-06-15 18:34:25,312][1651669] Updated weights for policy 0, policy_version 590460 (0.0013) [2024-06-15 18:34:25,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.2, 300 sec: 48985.4). Total num frames: 1209270272. Throughput: 0: 12572.9. Samples: 302359552. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:25,767][1648981] Avg episode reward: [(0, '475.710')] [2024-06-15 18:34:26,935][1651274] Signal inference workers to stop experience collection... (30950 times) [2024-06-15 18:34:27,005][1651669] InferenceWorker_p0-w0: stopping experience collection (30950 times) [2024-06-15 18:34:27,261][1651274] Signal inference workers to resume experience collection... (30950 times) [2024-06-15 18:34:27,262][1651669] InferenceWorker_p0-w0: resuming experience collection (30950 times) [2024-06-15 18:34:27,548][1651669] Updated weights for policy 0, policy_version 590524 (0.0082) [2024-06-15 18:34:30,648][1651669] Updated weights for policy 0, policy_version 590565 (0.0023) [2024-06-15 18:34:30,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 50807.2, 300 sec: 48765.3). Total num frames: 1209499648. Throughput: 0: 12857.1. Samples: 302442496. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:30,767][1648981] Avg episode reward: [(0, '478.280')] [2024-06-15 18:34:32,327][1651669] Updated weights for policy 0, policy_version 590640 (0.0013) [2024-06-15 18:34:34,725][1651669] Updated weights for policy 0, policy_version 590679 (0.0033) [2024-06-15 18:34:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.2, 300 sec: 49429.7). Total num frames: 1209794560. Throughput: 0: 12574.6. Samples: 302510080. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:35,767][1648981] Avg episode reward: [(0, '483.280')] [2024-06-15 18:34:37,204][1651669] Updated weights for policy 0, policy_version 590724 (0.0012) [2024-06-15 18:34:38,296][1651669] Updated weights for policy 0, policy_version 590782 (0.0014) [2024-06-15 18:34:40,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 50791.9, 300 sec: 48985.4). Total num frames: 1209991168. Throughput: 0: 12579.1. Samples: 302552576. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:40,767][1648981] Avg episode reward: [(0, '476.890')] [2024-06-15 18:34:41,066][1651669] Updated weights for policy 0, policy_version 590837 (0.0011) [2024-06-15 18:34:41,964][1651669] Updated weights for policy 0, policy_version 590892 (0.0169) [2024-06-15 18:34:43,475][1651669] Updated weights for policy 0, policy_version 590932 (0.0012) [2024-06-15 18:34:44,441][1651669] Updated weights for policy 0, policy_version 590973 (0.0012) [2024-06-15 18:34:45,770][1648981] Fps is (10 sec: 52408.4, 60 sec: 50241.5, 300 sec: 49762.9). Total num frames: 1210318848. Throughput: 0: 12837.3. Samples: 302635520. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:45,773][1648981] Avg episode reward: [(0, '480.150')] [2024-06-15 18:34:48,320][1651669] Updated weights for policy 0, policy_version 591029 (0.0012) [2024-06-15 18:34:50,308][1651669] Updated weights for policy 0, policy_version 591056 (0.0013) [2024-06-15 18:34:50,774][1648981] Fps is (10 sec: 52387.6, 60 sec: 50790.3, 300 sec: 49095.2). Total num frames: 1210515456. Throughput: 0: 12888.8. Samples: 302715904. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:50,775][1648981] Avg episode reward: [(0, '452.840')] [2024-06-15 18:34:52,076][1651669] Updated weights for policy 0, policy_version 591122 (0.0012) [2024-06-15 18:34:53,816][1651669] Updated weights for policy 0, policy_version 591184 (0.0014) [2024-06-15 18:34:55,041][1651669] Updated weights for policy 0, policy_version 591232 (0.0015) [2024-06-15 18:34:55,766][1648981] Fps is (10 sec: 52449.4, 60 sec: 50264.2, 300 sec: 49763.0). Total num frames: 1210843136. Throughput: 0: 12907.6. Samples: 302752768. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:34:55,767][1648981] Avg episode reward: [(0, '427.300')] [2024-06-15 18:34:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000591232_1210843136.pth... [2024-06-15 18:34:55,845][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000585408_1198915584.pth [2024-06-15 18:34:59,722][1651669] Updated weights for policy 0, policy_version 591296 (0.0013) [2024-06-15 18:35:00,766][1648981] Fps is (10 sec: 49190.5, 60 sec: 49710.6, 300 sec: 49318.6). Total num frames: 1211006976. Throughput: 0: 12731.7. Samples: 302821888. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:00,767][1648981] Avg episode reward: [(0, '437.900')] [2024-06-15 18:35:01,529][1651669] Updated weights for policy 0, policy_version 591351 (0.0013) [2024-06-15 18:35:02,611][1651669] Updated weights for policy 0, policy_version 591393 (0.0012) [2024-06-15 18:35:05,277][1651669] Updated weights for policy 0, policy_version 591458 (0.0013) [2024-06-15 18:35:05,767][1648981] Fps is (10 sec: 52426.9, 60 sec: 51336.3, 300 sec: 49763.2). Total num frames: 1211367424. Throughput: 0: 12583.7. Samples: 302892544. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:05,768][1648981] Avg episode reward: [(0, '417.640')] [2024-06-15 18:35:09,649][1651669] Updated weights for policy 0, policy_version 591504 (0.0012) [2024-06-15 18:35:09,791][1651274] Signal inference workers to stop experience collection... (31000 times) [2024-06-15 18:35:09,837][1651669] InferenceWorker_p0-w0: stopping experience collection (31000 times) [2024-06-15 18:35:10,064][1651274] Signal inference workers to resume experience collection... (31000 times) [2024-06-15 18:35:10,067][1651669] InferenceWorker_p0-w0: resuming experience collection (31000 times) [2024-06-15 18:35:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49713.2, 300 sec: 49207.6). Total num frames: 1211465728. Throughput: 0: 12777.3. Samples: 302934528. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:10,767][1648981] Avg episode reward: [(0, '425.940')] [2024-06-15 18:35:11,367][1651669] Updated weights for policy 0, policy_version 591553 (0.0011) [2024-06-15 18:35:12,807][1651669] Updated weights for policy 0, policy_version 591605 (0.0011) [2024-06-15 18:35:14,192][1651669] Updated weights for policy 0, policy_version 591668 (0.0013) [2024-06-15 18:35:15,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 50790.4, 300 sec: 49429.7). Total num frames: 1211793408. Throughput: 0: 12367.6. Samples: 302999040. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:15,767][1648981] Avg episode reward: [(0, '423.210')] [2024-06-15 18:35:16,558][1651669] Updated weights for policy 0, policy_version 591728 (0.0012) [2024-06-15 18:35:20,773][1648981] Fps is (10 sec: 45843.2, 60 sec: 48600.2, 300 sec: 48986.6). Total num frames: 1211924480. Throughput: 0: 12672.9. Samples: 303080448. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:20,774][1648981] Avg episode reward: [(0, '427.680')] [2024-06-15 18:35:21,171][1651669] Updated weights for policy 0, policy_version 591778 (0.0011) [2024-06-15 18:35:22,092][1651669] Updated weights for policy 0, policy_version 591810 (0.0011) [2024-06-15 18:35:23,500][1651669] Updated weights for policy 0, policy_version 591865 (0.0012) [2024-06-15 18:35:24,243][1651669] Updated weights for policy 0, policy_version 591891 (0.0011) [2024-06-15 18:35:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50244.3, 300 sec: 49430.3). Total num frames: 1212284928. Throughput: 0: 12288.0. Samples: 303105536. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:25,767][1648981] Avg episode reward: [(0, '433.320')] [2024-06-15 18:35:26,138][1651669] Updated weights for policy 0, policy_version 591952 (0.0011) [2024-06-15 18:35:26,989][1651669] Updated weights for policy 0, policy_version 591997 (0.0013) [2024-06-15 18:35:30,766][1648981] Fps is (10 sec: 49186.5, 60 sec: 48605.8, 300 sec: 49207.5). Total num frames: 1212416000. Throughput: 0: 12186.7. Samples: 303183872. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:30,767][1648981] Avg episode reward: [(0, '433.020')] [2024-06-15 18:35:33,253][1651669] Updated weights for policy 0, policy_version 592080 (0.0174) [2024-06-15 18:35:34,574][1651669] Updated weights for policy 0, policy_version 592124 (0.0028) [2024-06-15 18:35:35,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.1, 300 sec: 49541.5). Total num frames: 1212743680. Throughput: 0: 11835.0. Samples: 303248384. Policy #0 lag: (min: 15.0, avg: 126.0, max: 271.0) [2024-06-15 18:35:35,767][1648981] Avg episode reward: [(0, '428.290')] [2024-06-15 18:35:36,179][1651669] Updated weights for policy 0, policy_version 592190 (0.0015) [2024-06-15 18:35:38,053][1651669] Updated weights for policy 0, policy_version 592240 (0.0012) [2024-06-15 18:35:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 49318.6). Total num frames: 1212940288. Throughput: 0: 11889.8. Samples: 303287808. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:35:40,767][1648981] Avg episode reward: [(0, '451.700')] [2024-06-15 18:35:42,683][1651669] Updated weights for policy 0, policy_version 592276 (0.0037) [2024-06-15 18:35:44,215][1651669] Updated weights for policy 0, policy_version 592324 (0.0015) [2024-06-15 18:35:45,508][1651669] Updated weights for policy 0, policy_version 592384 (0.0011) [2024-06-15 18:35:45,770][1648981] Fps is (10 sec: 45857.4, 60 sec: 48059.8, 300 sec: 49318.0). Total num frames: 1213202432. Throughput: 0: 12025.3. Samples: 303363072. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:35:45,771][1648981] Avg episode reward: [(0, '456.860')] [2024-06-15 18:35:46,927][1651669] Updated weights for policy 0, policy_version 592432 (0.0011) [2024-06-15 18:35:48,178][1651274] Signal inference workers to stop experience collection... (31050 times) [2024-06-15 18:35:48,227][1651669] InferenceWorker_p0-w0: stopping experience collection (31050 times) [2024-06-15 18:35:48,467][1651274] Signal inference workers to resume experience collection... (31050 times) [2024-06-15 18:35:48,468][1651669] InferenceWorker_p0-w0: resuming experience collection (31050 times) [2024-06-15 18:35:48,470][1651669] Updated weights for policy 0, policy_version 592464 (0.0011) [2024-06-15 18:35:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49158.4, 300 sec: 49540.8). Total num frames: 1213464576. Throughput: 0: 12003.6. Samples: 303432704. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:35:50,767][1648981] Avg episode reward: [(0, '474.760')] [2024-06-15 18:35:53,430][1651669] Updated weights for policy 0, policy_version 592514 (0.0011) [2024-06-15 18:35:55,109][1651669] Updated weights for policy 0, policy_version 592592 (0.0028) [2024-06-15 18:35:55,786][1648981] Fps is (10 sec: 45802.1, 60 sec: 46952.0, 300 sec: 49093.2). Total num frames: 1213661184. Throughput: 0: 11975.5. Samples: 303473664. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:35:55,787][1648981] Avg episode reward: [(0, '499.400')] [2024-06-15 18:35:56,089][1651669] Updated weights for policy 0, policy_version 592636 (0.0018) [2024-06-15 18:35:58,073][1651669] Updated weights for policy 0, policy_version 592696 (0.0014) [2024-06-15 18:36:00,258][1651669] Updated weights for policy 0, policy_version 592762 (0.0013) [2024-06-15 18:36:00,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 49698.0, 300 sec: 49651.8). Total num frames: 1213988864. Throughput: 0: 12026.3. Samples: 303540224. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:00,767][1648981] Avg episode reward: [(0, '492.340')] [2024-06-15 18:36:05,529][1651669] Updated weights for policy 0, policy_version 592823 (0.0129) [2024-06-15 18:36:05,766][1648981] Fps is (10 sec: 45966.4, 60 sec: 45875.5, 300 sec: 48985.4). Total num frames: 1214119936. Throughput: 0: 11857.5. Samples: 303613952. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:05,767][1648981] Avg episode reward: [(0, '505.770')] [2024-06-15 18:36:06,571][1651669] Updated weights for policy 0, policy_version 592888 (0.0083) [2024-06-15 18:36:09,115][1651669] Updated weights for policy 0, policy_version 592944 (0.0014) [2024-06-15 18:36:10,211][1651669] Updated weights for policy 0, policy_version 592980 (0.0011) [2024-06-15 18:36:10,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 49698.2, 300 sec: 49321.6). Total num frames: 1214447616. Throughput: 0: 12128.7. Samples: 303651328. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:10,767][1648981] Avg episode reward: [(0, '500.940')] [2024-06-15 18:36:11,164][1651669] Updated weights for policy 0, policy_version 593021 (0.0012) [2024-06-15 18:36:15,543][1651669] Updated weights for policy 0, policy_version 593075 (0.0018) [2024-06-15 18:36:15,786][1648981] Fps is (10 sec: 52325.2, 60 sec: 47498.0, 300 sec: 48982.1). Total num frames: 1214644224. Throughput: 0: 12214.4. Samples: 303733760. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:15,787][1648981] Avg episode reward: [(0, '488.990')] [2024-06-15 18:36:16,835][1651669] Updated weights for policy 0, policy_version 593136 (0.0013) [2024-06-15 18:36:19,270][1651669] Updated weights for policy 0, policy_version 593203 (0.0026) [2024-06-15 18:36:20,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49703.9, 300 sec: 48985.4). Total num frames: 1214906368. Throughput: 0: 12356.2. Samples: 303804416. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:20,767][1648981] Avg episode reward: [(0, '499.280')] [2024-06-15 18:36:21,888][1651669] Updated weights for policy 0, policy_version 593264 (0.0013) [2024-06-15 18:36:25,405][1651669] Updated weights for policy 0, policy_version 593303 (0.0020) [2024-06-15 18:36:25,766][1648981] Fps is (10 sec: 45965.8, 60 sec: 46967.5, 300 sec: 48985.4). Total num frames: 1215102976. Throughput: 0: 12322.1. Samples: 303842304. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:25,767][1648981] Avg episode reward: [(0, '501.050')] [2024-06-15 18:36:26,723][1651669] Updated weights for policy 0, policy_version 593376 (0.0013) [2024-06-15 18:36:28,971][1651669] Updated weights for policy 0, policy_version 593415 (0.0014) [2024-06-15 18:36:29,571][1651274] Signal inference workers to stop experience collection... (31100 times) [2024-06-15 18:36:29,653][1651669] InferenceWorker_p0-w0: stopping experience collection (31100 times) [2024-06-15 18:36:29,736][1651274] Signal inference workers to resume experience collection... (31100 times) [2024-06-15 18:36:29,737][1651669] InferenceWorker_p0-w0: resuming experience collection (31100 times) [2024-06-15 18:36:29,954][1651669] Updated weights for policy 0, policy_version 593472 (0.0089) [2024-06-15 18:36:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1215430656. Throughput: 0: 12357.3. Samples: 303919104. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:30,767][1648981] Avg episode reward: [(0, '492.370')] [2024-06-15 18:36:32,159][1651669] Updated weights for policy 0, policy_version 593532 (0.0037) [2024-06-15 18:36:35,766][1648981] Fps is (10 sec: 55705.9, 60 sec: 48605.8, 300 sec: 49431.7). Total num frames: 1215660032. Throughput: 0: 12504.2. Samples: 303995392. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:35,767][1648981] Avg episode reward: [(0, '494.810')] [2024-06-15 18:36:36,098][1651669] Updated weights for policy 0, policy_version 593600 (0.0020) [2024-06-15 18:36:39,888][1651669] Updated weights for policy 0, policy_version 593668 (0.0133) [2024-06-15 18:36:40,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 49151.8, 300 sec: 49429.7). Total num frames: 1215889408. Throughput: 0: 12270.6. Samples: 304025600. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:40,767][1648981] Avg episode reward: [(0, '495.590')] [2024-06-15 18:36:41,372][1651669] Updated weights for policy 0, policy_version 593731 (0.0113) [2024-06-15 18:36:42,287][1651669] Updated weights for policy 0, policy_version 593779 (0.0012) [2024-06-15 18:36:45,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48062.9, 300 sec: 49318.6). Total num frames: 1216086016. Throughput: 0: 12538.4. Samples: 304104448. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:45,767][1648981] Avg episode reward: [(0, '483.140')] [2024-06-15 18:36:46,460][1651669] Updated weights for policy 0, policy_version 593825 (0.0033) [2024-06-15 18:36:47,664][1651669] Updated weights for policy 0, policy_version 593877 (0.0013) [2024-06-15 18:36:50,767][1648981] Fps is (10 sec: 45871.3, 60 sec: 48058.9, 300 sec: 49318.8). Total num frames: 1216348160. Throughput: 0: 12492.5. Samples: 304176128. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:50,768][1648981] Avg episode reward: [(0, '442.640')] [2024-06-15 18:36:50,985][1651669] Updated weights for policy 0, policy_version 593923 (0.0047) [2024-06-15 18:36:52,609][1651669] Updated weights for policy 0, policy_version 594000 (0.0013) [2024-06-15 18:36:55,768][1648981] Fps is (10 sec: 52421.5, 60 sec: 49167.1, 300 sec: 49318.4). Total num frames: 1216610304. Throughput: 0: 12276.2. Samples: 304203776. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:36:55,768][1648981] Avg episode reward: [(0, '415.600')] [2024-06-15 18:36:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000594048_1216610304.pth... [2024-06-15 18:36:55,869][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000588224_1204682752.pth [2024-06-15 18:36:58,034][1651669] Updated weights for policy 0, policy_version 594067 (0.0012) [2024-06-15 18:37:00,036][1651669] Updated weights for policy 0, policy_version 594146 (0.0014) [2024-06-15 18:37:00,766][1648981] Fps is (10 sec: 52434.3, 60 sec: 48059.9, 300 sec: 49318.6). Total num frames: 1216872448. Throughput: 0: 12031.6. Samples: 304274944. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:37:00,769][1648981] Avg episode reward: [(0, '415.580')] [2024-06-15 18:37:02,799][1651669] Updated weights for policy 0, policy_version 594208 (0.0012) [2024-06-15 18:37:04,362][1651669] Updated weights for policy 0, policy_version 594261 (0.0011) [2024-06-15 18:37:05,766][1648981] Fps is (10 sec: 52435.9, 60 sec: 50244.2, 300 sec: 49318.6). Total num frames: 1217134592. Throughput: 0: 12037.7. Samples: 304346112. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:37:05,767][1648981] Avg episode reward: [(0, '415.980')] [2024-06-15 18:37:08,682][1651669] Updated weights for policy 0, policy_version 594309 (0.0012) [2024-06-15 18:37:10,622][1651669] Updated weights for policy 0, policy_version 594384 (0.0034) [2024-06-15 18:37:10,718][1651274] Signal inference workers to stop experience collection... (31150 times) [2024-06-15 18:37:10,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 49207.5). Total num frames: 1217298432. Throughput: 0: 12151.5. Samples: 304389120. Policy #0 lag: (min: 15.0, avg: 161.5, max: 271.0) [2024-06-15 18:37:10,767][1648981] Avg episode reward: [(0, '438.440')] [2024-06-15 18:37:10,770][1651669] InferenceWorker_p0-w0: stopping experience collection (31150 times) [2024-06-15 18:37:10,963][1651274] Signal inference workers to resume experience collection... (31150 times) [2024-06-15 18:37:10,964][1651669] InferenceWorker_p0-w0: resuming experience collection (31150 times) [2024-06-15 18:37:13,204][1651669] Updated weights for policy 0, policy_version 594433 (0.0014) [2024-06-15 18:37:14,508][1651669] Updated weights for policy 0, policy_version 594495 (0.0132) [2024-06-15 18:37:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49168.2, 300 sec: 49318.6). Total num frames: 1217593344. Throughput: 0: 11753.2. Samples: 304448000. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:15,767][1648981] Avg episode reward: [(0, '430.320')] [2024-06-15 18:37:15,997][1651669] Updated weights for policy 0, policy_version 594557 (0.0014) [2024-06-15 18:37:20,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 46421.3, 300 sec: 48874.3). Total num frames: 1217691648. Throughput: 0: 11798.7. Samples: 304526336. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:20,767][1648981] Avg episode reward: [(0, '433.850')] [2024-06-15 18:37:21,005][1651669] Updated weights for policy 0, policy_version 594594 (0.0013) [2024-06-15 18:37:22,380][1651669] Updated weights for policy 0, policy_version 594656 (0.0011) [2024-06-15 18:37:23,533][1651669] Updated weights for policy 0, policy_version 594704 (0.0011) [2024-06-15 18:37:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1218052096. Throughput: 0: 11798.8. Samples: 304556544. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:25,767][1648981] Avg episode reward: [(0, '452.170')] [2024-06-15 18:37:25,997][1651669] Updated weights for policy 0, policy_version 594755 (0.0012) [2024-06-15 18:37:27,301][1651669] Updated weights for policy 0, policy_version 594813 (0.0011) [2024-06-15 18:37:30,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 48874.3). Total num frames: 1218183168. Throughput: 0: 11821.5. Samples: 304636416. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:30,767][1648981] Avg episode reward: [(0, '447.720')] [2024-06-15 18:37:32,459][1651669] Updated weights for policy 0, policy_version 594880 (0.0100) [2024-06-15 18:37:33,671][1651669] Updated weights for policy 0, policy_version 594943 (0.0011) [2024-06-15 18:37:35,562][1651669] Updated weights for policy 0, policy_version 594999 (0.0013) [2024-06-15 18:37:35,779][1648981] Fps is (10 sec: 52362.4, 60 sec: 48595.6, 300 sec: 49427.6). Total num frames: 1218576384. Throughput: 0: 11511.3. Samples: 304694272. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:35,780][1648981] Avg episode reward: [(0, '455.430')] [2024-06-15 18:37:37,532][1651669] Updated weights for policy 0, policy_version 595040 (0.0010) [2024-06-15 18:37:40,788][1648981] Fps is (10 sec: 52318.2, 60 sec: 46951.1, 300 sec: 49204.0). Total num frames: 1218707456. Throughput: 0: 11793.6. Samples: 304734720. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:40,788][1648981] Avg episode reward: [(0, '476.640')] [2024-06-15 18:37:42,252][1651669] Updated weights for policy 0, policy_version 595089 (0.0026) [2024-06-15 18:37:43,458][1651669] Updated weights for policy 0, policy_version 595152 (0.0012) [2024-06-15 18:37:44,253][1651669] Updated weights for policy 0, policy_version 595199 (0.0020) [2024-06-15 18:37:45,768][1648981] Fps is (10 sec: 49204.9, 60 sec: 49696.5, 300 sec: 49429.4). Total num frames: 1219067904. Throughput: 0: 11968.9. Samples: 304813568. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:45,769][1648981] Avg episode reward: [(0, '470.030')] [2024-06-15 18:37:45,939][1651669] Updated weights for policy 0, policy_version 595257 (0.0015) [2024-06-15 18:37:47,492][1651669] Updated weights for policy 0, policy_version 595298 (0.0011) [2024-06-15 18:37:50,766][1648981] Fps is (10 sec: 52539.7, 60 sec: 48060.6, 300 sec: 49318.7). Total num frames: 1219231744. Throughput: 0: 12253.9. Samples: 304897536. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:50,767][1648981] Avg episode reward: [(0, '460.800')] [2024-06-15 18:37:51,952][1651669] Updated weights for policy 0, policy_version 595344 (0.0013) [2024-06-15 18:37:52,056][1651274] Signal inference workers to stop experience collection... (31200 times) [2024-06-15 18:37:52,088][1651669] InferenceWorker_p0-w0: stopping experience collection (31200 times) [2024-06-15 18:37:52,350][1651274] Signal inference workers to resume experience collection... (31200 times) [2024-06-15 18:37:52,351][1651669] InferenceWorker_p0-w0: resuming experience collection (31200 times) [2024-06-15 18:37:54,304][1651669] Updated weights for policy 0, policy_version 595440 (0.0011) [2024-06-15 18:37:55,414][1651669] Updated weights for policy 0, policy_version 595490 (0.0012) [2024-06-15 18:37:55,766][1648981] Fps is (10 sec: 52438.6, 60 sec: 49699.2, 300 sec: 49651.8). Total num frames: 1219592192. Throughput: 0: 12094.6. Samples: 304933376. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:37:55,767][1648981] Avg episode reward: [(0, '466.990')] [2024-06-15 18:37:57,337][1651669] Updated weights for policy 0, policy_version 595538 (0.0013) [2024-06-15 18:38:00,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 48056.8, 300 sec: 49318.0). Total num frames: 1219756032. Throughput: 0: 12514.5. Samples: 305011200. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:00,771][1648981] Avg episode reward: [(0, '460.450')] [2024-06-15 18:38:01,910][1651669] Updated weights for policy 0, policy_version 595606 (0.0011) [2024-06-15 18:38:03,050][1651669] Updated weights for policy 0, policy_version 595659 (0.0010) [2024-06-15 18:38:03,961][1651669] Updated weights for policy 0, policy_version 595711 (0.0012) [2024-06-15 18:38:05,766][1648981] Fps is (10 sec: 55705.8, 60 sec: 50244.3, 300 sec: 49763.9). Total num frames: 1220149248. Throughput: 0: 12435.9. Samples: 305085952. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:05,767][1648981] Avg episode reward: [(0, '457.450')] [2024-06-15 18:38:07,823][1651669] Updated weights for policy 0, policy_version 595792 (0.0014) [2024-06-15 18:38:10,766][1648981] Fps is (10 sec: 52448.4, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 1220280320. Throughput: 0: 12458.7. Samples: 305117184. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:10,767][1648981] Avg episode reward: [(0, '432.290')] [2024-06-15 18:38:12,489][1651669] Updated weights for policy 0, policy_version 595856 (0.0013) [2024-06-15 18:38:13,473][1651669] Updated weights for policy 0, policy_version 595904 (0.0011) [2024-06-15 18:38:14,730][1651669] Updated weights for policy 0, policy_version 595968 (0.0012) [2024-06-15 18:38:15,774][1648981] Fps is (10 sec: 45839.6, 60 sec: 50237.7, 300 sec: 49539.5). Total num frames: 1220608000. Throughput: 0: 12524.8. Samples: 305200128. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:15,775][1648981] Avg episode reward: [(0, '469.260')] [2024-06-15 18:38:18,795][1651669] Updated weights for policy 0, policy_version 596048 (0.0014) [2024-06-15 18:38:19,930][1651669] Updated weights for policy 0, policy_version 596094 (0.0012) [2024-06-15 18:38:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 51882.7, 300 sec: 49318.6). Total num frames: 1220804608. Throughput: 0: 12735.3. Samples: 305267200. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:20,767][1648981] Avg episode reward: [(0, '452.760')] [2024-06-15 18:38:23,699][1651669] Updated weights for policy 0, policy_version 596131 (0.0011) [2024-06-15 18:38:25,107][1651669] Updated weights for policy 0, policy_version 596162 (0.0014) [2024-06-15 18:38:25,767][1648981] Fps is (10 sec: 39351.6, 60 sec: 49151.9, 300 sec: 49321.9). Total num frames: 1221001216. Throughput: 0: 12760.4. Samples: 305308672. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:25,767][1648981] Avg episode reward: [(0, '443.690')] [2024-06-15 18:38:26,834][1651669] Updated weights for policy 0, policy_version 596256 (0.0099) [2024-06-15 18:38:27,525][1651669] Updated weights for policy 0, policy_version 596288 (0.0019) [2024-06-15 18:38:29,254][1651274] Signal inference workers to stop experience collection... (31250 times) [2024-06-15 18:38:29,350][1651669] InferenceWorker_p0-w0: stopping experience collection (31250 times) [2024-06-15 18:38:29,612][1651274] Signal inference workers to resume experience collection... (31250 times) [2024-06-15 18:38:29,613][1651669] InferenceWorker_p0-w0: resuming experience collection (31250 times) [2024-06-15 18:38:29,883][1651669] Updated weights for policy 0, policy_version 596345 (0.0089) [2024-06-15 18:38:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 52428.8, 300 sec: 49318.6). Total num frames: 1221328896. Throughput: 0: 12573.0. Samples: 305379328. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:30,767][1648981] Avg episode reward: [(0, '455.360')] [2024-06-15 18:38:33,493][1651669] Updated weights for policy 0, policy_version 596387 (0.0015) [2024-06-15 18:38:35,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 48069.9, 300 sec: 49207.8). Total num frames: 1221459968. Throughput: 0: 12424.5. Samples: 305456640. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:35,767][1648981] Avg episode reward: [(0, '454.820')] [2024-06-15 18:38:36,194][1651669] Updated weights for policy 0, policy_version 596432 (0.0010) [2024-06-15 18:38:37,638][1651669] Updated weights for policy 0, policy_version 596496 (0.0012) [2024-06-15 18:38:40,239][1651669] Updated weights for policy 0, policy_version 596560 (0.0014) [2024-06-15 18:38:40,767][1648981] Fps is (10 sec: 45873.3, 60 sec: 51354.2, 300 sec: 49096.5). Total num frames: 1221787648. Throughput: 0: 12276.5. Samples: 305485824. Policy #0 lag: (min: 84.0, avg: 167.8, max: 340.0) [2024-06-15 18:38:40,768][1648981] Avg episode reward: [(0, '443.950')] [2024-06-15 18:38:41,519][1651669] Updated weights for policy 0, policy_version 596606 (0.0012) [2024-06-15 18:38:45,530][1651669] Updated weights for policy 0, policy_version 596665 (0.0014) [2024-06-15 18:38:45,770][1648981] Fps is (10 sec: 52409.2, 60 sec: 48604.4, 300 sec: 49208.2). Total num frames: 1221984256. Throughput: 0: 12231.1. Samples: 305561600. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:38:45,771][1648981] Avg episode reward: [(0, '464.040')] [2024-06-15 18:38:48,155][1651669] Updated weights for policy 0, policy_version 596731 (0.0011) [2024-06-15 18:38:49,815][1651669] Updated weights for policy 0, policy_version 596784 (0.0024) [2024-06-15 18:38:50,689][1651669] Updated weights for policy 0, policy_version 596818 (0.0050) [2024-06-15 18:38:50,766][1648981] Fps is (10 sec: 49154.0, 60 sec: 50790.4, 300 sec: 48989.3). Total num frames: 1222279168. Throughput: 0: 12049.1. Samples: 305628160. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:38:50,767][1648981] Avg episode reward: [(0, '462.260')] [2024-06-15 18:38:55,156][1651669] Updated weights for policy 0, policy_version 596880 (0.0013) [2024-06-15 18:38:55,767][1648981] Fps is (10 sec: 45891.2, 60 sec: 47513.4, 300 sec: 48876.8). Total num frames: 1222443008. Throughput: 0: 12310.7. Samples: 305671168. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:38:55,767][1648981] Avg episode reward: [(0, '476.020')] [2024-06-15 18:38:56,066][1651669] Updated weights for policy 0, policy_version 596921 (0.0020) [2024-06-15 18:38:56,154][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000596928_1222508544.pth... [2024-06-15 18:38:56,230][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000591232_1210843136.pth [2024-06-15 18:38:58,494][1651669] Updated weights for policy 0, policy_version 596976 (0.0133) [2024-06-15 18:39:00,274][1651669] Updated weights for policy 0, policy_version 597047 (0.0015) [2024-06-15 18:39:00,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50247.4, 300 sec: 49096.5). Total num frames: 1222770688. Throughput: 0: 12051.1. Samples: 305742336. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:00,767][1648981] Avg episode reward: [(0, '478.200')] [2024-06-15 18:39:01,923][1651669] Updated weights for policy 0, policy_version 597104 (0.0012) [2024-06-15 18:39:05,774][1648981] Fps is (10 sec: 45841.0, 60 sec: 45869.3, 300 sec: 48876.0). Total num frames: 1222901760. Throughput: 0: 12285.9. Samples: 305820160. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:05,775][1648981] Avg episode reward: [(0, '481.390')] [2024-06-15 18:39:06,709][1651669] Updated weights for policy 0, policy_version 597159 (0.0023) [2024-06-15 18:39:08,575][1651669] Updated weights for policy 0, policy_version 597205 (0.0012) [2024-06-15 18:39:09,844][1651669] Updated weights for policy 0, policy_version 597266 (0.0018) [2024-06-15 18:39:10,768][1648981] Fps is (10 sec: 49146.3, 60 sec: 49697.2, 300 sec: 49207.3). Total num frames: 1223262208. Throughput: 0: 12242.2. Samples: 305859584. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:10,768][1648981] Avg episode reward: [(0, '488.190')] [2024-06-15 18:39:11,096][1651274] Signal inference workers to stop experience collection... (31300 times) [2024-06-15 18:39:11,171][1651669] InferenceWorker_p0-w0: stopping experience collection (31300 times) [2024-06-15 18:39:11,173][1651669] Updated weights for policy 0, policy_version 597317 (0.0011) [2024-06-15 18:39:11,362][1651274] Signal inference workers to resume experience collection... (31300 times) [2024-06-15 18:39:11,362][1651669] InferenceWorker_p0-w0: resuming experience collection (31300 times) [2024-06-15 18:39:12,502][1651669] Updated weights for policy 0, policy_version 597367 (0.0012) [2024-06-15 18:39:15,766][1648981] Fps is (10 sec: 52469.1, 60 sec: 46973.5, 300 sec: 48874.3). Total num frames: 1223426048. Throughput: 0: 12231.1. Samples: 305929728. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:15,767][1648981] Avg episode reward: [(0, '483.750')] [2024-06-15 18:39:16,641][1651669] Updated weights for policy 0, policy_version 597408 (0.0114) [2024-06-15 18:39:18,580][1651669] Updated weights for policy 0, policy_version 597443 (0.0014) [2024-06-15 18:39:20,520][1651669] Updated weights for policy 0, policy_version 597525 (0.0010) [2024-06-15 18:39:20,766][1648981] Fps is (10 sec: 49157.9, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1223753728. Throughput: 0: 12310.8. Samples: 306010624. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:20,767][1648981] Avg episode reward: [(0, '488.470')] [2024-06-15 18:39:21,927][1651669] Updated weights for policy 0, policy_version 597585 (0.0010) [2024-06-15 18:39:22,920][1651669] Updated weights for policy 0, policy_version 597627 (0.0012) [2024-06-15 18:39:25,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.1, 300 sec: 48985.4). Total num frames: 1223950336. Throughput: 0: 12390.5. Samples: 306043392. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:25,768][1648981] Avg episode reward: [(0, '490.080')] [2024-06-15 18:39:27,292][1651669] Updated weights for policy 0, policy_version 597690 (0.0010) [2024-06-15 18:39:29,935][1651669] Updated weights for policy 0, policy_version 597733 (0.0012) [2024-06-15 18:39:30,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 1224245248. Throughput: 0: 12539.4. Samples: 306125824. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:30,767][1648981] Avg episode reward: [(0, '466.470')] [2024-06-15 18:39:31,180][1651669] Updated weights for policy 0, policy_version 597793 (0.0011) [2024-06-15 18:39:32,967][1651669] Updated weights for policy 0, policy_version 597872 (0.0120) [2024-06-15 18:39:35,787][1648981] Fps is (10 sec: 52319.4, 60 sec: 50226.7, 300 sec: 49093.0). Total num frames: 1224474624. Throughput: 0: 12521.1. Samples: 306191872. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:35,788][1648981] Avg episode reward: [(0, '444.720')] [2024-06-15 18:39:37,808][1651669] Updated weights for policy 0, policy_version 597924 (0.0015) [2024-06-15 18:39:40,110][1651669] Updated weights for policy 0, policy_version 597955 (0.0011) [2024-06-15 18:39:40,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48060.0, 300 sec: 48652.8). Total num frames: 1224671232. Throughput: 0: 12424.6. Samples: 306230272. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:40,767][1648981] Avg episode reward: [(0, '436.540')] [2024-06-15 18:39:42,032][1651669] Updated weights for policy 0, policy_version 598034 (0.0011) [2024-06-15 18:39:43,214][1651669] Updated weights for policy 0, policy_version 598096 (0.0013) [2024-06-15 18:39:44,426][1651669] Updated weights for policy 0, policy_version 598141 (0.0012) [2024-06-15 18:39:45,766][1648981] Fps is (10 sec: 52538.8, 60 sec: 50247.4, 300 sec: 49097.8). Total num frames: 1224998912. Throughput: 0: 12299.4. Samples: 306295808. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:45,767][1648981] Avg episode reward: [(0, '442.350')] [2024-06-15 18:39:49,069][1651669] Updated weights for policy 0, policy_version 598192 (0.0017) [2024-06-15 18:39:50,455][1651669] Updated weights for policy 0, policy_version 598215 (0.0010) [2024-06-15 18:39:50,787][1648981] Fps is (10 sec: 49054.1, 60 sec: 48043.7, 300 sec: 48537.8). Total num frames: 1225162752. Throughput: 0: 12557.6. Samples: 306385408. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:50,787][1648981] Avg episode reward: [(0, '436.830')] [2024-06-15 18:39:51,279][1651274] Signal inference workers to stop experience collection... (31350 times) [2024-06-15 18:39:51,312][1651669] InferenceWorker_p0-w0: stopping experience collection (31350 times) [2024-06-15 18:39:51,497][1651274] Signal inference workers to resume experience collection... (31350 times) [2024-06-15 18:39:51,498][1651669] InferenceWorker_p0-w0: resuming experience collection (31350 times) [2024-06-15 18:39:51,675][1651669] Updated weights for policy 0, policy_version 598273 (0.0012) [2024-06-15 18:39:53,577][1651669] Updated weights for policy 0, policy_version 598355 (0.0014) [2024-06-15 18:39:54,379][1651669] Updated weights for policy 0, policy_version 598400 (0.0012) [2024-06-15 18:39:55,781][1648981] Fps is (10 sec: 52354.0, 60 sec: 51324.5, 300 sec: 49205.2). Total num frames: 1225523200. Throughput: 0: 12250.3. Samples: 306411008. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:39:55,781][1648981] Avg episode reward: [(0, '433.260')] [2024-06-15 18:39:59,525][1651669] Updated weights for policy 0, policy_version 598457 (0.0049) [2024-06-15 18:40:00,766][1648981] Fps is (10 sec: 49250.6, 60 sec: 48059.7, 300 sec: 48430.1). Total num frames: 1225654272. Throughput: 0: 12413.2. Samples: 306488320. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:40:00,767][1648981] Avg episode reward: [(0, '428.920')] [2024-06-15 18:40:02,222][1651669] Updated weights for policy 0, policy_version 598512 (0.0030) [2024-06-15 18:40:03,943][1651669] Updated weights for policy 0, policy_version 598596 (0.0012) [2024-06-15 18:40:05,341][1651669] Updated weights for policy 0, policy_version 598656 (0.0015) [2024-06-15 18:40:05,766][1648981] Fps is (10 sec: 52503.8, 60 sec: 52435.5, 300 sec: 49429.7). Total num frames: 1226047488. Throughput: 0: 12162.8. Samples: 306557952. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:40:05,767][1648981] Avg episode reward: [(0, '414.510')] [2024-06-15 18:40:09,917][1651669] Updated weights for policy 0, policy_version 598720 (0.0012) [2024-06-15 18:40:10,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48606.8, 300 sec: 48763.2). Total num frames: 1226178560. Throughput: 0: 12435.9. Samples: 306603008. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:40:10,767][1648981] Avg episode reward: [(0, '414.770')] [2024-06-15 18:40:13,926][1651669] Updated weights for policy 0, policy_version 598791 (0.0014) [2024-06-15 18:40:15,767][1648981] Fps is (10 sec: 42598.1, 60 sec: 50790.3, 300 sec: 49319.8). Total num frames: 1226473472. Throughput: 0: 12208.3. Samples: 306675200. Policy #0 lag: (min: 15.0, avg: 116.8, max: 271.0) [2024-06-15 18:40:15,767][1648981] Avg episode reward: [(0, '413.920')] [2024-06-15 18:40:16,631][1651669] Updated weights for policy 0, policy_version 598896 (0.0019) [2024-06-15 18:40:20,554][1651669] Updated weights for policy 0, policy_version 598928 (0.0042) [2024-06-15 18:40:20,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 1226604544. Throughput: 0: 12271.0. Samples: 306743808. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:20,794][1648981] Avg episode reward: [(0, '439.310')] [2024-06-15 18:40:21,570][1651669] Updated weights for policy 0, policy_version 598969 (0.0011) [2024-06-15 18:40:22,881][1651669] Updated weights for policy 0, policy_version 599010 (0.0011) [2024-06-15 18:40:25,641][1651669] Updated weights for policy 0, policy_version 599063 (0.0011) [2024-06-15 18:40:25,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1226899456. Throughput: 0: 12185.6. Samples: 306778624. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:25,767][1648981] Avg episode reward: [(0, '440.160')] [2024-06-15 18:40:27,248][1651669] Updated weights for policy 0, policy_version 599136 (0.0136) [2024-06-15 18:40:30,668][1651669] Updated weights for policy 0, policy_version 599170 (0.0011) [2024-06-15 18:40:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 48652.1). Total num frames: 1227096064. Throughput: 0: 12276.6. Samples: 306848256. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:30,767][1648981] Avg episode reward: [(0, '473.110')] [2024-06-15 18:40:31,466][1651274] Signal inference workers to stop experience collection... (31400 times) [2024-06-15 18:40:31,518][1651669] InferenceWorker_p0-w0: stopping experience collection (31400 times) [2024-06-15 18:40:31,647][1651274] Signal inference workers to resume experience collection... (31400 times) [2024-06-15 18:40:31,648][1651669] InferenceWorker_p0-w0: resuming experience collection (31400 times) [2024-06-15 18:40:31,925][1651669] Updated weights for policy 0, policy_version 599230 (0.0013) [2024-06-15 18:40:33,526][1651669] Updated weights for policy 0, policy_version 599293 (0.0012) [2024-06-15 18:40:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48076.5, 300 sec: 48874.3). Total num frames: 1227358208. Throughput: 0: 12088.6. Samples: 306929152. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:35,767][1648981] Avg episode reward: [(0, '479.610')] [2024-06-15 18:40:37,326][1651669] Updated weights for policy 0, policy_version 599360 (0.0018) [2024-06-15 18:40:38,827][1651669] Updated weights for policy 0, policy_version 599424 (0.0011) [2024-06-15 18:40:40,774][1648981] Fps is (10 sec: 52387.6, 60 sec: 49145.6, 300 sec: 48873.6). Total num frames: 1227620352. Throughput: 0: 12039.4. Samples: 306952704. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:40,775][1648981] Avg episode reward: [(0, '486.320')] [2024-06-15 18:40:42,737][1651669] Updated weights for policy 0, policy_version 599484 (0.0012) [2024-06-15 18:40:44,576][1651669] Updated weights for policy 0, policy_version 599526 (0.0013) [2024-06-15 18:40:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1227882496. Throughput: 0: 12049.1. Samples: 307030528. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:45,767][1648981] Avg episode reward: [(0, '497.010')] [2024-06-15 18:40:47,294][1651669] Updated weights for policy 0, policy_version 599568 (0.0036) [2024-06-15 18:40:49,195][1651669] Updated weights for policy 0, policy_version 599650 (0.0012) [2024-06-15 18:40:50,790][1648981] Fps is (10 sec: 52346.8, 60 sec: 49695.2, 300 sec: 49095.8). Total num frames: 1228144640. Throughput: 0: 11997.3. Samples: 307098112. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:50,790][1648981] Avg episode reward: [(0, '507.280')] [2024-06-15 18:40:51,999][1651669] Updated weights for policy 0, policy_version 599696 (0.0012) [2024-06-15 18:40:53,160][1651669] Updated weights for policy 0, policy_version 599741 (0.0011) [2024-06-15 18:40:55,371][1651669] Updated weights for policy 0, policy_version 599778 (0.0012) [2024-06-15 18:40:55,774][1648981] Fps is (10 sec: 49116.4, 60 sec: 47519.2, 300 sec: 48762.1). Total num frames: 1228374016. Throughput: 0: 11944.7. Samples: 307140608. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:40:55,774][1648981] Avg episode reward: [(0, '550.920')] [2024-06-15 18:40:55,889][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000599808_1228406784.pth... [2024-06-15 18:40:55,978][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000594048_1216610304.pth [2024-06-15 18:40:58,188][1651669] Updated weights for policy 0, policy_version 599827 (0.0016) [2024-06-15 18:40:59,734][1651669] Updated weights for policy 0, policy_version 599893 (0.0014) [2024-06-15 18:41:00,771][1648981] Fps is (10 sec: 52527.8, 60 sec: 50240.4, 300 sec: 49317.8). Total num frames: 1228668928. Throughput: 0: 11979.6. Samples: 307214336. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:00,772][1648981] Avg episode reward: [(0, '545.440')] [2024-06-15 18:41:03,321][1651669] Updated weights for policy 0, policy_version 599968 (0.0014) [2024-06-15 18:41:05,747][1651669] Updated weights for policy 0, policy_version 600017 (0.0012) [2024-06-15 18:41:05,773][1648981] Fps is (10 sec: 45876.2, 60 sec: 46415.9, 300 sec: 48762.1). Total num frames: 1228832768. Throughput: 0: 12149.6. Samples: 307290624. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:05,787][1648981] Avg episode reward: [(0, '539.820')] [2024-06-15 18:41:08,769][1651669] Updated weights for policy 0, policy_version 600080 (0.0011) [2024-06-15 18:41:10,767][1648981] Fps is (10 sec: 42615.2, 60 sec: 48605.2, 300 sec: 48988.5). Total num frames: 1229094912. Throughput: 0: 12219.5. Samples: 307328512. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:10,768][1648981] Avg episode reward: [(0, '533.050')] [2024-06-15 18:41:11,047][1651669] Updated weights for policy 0, policy_version 600160 (0.0015) [2024-06-15 18:41:14,098][1651274] Signal inference workers to stop experience collection... (31450 times) [2024-06-15 18:41:14,140][1651669] InferenceWorker_p0-w0: stopping experience collection (31450 times) [2024-06-15 18:41:14,510][1651274] Signal inference workers to resume experience collection... (31450 times) [2024-06-15 18:41:14,511][1651669] InferenceWorker_p0-w0: resuming experience collection (31450 times) [2024-06-15 18:41:15,136][1651669] Updated weights for policy 0, policy_version 600225 (0.0011) [2024-06-15 18:41:15,766][1648981] Fps is (10 sec: 49186.8, 60 sec: 47513.7, 300 sec: 48874.3). Total num frames: 1229324288. Throughput: 0: 11969.4. Samples: 307386880. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:15,767][1648981] Avg episode reward: [(0, '537.000')] [2024-06-15 18:41:17,500][1651669] Updated weights for policy 0, policy_version 600290 (0.0015) [2024-06-15 18:41:19,865][1651669] Updated weights for policy 0, policy_version 600326 (0.0017) [2024-06-15 18:41:20,766][1648981] Fps is (10 sec: 42601.7, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1229520896. Throughput: 0: 11946.7. Samples: 307466752. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:20,767][1648981] Avg episode reward: [(0, '542.150')] [2024-06-15 18:41:21,271][1651669] Updated weights for policy 0, policy_version 600384 (0.0012) [2024-06-15 18:41:22,815][1651669] Updated weights for policy 0, policy_version 600444 (0.0014) [2024-06-15 18:41:25,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 1229750272. Throughput: 0: 12062.5. Samples: 307495424. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:25,767][1648981] Avg episode reward: [(0, '562.630')] [2024-06-15 18:41:26,722][1651669] Updated weights for policy 0, policy_version 600506 (0.0013) [2024-06-15 18:41:29,184][1651669] Updated weights for policy 0, policy_version 600573 (0.0012) [2024-06-15 18:41:30,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 48059.6, 300 sec: 48541.0). Total num frames: 1229979648. Throughput: 0: 11867.0. Samples: 307564544. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:30,767][1648981] Avg episode reward: [(0, '539.620')] [2024-06-15 18:41:32,757][1651669] Updated weights for policy 0, policy_version 600641 (0.0108) [2024-06-15 18:41:34,276][1651669] Updated weights for policy 0, policy_version 600701 (0.0011) [2024-06-15 18:41:35,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 48652.2). Total num frames: 1230241792. Throughput: 0: 11850.5. Samples: 307631104. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:35,767][1648981] Avg episode reward: [(0, '547.110')] [2024-06-15 18:41:38,020][1651669] Updated weights for policy 0, policy_version 600752 (0.0015) [2024-06-15 18:41:40,114][1651669] Updated weights for policy 0, policy_version 600825 (0.0023) [2024-06-15 18:41:40,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 48065.9, 300 sec: 48874.3). Total num frames: 1230503936. Throughput: 0: 11766.5. Samples: 307670016. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:40,767][1648981] Avg episode reward: [(0, '544.900')] [2024-06-15 18:41:42,757][1651669] Updated weights for policy 0, policy_version 600864 (0.0012) [2024-06-15 18:41:44,450][1651669] Updated weights for policy 0, policy_version 600928 (0.0011) [2024-06-15 18:41:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 48874.5). Total num frames: 1230766080. Throughput: 0: 11743.1. Samples: 307742720. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:45,767][1648981] Avg episode reward: [(0, '515.570')] [2024-06-15 18:41:47,557][1651669] Updated weights for policy 0, policy_version 600962 (0.0011) [2024-06-15 18:41:48,957][1651669] Updated weights for policy 0, policy_version 601021 (0.0011) [2024-06-15 18:41:50,399][1651669] Updated weights for policy 0, policy_version 601088 (0.0012) [2024-06-15 18:41:50,772][1648981] Fps is (10 sec: 52401.3, 60 sec: 48074.2, 300 sec: 48873.6). Total num frames: 1231028224. Throughput: 0: 11639.9. Samples: 307814400. Policy #0 lag: (min: 2.0, avg: 106.5, max: 258.0) [2024-06-15 18:41:50,772][1648981] Avg episode reward: [(0, '522.020')] [2024-06-15 18:41:54,302][1651669] Updated weights for policy 0, policy_version 601153 (0.0014) [2024-06-15 18:41:54,679][1651274] Signal inference workers to stop experience collection... (31500 times) [2024-06-15 18:41:54,719][1651669] InferenceWorker_p0-w0: stopping experience collection (31500 times) [2024-06-15 18:41:54,910][1651274] Signal inference workers to resume experience collection... (31500 times) [2024-06-15 18:41:54,923][1651669] InferenceWorker_p0-w0: resuming experience collection (31500 times) [2024-06-15 18:41:55,581][1651669] Updated weights for policy 0, policy_version 601209 (0.0011) [2024-06-15 18:41:55,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 48611.7, 300 sec: 48874.3). Total num frames: 1231290368. Throughput: 0: 11810.3. Samples: 307859968. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:41:55,767][1648981] Avg episode reward: [(0, '510.360')] [2024-06-15 18:41:59,143][1651669] Updated weights for policy 0, policy_version 601250 (0.0010) [2024-06-15 18:42:00,715][1651669] Updated weights for policy 0, policy_version 601312 (0.0022) [2024-06-15 18:42:00,795][1648981] Fps is (10 sec: 45766.6, 60 sec: 46948.3, 300 sec: 48647.3). Total num frames: 1231486976. Throughput: 0: 11882.1. Samples: 307921920. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:00,796][1648981] Avg episode reward: [(0, '513.520')] [2024-06-15 18:42:03,776][1651669] Updated weights for policy 0, policy_version 601347 (0.0010) [2024-06-15 18:42:05,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48065.3, 300 sec: 48874.3). Total num frames: 1231716352. Throughput: 0: 11992.2. Samples: 308006400. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:05,767][1648981] Avg episode reward: [(0, '459.660')] [2024-06-15 18:42:06,044][1651669] Updated weights for policy 0, policy_version 601442 (0.0012) [2024-06-15 18:42:09,066][1651669] Updated weights for policy 0, policy_version 601509 (0.0012) [2024-06-15 18:42:10,766][1648981] Fps is (10 sec: 46009.6, 60 sec: 47514.3, 300 sec: 48652.2). Total num frames: 1231945728. Throughput: 0: 12094.6. Samples: 308039680. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:10,767][1648981] Avg episode reward: [(0, '457.980')] [2024-06-15 18:42:12,116][1651669] Updated weights for policy 0, policy_version 601590 (0.0086) [2024-06-15 18:42:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46967.4, 300 sec: 48985.4). Total num frames: 1232142336. Throughput: 0: 12242.5. Samples: 308115456. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:15,767][1648981] Avg episode reward: [(0, '448.840')] [2024-06-15 18:42:16,190][1651669] Updated weights for policy 0, policy_version 601651 (0.0021) [2024-06-15 18:42:19,739][1651669] Updated weights for policy 0, policy_version 601749 (0.0017) [2024-06-15 18:42:20,767][1648981] Fps is (10 sec: 52426.9, 60 sec: 49151.8, 300 sec: 48874.3). Total num frames: 1232470016. Throughput: 0: 12014.8. Samples: 308171776. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:20,767][1648981] Avg episode reward: [(0, '448.650')] [2024-06-15 18:42:23,009][1651669] Updated weights for policy 0, policy_version 601824 (0.0031) [2024-06-15 18:42:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 48874.3). Total num frames: 1232601088. Throughput: 0: 11958.1. Samples: 308208128. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:25,767][1648981] Avg episode reward: [(0, '450.090')] [2024-06-15 18:42:26,131][1651669] Updated weights for policy 0, policy_version 601857 (0.0016) [2024-06-15 18:42:28,164][1651669] Updated weights for policy 0, policy_version 601924 (0.0012) [2024-06-15 18:42:29,452][1651669] Updated weights for policy 0, policy_version 601978 (0.0013) [2024-06-15 18:42:30,778][1648981] Fps is (10 sec: 39276.4, 60 sec: 48050.4, 300 sec: 48430.1). Total num frames: 1232863232. Throughput: 0: 11863.9. Samples: 308276736. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:30,779][1648981] Avg episode reward: [(0, '442.320')] [2024-06-15 18:42:31,886][1651669] Updated weights for policy 0, policy_version 602038 (0.0021) [2024-06-15 18:42:35,046][1651669] Updated weights for policy 0, policy_version 602103 (0.0067) [2024-06-15 18:42:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48877.8). Total num frames: 1233125376. Throughput: 0: 11811.5. Samples: 308345856. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:35,767][1648981] Avg episode reward: [(0, '469.880')] [2024-06-15 18:42:39,086][1651274] Signal inference workers to stop experience collection... (31550 times) [2024-06-15 18:42:39,134][1651669] InferenceWorker_p0-w0: stopping experience collection (31550 times) [2024-06-15 18:42:39,372][1651274] Signal inference workers to resume experience collection... (31550 times) [2024-06-15 18:42:39,382][1651669] InferenceWorker_p0-w0: resuming experience collection (31550 times) [2024-06-15 18:42:39,559][1651669] Updated weights for policy 0, policy_version 602177 (0.0011) [2024-06-15 18:42:40,767][1648981] Fps is (10 sec: 49208.7, 60 sec: 47513.5, 300 sec: 48430.3). Total num frames: 1233354752. Throughput: 0: 11753.2. Samples: 308388864. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:40,767][1648981] Avg episode reward: [(0, '469.330')] [2024-06-15 18:42:40,915][1651669] Updated weights for policy 0, policy_version 602235 (0.0012) [2024-06-15 18:42:43,127][1651669] Updated weights for policy 0, policy_version 602288 (0.0022) [2024-06-15 18:42:45,464][1651669] Updated weights for policy 0, policy_version 602358 (0.0012) [2024-06-15 18:42:45,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 48874.3). Total num frames: 1233649664. Throughput: 0: 11920.2. Samples: 308457984. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:45,767][1648981] Avg episode reward: [(0, '488.450')] [2024-06-15 18:42:48,982][1651669] Updated weights for policy 0, policy_version 602387 (0.0012) [2024-06-15 18:42:50,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 46425.6, 300 sec: 48207.9). Total num frames: 1233813504. Throughput: 0: 11616.7. Samples: 308529152. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:50,767][1648981] Avg episode reward: [(0, '485.800')] [2024-06-15 18:42:50,832][1651669] Updated weights for policy 0, policy_version 602464 (0.0013) [2024-06-15 18:42:54,592][1651669] Updated weights for policy 0, policy_version 602529 (0.0012) [2024-06-15 18:42:55,580][1651669] Updated weights for policy 0, policy_version 602571 (0.0013) [2024-06-15 18:42:55,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 48541.7). Total num frames: 1234075648. Throughput: 0: 11684.9. Samples: 308565504. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:42:55,767][1648981] Avg episode reward: [(0, '483.620')] [2024-06-15 18:42:56,242][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000602608_1234141184.pth... [2024-06-15 18:42:56,286][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000596928_1222508544.pth [2024-06-15 18:42:56,462][1651669] Updated weights for policy 0, policy_version 602614 (0.0030) [2024-06-15 18:42:59,370][1651669] Updated weights for policy 0, policy_version 602656 (0.0010) [2024-06-15 18:43:00,802][1648981] Fps is (10 sec: 55506.6, 60 sec: 48054.4, 300 sec: 48202.0). Total num frames: 1234370560. Throughput: 0: 11789.4. Samples: 308646400. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:43:00,803][1648981] Avg episode reward: [(0, '483.890')] [2024-06-15 18:43:01,150][1651669] Updated weights for policy 0, policy_version 602736 (0.0013) [2024-06-15 18:43:04,436][1651669] Updated weights for policy 0, policy_version 602789 (0.0012) [2024-06-15 18:43:05,522][1651669] Updated weights for policy 0, policy_version 602824 (0.0010) [2024-06-15 18:43:05,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 48059.5, 300 sec: 48541.0). Total num frames: 1234599936. Throughput: 0: 12208.4. Samples: 308721152. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:43:05,768][1648981] Avg episode reward: [(0, '481.930')] [2024-06-15 18:43:08,805][1651669] Updated weights for policy 0, policy_version 602883 (0.0012) [2024-06-15 18:43:10,766][1648981] Fps is (10 sec: 46039.9, 60 sec: 48059.6, 300 sec: 48209.1). Total num frames: 1234829312. Throughput: 0: 12481.4. Samples: 308769792. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:43:10,767][1648981] Avg episode reward: [(0, '469.470')] [2024-06-15 18:43:10,839][1651669] Updated weights for policy 0, policy_version 602960 (0.0012) [2024-06-15 18:43:12,045][1651669] Updated weights for policy 0, policy_version 603008 (0.0013) [2024-06-15 18:43:15,767][1648981] Fps is (10 sec: 49151.9, 60 sec: 49151.7, 300 sec: 48429.9). Total num frames: 1235091456. Throughput: 0: 12348.0. Samples: 308832256. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:43:15,767][1648981] Avg episode reward: [(0, '479.420')] [2024-06-15 18:43:16,278][1651669] Updated weights for policy 0, policy_version 603073 (0.0150) [2024-06-15 18:43:17,146][1651669] Updated weights for policy 0, policy_version 603120 (0.0037) [2024-06-15 18:43:20,551][1651274] Signal inference workers to stop experience collection... (31600 times) [2024-06-15 18:43:20,590][1651669] InferenceWorker_p0-w0: stopping experience collection (31600 times) [2024-06-15 18:43:20,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46421.6, 300 sec: 48318.9). Total num frames: 1235255296. Throughput: 0: 12549.7. Samples: 308910592. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:43:20,767][1648981] Avg episode reward: [(0, '474.850')] [2024-06-15 18:43:20,780][1651274] Signal inference workers to resume experience collection... (31600 times) [2024-06-15 18:43:20,798][1651669] InferenceWorker_p0-w0: resuming experience collection (31600 times) [2024-06-15 18:43:20,981][1651669] Updated weights for policy 0, policy_version 603170 (0.0016) [2024-06-15 18:43:23,034][1651669] Updated weights for policy 0, policy_version 603248 (0.0012) [2024-06-15 18:43:25,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 48605.8, 300 sec: 48096.7). Total num frames: 1235517440. Throughput: 0: 12071.9. Samples: 308932096. Policy #0 lag: (min: 58.0, avg: 142.0, max: 317.0) [2024-06-15 18:43:25,767][1648981] Avg episode reward: [(0, '501.830')] [2024-06-15 18:43:26,607][1651669] Updated weights for policy 0, policy_version 603315 (0.0012) [2024-06-15 18:43:28,087][1651669] Updated weights for policy 0, policy_version 603360 (0.0012) [2024-06-15 18:43:30,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48069.1, 300 sec: 48430.0). Total num frames: 1235746816. Throughput: 0: 12322.1. Samples: 309012480. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:43:30,767][1648981] Avg episode reward: [(0, '524.030')] [2024-06-15 18:43:31,200][1651669] Updated weights for policy 0, policy_version 603408 (0.0014) [2024-06-15 18:43:33,160][1651669] Updated weights for policy 0, policy_version 603472 (0.0011) [2024-06-15 18:43:34,289][1651669] Updated weights for policy 0, policy_version 603517 (0.0015) [2024-06-15 18:43:35,790][1648981] Fps is (10 sec: 52304.7, 60 sec: 48586.6, 300 sec: 48315.1). Total num frames: 1236041728. Throughput: 0: 12327.0. Samples: 309084160. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:43:35,791][1648981] Avg episode reward: [(0, '521.910')] [2024-06-15 18:43:36,839][1651669] Updated weights for policy 0, policy_version 603583 (0.0154) [2024-06-15 18:43:39,838][1651669] Updated weights for policy 0, policy_version 603638 (0.0014) [2024-06-15 18:43:40,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48606.1, 300 sec: 48430.6). Total num frames: 1236271104. Throughput: 0: 12367.7. Samples: 309122048. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:43:40,767][1648981] Avg episode reward: [(0, '524.680')] [2024-06-15 18:43:42,541][1651669] Updated weights for policy 0, policy_version 603681 (0.0011) [2024-06-15 18:43:44,997][1651669] Updated weights for policy 0, policy_version 603772 (0.0094) [2024-06-15 18:43:45,766][1648981] Fps is (10 sec: 49269.3, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1236533248. Throughput: 0: 11967.6. Samples: 309184512. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:43:45,767][1648981] Avg episode reward: [(0, '522.790')] [2024-06-15 18:43:47,728][1651669] Updated weights for policy 0, policy_version 603824 (0.0012) [2024-06-15 18:43:50,592][1651669] Updated weights for policy 0, policy_version 603876 (0.0012) [2024-06-15 18:43:50,778][1648981] Fps is (10 sec: 45820.9, 60 sec: 48596.3, 300 sec: 48428.1). Total num frames: 1236729856. Throughput: 0: 11909.5. Samples: 309257216. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:43:50,779][1648981] Avg episode reward: [(0, '532.780')] [2024-06-15 18:43:53,439][1651669] Updated weights for policy 0, policy_version 603936 (0.0012) [2024-06-15 18:43:54,520][1651669] Updated weights for policy 0, policy_version 603984 (0.0013) [2024-06-15 18:43:55,434][1651669] Updated weights for policy 0, policy_version 604020 (0.0032) [2024-06-15 18:43:55,770][1648981] Fps is (10 sec: 52408.0, 60 sec: 49695.0, 300 sec: 48429.4). Total num frames: 1237057536. Throughput: 0: 11900.1. Samples: 309305344. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:43:55,771][1648981] Avg episode reward: [(0, '522.810')] [2024-06-15 18:43:58,489][1651669] Updated weights for policy 0, policy_version 604092 (0.0035) [2024-06-15 18:44:00,766][1648981] Fps is (10 sec: 49210.5, 60 sec: 47542.0, 300 sec: 48542.3). Total num frames: 1237221376. Throughput: 0: 11992.3. Samples: 309371904. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:00,767][1648981] Avg episode reward: [(0, '514.240')] [2024-06-15 18:44:00,934][1651274] Signal inference workers to stop experience collection... (31650 times) [2024-06-15 18:44:00,976][1651669] InferenceWorker_p0-w0: stopping experience collection (31650 times) [2024-06-15 18:44:01,144][1651274] Signal inference workers to resume experience collection... (31650 times) [2024-06-15 18:44:01,145][1651669] InferenceWorker_p0-w0: resuming experience collection (31650 times) [2024-06-15 18:44:01,288][1651669] Updated weights for policy 0, policy_version 604150 (0.0015) [2024-06-15 18:44:03,913][1651669] Updated weights for policy 0, policy_version 604181 (0.0010) [2024-06-15 18:44:05,383][1651669] Updated weights for policy 0, policy_version 604242 (0.0011) [2024-06-15 18:44:05,768][1648981] Fps is (10 sec: 45887.3, 60 sec: 48605.0, 300 sec: 48318.9). Total num frames: 1237516288. Throughput: 0: 11912.2. Samples: 309446656. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:05,768][1648981] Avg episode reward: [(0, '507.220')] [2024-06-15 18:44:06,444][1651669] Updated weights for policy 0, policy_version 604288 (0.0010) [2024-06-15 18:44:09,496][1651669] Updated weights for policy 0, policy_version 604351 (0.0052) [2024-06-15 18:44:10,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1237712896. Throughput: 0: 12265.3. Samples: 309484032. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:10,767][1648981] Avg episode reward: [(0, '526.730')] [2024-06-15 18:44:14,875][1651669] Updated weights for policy 0, policy_version 604437 (0.0012) [2024-06-15 18:44:15,766][1648981] Fps is (10 sec: 42604.2, 60 sec: 47513.9, 300 sec: 48096.8). Total num frames: 1237942272. Throughput: 0: 12128.8. Samples: 309558272. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:15,767][1648981] Avg episode reward: [(0, '524.070')] [2024-06-15 18:44:16,652][1651669] Updated weights for policy 0, policy_version 604512 (0.0011) [2024-06-15 18:44:19,962][1651669] Updated weights for policy 0, policy_version 604552 (0.0017) [2024-06-15 18:44:20,796][1648981] Fps is (10 sec: 49009.6, 60 sec: 49128.2, 300 sec: 48314.2). Total num frames: 1238204416. Throughput: 0: 12093.2. Samples: 309628416. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:20,796][1648981] Avg episode reward: [(0, '540.790')] [2024-06-15 18:44:20,986][1651669] Updated weights for policy 0, policy_version 604608 (0.0013) [2024-06-15 18:44:22,558][1651669] Updated weights for policy 0, policy_version 604656 (0.0013) [2024-06-15 18:44:25,618][1651669] Updated weights for policy 0, policy_version 604704 (0.0033) [2024-06-15 18:44:25,789][1648981] Fps is (10 sec: 49043.0, 60 sec: 48588.0, 300 sec: 48093.1). Total num frames: 1238433792. Throughput: 0: 12054.5. Samples: 309664768. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:25,789][1648981] Avg episode reward: [(0, '538.710')] [2024-06-15 18:44:27,021][1651669] Updated weights for policy 0, policy_version 604753 (0.0012) [2024-06-15 18:44:28,136][1651669] Updated weights for policy 0, policy_version 604796 (0.0016) [2024-06-15 18:44:30,786][1648981] Fps is (10 sec: 49197.9, 60 sec: 49136.0, 300 sec: 48208.0). Total num frames: 1238695936. Throughput: 0: 12248.5. Samples: 309735936. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:30,787][1648981] Avg episode reward: [(0, '561.350')] [2024-06-15 18:44:31,159][1651669] Updated weights for policy 0, policy_version 604851 (0.0011) [2024-06-15 18:44:33,094][1651669] Updated weights for policy 0, policy_version 604883 (0.0012) [2024-06-15 18:44:35,704][1651669] Updated weights for policy 0, policy_version 604930 (0.0012) [2024-06-15 18:44:35,769][1648981] Fps is (10 sec: 45970.1, 60 sec: 47531.2, 300 sec: 48207.6). Total num frames: 1238892544. Throughput: 0: 12370.5. Samples: 309813760. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:35,776][1648981] Avg episode reward: [(0, '561.460')] [2024-06-15 18:44:36,888][1651669] Updated weights for policy 0, policy_version 604978 (0.0010) [2024-06-15 18:44:38,502][1651669] Updated weights for policy 0, policy_version 605051 (0.0022) [2024-06-15 18:44:40,766][1648981] Fps is (10 sec: 49249.2, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 1239187456. Throughput: 0: 11959.1. Samples: 309843456. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:40,767][1648981] Avg episode reward: [(0, '551.620')] [2024-06-15 18:44:41,597][1651669] Updated weights for policy 0, policy_version 605109 (0.0011) [2024-06-15 18:44:42,843][1651274] Signal inference workers to stop experience collection... (31700 times) [2024-06-15 18:44:42,870][1651669] InferenceWorker_p0-w0: stopping experience collection (31700 times) [2024-06-15 18:44:43,066][1651274] Signal inference workers to resume experience collection... (31700 times) [2024-06-15 18:44:43,067][1651669] InferenceWorker_p0-w0: resuming experience collection (31700 times) [2024-06-15 18:44:43,248][1651669] Updated weights for policy 0, policy_version 605141 (0.0009) [2024-06-15 18:44:45,767][1648981] Fps is (10 sec: 52431.1, 60 sec: 48058.8, 300 sec: 48322.0). Total num frames: 1239416832. Throughput: 0: 12287.7. Samples: 309924864. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:45,768][1648981] Avg episode reward: [(0, '550.730')] [2024-06-15 18:44:46,671][1651669] Updated weights for policy 0, policy_version 605189 (0.0012) [2024-06-15 18:44:48,459][1651669] Updated weights for policy 0, policy_version 605264 (0.0011) [2024-06-15 18:44:49,679][1651669] Updated weights for policy 0, policy_version 605312 (0.0027) [2024-06-15 18:44:50,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49161.7, 300 sec: 47988.0). Total num frames: 1239678976. Throughput: 0: 12129.1. Samples: 309992448. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:50,767][1648981] Avg episode reward: [(0, '532.330')] [2024-06-15 18:44:52,068][1651669] Updated weights for policy 0, policy_version 605363 (0.0104) [2024-06-15 18:44:53,545][1651669] Updated weights for policy 0, policy_version 605408 (0.0010) [2024-06-15 18:44:55,770][1648981] Fps is (10 sec: 52414.6, 60 sec: 48059.8, 300 sec: 48429.4). Total num frames: 1239941120. Throughput: 0: 12366.6. Samples: 310040576. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:44:55,771][1648981] Avg episode reward: [(0, '542.290')] [2024-06-15 18:44:55,782][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000605440_1239941120.pth... [2024-06-15 18:44:55,865][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000599808_1228406784.pth [2024-06-15 18:44:57,402][1651669] Updated weights for policy 0, policy_version 605456 (0.0016) [2024-06-15 18:44:58,724][1651669] Updated weights for policy 0, policy_version 605506 (0.0016) [2024-06-15 18:45:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1240203264. Throughput: 0: 12162.8. Samples: 310105600. Policy #0 lag: (min: 24.0, avg: 158.7, max: 280.0) [2024-06-15 18:45:00,767][1648981] Avg episode reward: [(0, '528.200')] [2024-06-15 18:45:01,439][1651669] Updated weights for policy 0, policy_version 605584 (0.0012) [2024-06-15 18:45:02,459][1651669] Updated weights for policy 0, policy_version 605630 (0.0010) [2024-06-15 18:45:04,380][1651669] Updated weights for policy 0, policy_version 605692 (0.0011) [2024-06-15 18:45:05,766][1648981] Fps is (10 sec: 52448.3, 60 sec: 49153.0, 300 sec: 48430.0). Total num frames: 1240465408. Throughput: 0: 12512.2. Samples: 310191104. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:05,767][1648981] Avg episode reward: [(0, '540.720')] [2024-06-15 18:45:08,377][1651669] Updated weights for policy 0, policy_version 605746 (0.0017) [2024-06-15 18:45:09,764][1651669] Updated weights for policy 0, policy_version 605799 (0.0013) [2024-06-15 18:45:10,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.2, 300 sec: 48318.9). Total num frames: 1240727552. Throughput: 0: 12442.0. Samples: 310224384. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:10,767][1648981] Avg episode reward: [(0, '535.130')] [2024-06-15 18:45:12,270][1651669] Updated weights for policy 0, policy_version 605856 (0.0024) [2024-06-15 18:45:14,584][1651669] Updated weights for policy 0, policy_version 605904 (0.0012) [2024-06-15 18:45:15,377][1651669] Updated weights for policy 0, policy_version 605945 (0.0014) [2024-06-15 18:45:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50790.3, 300 sec: 48763.2). Total num frames: 1240989696. Throughput: 0: 12486.9. Samples: 310297600. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:15,767][1648981] Avg episode reward: [(0, '533.730')] [2024-06-15 18:45:19,034][1651669] Updated weights for policy 0, policy_version 606012 (0.0121) [2024-06-15 18:45:20,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49722.2, 300 sec: 48430.0). Total num frames: 1241186304. Throughput: 0: 12333.9. Samples: 310368768. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:20,767][1648981] Avg episode reward: [(0, '515.250')] [2024-06-15 18:45:21,004][1651669] Updated weights for policy 0, policy_version 606079 (0.0012) [2024-06-15 18:45:23,017][1651669] Updated weights for policy 0, policy_version 606133 (0.0012) [2024-06-15 18:45:25,410][1651274] Signal inference workers to stop experience collection... (31750 times) [2024-06-15 18:45:25,489][1651669] InferenceWorker_p0-w0: stopping experience collection (31750 times) [2024-06-15 18:45:25,673][1651274] Signal inference workers to resume experience collection... (31750 times) [2024-06-15 18:45:25,674][1651669] InferenceWorker_p0-w0: resuming experience collection (31750 times) [2024-06-15 18:45:25,767][1648981] Fps is (10 sec: 42598.3, 60 sec: 49716.4, 300 sec: 48541.0). Total num frames: 1241415680. Throughput: 0: 12481.4. Samples: 310405120. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:25,767][1648981] Avg episode reward: [(0, '511.760')] [2024-06-15 18:45:26,021][1651669] Updated weights for policy 0, policy_version 606176 (0.0015) [2024-06-15 18:45:28,237][1651669] Updated weights for policy 0, policy_version 606224 (0.0014) [2024-06-15 18:45:29,102][1651669] Updated weights for policy 0, policy_version 606268 (0.0013) [2024-06-15 18:45:30,769][1648981] Fps is (10 sec: 49143.8, 60 sec: 49713.1, 300 sec: 48540.8). Total num frames: 1241677824. Throughput: 0: 12538.2. Samples: 310489088. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:30,776][1648981] Avg episode reward: [(0, '508.950')] [2024-06-15 18:45:31,602][1651669] Updated weights for policy 0, policy_version 606336 (0.0012) [2024-06-15 18:45:33,059][1651669] Updated weights for policy 0, policy_version 606400 (0.0012) [2024-06-15 18:45:35,767][1648981] Fps is (10 sec: 49151.7, 60 sec: 50245.4, 300 sec: 48431.3). Total num frames: 1241907200. Throughput: 0: 12640.7. Samples: 310561280. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:35,768][1648981] Avg episode reward: [(0, '491.240')] [2024-06-15 18:45:37,252][1651669] Updated weights for policy 0, policy_version 606457 (0.0013) [2024-06-15 18:45:39,612][1651669] Updated weights for policy 0, policy_version 606512 (0.0012) [2024-06-15 18:45:40,766][1648981] Fps is (10 sec: 52437.1, 60 sec: 50244.2, 300 sec: 48541.1). Total num frames: 1242202112. Throughput: 0: 12584.9. Samples: 310606848. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:40,767][1648981] Avg episode reward: [(0, '467.240')] [2024-06-15 18:45:41,076][1651669] Updated weights for policy 0, policy_version 606564 (0.0012) [2024-06-15 18:45:42,582][1651669] Updated weights for policy 0, policy_version 606626 (0.0012) [2024-06-15 18:45:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50245.1, 300 sec: 48433.8). Total num frames: 1242431488. Throughput: 0: 12583.8. Samples: 310671872. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:45,767][1648981] Avg episode reward: [(0, '487.780')] [2024-06-15 18:45:47,161][1651669] Updated weights for policy 0, policy_version 606677 (0.0014) [2024-06-15 18:45:48,908][1651669] Updated weights for policy 0, policy_version 606736 (0.0012) [2024-06-15 18:45:50,079][1651669] Updated weights for policy 0, policy_version 606782 (0.0010) [2024-06-15 18:45:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 48542.3). Total num frames: 1242693632. Throughput: 0: 12470.1. Samples: 310752256. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:50,767][1648981] Avg episode reward: [(0, '468.650')] [2024-06-15 18:45:52,287][1651669] Updated weights for policy 0, policy_version 606832 (0.0027) [2024-06-15 18:45:53,557][1651669] Updated weights for policy 0, policy_version 606882 (0.0030) [2024-06-15 18:45:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50247.4, 300 sec: 48430.7). Total num frames: 1242955776. Throughput: 0: 12367.6. Samples: 310780928. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:45:55,767][1648981] Avg episode reward: [(0, '472.920')] [2024-06-15 18:45:58,317][1651669] Updated weights for policy 0, policy_version 606929 (0.0013) [2024-06-15 18:45:59,750][1651669] Updated weights for policy 0, policy_version 607008 (0.0014) [2024-06-15 18:46:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 48764.4). Total num frames: 1243217920. Throughput: 0: 12538.4. Samples: 310861824. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:00,767][1648981] Avg episode reward: [(0, '469.310')] [2024-06-15 18:46:02,292][1651669] Updated weights for policy 0, policy_version 607056 (0.0018) [2024-06-15 18:46:03,702][1651274] Signal inference workers to stop experience collection... (31800 times) [2024-06-15 18:46:03,769][1651669] InferenceWorker_p0-w0: stopping experience collection (31800 times) [2024-06-15 18:46:04,038][1651274] Signal inference workers to resume experience collection... (31800 times) [2024-06-15 18:46:04,039][1651669] InferenceWorker_p0-w0: resuming experience collection (31800 times) [2024-06-15 18:46:04,746][1651669] Updated weights for policy 0, policy_version 607141 (0.0014) [2024-06-15 18:46:05,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 48763.3). Total num frames: 1243480064. Throughput: 0: 12151.4. Samples: 310915584. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:05,767][1648981] Avg episode reward: [(0, '468.050')] [2024-06-15 18:46:10,267][1651669] Updated weights for policy 0, policy_version 607217 (0.0011) [2024-06-15 18:46:10,767][1648981] Fps is (10 sec: 39319.8, 60 sec: 48059.5, 300 sec: 48429.9). Total num frames: 1243611136. Throughput: 0: 12447.2. Samples: 310965248. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:10,768][1648981] Avg episode reward: [(0, '482.870')] [2024-06-15 18:46:11,907][1651669] Updated weights for policy 0, policy_version 607291 (0.0013) [2024-06-15 18:46:15,032][1651669] Updated weights for policy 0, policy_version 607347 (0.0012) [2024-06-15 18:46:15,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 48059.9, 300 sec: 48652.2). Total num frames: 1243873280. Throughput: 0: 12117.8. Samples: 311034368. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:15,767][1648981] Avg episode reward: [(0, '467.220')] [2024-06-15 18:46:16,651][1651669] Updated weights for policy 0, policy_version 607409 (0.0016) [2024-06-15 18:46:20,161][1651669] Updated weights for policy 0, policy_version 607441 (0.0011) [2024-06-15 18:46:20,766][1648981] Fps is (10 sec: 49154.4, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 1244102656. Throughput: 0: 12185.7. Samples: 311109632. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:20,767][1648981] Avg episode reward: [(0, '457.150')] [2024-06-15 18:46:21,244][1651669] Updated weights for policy 0, policy_version 607491 (0.0013) [2024-06-15 18:46:22,511][1651669] Updated weights for policy 0, policy_version 607546 (0.0013) [2024-06-15 18:46:25,354][1651669] Updated weights for policy 0, policy_version 607603 (0.0020) [2024-06-15 18:46:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49698.2, 300 sec: 48874.3). Total num frames: 1244397568. Throughput: 0: 12049.1. Samples: 311149056. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:25,767][1648981] Avg episode reward: [(0, '450.750')] [2024-06-15 18:46:26,693][1651669] Updated weights for policy 0, policy_version 607654 (0.0083) [2024-06-15 18:46:30,521][1651669] Updated weights for policy 0, policy_version 607712 (0.0013) [2024-06-15 18:46:30,773][1648981] Fps is (10 sec: 49120.1, 60 sec: 48602.0, 300 sec: 48651.1). Total num frames: 1244594176. Throughput: 0: 12161.1. Samples: 311219200. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:30,773][1648981] Avg episode reward: [(0, '442.000')] [2024-06-15 18:46:32,804][1651669] Updated weights for policy 0, policy_version 607776 (0.0013) [2024-06-15 18:46:35,094][1651669] Updated weights for policy 0, policy_version 607840 (0.0013) [2024-06-15 18:46:35,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1244889088. Throughput: 0: 12003.5. Samples: 311292416. Policy #0 lag: (min: 45.0, avg: 174.1, max: 301.0) [2024-06-15 18:46:35,767][1648981] Avg episode reward: [(0, '440.190')] [2024-06-15 18:46:36,390][1651669] Updated weights for policy 0, policy_version 607889 (0.0012) [2024-06-15 18:46:40,585][1651669] Updated weights for policy 0, policy_version 607952 (0.0087) [2024-06-15 18:46:40,766][1648981] Fps is (10 sec: 49183.5, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 1245085696. Throughput: 0: 12117.3. Samples: 311326208. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:46:40,767][1648981] Avg episode reward: [(0, '435.030')] [2024-06-15 18:46:41,413][1651669] Updated weights for policy 0, policy_version 607999 (0.0012) [2024-06-15 18:46:43,085][1651669] Updated weights for policy 0, policy_version 608048 (0.0011) [2024-06-15 18:46:44,862][1651274] Signal inference workers to stop experience collection... (31850 times) [2024-06-15 18:46:44,892][1651669] InferenceWorker_p0-w0: stopping experience collection (31850 times) [2024-06-15 18:46:45,280][1651274] Signal inference workers to resume experience collection... (31850 times) [2024-06-15 18:46:45,281][1651669] InferenceWorker_p0-w0: resuming experience collection (31850 times) [2024-06-15 18:46:45,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 49698.2, 300 sec: 48764.1). Total num frames: 1245413376. Throughput: 0: 12310.8. Samples: 311415808. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:46:45,767][1648981] Avg episode reward: [(0, '420.820')] [2024-06-15 18:46:46,029][1651669] Updated weights for policy 0, policy_version 608128 (0.0012) [2024-06-15 18:46:47,601][1651669] Updated weights for policy 0, policy_version 608192 (0.0011) [2024-06-15 18:46:50,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1245577216. Throughput: 0: 12561.1. Samples: 311480832. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:46:50,767][1648981] Avg episode reward: [(0, '419.000')] [2024-06-15 18:46:52,065][1651669] Updated weights for policy 0, policy_version 608256 (0.0012) [2024-06-15 18:46:54,621][1651669] Updated weights for policy 0, policy_version 608314 (0.0114) [2024-06-15 18:46:55,767][1648981] Fps is (10 sec: 42595.2, 60 sec: 48059.2, 300 sec: 48656.8). Total num frames: 1245839360. Throughput: 0: 12401.7. Samples: 311523328. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:46:55,768][1648981] Avg episode reward: [(0, '434.740')] [2024-06-15 18:46:55,810][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000608320_1245839360.pth... [2024-06-15 18:46:55,950][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000602608_1234141184.pth [2024-06-15 18:46:58,109][1651669] Updated weights for policy 0, policy_version 608403 (0.0074) [2024-06-15 18:47:00,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 48763.3). Total num frames: 1246101504. Throughput: 0: 12162.9. Samples: 311581696. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:00,767][1648981] Avg episode reward: [(0, '416.350')] [2024-06-15 18:47:02,545][1651669] Updated weights for policy 0, policy_version 608467 (0.0012) [2024-06-15 18:47:04,427][1651669] Updated weights for policy 0, policy_version 608513 (0.0104) [2024-06-15 18:47:05,552][1651669] Updated weights for policy 0, policy_version 608572 (0.0010) [2024-06-15 18:47:05,770][1648981] Fps is (10 sec: 52412.7, 60 sec: 48056.7, 300 sec: 48873.7). Total num frames: 1246363648. Throughput: 0: 12275.6. Samples: 311662080. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:05,771][1648981] Avg episode reward: [(0, '411.760')] [2024-06-15 18:47:07,772][1651669] Updated weights for policy 0, policy_version 608640 (0.0014) [2024-06-15 18:47:09,355][1651669] Updated weights for policy 0, policy_version 608700 (0.0126) [2024-06-15 18:47:10,770][1648981] Fps is (10 sec: 52408.6, 60 sec: 50241.5, 300 sec: 49095.8). Total num frames: 1246625792. Throughput: 0: 12275.6. Samples: 311701504. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:10,771][1648981] Avg episode reward: [(0, '416.890')] [2024-06-15 18:47:13,940][1651669] Updated weights for policy 0, policy_version 608758 (0.0013) [2024-06-15 18:47:15,481][1651669] Updated weights for policy 0, policy_version 608801 (0.0020) [2024-06-15 18:47:15,766][1648981] Fps is (10 sec: 49170.7, 60 sec: 49698.1, 300 sec: 48763.3). Total num frames: 1246855168. Throughput: 0: 12267.0. Samples: 311771136. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:15,767][1648981] Avg episode reward: [(0, '429.370')] [2024-06-15 18:47:17,773][1651669] Updated weights for policy 0, policy_version 608834 (0.0013) [2024-06-15 18:47:18,855][1651669] Updated weights for policy 0, policy_version 608883 (0.0012) [2024-06-15 18:47:20,527][1651669] Updated weights for policy 0, policy_version 608954 (0.0010) [2024-06-15 18:47:20,776][1648981] Fps is (10 sec: 52399.4, 60 sec: 50782.4, 300 sec: 49317.1). Total num frames: 1247150080. Throughput: 0: 12262.8. Samples: 311844352. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:20,778][1648981] Avg episode reward: [(0, '424.990')] [2024-06-15 18:47:24,308][1651669] Updated weights for policy 0, policy_version 609015 (0.0012) [2024-06-15 18:47:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48605.9, 300 sec: 48987.3). Total num frames: 1247313920. Throughput: 0: 12379.0. Samples: 311883264. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:25,767][1648981] Avg episode reward: [(0, '413.350')] [2024-06-15 18:47:26,182][1651274] Signal inference workers to stop experience collection... (31900 times) [2024-06-15 18:47:26,232][1651669] InferenceWorker_p0-w0: stopping experience collection (31900 times) [2024-06-15 18:47:26,387][1651274] Signal inference workers to resume experience collection... (31900 times) [2024-06-15 18:47:26,388][1651669] InferenceWorker_p0-w0: resuming experience collection (31900 times) [2024-06-15 18:47:26,390][1651669] Updated weights for policy 0, policy_version 609072 (0.0048) [2024-06-15 18:47:29,157][1651669] Updated weights for policy 0, policy_version 609121 (0.0013) [2024-06-15 18:47:30,766][1648981] Fps is (10 sec: 49197.8, 60 sec: 50795.8, 300 sec: 49207.5). Total num frames: 1247641600. Throughput: 0: 12003.5. Samples: 311955968. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:30,767][1648981] Avg episode reward: [(0, '444.130')] [2024-06-15 18:47:30,835][1651669] Updated weights for policy 0, policy_version 609209 (0.0115) [2024-06-15 18:47:35,798][1648981] Fps is (10 sec: 45729.7, 60 sec: 48034.4, 300 sec: 48869.1). Total num frames: 1247772672. Throughput: 0: 12120.1. Samples: 312026624. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:35,799][1648981] Avg episode reward: [(0, '439.100')] [2024-06-15 18:47:36,782][1651669] Updated weights for policy 0, policy_version 609296 (0.0015) [2024-06-15 18:47:37,910][1651669] Updated weights for policy 0, policy_version 609343 (0.0011) [2024-06-15 18:47:39,940][1651669] Updated weights for policy 0, policy_version 609406 (0.0099) [2024-06-15 18:47:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1248100352. Throughput: 0: 12037.9. Samples: 312065024. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:40,767][1648981] Avg episode reward: [(0, '450.490')] [2024-06-15 18:47:41,391][1651669] Updated weights for policy 0, policy_version 609461 (0.0010) [2024-06-15 18:47:45,766][1648981] Fps is (10 sec: 42734.1, 60 sec: 46421.3, 300 sec: 48763.2). Total num frames: 1248198656. Throughput: 0: 12390.4. Samples: 312139264. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:45,767][1648981] Avg episode reward: [(0, '465.150')] [2024-06-15 18:47:46,032][1651669] Updated weights for policy 0, policy_version 609489 (0.0033) [2024-06-15 18:47:47,748][1651669] Updated weights for policy 0, policy_version 609568 (0.0012) [2024-06-15 18:47:49,442][1651669] Updated weights for policy 0, policy_version 609616 (0.0028) [2024-06-15 18:47:50,770][1648981] Fps is (10 sec: 49133.8, 60 sec: 50241.1, 300 sec: 49206.9). Total num frames: 1248591872. Throughput: 0: 12162.9. Samples: 312209408. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:50,771][1648981] Avg episode reward: [(0, '478.710')] [2024-06-15 18:47:51,037][1651669] Updated weights for policy 0, policy_version 609684 (0.0012) [2024-06-15 18:47:55,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48060.3, 300 sec: 48658.1). Total num frames: 1248722944. Throughput: 0: 12050.1. Samples: 312243712. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:47:55,767][1648981] Avg episode reward: [(0, '484.920')] [2024-06-15 18:47:57,652][1651669] Updated weights for policy 0, policy_version 609760 (0.0016) [2024-06-15 18:47:59,947][1651669] Updated weights for policy 0, policy_version 609844 (0.0095) [2024-06-15 18:48:00,766][1648981] Fps is (10 sec: 39336.3, 60 sec: 48059.6, 300 sec: 48763.3). Total num frames: 1248985088. Throughput: 0: 11969.4. Samples: 312309760. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:48:00,767][1648981] Avg episode reward: [(0, '482.230')] [2024-06-15 18:48:01,945][1651669] Updated weights for policy 0, policy_version 609889 (0.0013) [2024-06-15 18:48:03,009][1651669] Updated weights for policy 0, policy_version 609939 (0.0010) [2024-06-15 18:48:03,846][1651669] Updated weights for policy 0, policy_version 609978 (0.0012) [2024-06-15 18:48:05,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 48062.7, 300 sec: 48874.3). Total num frames: 1249247232. Throughput: 0: 12062.9. Samples: 312387072. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:48:05,769][1648981] Avg episode reward: [(0, '496.030')] [2024-06-15 18:48:08,730][1651274] Signal inference workers to stop experience collection... (31950 times) [2024-06-15 18:48:08,781][1651669] Updated weights for policy 0, policy_version 610021 (0.0013) [2024-06-15 18:48:08,808][1651669] InferenceWorker_p0-w0: stopping experience collection (31950 times) [2024-06-15 18:48:08,921][1651274] Signal inference workers to resume experience collection... (31950 times) [2024-06-15 18:48:08,934][1651669] InferenceWorker_p0-w0: resuming experience collection (31950 times) [2024-06-15 18:48:10,329][1651669] Updated weights for policy 0, policy_version 610080 (0.0013) [2024-06-15 18:48:10,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46970.4, 300 sec: 48652.2). Total num frames: 1249443840. Throughput: 0: 12094.6. Samples: 312427520. Policy #0 lag: (min: 55.0, avg: 183.6, max: 311.0) [2024-06-15 18:48:10,767][1648981] Avg episode reward: [(0, '492.620')] [2024-06-15 18:48:11,798][1651669] Updated weights for policy 0, policy_version 610114 (0.0012) [2024-06-15 18:48:13,806][1651669] Updated weights for policy 0, policy_version 610192 (0.0201) [2024-06-15 18:48:15,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 49207.5). Total num frames: 1249771520. Throughput: 0: 11707.7. Samples: 312482816. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:15,767][1648981] Avg episode reward: [(0, '485.820')] [2024-06-15 18:48:19,502][1651669] Updated weights for policy 0, policy_version 610247 (0.0012) [2024-06-15 18:48:20,695][1651669] Updated weights for policy 0, policy_version 610304 (0.0021) [2024-06-15 18:48:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45882.3, 300 sec: 48763.2). Total num frames: 1249902592. Throughput: 0: 12091.7. Samples: 312570368. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:20,767][1648981] Avg episode reward: [(0, '490.280')] [2024-06-15 18:48:22,113][1651669] Updated weights for policy 0, policy_version 610357 (0.0013) [2024-06-15 18:48:23,350][1651669] Updated weights for policy 0, policy_version 610403 (0.0013) [2024-06-15 18:48:25,054][1651669] Updated weights for policy 0, policy_version 610467 (0.0011) [2024-06-15 18:48:25,601][1651669] Updated weights for policy 0, policy_version 610496 (0.0012) [2024-06-15 18:48:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 1250295808. Throughput: 0: 11832.9. Samples: 312597504. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:25,767][1648981] Avg episode reward: [(0, '479.130')] [2024-06-15 18:48:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 48767.2). Total num frames: 1250426880. Throughput: 0: 12219.7. Samples: 312689152. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:30,767][1648981] Avg episode reward: [(0, '470.330')] [2024-06-15 18:48:30,826][1651669] Updated weights for policy 0, policy_version 610576 (0.0013) [2024-06-15 18:48:32,529][1651669] Updated weights for policy 0, policy_version 610644 (0.0011) [2024-06-15 18:48:33,268][1651669] Updated weights for policy 0, policy_version 610682 (0.0015) [2024-06-15 18:48:35,767][1648981] Fps is (10 sec: 49146.5, 60 sec: 50269.9, 300 sec: 49207.3). Total num frames: 1250787328. Throughput: 0: 12027.0. Samples: 312750592. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:35,768][1648981] Avg episode reward: [(0, '447.660')] [2024-06-15 18:48:35,873][1651669] Updated weights for policy 0, policy_version 610740 (0.0012) [2024-06-15 18:48:39,826][1651669] Updated weights for policy 0, policy_version 610784 (0.0013) [2024-06-15 18:48:40,830][1648981] Fps is (10 sec: 52097.9, 60 sec: 47463.4, 300 sec: 48863.8). Total num frames: 1250951168. Throughput: 0: 12384.3. Samples: 312801792. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:40,831][1648981] Avg episode reward: [(0, '421.020')] [2024-06-15 18:48:41,732][1651669] Updated weights for policy 0, policy_version 610864 (0.0017) [2024-06-15 18:48:43,371][1651669] Updated weights for policy 0, policy_version 610940 (0.0012) [2024-06-15 18:48:44,576][1651274] Signal inference workers to stop experience collection... (32000 times) [2024-06-15 18:48:44,641][1651669] InferenceWorker_p0-w0: stopping experience collection (32000 times) [2024-06-15 18:48:44,861][1651274] Signal inference workers to resume experience collection... (32000 times) [2024-06-15 18:48:44,862][1651669] InferenceWorker_p0-w0: resuming experience collection (32000 times) [2024-06-15 18:48:45,754][1651669] Updated weights for policy 0, policy_version 611001 (0.0013) [2024-06-15 18:48:45,766][1648981] Fps is (10 sec: 52434.8, 60 sec: 51882.7, 300 sec: 49431.7). Total num frames: 1251311616. Throughput: 0: 12435.9. Samples: 312869376. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:45,767][1648981] Avg episode reward: [(0, '425.910')] [2024-06-15 18:48:50,766][1648981] Fps is (10 sec: 49466.4, 60 sec: 47516.6, 300 sec: 48763.9). Total num frames: 1251442688. Throughput: 0: 12527.0. Samples: 312950784. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:50,767][1648981] Avg episode reward: [(0, '409.780')] [2024-06-15 18:48:50,957][1651669] Updated weights for policy 0, policy_version 611069 (0.0011) [2024-06-15 18:48:53,178][1651669] Updated weights for policy 0, policy_version 611122 (0.0013) [2024-06-15 18:48:54,331][1651669] Updated weights for policy 0, policy_version 611188 (0.0012) [2024-06-15 18:48:55,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 50790.2, 300 sec: 49318.6). Total num frames: 1251770368. Throughput: 0: 12287.9. Samples: 312980480. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:48:55,767][1648981] Avg episode reward: [(0, '404.270')] [2024-06-15 18:48:56,381][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000611248_1251835904.pth... [2024-06-15 18:48:56,438][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000605440_1239941120.pth [2024-06-15 18:48:56,518][1651669] Updated weights for policy 0, policy_version 611252 (0.0123) [2024-06-15 18:49:00,727][1651669] Updated weights for policy 0, policy_version 611296 (0.0059) [2024-06-15 18:49:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 48874.5). Total num frames: 1251934208. Throughput: 0: 12743.1. Samples: 313056256. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:00,767][1648981] Avg episode reward: [(0, '402.490')] [2024-06-15 18:49:02,578][1651669] Updated weights for policy 0, policy_version 611360 (0.0012) [2024-06-15 18:49:04,367][1651669] Updated weights for policy 0, policy_version 611408 (0.0023) [2024-06-15 18:49:05,766][1648981] Fps is (10 sec: 49152.9, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1252261888. Throughput: 0: 12333.5. Samples: 313125376. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:05,767][1648981] Avg episode reward: [(0, '405.380')] [2024-06-15 18:49:06,653][1651669] Updated weights for policy 0, policy_version 611488 (0.0011) [2024-06-15 18:49:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 1252392960. Throughput: 0: 12515.6. Samples: 313160704. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:10,767][1648981] Avg episode reward: [(0, '417.010')] [2024-06-15 18:49:11,072][1651669] Updated weights for policy 0, policy_version 611523 (0.0022) [2024-06-15 18:49:13,307][1651669] Updated weights for policy 0, policy_version 611616 (0.0015) [2024-06-15 18:49:14,772][1651669] Updated weights for policy 0, policy_version 611669 (0.0013) [2024-06-15 18:49:15,770][1648981] Fps is (10 sec: 49136.7, 60 sec: 49695.5, 300 sec: 49322.9). Total num frames: 1252753408. Throughput: 0: 12344.0. Samples: 313244672. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:15,770][1648981] Avg episode reward: [(0, '401.070')] [2024-06-15 18:49:17,120][1651669] Updated weights for policy 0, policy_version 611760 (0.0083) [2024-06-15 18:49:20,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 49100.1). Total num frames: 1252917248. Throughput: 0: 12595.5. Samples: 313317376. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:20,767][1648981] Avg episode reward: [(0, '417.610')] [2024-06-15 18:49:21,776][1651669] Updated weights for policy 0, policy_version 611793 (0.0012) [2024-06-15 18:49:22,775][1651669] Updated weights for policy 0, policy_version 611840 (0.0159) [2024-06-15 18:49:24,413][1651669] Updated weights for policy 0, policy_version 611895 (0.0014) [2024-06-15 18:49:25,766][1648981] Fps is (10 sec: 49167.3, 60 sec: 49152.0, 300 sec: 49321.9). Total num frames: 1253244928. Throughput: 0: 12305.4. Samples: 313354752. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:25,767][1648981] Avg episode reward: [(0, '410.130')] [2024-06-15 18:49:26,035][1651669] Updated weights for policy 0, policy_version 611952 (0.0016) [2024-06-15 18:49:26,934][1651274] Signal inference workers to stop experience collection... (32050 times) [2024-06-15 18:49:26,954][1651669] InferenceWorker_p0-w0: stopping experience collection (32050 times) [2024-06-15 18:49:27,196][1651274] Signal inference workers to resume experience collection... (32050 times) [2024-06-15 18:49:27,226][1651669] InferenceWorker_p0-w0: resuming experience collection (32050 times) [2024-06-15 18:49:28,183][1651669] Updated weights for policy 0, policy_version 612021 (0.0122) [2024-06-15 18:49:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 49318.9). Total num frames: 1253441536. Throughput: 0: 12299.4. Samples: 313422848. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:30,767][1648981] Avg episode reward: [(0, '412.120')] [2024-06-15 18:49:33,116][1651669] Updated weights for policy 0, policy_version 612066 (0.0012) [2024-06-15 18:49:35,015][1651669] Updated weights for policy 0, policy_version 612129 (0.0061) [2024-06-15 18:49:35,575][1651669] Updated weights for policy 0, policy_version 612159 (0.0030) [2024-06-15 18:49:35,767][1648981] Fps is (10 sec: 45875.2, 60 sec: 48606.7, 300 sec: 49207.5). Total num frames: 1253703680. Throughput: 0: 12105.9. Samples: 313495552. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:35,767][1648981] Avg episode reward: [(0, '406.410')] [2024-06-15 18:49:37,108][1651669] Updated weights for policy 0, policy_version 612210 (0.0012) [2024-06-15 18:49:38,749][1651669] Updated weights for policy 0, policy_version 612261 (0.0013) [2024-06-15 18:49:40,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50297.6, 300 sec: 49318.8). Total num frames: 1253965824. Throughput: 0: 12197.1. Samples: 313529344. Policy #0 lag: (min: 127.0, avg: 182.0, max: 319.0) [2024-06-15 18:49:40,767][1648981] Avg episode reward: [(0, '419.790')] [2024-06-15 18:49:43,117][1651669] Updated weights for policy 0, policy_version 612290 (0.0014) [2024-06-15 18:49:44,728][1651669] Updated weights for policy 0, policy_version 612353 (0.0018) [2024-06-15 18:49:45,732][1651669] Updated weights for policy 0, policy_version 612408 (0.0013) [2024-06-15 18:49:45,767][1648981] Fps is (10 sec: 49149.6, 60 sec: 48059.3, 300 sec: 49207.5). Total num frames: 1254195200. Throughput: 0: 12344.7. Samples: 313611776. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:49:45,767][1648981] Avg episode reward: [(0, '414.350')] [2024-06-15 18:49:46,828][1651669] Updated weights for policy 0, policy_version 612448 (0.0032) [2024-06-15 18:49:48,785][1651669] Updated weights for policy 0, policy_version 612502 (0.0025) [2024-06-15 18:49:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50790.4, 300 sec: 49319.2). Total num frames: 1254490112. Throughput: 0: 12447.3. Samples: 313685504. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:49:50,767][1648981] Avg episode reward: [(0, '406.340')] [2024-06-15 18:49:53,054][1651669] Updated weights for policy 0, policy_version 612547 (0.0013) [2024-06-15 18:49:54,988][1651669] Updated weights for policy 0, policy_version 612626 (0.0028) [2024-06-15 18:49:55,767][1648981] Fps is (10 sec: 55707.8, 60 sec: 49698.2, 300 sec: 49318.6). Total num frames: 1254752256. Throughput: 0: 12754.4. Samples: 313734656. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:49:55,770][1648981] Avg episode reward: [(0, '410.370')] [2024-06-15 18:49:57,646][1651669] Updated weights for policy 0, policy_version 612704 (0.0029) [2024-06-15 18:49:59,336][1651669] Updated weights for policy 0, policy_version 612769 (0.0015) [2024-06-15 18:50:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 51336.5, 300 sec: 49318.6). Total num frames: 1255014400. Throughput: 0: 12232.0. Samples: 313795072. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:00,767][1648981] Avg episode reward: [(0, '415.560')] [2024-06-15 18:50:04,010][1651669] Updated weights for policy 0, policy_version 612832 (0.0012) [2024-06-15 18:50:05,187][1651669] Updated weights for policy 0, policy_version 612866 (0.0011) [2024-06-15 18:50:05,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 1255178240. Throughput: 0: 12492.8. Samples: 313879552. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:05,767][1648981] Avg episode reward: [(0, '438.100')] [2024-06-15 18:50:06,395][1651669] Updated weights for policy 0, policy_version 612928 (0.0012) [2024-06-15 18:50:08,110][1651669] Updated weights for policy 0, policy_version 612989 (0.0011) [2024-06-15 18:50:10,455][1651274] Signal inference workers to stop experience collection... (32100 times) [2024-06-15 18:50:10,556][1651669] InferenceWorker_p0-w0: stopping experience collection (32100 times) [2024-06-15 18:50:10,660][1651274] Signal inference workers to resume experience collection... (32100 times) [2024-06-15 18:50:10,661][1651669] InferenceWorker_p0-w0: resuming experience collection (32100 times) [2024-06-15 18:50:10,739][1651669] Updated weights for policy 0, policy_version 613042 (0.0013) [2024-06-15 18:50:10,767][1648981] Fps is (10 sec: 49151.7, 60 sec: 51882.6, 300 sec: 49207.5). Total num frames: 1255505920. Throughput: 0: 12379.0. Samples: 313911808. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:10,767][1648981] Avg episode reward: [(0, '434.030')] [2024-06-15 18:50:14,253][1651669] Updated weights for policy 0, policy_version 613089 (0.0011) [2024-06-15 18:50:15,631][1651669] Updated weights for policy 0, policy_version 613141 (0.0011) [2024-06-15 18:50:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49154.6, 300 sec: 49207.5). Total num frames: 1255702528. Throughput: 0: 12765.9. Samples: 313997312. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:15,767][1648981] Avg episode reward: [(0, '447.430')] [2024-06-15 18:50:17,566][1651669] Updated weights for policy 0, policy_version 613200 (0.0013) [2024-06-15 18:50:20,185][1651669] Updated weights for policy 0, policy_version 613254 (0.0093) [2024-06-15 18:50:20,769][1648981] Fps is (10 sec: 49137.7, 60 sec: 51334.0, 300 sec: 49429.2). Total num frames: 1255997440. Throughput: 0: 12662.6. Samples: 314065408. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:20,770][1648981] Avg episode reward: [(0, '459.140')] [2024-06-15 18:50:21,201][1651669] Updated weights for policy 0, policy_version 613305 (0.0014) [2024-06-15 18:50:25,390][1651669] Updated weights for policy 0, policy_version 613367 (0.0013) [2024-06-15 18:50:25,778][1648981] Fps is (10 sec: 49093.9, 60 sec: 49142.4, 300 sec: 49205.8). Total num frames: 1256194048. Throughput: 0: 12717.0. Samples: 314101760. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:25,779][1648981] Avg episode reward: [(0, '463.260')] [2024-06-15 18:50:26,559][1651669] Updated weights for policy 0, policy_version 613410 (0.0011) [2024-06-15 18:50:28,576][1651669] Updated weights for policy 0, policy_version 613472 (0.0020) [2024-06-15 18:50:30,725][1651669] Updated weights for policy 0, policy_version 613524 (0.0091) [2024-06-15 18:50:30,766][1648981] Fps is (10 sec: 49167.0, 60 sec: 50790.4, 300 sec: 49429.7). Total num frames: 1256488960. Throughput: 0: 12561.2. Samples: 314177024. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:30,767][1648981] Avg episode reward: [(0, '465.790')] [2024-06-15 18:50:34,935][1651669] Updated weights for policy 0, policy_version 613584 (0.0108) [2024-06-15 18:50:35,766][1648981] Fps is (10 sec: 49210.2, 60 sec: 49698.2, 300 sec: 49096.5). Total num frames: 1256685568. Throughput: 0: 12583.8. Samples: 314251776. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:35,767][1648981] Avg episode reward: [(0, '460.500')] [2024-06-15 18:50:36,372][1651669] Updated weights for policy 0, policy_version 613648 (0.0012) [2024-06-15 18:50:37,549][1651669] Updated weights for policy 0, policy_version 613690 (0.0012) [2024-06-15 18:50:39,479][1651669] Updated weights for policy 0, policy_version 613744 (0.0265) [2024-06-15 18:50:40,771][1648981] Fps is (10 sec: 49128.3, 60 sec: 50240.2, 300 sec: 49317.8). Total num frames: 1256980480. Throughput: 0: 12309.5. Samples: 314288640. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:40,772][1648981] Avg episode reward: [(0, '437.120')] [2024-06-15 18:50:41,143][1651669] Updated weights for policy 0, policy_version 613766 (0.0026) [2024-06-15 18:50:44,838][1651669] Updated weights for policy 0, policy_version 613844 (0.0012) [2024-06-15 18:50:45,766][1648981] Fps is (10 sec: 55705.4, 60 sec: 50790.8, 300 sec: 49318.6). Total num frames: 1257242624. Throughput: 0: 12822.8. Samples: 314372096. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:45,767][1648981] Avg episode reward: [(0, '460.930')] [2024-06-15 18:50:46,478][1651669] Updated weights for policy 0, policy_version 613904 (0.0013) [2024-06-15 18:50:47,663][1651669] Updated weights for policy 0, policy_version 613952 (0.0020) [2024-06-15 18:50:50,766][1648981] Fps is (10 sec: 52454.0, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1257504768. Throughput: 0: 12379.0. Samples: 314436608. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:50,767][1648981] Avg episode reward: [(0, '482.740')] [2024-06-15 18:50:52,093][1651669] Updated weights for policy 0, policy_version 614018 (0.0167) [2024-06-15 18:50:52,728][1651274] Signal inference workers to stop experience collection... (32150 times) [2024-06-15 18:50:52,755][1651669] InferenceWorker_p0-w0: stopping experience collection (32150 times) [2024-06-15 18:50:52,933][1651274] Signal inference workers to resume experience collection... (32150 times) [2024-06-15 18:50:52,934][1651669] InferenceWorker_p0-w0: resuming experience collection (32150 times) [2024-06-15 18:50:53,121][1651669] Updated weights for policy 0, policy_version 614077 (0.0011) [2024-06-15 18:50:55,354][1651669] Updated weights for policy 0, policy_version 614128 (0.0011) [2024-06-15 18:50:55,771][1648981] Fps is (10 sec: 49131.7, 60 sec: 49694.8, 300 sec: 49206.8). Total num frames: 1257734144. Throughput: 0: 12673.7. Samples: 314482176. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:50:55,771][1648981] Avg episode reward: [(0, '493.960')] [2024-06-15 18:50:55,792][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000614144_1257766912.pth... [2024-06-15 18:50:55,852][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000608320_1245839360.pth [2024-06-15 18:50:55,857][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000614144_1257766912.pth [2024-06-15 18:50:57,293][1651669] Updated weights for policy 0, policy_version 614180 (0.0013) [2024-06-15 18:50:59,660][1651669] Updated weights for policy 0, policy_version 614224 (0.0012) [2024-06-15 18:51:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1258029056. Throughput: 0: 12447.3. Samples: 314557440. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:51:00,767][1648981] Avg episode reward: [(0, '513.040')] [2024-06-15 18:51:02,178][1651669] Updated weights for policy 0, policy_version 614274 (0.0012) [2024-06-15 18:51:03,334][1651669] Updated weights for policy 0, policy_version 614336 (0.0026) [2024-06-15 18:51:05,767][1648981] Fps is (10 sec: 45893.9, 60 sec: 50244.2, 300 sec: 49429.8). Total num frames: 1258192896. Throughput: 0: 12687.0. Samples: 314636288. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:51:05,769][1648981] Avg episode reward: [(0, '516.530')] [2024-06-15 18:51:06,360][1651669] Updated weights for policy 0, policy_version 614398 (0.0059) [2024-06-15 18:51:08,691][1651669] Updated weights for policy 0, policy_version 614439 (0.0031) [2024-06-15 18:51:10,161][1651669] Updated weights for policy 0, policy_version 614484 (0.0011) [2024-06-15 18:51:10,768][1648981] Fps is (10 sec: 45869.0, 60 sec: 49697.1, 300 sec: 49540.5). Total num frames: 1258487808. Throughput: 0: 12598.1. Samples: 314668544. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:51:10,768][1648981] Avg episode reward: [(0, '490.000')] [2024-06-15 18:51:11,153][1651669] Updated weights for policy 0, policy_version 614525 (0.0012) [2024-06-15 18:51:14,132][1651669] Updated weights for policy 0, policy_version 614586 (0.0032) [2024-06-15 18:51:15,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 1258684416. Throughput: 0: 12435.9. Samples: 314736640. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:51:15,767][1648981] Avg episode reward: [(0, '513.460')] [2024-06-15 18:51:17,304][1651669] Updated weights for policy 0, policy_version 614640 (0.0013) [2024-06-15 18:51:19,253][1651669] Updated weights for policy 0, policy_version 614704 (0.0010) [2024-06-15 18:51:20,776][1648981] Fps is (10 sec: 45837.9, 60 sec: 49146.7, 300 sec: 49317.0). Total num frames: 1258946560. Throughput: 0: 12467.4. Samples: 314812928. Policy #0 lag: (min: 47.0, avg: 149.1, max: 303.0) [2024-06-15 18:51:20,776][1648981] Avg episode reward: [(0, '508.510')] [2024-06-15 18:51:21,035][1651669] Updated weights for policy 0, policy_version 614741 (0.0031) [2024-06-15 18:51:23,318][1651669] Updated weights for policy 0, policy_version 614800 (0.0014) [2024-06-15 18:51:25,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50254.2, 300 sec: 49541.9). Total num frames: 1259208704. Throughput: 0: 12494.1. Samples: 314850816. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:25,767][1648981] Avg episode reward: [(0, '512.790')] [2024-06-15 18:51:27,062][1651669] Updated weights for policy 0, policy_version 614850 (0.0012) [2024-06-15 18:51:28,326][1651669] Updated weights for policy 0, policy_version 614912 (0.0017) [2024-06-15 18:51:30,444][1651669] Updated weights for policy 0, policy_version 614969 (0.0012) [2024-06-15 18:51:30,766][1648981] Fps is (10 sec: 52478.9, 60 sec: 49698.1, 300 sec: 49429.7). Total num frames: 1259470848. Throughput: 0: 12367.7. Samples: 314928640. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:30,767][1648981] Avg episode reward: [(0, '529.880')] [2024-06-15 18:51:32,188][1651669] Updated weights for policy 0, policy_version 615030 (0.0015) [2024-06-15 18:51:34,295][1651669] Updated weights for policy 0, policy_version 615073 (0.0013) [2024-06-15 18:51:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50790.4, 300 sec: 49651.9). Total num frames: 1259732992. Throughput: 0: 12413.2. Samples: 314995200. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:35,767][1648981] Avg episode reward: [(0, '522.250')] [2024-06-15 18:51:37,821][1651669] Updated weights for policy 0, policy_version 615120 (0.0045) [2024-06-15 18:51:37,954][1651274] Signal inference workers to stop experience collection... (32200 times) [2024-06-15 18:51:38,034][1651669] InferenceWorker_p0-w0: stopping experience collection (32200 times) [2024-06-15 18:51:38,286][1651274] Signal inference workers to resume experience collection... (32200 times) [2024-06-15 18:51:38,286][1651669] InferenceWorker_p0-w0: resuming experience collection (32200 times) [2024-06-15 18:51:38,915][1651669] Updated weights for policy 0, policy_version 615167 (0.0012) [2024-06-15 18:51:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49155.9, 300 sec: 49207.5). Total num frames: 1259929600. Throughput: 0: 12266.4. Samples: 315034112. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:40,767][1648981] Avg episode reward: [(0, '528.780')] [2024-06-15 18:51:41,177][1651669] Updated weights for policy 0, policy_version 615228 (0.0011) [2024-06-15 18:51:42,674][1651669] Updated weights for policy 0, policy_version 615280 (0.0014) [2024-06-15 18:51:45,209][1651669] Updated weights for policy 0, policy_version 615344 (0.0014) [2024-06-15 18:51:45,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 50244.2, 300 sec: 49762.9). Total num frames: 1260257280. Throughput: 0: 12185.6. Samples: 315105792. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:45,767][1648981] Avg episode reward: [(0, '538.960')] [2024-06-15 18:51:49,993][1651669] Updated weights for policy 0, policy_version 615414 (0.0012) [2024-06-15 18:51:50,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 49318.7). Total num frames: 1260388352. Throughput: 0: 12049.1. Samples: 315178496. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:50,767][1648981] Avg episode reward: [(0, '536.950')] [2024-06-15 18:51:52,203][1651669] Updated weights for policy 0, policy_version 615459 (0.0014) [2024-06-15 18:51:53,219][1651669] Updated weights for policy 0, policy_version 615520 (0.0012) [2024-06-15 18:51:55,201][1651669] Updated weights for policy 0, policy_version 615554 (0.0046) [2024-06-15 18:51:55,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 49155.3, 300 sec: 49429.7). Total num frames: 1260683264. Throughput: 0: 12151.8. Samples: 315215360. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:51:55,767][1648981] Avg episode reward: [(0, '549.670')] [2024-06-15 18:51:59,433][1651669] Updated weights for policy 0, policy_version 615623 (0.0116) [2024-06-15 18:52:00,774][1648981] Fps is (10 sec: 52389.9, 60 sec: 48053.8, 300 sec: 49318.0). Total num frames: 1260912640. Throughput: 0: 12365.6. Samples: 315293184. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:00,774][1648981] Avg episode reward: [(0, '522.700')] [2024-06-15 18:52:01,794][1651669] Updated weights for policy 0, policy_version 615684 (0.0014) [2024-06-15 18:52:03,339][1651669] Updated weights for policy 0, policy_version 615747 (0.0013) [2024-06-15 18:52:04,316][1651669] Updated weights for policy 0, policy_version 615805 (0.0051) [2024-06-15 18:52:05,767][1648981] Fps is (10 sec: 49148.3, 60 sec: 49697.5, 300 sec: 49319.1). Total num frames: 1261174784. Throughput: 0: 12244.8. Samples: 315363840. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:05,768][1648981] Avg episode reward: [(0, '515.020')] [2024-06-15 18:52:07,796][1651669] Updated weights for policy 0, policy_version 615869 (0.0031) [2024-06-15 18:52:10,702][1651669] Updated weights for policy 0, policy_version 615932 (0.0017) [2024-06-15 18:52:10,767][1648981] Fps is (10 sec: 52464.0, 60 sec: 49152.5, 300 sec: 49429.6). Total num frames: 1261436928. Throughput: 0: 12151.3. Samples: 315397632. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:10,770][1648981] Avg episode reward: [(0, '513.580')] [2024-06-15 18:52:13,188][1651669] Updated weights for policy 0, policy_version 615984 (0.0018) [2024-06-15 18:52:14,588][1651669] Updated weights for policy 0, policy_version 616017 (0.0011) [2024-06-15 18:52:15,779][1648981] Fps is (10 sec: 52369.1, 60 sec: 50234.0, 300 sec: 49318.1). Total num frames: 1261699072. Throughput: 0: 12136.8. Samples: 315474944. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:15,779][1648981] Avg episode reward: [(0, '512.800')] [2024-06-15 18:52:17,676][1651669] Updated weights for policy 0, policy_version 616080 (0.0013) [2024-06-15 18:52:18,698][1651669] Updated weights for policy 0, policy_version 616125 (0.0046) [2024-06-15 18:52:20,196][1651274] Signal inference workers to stop experience collection... (32250 times) [2024-06-15 18:52:20,304][1651669] InferenceWorker_p0-w0: stopping experience collection (32250 times) [2024-06-15 18:52:20,466][1651274] Signal inference workers to resume experience collection... (32250 times) [2024-06-15 18:52:20,467][1651669] InferenceWorker_p0-w0: resuming experience collection (32250 times) [2024-06-15 18:52:20,683][1651669] Updated weights for policy 0, policy_version 616166 (0.0013) [2024-06-15 18:52:20,766][1648981] Fps is (10 sec: 45878.6, 60 sec: 49159.8, 300 sec: 49429.7). Total num frames: 1261895680. Throughput: 0: 12470.0. Samples: 315556352. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:20,767][1648981] Avg episode reward: [(0, '515.670')] [2024-06-15 18:52:22,695][1651669] Updated weights for policy 0, policy_version 616247 (0.0156) [2024-06-15 18:52:25,554][1651669] Updated weights for policy 0, policy_version 616292 (0.0012) [2024-06-15 18:52:25,766][1648981] Fps is (10 sec: 49212.1, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 1262190592. Throughput: 0: 12242.5. Samples: 315585024. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:25,767][1648981] Avg episode reward: [(0, '524.530')] [2024-06-15 18:52:28,355][1651669] Updated weights for policy 0, policy_version 616342 (0.0012) [2024-06-15 18:52:29,202][1651669] Updated weights for policy 0, policy_version 616384 (0.0011) [2024-06-15 18:52:30,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 49435.0). Total num frames: 1262354432. Throughput: 0: 12481.4. Samples: 315667456. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:30,767][1648981] Avg episode reward: [(0, '508.510')] [2024-06-15 18:52:31,997][1651669] Updated weights for policy 0, policy_version 616448 (0.0011) [2024-06-15 18:52:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 49207.5). Total num frames: 1262616576. Throughput: 0: 12367.7. Samples: 315735040. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:35,767][1648981] Avg episode reward: [(0, '498.350')] [2024-06-15 18:52:35,937][1651669] Updated weights for policy 0, policy_version 616528 (0.0013) [2024-06-15 18:52:36,977][1651669] Updated weights for policy 0, policy_version 616575 (0.0015) [2024-06-15 18:52:39,540][1651669] Updated weights for policy 0, policy_version 616624 (0.0020) [2024-06-15 18:52:40,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49152.1, 300 sec: 49762.9). Total num frames: 1262878720. Throughput: 0: 12527.0. Samples: 315779072. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:40,767][1648981] Avg episode reward: [(0, '497.330')] [2024-06-15 18:52:41,119][1651669] Updated weights for policy 0, policy_version 616657 (0.0011) [2024-06-15 18:52:42,762][1651669] Updated weights for policy 0, policy_version 616736 (0.0020) [2024-06-15 18:52:45,774][1648981] Fps is (10 sec: 52387.0, 60 sec: 48053.4, 300 sec: 49317.9). Total num frames: 1263140864. Throughput: 0: 12390.3. Samples: 315850752. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:45,775][1648981] Avg episode reward: [(0, '502.400')] [2024-06-15 18:52:46,805][1651669] Updated weights for policy 0, policy_version 616784 (0.0018) [2024-06-15 18:52:49,108][1651669] Updated weights for policy 0, policy_version 616837 (0.0013) [2024-06-15 18:52:50,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.3, 300 sec: 49762.9). Total num frames: 1263403008. Throughput: 0: 12538.5. Samples: 315928064. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:50,767][1648981] Avg episode reward: [(0, '490.990')] [2024-06-15 18:52:51,306][1651669] Updated weights for policy 0, policy_version 616912 (0.0013) [2024-06-15 18:52:53,144][1651669] Updated weights for policy 0, policy_version 616992 (0.0113) [2024-06-15 18:52:53,954][1651669] Updated weights for policy 0, policy_version 617024 (0.0011) [2024-06-15 18:52:55,768][1648981] Fps is (10 sec: 52461.3, 60 sec: 49696.7, 300 sec: 49762.6). Total num frames: 1263665152. Throughput: 0: 12481.1. Samples: 315959296. Policy #0 lag: (min: 15.0, avg: 143.6, max: 271.0) [2024-06-15 18:52:55,769][1648981] Avg episode reward: [(0, '497.840')] [2024-06-15 18:52:55,806][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000617024_1263665152.pth... [2024-06-15 18:52:55,849][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000611248_1251835904.pth [2024-06-15 18:52:58,712][1651669] Updated weights for policy 0, policy_version 617085 (0.0012) [2024-06-15 18:53:00,633][1651274] Signal inference workers to stop experience collection... (32300 times) [2024-06-15 18:53:00,659][1651669] InferenceWorker_p0-w0: stopping experience collection (32300 times) [2024-06-15 18:53:00,769][1648981] Fps is (10 sec: 45865.1, 60 sec: 49156.3, 300 sec: 49540.4). Total num frames: 1263861760. Throughput: 0: 12518.3. Samples: 316038144. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:00,769][1648981] Avg episode reward: [(0, '486.070')] [2024-06-15 18:53:00,968][1651274] Signal inference workers to resume experience collection... (32300 times) [2024-06-15 18:53:00,969][1651669] InferenceWorker_p0-w0: resuming experience collection (32300 times) [2024-06-15 18:53:01,136][1651669] Updated weights for policy 0, policy_version 617142 (0.0016) [2024-06-15 18:53:02,702][1651669] Updated weights for policy 0, policy_version 617187 (0.0041) [2024-06-15 18:53:04,037][1651669] Updated weights for policy 0, policy_version 617250 (0.0012) [2024-06-15 18:53:05,766][1648981] Fps is (10 sec: 52438.3, 60 sec: 50245.0, 300 sec: 49985.1). Total num frames: 1264189440. Throughput: 0: 12356.3. Samples: 316112384. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:05,767][1648981] Avg episode reward: [(0, '492.910')] [2024-06-15 18:53:08,434][1651669] Updated weights for policy 0, policy_version 617302 (0.0013) [2024-06-15 18:53:10,630][1651669] Updated weights for policy 0, policy_version 617348 (0.0015) [2024-06-15 18:53:10,768][1648981] Fps is (10 sec: 49155.8, 60 sec: 48605.3, 300 sec: 49429.4). Total num frames: 1264353280. Throughput: 0: 12594.8. Samples: 316151808. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:10,768][1648981] Avg episode reward: [(0, '492.710')] [2024-06-15 18:53:11,967][1651669] Updated weights for policy 0, policy_version 617402 (0.0011) [2024-06-15 18:53:13,996][1651669] Updated weights for policy 0, policy_version 617457 (0.0016) [2024-06-15 18:53:15,484][1651669] Updated weights for policy 0, policy_version 617529 (0.0011) [2024-06-15 18:53:15,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50254.6, 300 sec: 50207.3). Total num frames: 1264713728. Throughput: 0: 12140.1. Samples: 316213760. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:15,767][1648981] Avg episode reward: [(0, '502.980')] [2024-06-15 18:53:19,685][1651669] Updated weights for policy 0, policy_version 617590 (0.0012) [2024-06-15 18:53:20,766][1648981] Fps is (10 sec: 49159.2, 60 sec: 49152.1, 300 sec: 49318.6). Total num frames: 1264844800. Throughput: 0: 12492.8. Samples: 316297216. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:20,767][1648981] Avg episode reward: [(0, '520.580')] [2024-06-15 18:53:21,904][1651669] Updated weights for policy 0, policy_version 617633 (0.0019) [2024-06-15 18:53:23,437][1651669] Updated weights for policy 0, policy_version 617680 (0.0015) [2024-06-15 18:53:25,265][1651669] Updated weights for policy 0, policy_version 617746 (0.0010) [2024-06-15 18:53:25,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 49698.0, 300 sec: 49985.1). Total num frames: 1265172480. Throughput: 0: 12242.4. Samples: 316329984. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:25,767][1648981] Avg episode reward: [(0, '540.440')] [2024-06-15 18:53:26,080][1651669] Updated weights for policy 0, policy_version 617792 (0.0010) [2024-06-15 18:53:30,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 49207.7). Total num frames: 1265303552. Throughput: 0: 12369.8. Samples: 316407296. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:30,767][1648981] Avg episode reward: [(0, '546.170')] [2024-06-15 18:53:31,106][1651669] Updated weights for policy 0, policy_version 617851 (0.0152) [2024-06-15 18:53:33,747][1651669] Updated weights for policy 0, policy_version 617922 (0.0027) [2024-06-15 18:53:35,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 50790.4, 300 sec: 49884.8). Total num frames: 1265664000. Throughput: 0: 12071.8. Samples: 316471296. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:35,767][1648981] Avg episode reward: [(0, '559.200')] [2024-06-15 18:53:35,816][1651669] Updated weights for policy 0, policy_version 618001 (0.0012) [2024-06-15 18:53:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.6, 300 sec: 48985.4). Total num frames: 1265762304. Throughput: 0: 12151.9. Samples: 316506112. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:40,767][1648981] Avg episode reward: [(0, '544.840')] [2024-06-15 18:53:41,168][1651669] Updated weights for policy 0, policy_version 618071 (0.0091) [2024-06-15 18:53:41,437][1651274] Signal inference workers to stop experience collection... (32350 times) [2024-06-15 18:53:41,521][1651669] InferenceWorker_p0-w0: stopping experience collection (32350 times) [2024-06-15 18:53:41,799][1651274] Signal inference workers to resume experience collection... (32350 times) [2024-06-15 18:53:41,800][1651669] InferenceWorker_p0-w0: resuming experience collection (32350 times) [2024-06-15 18:53:42,182][1651669] Updated weights for policy 0, policy_version 618110 (0.0011) [2024-06-15 18:53:44,799][1651669] Updated weights for policy 0, policy_version 618179 (0.0209) [2024-06-15 18:53:45,769][1648981] Fps is (10 sec: 45865.1, 60 sec: 49703.0, 300 sec: 49762.6). Total num frames: 1266122752. Throughput: 0: 12162.9. Samples: 316585472. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:45,769][1648981] Avg episode reward: [(0, '545.240')] [2024-06-15 18:53:45,808][1651669] Updated weights for policy 0, policy_version 618228 (0.0013) [2024-06-15 18:53:47,514][1651669] Updated weights for policy 0, policy_version 618288 (0.0167) [2024-06-15 18:53:50,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 49207.6). Total num frames: 1266286592. Throughput: 0: 12037.7. Samples: 316654080. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:50,767][1648981] Avg episode reward: [(0, '539.760')] [2024-06-15 18:53:52,188][1651669] Updated weights for policy 0, policy_version 618320 (0.0011) [2024-06-15 18:53:54,810][1651669] Updated weights for policy 0, policy_version 618370 (0.0016) [2024-06-15 18:53:55,767][1648981] Fps is (10 sec: 36049.6, 60 sec: 46968.2, 300 sec: 49318.5). Total num frames: 1266483200. Throughput: 0: 12049.2. Samples: 316694016. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:53:55,768][1648981] Avg episode reward: [(0, '532.190')] [2024-06-15 18:53:57,061][1651669] Updated weights for policy 0, policy_version 618466 (0.0014) [2024-06-15 18:53:59,107][1651669] Updated weights for policy 0, policy_version 618550 (0.0132) [2024-06-15 18:54:00,778][1648981] Fps is (10 sec: 52366.7, 60 sec: 49144.1, 300 sec: 49316.7). Total num frames: 1266810880. Throughput: 0: 11863.9. Samples: 316747776. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:00,779][1648981] Avg episode reward: [(0, '531.620')] [2024-06-15 18:54:04,272][1651669] Updated weights for policy 0, policy_version 618595 (0.0013) [2024-06-15 18:54:05,788][1648981] Fps is (10 sec: 45779.0, 60 sec: 45858.5, 300 sec: 49315.0). Total num frames: 1266941952. Throughput: 0: 11895.4. Samples: 316832768. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:05,789][1648981] Avg episode reward: [(0, '524.800')] [2024-06-15 18:54:05,952][1651669] Updated weights for policy 0, policy_version 618629 (0.0014) [2024-06-15 18:54:07,903][1651669] Updated weights for policy 0, policy_version 618707 (0.0138) [2024-06-15 18:54:10,167][1651669] Updated weights for policy 0, policy_version 618789 (0.0138) [2024-06-15 18:54:10,768][1648981] Fps is (10 sec: 52483.1, 60 sec: 49698.1, 300 sec: 49430.0). Total num frames: 1267335168. Throughput: 0: 11730.2. Samples: 316857856. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:10,768][1648981] Avg episode reward: [(0, '526.600')] [2024-06-15 18:54:14,908][1651669] Updated weights for policy 0, policy_version 618834 (0.0012) [2024-06-15 18:54:15,766][1648981] Fps is (10 sec: 52543.8, 60 sec: 45875.1, 300 sec: 49318.6). Total num frames: 1267466240. Throughput: 0: 11764.6. Samples: 316936704. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:15,767][1648981] Avg episode reward: [(0, '545.660')] [2024-06-15 18:54:18,319][1651669] Updated weights for policy 0, policy_version 618912 (0.0012) [2024-06-15 18:54:20,493][1651274] Signal inference workers to stop experience collection... (32400 times) [2024-06-15 18:54:20,535][1651669] InferenceWorker_p0-w0: stopping experience collection (32400 times) [2024-06-15 18:54:20,536][1651669] Updated weights for policy 0, policy_version 618996 (0.0017) [2024-06-15 18:54:20,767][1648981] Fps is (10 sec: 36049.3, 60 sec: 47513.4, 300 sec: 48985.4). Total num frames: 1267695616. Throughput: 0: 11696.3. Samples: 316997632. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:20,767][1648981] Avg episode reward: [(0, '550.540')] [2024-06-15 18:54:20,790][1651274] Signal inference workers to resume experience collection... (32400 times) [2024-06-15 18:54:20,791][1651669] InferenceWorker_p0-w0: resuming experience collection (32400 times) [2024-06-15 18:54:21,947][1651669] Updated weights for policy 0, policy_version 619059 (0.0013) [2024-06-15 18:54:25,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 44783.2, 300 sec: 48874.3). Total num frames: 1267859456. Throughput: 0: 11685.0. Samples: 317031936. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:25,767][1648981] Avg episode reward: [(0, '552.240')] [2024-06-15 18:54:26,663][1651669] Updated weights for policy 0, policy_version 619120 (0.0012) [2024-06-15 18:54:29,694][1651669] Updated weights for policy 0, policy_version 619186 (0.0015) [2024-06-15 18:54:30,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 47513.7, 300 sec: 48985.4). Total num frames: 1268154368. Throughput: 0: 11617.3. Samples: 317108224. Policy #0 lag: (min: 15.0, avg: 110.8, max: 271.0) [2024-06-15 18:54:30,767][1648981] Avg episode reward: [(0, '554.930')] [2024-06-15 18:54:31,134][1651669] Updated weights for policy 0, policy_version 619233 (0.0011) [2024-06-15 18:54:32,744][1651669] Updated weights for policy 0, policy_version 619296 (0.0011) [2024-06-15 18:54:35,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 45329.0, 300 sec: 48874.3). Total num frames: 1268383744. Throughput: 0: 11548.4. Samples: 317173760. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:54:35,767][1648981] Avg episode reward: [(0, '560.100')] [2024-06-15 18:54:38,409][1651669] Updated weights for policy 0, policy_version 619347 (0.0016) [2024-06-15 18:54:39,813][1651669] Updated weights for policy 0, policy_version 619413 (0.0114) [2024-06-15 18:54:40,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 47513.5, 300 sec: 48874.4). Total num frames: 1268613120. Throughput: 0: 11605.5. Samples: 317216256. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:54:40,767][1648981] Avg episode reward: [(0, '544.750')] [2024-06-15 18:54:40,945][1651669] Updated weights for policy 0, policy_version 619456 (0.0012) [2024-06-15 18:54:43,671][1651669] Updated weights for policy 0, policy_version 619553 (0.0070) [2024-06-15 18:54:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46423.0, 300 sec: 48874.3). Total num frames: 1268908032. Throughput: 0: 11665.3. Samples: 317272576. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:54:45,767][1648981] Avg episode reward: [(0, '558.870')] [2024-06-15 18:54:48,744][1651669] Updated weights for policy 0, policy_version 619588 (0.0012) [2024-06-15 18:54:49,965][1651669] Updated weights for policy 0, policy_version 619648 (0.0140) [2024-06-15 18:54:50,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 46421.3, 300 sec: 48541.1). Total num frames: 1269071872. Throughput: 0: 11611.0. Samples: 317355008. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:54:50,767][1648981] Avg episode reward: [(0, '532.630')] [2024-06-15 18:54:51,405][1651669] Updated weights for policy 0, policy_version 619704 (0.0011) [2024-06-15 18:54:52,640][1651669] Updated weights for policy 0, policy_version 619749 (0.0013) [2024-06-15 18:54:53,955][1651669] Updated weights for policy 0, policy_version 619795 (0.0010) [2024-06-15 18:54:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49152.7, 300 sec: 48874.3). Total num frames: 1269432320. Throughput: 0: 11730.9. Samples: 317385728. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:54:55,767][1648981] Avg episode reward: [(0, '523.270')] [2024-06-15 18:54:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000619840_1269432320.pth... [2024-06-15 18:54:55,816][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000614144_1257766912.pth [2024-06-15 18:55:00,040][1651669] Updated weights for policy 0, policy_version 619847 (0.0011) [2024-06-15 18:55:00,593][1651274] Signal inference workers to stop experience collection... (32450 times) [2024-06-15 18:55:00,662][1651669] InferenceWorker_p0-w0: stopping experience collection (32450 times) [2024-06-15 18:55:00,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 44791.8, 300 sec: 48541.1). Total num frames: 1269497856. Throughput: 0: 11912.5. Samples: 317472768. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:00,767][1648981] Avg episode reward: [(0, '531.510')] [2024-06-15 18:55:00,806][1651274] Signal inference workers to resume experience collection... (32450 times) [2024-06-15 18:55:00,811][1651669] InferenceWorker_p0-w0: resuming experience collection (32450 times) [2024-06-15 18:55:01,369][1651669] Updated weights for policy 0, policy_version 619908 (0.0012) [2024-06-15 18:55:02,604][1651669] Updated weights for policy 0, policy_version 619970 (0.0013) [2024-06-15 18:55:04,354][1651669] Updated weights for policy 0, policy_version 620035 (0.0025) [2024-06-15 18:55:05,618][1651669] Updated weights for policy 0, policy_version 620091 (0.0012) [2024-06-15 18:55:05,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50262.5, 300 sec: 48985.4). Total num frames: 1269956608. Throughput: 0: 11719.1. Samples: 317524992. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:05,767][1648981] Avg episode reward: [(0, '516.450')] [2024-06-15 18:55:10,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 43691.6, 300 sec: 48318.9). Total num frames: 1269956608. Throughput: 0: 12014.9. Samples: 317572608. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:10,767][1648981] Avg episode reward: [(0, '504.060')] [2024-06-15 18:55:12,080][1651669] Updated weights for policy 0, policy_version 620148 (0.0015) [2024-06-15 18:55:13,522][1651669] Updated weights for policy 0, policy_version 620224 (0.0012) [2024-06-15 18:55:14,728][1651669] Updated weights for policy 0, policy_version 620277 (0.0012) [2024-06-15 18:55:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48605.8, 300 sec: 48763.7). Total num frames: 1270382592. Throughput: 0: 11901.1. Samples: 317643776. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:15,767][1648981] Avg episode reward: [(0, '527.300')] [2024-06-15 18:55:16,383][1651669] Updated weights for policy 0, policy_version 620336 (0.0012) [2024-06-15 18:55:20,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 46421.5, 300 sec: 48431.9). Total num frames: 1270480896. Throughput: 0: 12219.7. Samples: 317723648. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:20,767][1648981] Avg episode reward: [(0, '536.100')] [2024-06-15 18:55:22,996][1651669] Updated weights for policy 0, policy_version 620400 (0.0060) [2024-06-15 18:55:24,832][1651669] Updated weights for policy 0, policy_version 620496 (0.0197) [2024-06-15 18:55:25,655][1651669] Updated weights for policy 0, policy_version 620544 (0.0016) [2024-06-15 18:55:25,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50244.1, 300 sec: 48763.2). Total num frames: 1270874112. Throughput: 0: 11992.2. Samples: 317755904. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:25,767][1648981] Avg episode reward: [(0, '534.930')] [2024-06-15 18:55:27,679][1651669] Updated weights for policy 0, policy_version 620606 (0.0012) [2024-06-15 18:55:30,768][1648981] Fps is (10 sec: 52418.2, 60 sec: 47512.0, 300 sec: 48540.7). Total num frames: 1271005184. Throughput: 0: 12401.2. Samples: 317830656. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:30,769][1648981] Avg episode reward: [(0, '513.260')] [2024-06-15 18:55:33,728][1651669] Updated weights for policy 0, policy_version 620672 (0.0014) [2024-06-15 18:55:35,135][1651274] Signal inference workers to stop experience collection... (32500 times) [2024-06-15 18:55:35,189][1651669] InferenceWorker_p0-w0: stopping experience collection (32500 times) [2024-06-15 18:55:35,191][1651669] Updated weights for policy 0, policy_version 620740 (0.0012) [2024-06-15 18:55:35,329][1651274] Signal inference workers to resume experience collection... (32500 times) [2024-06-15 18:55:35,330][1651669] InferenceWorker_p0-w0: resuming experience collection (32500 times) [2024-06-15 18:55:35,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.1, 300 sec: 48652.9). Total num frames: 1271332864. Throughput: 0: 12128.7. Samples: 317900800. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:35,767][1648981] Avg episode reward: [(0, '522.480')] [2024-06-15 18:55:36,563][1651669] Updated weights for policy 0, policy_version 620801 (0.0015) [2024-06-15 18:55:40,766][1648981] Fps is (10 sec: 52439.3, 60 sec: 48606.1, 300 sec: 48430.0). Total num frames: 1271529472. Throughput: 0: 12162.9. Samples: 317933056. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:40,767][1648981] Avg episode reward: [(0, '507.140')] [2024-06-15 18:55:42,624][1651669] Updated weights for policy 0, policy_version 620870 (0.0046) [2024-06-15 18:55:44,620][1651669] Updated weights for policy 0, policy_version 620944 (0.0012) [2024-06-15 18:55:45,786][1648981] Fps is (10 sec: 45784.3, 60 sec: 48043.9, 300 sec: 48426.7). Total num frames: 1271791616. Throughput: 0: 12180.2. Samples: 318021120. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:45,787][1648981] Avg episode reward: [(0, '542.340')] [2024-06-15 18:55:46,233][1651669] Updated weights for policy 0, policy_version 621010 (0.0012) [2024-06-15 18:55:47,739][1651669] Updated weights for policy 0, policy_version 621074 (0.0013) [2024-06-15 18:55:48,610][1651669] Updated weights for policy 0, policy_version 621119 (0.0013) [2024-06-15 18:55:50,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 49695.0, 300 sec: 48541.1). Total num frames: 1272053760. Throughput: 0: 12434.9. Samples: 318084608. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:50,771][1648981] Avg episode reward: [(0, '539.660')] [2024-06-15 18:55:54,766][1651669] Updated weights for policy 0, policy_version 621172 (0.0014) [2024-06-15 18:55:55,767][1648981] Fps is (10 sec: 45965.2, 60 sec: 46967.3, 300 sec: 48207.8). Total num frames: 1272250368. Throughput: 0: 12526.9. Samples: 318136320. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:55:55,768][1648981] Avg episode reward: [(0, '536.820')] [2024-06-15 18:55:56,788][1651669] Updated weights for policy 0, policy_version 621255 (0.0083) [2024-06-15 18:55:58,208][1651669] Updated weights for policy 0, policy_version 621317 (0.0014) [2024-06-15 18:56:00,766][1648981] Fps is (10 sec: 52448.9, 60 sec: 51336.5, 300 sec: 48763.2). Total num frames: 1272578048. Throughput: 0: 12151.5. Samples: 318190592. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:56:00,767][1648981] Avg episode reward: [(0, '523.070')] [2024-06-15 18:56:04,202][1651669] Updated weights for policy 0, policy_version 621377 (0.0013) [2024-06-15 18:56:05,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 45875.3, 300 sec: 48208.1). Total num frames: 1272709120. Throughput: 0: 12344.9. Samples: 318279168. Policy #0 lag: (min: 127.0, avg: 234.5, max: 415.0) [2024-06-15 18:56:05,767][1648981] Avg episode reward: [(0, '545.200')] [2024-06-15 18:56:06,323][1651669] Updated weights for policy 0, policy_version 621472 (0.0012) [2024-06-15 18:56:08,543][1651669] Updated weights for policy 0, policy_version 621559 (0.0095) [2024-06-15 18:56:09,905][1651274] Signal inference workers to stop experience collection... (32550 times) [2024-06-15 18:56:09,998][1651669] InferenceWorker_p0-w0: stopping experience collection (32550 times) [2024-06-15 18:56:10,094][1651274] Signal inference workers to resume experience collection... (32550 times) [2024-06-15 18:56:10,094][1651669] InferenceWorker_p0-w0: resuming experience collection (32550 times) [2024-06-15 18:56:10,111][1651669] Updated weights for policy 0, policy_version 621616 (0.0022) [2024-06-15 18:56:10,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 52429.1, 300 sec: 48874.3). Total num frames: 1273102336. Throughput: 0: 12015.0. Samples: 318296576. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:10,767][1648981] Avg episode reward: [(0, '552.410')] [2024-06-15 18:56:15,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 45329.1, 300 sec: 47987.2). Total num frames: 1273102336. Throughput: 0: 12117.9. Samples: 318375936. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:15,767][1648981] Avg episode reward: [(0, '588.290')] [2024-06-15 18:56:16,754][1651669] Updated weights for policy 0, policy_version 621664 (0.0142) [2024-06-15 18:56:18,122][1651669] Updated weights for policy 0, policy_version 621728 (0.0012) [2024-06-15 18:56:19,275][1651669] Updated weights for policy 0, policy_version 621767 (0.0013) [2024-06-15 18:56:20,778][1648981] Fps is (10 sec: 39274.7, 60 sec: 50234.4, 300 sec: 48428.1). Total num frames: 1273495552. Throughput: 0: 11932.2. Samples: 318437888. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:20,779][1648981] Avg episode reward: [(0, '591.710')] [2024-06-15 18:56:21,677][1651669] Updated weights for policy 0, policy_version 621861 (0.0013) [2024-06-15 18:56:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1273626624. Throughput: 0: 11844.3. Samples: 318466048. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:25,767][1648981] Avg episode reward: [(0, '575.220')] [2024-06-15 18:56:27,923][1651669] Updated weights for policy 0, policy_version 621908 (0.0012) [2024-06-15 18:56:29,471][1651669] Updated weights for policy 0, policy_version 621968 (0.0017) [2024-06-15 18:56:30,783][1648981] Fps is (10 sec: 39301.8, 60 sec: 48047.9, 300 sec: 47982.9). Total num frames: 1273888768. Throughput: 0: 11708.5. Samples: 318547968. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:30,784][1648981] Avg episode reward: [(0, '595.470')] [2024-06-15 18:56:31,204][1651669] Updated weights for policy 0, policy_version 622032 (0.0022) [2024-06-15 18:56:32,976][1651669] Updated weights for policy 0, policy_version 622103 (0.0012) [2024-06-15 18:56:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 1274150912. Throughput: 0: 11606.3. Samples: 318606848. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:35,767][1648981] Avg episode reward: [(0, '587.390')] [2024-06-15 18:56:39,715][1651669] Updated weights for policy 0, policy_version 622177 (0.0012) [2024-06-15 18:56:40,766][1648981] Fps is (10 sec: 39388.2, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1274281984. Throughput: 0: 11480.3. Samples: 318652928. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:40,767][1648981] Avg episode reward: [(0, '612.410')] [2024-06-15 18:56:41,169][1651274] Saving new best policy, reward=612.410! [2024-06-15 18:56:41,690][1651669] Updated weights for policy 0, policy_version 622256 (0.0011) [2024-06-15 18:56:42,715][1651669] Updated weights for policy 0, policy_version 622304 (0.0012) [2024-06-15 18:56:44,054][1651669] Updated weights for policy 0, policy_version 622352 (0.0012) [2024-06-15 18:56:44,862][1651669] Updated weights for policy 0, policy_version 622397 (0.0012) [2024-06-15 18:56:45,767][1648981] Fps is (10 sec: 52426.0, 60 sec: 48075.2, 300 sec: 48429.9). Total num frames: 1274675200. Throughput: 0: 11502.8. Samples: 318708224. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:45,768][1648981] Avg episode reward: [(0, '618.650')] [2024-06-15 18:56:45,769][1651274] Saving new best policy, reward=618.650! [2024-06-15 18:56:50,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 44785.8, 300 sec: 47652.5). Total num frames: 1274740736. Throughput: 0: 11446.1. Samples: 318794240. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:50,767][1648981] Avg episode reward: [(0, '621.350')] [2024-06-15 18:56:51,435][1651274] Saving new best policy, reward=621.350! [2024-06-15 18:56:52,235][1651274] Signal inference workers to stop experience collection... (32600 times) [2024-06-15 18:56:52,330][1651669] InferenceWorker_p0-w0: stopping experience collection (32600 times) [2024-06-15 18:56:52,545][1651274] Signal inference workers to resume experience collection... (32600 times) [2024-06-15 18:56:52,546][1651669] InferenceWorker_p0-w0: resuming experience collection (32600 times) [2024-06-15 18:56:52,757][1651669] Updated weights for policy 0, policy_version 622500 (0.0014) [2024-06-15 18:56:54,724][1651669] Updated weights for policy 0, policy_version 622576 (0.0012) [2024-06-15 18:56:55,767][1648981] Fps is (10 sec: 39322.5, 60 sec: 46967.4, 300 sec: 47986.8). Total num frames: 1275068416. Throughput: 0: 11502.8. Samples: 318814208. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:56:55,767][1648981] Avg episode reward: [(0, '616.460')] [2024-06-15 18:56:56,163][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000622624_1275133952.pth... [2024-06-15 18:56:56,174][1651669] Updated weights for policy 0, policy_version 622624 (0.0014) [2024-06-15 18:56:56,291][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000617024_1263665152.pth [2024-06-15 18:57:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 47541.5). Total num frames: 1275199488. Throughput: 0: 11320.9. Samples: 318885376. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:00,767][1648981] Avg episode reward: [(0, '609.160')] [2024-06-15 18:57:02,351][1651669] Updated weights for policy 0, policy_version 622704 (0.0095) [2024-06-15 18:57:03,987][1651669] Updated weights for policy 0, policy_version 622768 (0.0013) [2024-06-15 18:57:05,525][1651669] Updated weights for policy 0, policy_version 622819 (0.0012) [2024-06-15 18:57:05,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 47513.6, 300 sec: 47874.7). Total num frames: 1275559936. Throughput: 0: 11380.8. Samples: 318949888. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:05,767][1648981] Avg episode reward: [(0, '594.680')] [2024-06-15 18:57:07,457][1651669] Updated weights for policy 0, policy_version 622882 (0.0013) [2024-06-15 18:57:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 43690.5, 300 sec: 47543.3). Total num frames: 1275723776. Throughput: 0: 11559.8. Samples: 318986240. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:10,767][1648981] Avg episode reward: [(0, '609.250')] [2024-06-15 18:57:13,471][1651669] Updated weights for policy 0, policy_version 622944 (0.0013) [2024-06-15 18:57:15,119][1651669] Updated weights for policy 0, policy_version 623009 (0.0011) [2024-06-15 18:57:15,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 1275953152. Throughput: 0: 11552.8. Samples: 319067648. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:15,767][1648981] Avg episode reward: [(0, '624.620')] [2024-06-15 18:57:16,203][1651274] Saving new best policy, reward=624.620! [2024-06-15 18:57:16,973][1651669] Updated weights for policy 0, policy_version 623088 (0.0011) [2024-06-15 18:57:18,567][1651669] Updated weights for policy 0, policy_version 623168 (0.0012) [2024-06-15 18:57:20,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 45884.3, 300 sec: 47652.5). Total num frames: 1276248064. Throughput: 0: 11594.0. Samples: 319128576. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:20,767][1648981] Avg episode reward: [(0, '604.610')] [2024-06-15 18:57:25,022][1651669] Updated weights for policy 0, policy_version 623218 (0.0012) [2024-06-15 18:57:25,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1276411904. Throughput: 0: 11650.8. Samples: 319177216. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:25,767][1648981] Avg episode reward: [(0, '580.350')] [2024-06-15 18:57:26,557][1651669] Updated weights for policy 0, policy_version 623281 (0.0012) [2024-06-15 18:57:28,878][1651669] Updated weights for policy 0, policy_version 623376 (0.0013) [2024-06-15 18:57:28,989][1651274] Signal inference workers to stop experience collection... (32650 times) [2024-06-15 18:57:29,035][1651669] InferenceWorker_p0-w0: stopping experience collection (32650 times) [2024-06-15 18:57:29,220][1651274] Signal inference workers to resume experience collection... (32650 times) [2024-06-15 18:57:29,221][1651669] InferenceWorker_p0-w0: resuming experience collection (32650 times) [2024-06-15 18:57:29,766][1651669] Updated weights for policy 0, policy_version 623421 (0.0011) [2024-06-15 18:57:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48073.2, 300 sec: 47985.7). Total num frames: 1276772352. Throughput: 0: 11719.3. Samples: 319235584. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:30,767][1648981] Avg episode reward: [(0, '588.640')] [2024-06-15 18:57:34,714][1651669] Updated weights for policy 0, policy_version 623458 (0.0012) [2024-06-15 18:57:35,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 46421.4, 300 sec: 47652.4). Total num frames: 1276936192. Throughput: 0: 11764.6. Samples: 319323648. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:35,767][1648981] Avg episode reward: [(0, '570.740')] [2024-06-15 18:57:36,339][1651669] Updated weights for policy 0, policy_version 623536 (0.0012) [2024-06-15 18:57:38,360][1651669] Updated weights for policy 0, policy_version 623604 (0.0013) [2024-06-15 18:57:40,618][1651669] Updated weights for policy 0, policy_version 623664 (0.0012) [2024-06-15 18:57:40,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 49698.0, 300 sec: 47875.9). Total num frames: 1277263872. Throughput: 0: 11832.9. Samples: 319346688. Policy #0 lag: (min: 143.0, avg: 200.5, max: 383.0) [2024-06-15 18:57:40,767][1648981] Avg episode reward: [(0, '555.810')] [2024-06-15 18:57:45,774][1648981] Fps is (10 sec: 45838.9, 60 sec: 45323.5, 300 sec: 47429.0). Total num frames: 1277394944. Throughput: 0: 12103.8. Samples: 319430144. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:57:45,775][1648981] Avg episode reward: [(0, '565.060')] [2024-06-15 18:57:45,887][1651669] Updated weights for policy 0, policy_version 623738 (0.0064) [2024-06-15 18:57:46,987][1651669] Updated weights for policy 0, policy_version 623779 (0.0023) [2024-06-15 18:57:48,775][1651669] Updated weights for policy 0, policy_version 623841 (0.0091) [2024-06-15 18:57:50,767][1648981] Fps is (10 sec: 42598.1, 60 sec: 49151.8, 300 sec: 47541.6). Total num frames: 1277689856. Throughput: 0: 12128.7. Samples: 319495680. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:57:50,767][1648981] Avg episode reward: [(0, '562.370')] [2024-06-15 18:57:51,054][1651669] Updated weights for policy 0, policy_version 623893 (0.0029) [2024-06-15 18:57:55,769][1648981] Fps is (10 sec: 45900.5, 60 sec: 46419.8, 300 sec: 47430.3). Total num frames: 1277853696. Throughput: 0: 12128.1. Samples: 319532032. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:57:55,770][1648981] Avg episode reward: [(0, '568.490')] [2024-06-15 18:57:55,988][1651669] Updated weights for policy 0, policy_version 623971 (0.0012) [2024-06-15 18:57:56,623][1651669] Updated weights for policy 0, policy_version 623999 (0.0010) [2024-06-15 18:57:58,061][1651669] Updated weights for policy 0, policy_version 624052 (0.0012) [2024-06-15 18:57:59,554][1651669] Updated weights for policy 0, policy_version 624119 (0.0061) [2024-06-15 18:58:00,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 50244.2, 300 sec: 47541.4). Total num frames: 1278214144. Throughput: 0: 11855.6. Samples: 319601152. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:00,767][1648981] Avg episode reward: [(0, '552.660')] [2024-06-15 18:58:01,741][1651669] Updated weights for policy 0, policy_version 624161 (0.0013) [2024-06-15 18:58:05,773][1648981] Fps is (10 sec: 49133.1, 60 sec: 46416.5, 300 sec: 47429.5). Total num frames: 1278345216. Throughput: 0: 12309.0. Samples: 319682560. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:05,784][1648981] Avg episode reward: [(0, '557.140')] [2024-06-15 18:58:06,802][1651669] Updated weights for policy 0, policy_version 624193 (0.0012) [2024-06-15 18:58:08,271][1651669] Updated weights for policy 0, policy_version 624245 (0.0012) [2024-06-15 18:58:09,715][1651669] Updated weights for policy 0, policy_version 624304 (0.0014) [2024-06-15 18:58:09,816][1651274] Signal inference workers to stop experience collection... (32700 times) [2024-06-15 18:58:09,868][1651669] InferenceWorker_p0-w0: stopping experience collection (32700 times) [2024-06-15 18:58:10,045][1651274] Signal inference workers to resume experience collection... (32700 times) [2024-06-15 18:58:10,046][1651669] InferenceWorker_p0-w0: resuming experience collection (32700 times) [2024-06-15 18:58:10,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1278672896. Throughput: 0: 12071.8. Samples: 319720448. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:10,767][1648981] Avg episode reward: [(0, '539.510')] [2024-06-15 18:58:10,951][1651669] Updated weights for policy 0, policy_version 624359 (0.0020) [2024-06-15 18:58:12,200][1651669] Updated weights for policy 0, policy_version 624417 (0.0013) [2024-06-15 18:58:15,766][1648981] Fps is (10 sec: 52461.2, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 1278869504. Throughput: 0: 12310.7. Samples: 319789568. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:15,767][1648981] Avg episode reward: [(0, '543.310')] [2024-06-15 18:58:17,894][1651669] Updated weights for policy 0, policy_version 624469 (0.0013) [2024-06-15 18:58:19,346][1651669] Updated weights for policy 0, policy_version 624528 (0.0012) [2024-06-15 18:58:20,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 48059.7, 300 sec: 47319.3). Total num frames: 1279131648. Throughput: 0: 11923.9. Samples: 319860224. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:20,767][1648981] Avg episode reward: [(0, '563.440')] [2024-06-15 18:58:21,634][1651669] Updated weights for policy 0, policy_version 624624 (0.0103) [2024-06-15 18:58:23,150][1651669] Updated weights for policy 0, policy_version 624688 (0.0014) [2024-06-15 18:58:25,790][1648981] Fps is (10 sec: 52304.3, 60 sec: 49678.5, 300 sec: 47759.7). Total num frames: 1279393792. Throughput: 0: 12042.7. Samples: 319888896. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:25,791][1648981] Avg episode reward: [(0, '536.090')] [2024-06-15 18:58:29,242][1651669] Updated weights for policy 0, policy_version 624739 (0.0039) [2024-06-15 18:58:30,701][1651669] Updated weights for policy 0, policy_version 624800 (0.0013) [2024-06-15 18:58:30,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46967.5, 300 sec: 47208.2). Total num frames: 1279590400. Throughput: 0: 12108.1. Samples: 319974912. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:30,767][1648981] Avg episode reward: [(0, '540.350')] [2024-06-15 18:58:32,408][1651669] Updated weights for policy 0, policy_version 624867 (0.0133) [2024-06-15 18:58:34,151][1651669] Updated weights for policy 0, policy_version 624960 (0.0052) [2024-06-15 18:58:35,766][1648981] Fps is (10 sec: 52554.3, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1279918080. Throughput: 0: 11969.5. Samples: 320034304. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:35,767][1648981] Avg episode reward: [(0, '558.730')] [2024-06-15 18:58:40,684][1651669] Updated weights for policy 0, policy_version 625019 (0.0013) [2024-06-15 18:58:40,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 45875.3, 300 sec: 47097.4). Total num frames: 1280016384. Throughput: 0: 12106.6. Samples: 320076800. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:40,767][1648981] Avg episode reward: [(0, '563.390')] [2024-06-15 18:58:42,021][1651669] Updated weights for policy 0, policy_version 625072 (0.0021) [2024-06-15 18:58:43,592][1651669] Updated weights for policy 0, policy_version 625144 (0.0013) [2024-06-15 18:58:45,242][1651669] Updated weights for policy 0, policy_version 625200 (0.0012) [2024-06-15 18:58:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50797.0, 300 sec: 47985.7). Total num frames: 1280442368. Throughput: 0: 12071.8. Samples: 320144384. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:45,767][1648981] Avg episode reward: [(0, '576.690')] [2024-06-15 18:58:50,308][1651274] Signal inference workers to stop experience collection... (32750 times) [2024-06-15 18:58:50,411][1651669] InferenceWorker_p0-w0: stopping experience collection (32750 times) [2024-06-15 18:58:50,611][1651274] Signal inference workers to resume experience collection... (32750 times) [2024-06-15 18:58:50,613][1651669] InferenceWorker_p0-w0: resuming experience collection (32750 times) [2024-06-15 18:58:50,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 46421.4, 300 sec: 47430.4). Total num frames: 1280475136. Throughput: 0: 12027.9. Samples: 320223744. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:50,768][1648981] Avg episode reward: [(0, '581.770')] [2024-06-15 18:58:51,263][1651669] Updated weights for policy 0, policy_version 625249 (0.0016) [2024-06-15 18:58:52,752][1651669] Updated weights for policy 0, policy_version 625314 (0.0010) [2024-06-15 18:58:54,307][1651669] Updated weights for policy 0, policy_version 625377 (0.0022) [2024-06-15 18:58:55,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 50246.2, 300 sec: 47654.4). Total num frames: 1280868352. Throughput: 0: 11764.6. Samples: 320249856. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:58:55,767][1648981] Avg episode reward: [(0, '577.980')] [2024-06-15 18:58:55,840][1651669] Updated weights for policy 0, policy_version 625426 (0.0023) [2024-06-15 18:58:56,012][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000625440_1280901120.pth... [2024-06-15 18:58:56,166][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000619840_1269432320.pth [2024-06-15 18:59:00,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 45875.2, 300 sec: 47544.9). Total num frames: 1280966656. Throughput: 0: 11901.1. Samples: 320325120. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:59:00,767][1648981] Avg episode reward: [(0, '577.170')] [2024-06-15 18:59:02,435][1651669] Updated weights for policy 0, policy_version 625489 (0.0013) [2024-06-15 18:59:03,901][1651669] Updated weights for policy 0, policy_version 625557 (0.0088) [2024-06-15 18:59:05,204][1651669] Updated weights for policy 0, policy_version 625621 (0.0012) [2024-06-15 18:59:05,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49703.3, 300 sec: 47430.5). Total num frames: 1281327104. Throughput: 0: 11821.5. Samples: 320392192. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:59:05,767][1648981] Avg episode reward: [(0, '572.290')] [2024-06-15 18:59:05,953][1651669] Updated weights for policy 0, policy_version 625664 (0.0012) [2024-06-15 18:59:08,088][1651669] Updated weights for policy 0, policy_version 625728 (0.0016) [2024-06-15 18:59:10,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1281490944. Throughput: 0: 11941.6. Samples: 320425984. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:59:10,767][1648981] Avg episode reward: [(0, '589.200')] [2024-06-15 18:59:14,287][1651669] Updated weights for policy 0, policy_version 625808 (0.0013) [2024-06-15 18:59:15,596][1651669] Updated weights for policy 0, policy_version 625872 (0.0012) [2024-06-15 18:59:15,768][1648981] Fps is (10 sec: 45866.1, 60 sec: 48604.3, 300 sec: 47763.2). Total num frames: 1281785856. Throughput: 0: 11786.8. Samples: 320505344. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:59:15,769][1648981] Avg episode reward: [(0, '574.180')] [2024-06-15 18:59:16,554][1651669] Updated weights for policy 0, policy_version 625916 (0.0013) [2024-06-15 18:59:19,051][1651669] Updated weights for policy 0, policy_version 625981 (0.0105) [2024-06-15 18:59:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1282015232. Throughput: 0: 11935.3. Samples: 320571392. Policy #0 lag: (min: 15.0, avg: 87.9, max: 271.0) [2024-06-15 18:59:20,767][1648981] Avg episode reward: [(0, '583.120')] [2024-06-15 18:59:25,113][1651669] Updated weights for policy 0, policy_version 626064 (0.0108) [2024-06-15 18:59:25,767][1648981] Fps is (10 sec: 42606.1, 60 sec: 46986.0, 300 sec: 47652.4). Total num frames: 1282211840. Throughput: 0: 12037.6. Samples: 320618496. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:25,767][1648981] Avg episode reward: [(0, '583.840')] [2024-06-15 18:59:26,313][1651274] Signal inference workers to stop experience collection... (32800 times) [2024-06-15 18:59:26,426][1651669] InferenceWorker_p0-w0: stopping experience collection (32800 times) [2024-06-15 18:59:26,446][1651669] Updated weights for policy 0, policy_version 626120 (0.0071) [2024-06-15 18:59:26,518][1651274] Signal inference workers to resume experience collection... (32800 times) [2024-06-15 18:59:26,530][1651669] InferenceWorker_p0-w0: resuming experience collection (32800 times) [2024-06-15 18:59:27,552][1651669] Updated weights for policy 0, policy_version 626175 (0.0029) [2024-06-15 18:59:29,886][1651669] Updated weights for policy 0, policy_version 626230 (0.0011) [2024-06-15 18:59:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 47985.7). Total num frames: 1282539520. Throughput: 0: 11901.2. Samples: 320679936. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:30,767][1648981] Avg episode reward: [(0, '584.550')] [2024-06-15 18:59:34,646][1651669] Updated weights for policy 0, policy_version 626273 (0.0012) [2024-06-15 18:59:35,782][1648981] Fps is (10 sec: 49075.5, 60 sec: 46409.1, 300 sec: 47761.0). Total num frames: 1282703360. Throughput: 0: 11885.6. Samples: 320758784. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:35,783][1648981] Avg episode reward: [(0, '561.080')] [2024-06-15 18:59:36,311][1651669] Updated weights for policy 0, policy_version 626352 (0.0013) [2024-06-15 18:59:38,642][1651669] Updated weights for policy 0, policy_version 626432 (0.0114) [2024-06-15 18:59:40,570][1651669] Updated weights for policy 0, policy_version 626493 (0.0014) [2024-06-15 18:59:40,792][1648981] Fps is (10 sec: 52295.7, 60 sec: 50768.8, 300 sec: 47981.5). Total num frames: 1283063808. Throughput: 0: 11826.2. Samples: 320782336. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:40,792][1648981] Avg episode reward: [(0, '561.100')] [2024-06-15 18:59:45,782][1648981] Fps is (10 sec: 36044.9, 60 sec: 43679.2, 300 sec: 47427.8). Total num frames: 1283063808. Throughput: 0: 11942.5. Samples: 320862720. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:45,783][1648981] Avg episode reward: [(0, '565.450')] [2024-06-15 18:59:46,747][1651669] Updated weights for policy 0, policy_version 626547 (0.0011) [2024-06-15 18:59:48,329][1651669] Updated weights for policy 0, policy_version 626615 (0.0012) [2024-06-15 18:59:49,755][1651669] Updated weights for policy 0, policy_version 626641 (0.0041) [2024-06-15 18:59:50,771][1648981] Fps is (10 sec: 39405.5, 60 sec: 49694.8, 300 sec: 47540.7). Total num frames: 1283457024. Throughput: 0: 11888.7. Samples: 320927232. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:50,771][1648981] Avg episode reward: [(0, '537.630')] [2024-06-15 18:59:51,495][1651669] Updated weights for policy 0, policy_version 626720 (0.0238) [2024-06-15 18:59:55,767][1648981] Fps is (10 sec: 52510.7, 60 sec: 45329.0, 300 sec: 47763.5). Total num frames: 1283588096. Throughput: 0: 11923.9. Samples: 320962560. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 18:59:55,767][1648981] Avg episode reward: [(0, '526.980')] [2024-06-15 18:59:57,177][1651669] Updated weights for policy 0, policy_version 626784 (0.0011) [2024-06-15 18:59:58,676][1651669] Updated weights for policy 0, policy_version 626851 (0.0012) [2024-06-15 19:00:00,264][1651669] Updated weights for policy 0, policy_version 626882 (0.0011) [2024-06-15 19:00:00,766][1648981] Fps is (10 sec: 42616.1, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1283883008. Throughput: 0: 11787.9. Samples: 321035776. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:00,767][1648981] Avg episode reward: [(0, '523.320')] [2024-06-15 19:00:01,946][1651669] Updated weights for policy 0, policy_version 626962 (0.0012) [2024-06-15 19:00:05,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 46421.4, 300 sec: 47985.7). Total num frames: 1284112384. Throughput: 0: 12128.7. Samples: 321117184. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:05,767][1648981] Avg episode reward: [(0, '516.650')] [2024-06-15 19:00:07,054][1651669] Updated weights for policy 0, policy_version 627009 (0.0033) [2024-06-15 19:00:08,314][1651274] Signal inference workers to stop experience collection... (32850 times) [2024-06-15 19:00:08,361][1651669] InferenceWorker_p0-w0: stopping experience collection (32850 times) [2024-06-15 19:00:08,490][1651274] Signal inference workers to resume experience collection... (32850 times) [2024-06-15 19:00:08,491][1651669] InferenceWorker_p0-w0: resuming experience collection (32850 times) [2024-06-15 19:00:08,640][1651669] Updated weights for policy 0, policy_version 627073 (0.0010) [2024-06-15 19:00:09,993][1651669] Updated weights for policy 0, policy_version 627132 (0.0010) [2024-06-15 19:00:10,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 48059.5, 300 sec: 47430.3). Total num frames: 1284374528. Throughput: 0: 11832.9. Samples: 321150976. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:10,768][1648981] Avg episode reward: [(0, '486.210')] [2024-06-15 19:00:12,387][1651669] Updated weights for policy 0, policy_version 627190 (0.0012) [2024-06-15 19:00:13,804][1651669] Updated weights for policy 0, policy_version 627250 (0.0147) [2024-06-15 19:00:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47515.2, 300 sec: 47985.7). Total num frames: 1284636672. Throughput: 0: 11889.8. Samples: 321214976. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:15,767][1648981] Avg episode reward: [(0, '496.940')] [2024-06-15 19:00:18,040][1651669] Updated weights for policy 0, policy_version 627272 (0.0011) [2024-06-15 19:00:19,451][1651669] Updated weights for policy 0, policy_version 627344 (0.0011) [2024-06-15 19:00:20,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1284898816. Throughput: 0: 11882.6. Samples: 321293312. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:20,767][1648981] Avg episode reward: [(0, '495.350')] [2024-06-15 19:00:22,162][1651669] Updated weights for policy 0, policy_version 627410 (0.0012) [2024-06-15 19:00:23,905][1651669] Updated weights for policy 0, policy_version 627480 (0.0012) [2024-06-15 19:00:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.1, 300 sec: 47986.0). Total num frames: 1285160960. Throughput: 0: 12090.0. Samples: 321326080. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:25,767][1648981] Avg episode reward: [(0, '496.960')] [2024-06-15 19:00:29,340][1651669] Updated weights for policy 0, policy_version 627536 (0.0016) [2024-06-15 19:00:30,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46421.4, 300 sec: 47430.3). Total num frames: 1285324800. Throughput: 0: 12133.0. Samples: 321408512. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:30,767][1648981] Avg episode reward: [(0, '475.970')] [2024-06-15 19:00:30,932][1651669] Updated weights for policy 0, policy_version 627605 (0.0185) [2024-06-15 19:00:31,790][1651669] Updated weights for policy 0, policy_version 627647 (0.0013) [2024-06-15 19:00:33,608][1651669] Updated weights for policy 0, policy_version 627696 (0.0030) [2024-06-15 19:00:34,781][1651669] Updated weights for policy 0, policy_version 627747 (0.0013) [2024-06-15 19:00:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49711.2, 300 sec: 47985.7). Total num frames: 1285685248. Throughput: 0: 12016.1. Samples: 321467904. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:35,767][1648981] Avg episode reward: [(0, '468.540')] [2024-06-15 19:00:40,135][1651669] Updated weights for policy 0, policy_version 627792 (0.0014) [2024-06-15 19:00:40,774][1648981] Fps is (10 sec: 42565.3, 60 sec: 44796.2, 300 sec: 47321.1). Total num frames: 1285750784. Throughput: 0: 12251.8. Samples: 321513984. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:40,775][1648981] Avg episode reward: [(0, '455.570')] [2024-06-15 19:00:41,717][1651669] Updated weights for policy 0, policy_version 627872 (0.0024) [2024-06-15 19:00:44,026][1651669] Updated weights for policy 0, policy_version 627925 (0.0011) [2024-06-15 19:00:45,179][1651274] Signal inference workers to stop experience collection... (32900 times) [2024-06-15 19:00:45,262][1651669] InferenceWorker_p0-w0: stopping experience collection (32900 times) [2024-06-15 19:00:45,357][1651274] Signal inference workers to resume experience collection... (32900 times) [2024-06-15 19:00:45,360][1651669] InferenceWorker_p0-w0: resuming experience collection (32900 times) [2024-06-15 19:00:45,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 51349.8, 300 sec: 47764.1). Total num frames: 1286144000. Throughput: 0: 12185.5. Samples: 321584128. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:45,767][1648981] Avg episode reward: [(0, '441.980')] [2024-06-15 19:00:45,892][1651669] Updated weights for policy 0, policy_version 628016 (0.0094) [2024-06-15 19:00:50,766][1648981] Fps is (10 sec: 52469.9, 60 sec: 46970.8, 300 sec: 47541.4). Total num frames: 1286275072. Throughput: 0: 12094.6. Samples: 321661440. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:50,767][1648981] Avg episode reward: [(0, '423.860')] [2024-06-15 19:00:50,944][1651669] Updated weights for policy 0, policy_version 628080 (0.0012) [2024-06-15 19:00:52,149][1651669] Updated weights for policy 0, policy_version 628144 (0.0012) [2024-06-15 19:00:55,562][1651669] Updated weights for policy 0, policy_version 628192 (0.0093) [2024-06-15 19:00:55,783][1648981] Fps is (10 sec: 39257.5, 60 sec: 49138.5, 300 sec: 47316.5). Total num frames: 1286537216. Throughput: 0: 12101.5. Samples: 321695744. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:00:55,783][1648981] Avg episode reward: [(0, '426.150')] [2024-06-15 19:00:56,286][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000628224_1286602752.pth... [2024-06-15 19:00:56,453][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000622624_1275133952.pth [2024-06-15 19:00:57,612][1651669] Updated weights for policy 0, policy_version 628279 (0.0151) [2024-06-15 19:01:00,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1286733824. Throughput: 0: 12242.5. Samples: 321765888. Policy #0 lag: (min: 10.0, avg: 76.8, max: 266.0) [2024-06-15 19:01:00,767][1648981] Avg episode reward: [(0, '426.660')] [2024-06-15 19:01:01,121][1651669] Updated weights for policy 0, policy_version 628309 (0.0019) [2024-06-15 19:01:02,635][1651669] Updated weights for policy 0, policy_version 628385 (0.0020) [2024-06-15 19:01:05,766][1648981] Fps is (10 sec: 49234.1, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1287028736. Throughput: 0: 12231.1. Samples: 321843712. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:05,767][1648981] Avg episode reward: [(0, '435.050')] [2024-06-15 19:01:06,106][1651669] Updated weights for policy 0, policy_version 628464 (0.0012) [2024-06-15 19:01:07,199][1651669] Updated weights for policy 0, policy_version 628499 (0.0060) [2024-06-15 19:01:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.9, 300 sec: 47985.7). Total num frames: 1287258112. Throughput: 0: 12117.3. Samples: 321871360. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:10,767][1648981] Avg episode reward: [(0, '469.830')] [2024-06-15 19:01:11,934][1651669] Updated weights for policy 0, policy_version 628547 (0.0012) [2024-06-15 19:01:13,042][1651669] Updated weights for policy 0, policy_version 628615 (0.0013) [2024-06-15 19:01:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 47654.4). Total num frames: 1287553024. Throughput: 0: 12037.7. Samples: 321950208. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:15,767][1648981] Avg episode reward: [(0, '469.530')] [2024-06-15 19:01:15,914][1651669] Updated weights for policy 0, policy_version 628693 (0.0048) [2024-06-15 19:01:17,440][1651669] Updated weights for policy 0, policy_version 628755 (0.0103) [2024-06-15 19:01:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1287782400. Throughput: 0: 12367.6. Samples: 322024448. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:20,767][1648981] Avg episode reward: [(0, '463.140')] [2024-06-15 19:01:21,919][1651669] Updated weights for policy 0, policy_version 628816 (0.0056) [2024-06-15 19:01:22,918][1651669] Updated weights for policy 0, policy_version 628864 (0.0014) [2024-06-15 19:01:23,761][1651274] Signal inference workers to stop experience collection... (32950 times) [2024-06-15 19:01:23,785][1651669] InferenceWorker_p0-w0: stopping experience collection (32950 times) [2024-06-15 19:01:23,980][1651274] Signal inference workers to resume experience collection... (32950 times) [2024-06-15 19:01:23,981][1651669] InferenceWorker_p0-w0: resuming experience collection (32950 times) [2024-06-15 19:01:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.8, 300 sec: 47988.4). Total num frames: 1288044544. Throughput: 0: 12187.7. Samples: 322062336. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:25,767][1648981] Avg episode reward: [(0, '471.870')] [2024-06-15 19:01:26,684][1651669] Updated weights for policy 0, policy_version 628929 (0.0011) [2024-06-15 19:01:27,800][1651669] Updated weights for policy 0, policy_version 628989 (0.0073) [2024-06-15 19:01:29,370][1651669] Updated weights for policy 0, policy_version 629050 (0.0011) [2024-06-15 19:01:30,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 49697.9, 300 sec: 47985.6). Total num frames: 1288306688. Throughput: 0: 12208.4. Samples: 322133504. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:30,768][1648981] Avg episode reward: [(0, '461.150')] [2024-06-15 19:01:32,621][1651669] Updated weights for policy 0, policy_version 629104 (0.0012) [2024-06-15 19:01:34,457][1651669] Updated weights for policy 0, policy_version 629157 (0.0011) [2024-06-15 19:01:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1288568832. Throughput: 0: 12140.1. Samples: 322207744. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:35,767][1648981] Avg episode reward: [(0, '464.800')] [2024-06-15 19:01:38,279][1651669] Updated weights for policy 0, policy_version 629221 (0.0086) [2024-06-15 19:01:39,637][1651669] Updated weights for policy 0, policy_version 629264 (0.0011) [2024-06-15 19:01:40,790][1648981] Fps is (10 sec: 49037.5, 60 sec: 50777.0, 300 sec: 47870.9). Total num frames: 1288798208. Throughput: 0: 12206.5. Samples: 322245120. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:40,791][1648981] Avg episode reward: [(0, '464.840')] [2024-06-15 19:01:42,479][1651669] Updated weights for policy 0, policy_version 629328 (0.0159) [2024-06-15 19:01:43,677][1651669] Updated weights for policy 0, policy_version 629375 (0.0015) [2024-06-15 19:01:45,838][1648981] Fps is (10 sec: 52055.1, 60 sec: 49093.5, 300 sec: 48640.3). Total num frames: 1289093120. Throughput: 0: 12166.2. Samples: 322314240. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:45,839][1648981] Avg episode reward: [(0, '485.450')] [2024-06-15 19:01:48,227][1651669] Updated weights for policy 0, policy_version 629441 (0.0046) [2024-06-15 19:01:50,798][1648981] Fps is (10 sec: 42564.0, 60 sec: 49126.0, 300 sec: 47980.6). Total num frames: 1289224192. Throughput: 0: 12097.4. Samples: 322388480. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:50,799][1648981] Avg episode reward: [(0, '497.210')] [2024-06-15 19:01:51,264][1651669] Updated weights for policy 0, policy_version 629521 (0.0021) [2024-06-15 19:01:53,453][1651669] Updated weights for policy 0, policy_version 629584 (0.0014) [2024-06-15 19:01:54,526][1651669] Updated weights for policy 0, policy_version 629629 (0.0138) [2024-06-15 19:01:55,767][1648981] Fps is (10 sec: 39605.0, 60 sec: 49165.4, 300 sec: 48429.9). Total num frames: 1289486336. Throughput: 0: 12174.2. Samples: 322419200. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:01:55,767][1648981] Avg episode reward: [(0, '494.800')] [2024-06-15 19:01:57,257][1651669] Updated weights for policy 0, policy_version 629692 (0.0015) [2024-06-15 19:02:00,766][1648981] Fps is (10 sec: 46021.1, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1289682944. Throughput: 0: 12117.3. Samples: 322495488. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:00,767][1648981] Avg episode reward: [(0, '498.630')] [2024-06-15 19:02:00,938][1651669] Updated weights for policy 0, policy_version 629744 (0.0011) [2024-06-15 19:02:02,397][1651669] Updated weights for policy 0, policy_version 629808 (0.0013) [2024-06-15 19:02:03,943][1651669] Updated weights for policy 0, policy_version 629842 (0.0045) [2024-06-15 19:02:04,808][1651669] Updated weights for policy 0, policy_version 629884 (0.0011) [2024-06-15 19:02:05,774][1648981] Fps is (10 sec: 52389.0, 60 sec: 49691.6, 300 sec: 48428.7). Total num frames: 1290010624. Throughput: 0: 12115.2. Samples: 322569728. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:05,775][1648981] Avg episode reward: [(0, '496.680')] [2024-06-15 19:02:10,724][1651669] Updated weights for policy 0, policy_version 629953 (0.0012) [2024-06-15 19:02:10,732][1651274] Signal inference workers to stop experience collection... (33000 times) [2024-06-15 19:02:10,757][1651669] InferenceWorker_p0-w0: stopping experience collection (33000 times) [2024-06-15 19:02:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48096.7). Total num frames: 1290141696. Throughput: 0: 12060.4. Samples: 322605056. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:10,767][1648981] Avg episode reward: [(0, '512.720')] [2024-06-15 19:02:10,980][1651274] Signal inference workers to resume experience collection... (33000 times) [2024-06-15 19:02:10,982][1651669] InferenceWorker_p0-w0: resuming experience collection (33000 times) [2024-06-15 19:02:12,262][1651669] Updated weights for policy 0, policy_version 630016 (0.0013) [2024-06-15 19:02:13,570][1651669] Updated weights for policy 0, policy_version 630079 (0.0032) [2024-06-15 19:02:15,767][1648981] Fps is (10 sec: 45910.2, 60 sec: 48605.7, 300 sec: 48207.8). Total num frames: 1290469376. Throughput: 0: 12037.7. Samples: 322675200. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:15,769][1648981] Avg episode reward: [(0, '536.040')] [2024-06-15 19:02:16,203][1651669] Updated weights for policy 0, policy_version 630141 (0.0027) [2024-06-15 19:02:18,402][1651669] Updated weights for policy 0, policy_version 630192 (0.0013) [2024-06-15 19:02:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1290665984. Throughput: 0: 11992.2. Samples: 322747392. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:20,767][1648981] Avg episode reward: [(0, '560.510')] [2024-06-15 19:02:22,865][1651669] Updated weights for policy 0, policy_version 630240 (0.0014) [2024-06-15 19:02:24,626][1651669] Updated weights for policy 0, policy_version 630304 (0.0013) [2024-06-15 19:02:25,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1290928128. Throughput: 0: 12066.8. Samples: 322787840. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:25,767][1648981] Avg episode reward: [(0, '555.590')] [2024-06-15 19:02:25,909][1651669] Updated weights for policy 0, policy_version 630338 (0.0012) [2024-06-15 19:02:29,085][1651669] Updated weights for policy 0, policy_version 630416 (0.0012) [2024-06-15 19:02:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.9, 300 sec: 48318.9). Total num frames: 1291190272. Throughput: 0: 11851.8. Samples: 322846720. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:30,767][1648981] Avg episode reward: [(0, '534.460')] [2024-06-15 19:02:33,480][1651669] Updated weights for policy 0, policy_version 630468 (0.0012) [2024-06-15 19:02:35,650][1651669] Updated weights for policy 0, policy_version 630545 (0.0011) [2024-06-15 19:02:35,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 1291354112. Throughput: 0: 11818.5. Samples: 322919936. Policy #0 lag: (min: 15.0, avg: 103.4, max: 271.0) [2024-06-15 19:02:35,767][1648981] Avg episode reward: [(0, '531.060')] [2024-06-15 19:02:36,525][1651669] Updated weights for policy 0, policy_version 630591 (0.0035) [2024-06-15 19:02:38,362][1651669] Updated weights for policy 0, policy_version 630640 (0.0013) [2024-06-15 19:02:40,307][1651669] Updated weights for policy 0, policy_version 630672 (0.0011) [2024-06-15 19:02:40,768][1648981] Fps is (10 sec: 45869.1, 60 sec: 47531.2, 300 sec: 48320.0). Total num frames: 1291649024. Throughput: 0: 11775.7. Samples: 322949120. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:02:40,768][1648981] Avg episode reward: [(0, '561.080')] [2024-06-15 19:02:41,300][1651669] Updated weights for policy 0, policy_version 630715 (0.0026) [2024-06-15 19:02:45,696][1651669] Updated weights for policy 0, policy_version 630768 (0.0011) [2024-06-15 19:02:45,771][1648981] Fps is (10 sec: 45855.5, 60 sec: 45380.1, 300 sec: 47873.9). Total num frames: 1291812864. Throughput: 0: 11979.7. Samples: 323034624. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:02:45,771][1648981] Avg episode reward: [(0, '546.150')] [2024-06-15 19:02:47,591][1651669] Updated weights for policy 0, policy_version 630836 (0.0030) [2024-06-15 19:02:49,442][1651669] Updated weights for policy 0, policy_version 630896 (0.0021) [2024-06-15 19:02:50,766][1648981] Fps is (10 sec: 45881.1, 60 sec: 48085.1, 300 sec: 48319.3). Total num frames: 1292107776. Throughput: 0: 11550.4. Samples: 323089408. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:02:50,767][1648981] Avg episode reward: [(0, '544.960')] [2024-06-15 19:02:51,659][1651274] Signal inference workers to stop experience collection... (33050 times) [2024-06-15 19:02:51,713][1651669] InferenceWorker_p0-w0: stopping experience collection (33050 times) [2024-06-15 19:02:51,940][1651274] Signal inference workers to resume experience collection... (33050 times) [2024-06-15 19:02:51,940][1651669] InferenceWorker_p0-w0: resuming experience collection (33050 times) [2024-06-15 19:02:52,519][1651669] Updated weights for policy 0, policy_version 630971 (0.0013) [2024-06-15 19:02:55,766][1648981] Fps is (10 sec: 42616.4, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1292238848. Throughput: 0: 11525.7. Samples: 323123712. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:02:55,767][1648981] Avg episode reward: [(0, '561.150')] [2024-06-15 19:02:55,830][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000630976_1292238848.pth... [2024-06-15 19:02:55,933][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000625440_1280901120.pth [2024-06-15 19:02:57,337][1651669] Updated weights for policy 0, policy_version 631024 (0.0032) [2024-06-15 19:02:59,293][1651669] Updated weights for policy 0, policy_version 631097 (0.0013) [2024-06-15 19:03:00,767][1648981] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 48097.8). Total num frames: 1292533760. Throughput: 0: 11503.0. Samples: 323192832. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:00,769][1648981] Avg episode reward: [(0, '551.240')] [2024-06-15 19:03:01,421][1651669] Updated weights for policy 0, policy_version 631152 (0.0010) [2024-06-15 19:03:03,295][1651669] Updated weights for policy 0, policy_version 631201 (0.0017) [2024-06-15 19:03:05,776][1648981] Fps is (10 sec: 52380.8, 60 sec: 45874.2, 300 sec: 47762.0). Total num frames: 1292763136. Throughput: 0: 11568.8. Samples: 323268096. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:05,776][1648981] Avg episode reward: [(0, '573.170')] [2024-06-15 19:03:07,160][1651669] Updated weights for policy 0, policy_version 631251 (0.0013) [2024-06-15 19:03:08,208][1651669] Updated weights for policy 0, policy_version 631298 (0.0012) [2024-06-15 19:03:09,671][1651669] Updated weights for policy 0, policy_version 631359 (0.0122) [2024-06-15 19:03:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1293025280. Throughput: 0: 11502.9. Samples: 323305472. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:10,767][1648981] Avg episode reward: [(0, '532.550')] [2024-06-15 19:03:12,839][1651669] Updated weights for policy 0, policy_version 631411 (0.0011) [2024-06-15 19:03:14,083][1651669] Updated weights for policy 0, policy_version 631457 (0.0011) [2024-06-15 19:03:15,766][1648981] Fps is (10 sec: 52476.8, 60 sec: 46967.6, 300 sec: 47985.7). Total num frames: 1293287424. Throughput: 0: 11662.2. Samples: 323371520. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:15,767][1648981] Avg episode reward: [(0, '534.170')] [2024-06-15 19:03:18,085][1651669] Updated weights for policy 0, policy_version 631522 (0.0023) [2024-06-15 19:03:19,792][1651669] Updated weights for policy 0, policy_version 631600 (0.0012) [2024-06-15 19:03:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47989.6). Total num frames: 1293549568. Throughput: 0: 11787.4. Samples: 323450368. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:20,767][1648981] Avg episode reward: [(0, '532.560')] [2024-06-15 19:03:23,499][1651669] Updated weights for policy 0, policy_version 631636 (0.0015) [2024-06-15 19:03:24,667][1651669] Updated weights for policy 0, policy_version 631699 (0.0013) [2024-06-15 19:03:25,589][1651669] Updated weights for policy 0, policy_version 631741 (0.0011) [2024-06-15 19:03:25,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 48059.6, 300 sec: 48207.8). Total num frames: 1293811712. Throughput: 0: 12049.4. Samples: 323491328. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:25,768][1648981] Avg episode reward: [(0, '546.760')] [2024-06-15 19:03:29,176][1651669] Updated weights for policy 0, policy_version 631793 (0.0020) [2024-06-15 19:03:30,291][1651669] Updated weights for policy 0, policy_version 631843 (0.0011) [2024-06-15 19:03:30,767][1648981] Fps is (10 sec: 49148.8, 60 sec: 47513.1, 300 sec: 47874.5). Total num frames: 1294041088. Throughput: 0: 11890.7. Samples: 323569664. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:30,768][1648981] Avg episode reward: [(0, '565.770')] [2024-06-15 19:03:32,769][1651669] Updated weights for policy 0, policy_version 631888 (0.0012) [2024-06-15 19:03:32,852][1651274] Signal inference workers to stop experience collection... (33100 times) [2024-06-15 19:03:32,894][1651669] InferenceWorker_p0-w0: stopping experience collection (33100 times) [2024-06-15 19:03:33,055][1651274] Signal inference workers to resume experience collection... (33100 times) [2024-06-15 19:03:33,056][1651669] InferenceWorker_p0-w0: resuming experience collection (33100 times) [2024-06-15 19:03:34,107][1651669] Updated weights for policy 0, policy_version 631952 (0.0013) [2024-06-15 19:03:35,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 1294336000. Throughput: 0: 12208.4. Samples: 323638784. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:35,767][1648981] Avg episode reward: [(0, '575.510')] [2024-06-15 19:03:38,813][1651669] Updated weights for policy 0, policy_version 632003 (0.0011) [2024-06-15 19:03:40,766][1648981] Fps is (10 sec: 42601.2, 60 sec: 46968.6, 300 sec: 47541.4). Total num frames: 1294467072. Throughput: 0: 12618.0. Samples: 323691520. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:40,767][1648981] Avg episode reward: [(0, '546.360')] [2024-06-15 19:03:40,919][1651669] Updated weights for policy 0, policy_version 632082 (0.0010) [2024-06-15 19:03:41,707][1651669] Updated weights for policy 0, policy_version 632127 (0.0012) [2024-06-15 19:03:43,775][1651669] Updated weights for policy 0, policy_version 632192 (0.0012) [2024-06-15 19:03:45,446][1651669] Updated weights for policy 0, policy_version 632256 (0.0012) [2024-06-15 19:03:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50794.0, 300 sec: 48763.3). Total num frames: 1294860288. Throughput: 0: 12413.2. Samples: 323751424. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:45,767][1648981] Avg episode reward: [(0, '541.190')] [2024-06-15 19:03:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 47513.7, 300 sec: 47763.5). Total num frames: 1294958592. Throughput: 0: 12461.2. Samples: 323828736. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:50,767][1648981] Avg episode reward: [(0, '552.550')] [2024-06-15 19:03:51,816][1651669] Updated weights for policy 0, policy_version 632337 (0.0030) [2024-06-15 19:03:53,965][1651669] Updated weights for policy 0, policy_version 632385 (0.0011) [2024-06-15 19:03:55,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 50790.2, 300 sec: 48541.0). Total num frames: 1295286272. Throughput: 0: 12310.7. Samples: 323859456. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:03:55,767][1648981] Avg episode reward: [(0, '548.140')] [2024-06-15 19:03:55,983][1651669] Updated weights for policy 0, policy_version 632481 (0.0110) [2024-06-15 19:04:00,421][1651669] Updated weights for policy 0, policy_version 632532 (0.0015) [2024-06-15 19:04:00,771][1648981] Fps is (10 sec: 49128.7, 60 sec: 48602.1, 300 sec: 47873.8). Total num frames: 1295450112. Throughput: 0: 12616.6. Samples: 323939328. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:04:00,772][1648981] Avg episode reward: [(0, '544.090')] [2024-06-15 19:04:02,413][1651669] Updated weights for policy 0, policy_version 632611 (0.0011) [2024-06-15 19:04:04,800][1651669] Updated weights for policy 0, policy_version 632657 (0.0012) [2024-06-15 19:04:05,771][1648981] Fps is (10 sec: 45852.9, 60 sec: 49701.5, 300 sec: 48318.1). Total num frames: 1295745024. Throughput: 0: 12582.4. Samples: 324016640. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:04:05,772][1648981] Avg episode reward: [(0, '551.520')] [2024-06-15 19:04:06,668][1651669] Updated weights for policy 0, policy_version 632752 (0.0056) [2024-06-15 19:04:10,766][1648981] Fps is (10 sec: 45897.1, 60 sec: 48059.8, 300 sec: 47874.9). Total num frames: 1295908864. Throughput: 0: 12367.7. Samples: 324047872. Policy #0 lag: (min: 15.0, avg: 150.3, max: 271.0) [2024-06-15 19:04:10,767][1648981] Avg episode reward: [(0, '552.100')] [2024-06-15 19:04:11,577][1651669] Updated weights for policy 0, policy_version 632805 (0.0097) [2024-06-15 19:04:11,824][1651274] Signal inference workers to stop experience collection... (33150 times) [2024-06-15 19:04:11,877][1651669] InferenceWorker_p0-w0: stopping experience collection (33150 times) [2024-06-15 19:04:12,108][1651274] Signal inference workers to resume experience collection... (33150 times) [2024-06-15 19:04:12,109][1651669] InferenceWorker_p0-w0: resuming experience collection (33150 times) [2024-06-15 19:04:13,086][1651669] Updated weights for policy 0, policy_version 632866 (0.0116) [2024-06-15 19:04:13,771][1651669] Updated weights for policy 0, policy_version 632896 (0.0011) [2024-06-15 19:04:15,766][1648981] Fps is (10 sec: 49177.2, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1296236544. Throughput: 0: 12288.2. Samples: 324122624. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:15,767][1648981] Avg episode reward: [(0, '539.980')] [2024-06-15 19:04:16,888][1651669] Updated weights for policy 0, policy_version 632976 (0.0090) [2024-06-15 19:04:20,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 48207.9). Total num frames: 1296433152. Throughput: 0: 12219.7. Samples: 324188672. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:20,767][1648981] Avg episode reward: [(0, '540.410')] [2024-06-15 19:04:22,609][1651669] Updated weights for policy 0, policy_version 633027 (0.0011) [2024-06-15 19:04:24,751][1651669] Updated weights for policy 0, policy_version 633107 (0.0015) [2024-06-15 19:04:25,652][1651669] Updated weights for policy 0, policy_version 633148 (0.0010) [2024-06-15 19:04:25,767][1648981] Fps is (10 sec: 42598.0, 60 sec: 47513.7, 300 sec: 47874.6). Total num frames: 1296662528. Throughput: 0: 11855.6. Samples: 324225024. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:25,767][1648981] Avg episode reward: [(0, '537.420')] [2024-06-15 19:04:27,710][1651669] Updated weights for policy 0, policy_version 633209 (0.0010) [2024-06-15 19:04:28,949][1651669] Updated weights for policy 0, policy_version 633268 (0.0011) [2024-06-15 19:04:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48606.4, 300 sec: 48321.5). Total num frames: 1296957440. Throughput: 0: 11992.2. Samples: 324291072. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:30,767][1648981] Avg episode reward: [(0, '579.520')] [2024-06-15 19:04:34,412][1651669] Updated weights for policy 0, policy_version 633313 (0.0161) [2024-06-15 19:04:35,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 46421.4, 300 sec: 47656.6). Total num frames: 1297121280. Throughput: 0: 11935.3. Samples: 324365824. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:35,767][1648981] Avg episode reward: [(0, '582.490')] [2024-06-15 19:04:36,357][1651669] Updated weights for policy 0, policy_version 633377 (0.0047) [2024-06-15 19:04:36,961][1651669] Updated weights for policy 0, policy_version 633408 (0.0033) [2024-06-15 19:04:38,860][1651669] Updated weights for policy 0, policy_version 633472 (0.0010) [2024-06-15 19:04:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 48876.9). Total num frames: 1297481728. Throughput: 0: 11878.5. Samples: 324393984. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:40,767][1648981] Avg episode reward: [(0, '603.040')] [2024-06-15 19:04:45,328][1651669] Updated weights for policy 0, policy_version 633538 (0.0012) [2024-06-15 19:04:45,782][1648981] Fps is (10 sec: 39259.5, 60 sec: 44225.2, 300 sec: 47650.6). Total num frames: 1297514496. Throughput: 0: 11818.6. Samples: 324471296. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:45,783][1648981] Avg episode reward: [(0, '587.840')] [2024-06-15 19:04:46,983][1651669] Updated weights for policy 0, policy_version 633601 (0.0013) [2024-06-15 19:04:48,476][1651669] Updated weights for policy 0, policy_version 633662 (0.0011) [2024-06-15 19:04:50,444][1651669] Updated weights for policy 0, policy_version 633729 (0.0010) [2024-06-15 19:04:50,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 49152.0, 300 sec: 48541.1). Total num frames: 1297907712. Throughput: 0: 11345.0. Samples: 324527104. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:50,767][1648981] Avg episode reward: [(0, '565.600')] [2024-06-15 19:04:50,837][1651274] Signal inference workers to stop experience collection... (33200 times) [2024-06-15 19:04:50,887][1651669] InferenceWorker_p0-w0: stopping experience collection (33200 times) [2024-06-15 19:04:51,178][1651274] Signal inference workers to resume experience collection... (33200 times) [2024-06-15 19:04:51,184][1651669] InferenceWorker_p0-w0: resuming experience collection (33200 times) [2024-06-15 19:04:55,767][1648981] Fps is (10 sec: 49229.0, 60 sec: 45329.2, 300 sec: 47874.6). Total num frames: 1298006016. Throughput: 0: 11389.1. Samples: 324560384. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:04:55,767][1648981] Avg episode reward: [(0, '555.530')] [2024-06-15 19:04:55,779][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000633792_1298006016.pth... [2024-06-15 19:04:55,899][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000628224_1286602752.pth [2024-06-15 19:04:57,172][1651669] Updated weights for policy 0, policy_version 633795 (0.0019) [2024-06-15 19:04:58,376][1651669] Updated weights for policy 0, policy_version 633856 (0.0012) [2024-06-15 19:05:00,226][1651669] Updated weights for policy 0, policy_version 633909 (0.0011) [2024-06-15 19:05:00,767][1648981] Fps is (10 sec: 36044.0, 60 sec: 46971.1, 300 sec: 47985.7). Total num frames: 1298268160. Throughput: 0: 11446.0. Samples: 324637696. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:00,768][1648981] Avg episode reward: [(0, '580.410')] [2024-06-15 19:05:02,038][1651669] Updated weights for policy 0, policy_version 633984 (0.0017) [2024-06-15 19:05:03,392][1651669] Updated weights for policy 0, policy_version 634048 (0.0012) [2024-06-15 19:05:05,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 46425.3, 300 sec: 47985.7). Total num frames: 1298530304. Throughput: 0: 11411.9. Samples: 324702208. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:05,767][1648981] Avg episode reward: [(0, '591.630')] [2024-06-15 19:05:09,492][1651669] Updated weights for policy 0, policy_version 634112 (0.0016) [2024-06-15 19:05:10,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1298694144. Throughput: 0: 11548.5. Samples: 324744704. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:10,767][1648981] Avg episode reward: [(0, '552.500')] [2024-06-15 19:05:12,233][1651669] Updated weights for policy 0, policy_version 634192 (0.0011) [2024-06-15 19:05:13,682][1651669] Updated weights for policy 0, policy_version 634245 (0.0013) [2024-06-15 19:05:15,774][1648981] Fps is (10 sec: 52387.7, 60 sec: 46961.4, 300 sec: 47984.4). Total num frames: 1299054592. Throughput: 0: 11318.9. Samples: 324800512. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:15,775][1648981] Avg episode reward: [(0, '554.400')] [2024-06-15 19:05:19,858][1651669] Updated weights for policy 0, policy_version 634306 (0.0033) [2024-06-15 19:05:20,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 1299152896. Throughput: 0: 11468.8. Samples: 324881920. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:20,767][1648981] Avg episode reward: [(0, '551.650')] [2024-06-15 19:05:21,009][1651669] Updated weights for policy 0, policy_version 634364 (0.0013) [2024-06-15 19:05:21,966][1651669] Updated weights for policy 0, policy_version 634405 (0.0011) [2024-06-15 19:05:23,270][1651669] Updated weights for policy 0, policy_version 634454 (0.0014) [2024-06-15 19:05:24,754][1651669] Updated weights for policy 0, policy_version 634528 (0.0013) [2024-06-15 19:05:25,767][1648981] Fps is (10 sec: 52469.3, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1299578880. Throughput: 0: 11639.5. Samples: 324917760. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:25,769][1648981] Avg episode reward: [(0, '565.790')] [2024-06-15 19:05:29,917][1651669] Updated weights for policy 0, policy_version 634562 (0.0013) [2024-06-15 19:05:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 44782.9, 300 sec: 47319.2). Total num frames: 1299644416. Throughput: 0: 11643.6. Samples: 324995072. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:30,767][1648981] Avg episode reward: [(0, '555.430')] [2024-06-15 19:05:31,124][1651669] Updated weights for policy 0, policy_version 634624 (0.0013) [2024-06-15 19:05:33,731][1651274] Signal inference workers to stop experience collection... (33250 times) [2024-06-15 19:05:33,743][1651669] Updated weights for policy 0, policy_version 634689 (0.0020) [2024-06-15 19:05:33,797][1651669] InferenceWorker_p0-w0: stopping experience collection (33250 times) [2024-06-15 19:05:33,930][1651274] Signal inference workers to resume experience collection... (33250 times) [2024-06-15 19:05:33,931][1651669] InferenceWorker_p0-w0: resuming experience collection (33250 times) [2024-06-15 19:05:35,513][1651669] Updated weights for policy 0, policy_version 634770 (0.0129) [2024-06-15 19:05:35,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 48059.7, 300 sec: 48320.2). Total num frames: 1300004864. Throughput: 0: 11787.4. Samples: 325057536. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:35,767][1648981] Avg episode reward: [(0, '556.220')] [2024-06-15 19:05:36,395][1651669] Updated weights for policy 0, policy_version 634815 (0.0022) [2024-06-15 19:05:40,770][1648981] Fps is (10 sec: 49133.9, 60 sec: 44234.1, 300 sec: 47429.8). Total num frames: 1300135936. Throughput: 0: 11979.9. Samples: 325099520. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:40,771][1648981] Avg episode reward: [(0, '580.280')] [2024-06-15 19:05:41,635][1651669] Updated weights for policy 0, policy_version 634869 (0.0012) [2024-06-15 19:05:43,422][1651669] Updated weights for policy 0, policy_version 634917 (0.0014) [2024-06-15 19:05:45,063][1651669] Updated weights for policy 0, policy_version 634992 (0.0093) [2024-06-15 19:05:45,768][1648981] Fps is (10 sec: 52422.5, 60 sec: 50256.5, 300 sec: 48318.7). Total num frames: 1300529152. Throughput: 0: 11957.8. Samples: 325175808. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:45,768][1648981] Avg episode reward: [(0, '582.130')] [2024-06-15 19:05:46,507][1651669] Updated weights for policy 0, policy_version 635069 (0.0020) [2024-06-15 19:05:50,766][1648981] Fps is (10 sec: 49170.2, 60 sec: 45329.0, 300 sec: 47766.2). Total num frames: 1300627456. Throughput: 0: 12174.2. Samples: 325250048. Policy #0 lag: (min: 95.0, avg: 194.0, max: 335.0) [2024-06-15 19:05:50,767][1648981] Avg episode reward: [(0, '585.070')] [2024-06-15 19:05:52,077][1651669] Updated weights for policy 0, policy_version 635106 (0.0011) [2024-06-15 19:05:53,666][1651669] Updated weights for policy 0, policy_version 635154 (0.0016) [2024-06-15 19:05:54,951][1651669] Updated weights for policy 0, policy_version 635216 (0.0012) [2024-06-15 19:05:55,766][1648981] Fps is (10 sec: 45880.2, 60 sec: 49698.2, 300 sec: 48318.9). Total num frames: 1300987904. Throughput: 0: 12094.6. Samples: 325288960. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:05:55,767][1648981] Avg episode reward: [(0, '587.130')] [2024-06-15 19:05:56,054][1651669] Updated weights for policy 0, policy_version 635266 (0.0012) [2024-06-15 19:05:57,178][1651669] Updated weights for policy 0, policy_version 635326 (0.0012) [2024-06-15 19:06:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.9, 300 sec: 47874.6). Total num frames: 1301151744. Throughput: 0: 12415.3. Samples: 325359104. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:00,767][1648981] Avg episode reward: [(0, '591.530')] [2024-06-15 19:06:03,561][1651669] Updated weights for policy 0, policy_version 635376 (0.0011) [2024-06-15 19:06:04,627][1651669] Updated weights for policy 0, policy_version 635409 (0.0011) [2024-06-15 19:06:05,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1301413888. Throughput: 0: 12253.9. Samples: 325433344. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:05,767][1648981] Avg episode reward: [(0, '591.300')] [2024-06-15 19:06:05,794][1651669] Updated weights for policy 0, policy_version 635472 (0.0030) [2024-06-15 19:06:07,250][1651669] Updated weights for policy 0, policy_version 635541 (0.0013) [2024-06-15 19:06:10,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1301676032. Throughput: 0: 12299.4. Samples: 325471232. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:10,767][1648981] Avg episode reward: [(0, '564.760')] [2024-06-15 19:06:13,116][1651669] Updated weights for policy 0, policy_version 635585 (0.0012) [2024-06-15 19:06:13,426][1651274] Signal inference workers to stop experience collection... (33300 times) [2024-06-15 19:06:13,514][1651669] InferenceWorker_p0-w0: stopping experience collection (33300 times) [2024-06-15 19:06:13,683][1651274] Signal inference workers to resume experience collection... (33300 times) [2024-06-15 19:06:13,684][1651669] InferenceWorker_p0-w0: resuming experience collection (33300 times) [2024-06-15 19:06:14,541][1651669] Updated weights for policy 0, policy_version 635650 (0.0046) [2024-06-15 19:06:15,723][1651669] Updated weights for policy 0, policy_version 635698 (0.0012) [2024-06-15 19:06:15,770][1648981] Fps is (10 sec: 49133.2, 60 sec: 47516.8, 300 sec: 47874.0). Total num frames: 1301905408. Throughput: 0: 12332.5. Samples: 325550080. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:15,771][1648981] Avg episode reward: [(0, '550.430')] [2024-06-15 19:06:17,522][1651669] Updated weights for policy 0, policy_version 635776 (0.0081) [2024-06-15 19:06:18,547][1651669] Updated weights for policy 0, policy_version 635835 (0.0011) [2024-06-15 19:06:20,797][1648981] Fps is (10 sec: 52271.6, 60 sec: 50765.0, 300 sec: 47980.8). Total num frames: 1302200320. Throughput: 0: 12541.3. Samples: 325622272. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:20,799][1648981] Avg episode reward: [(0, '564.820')] [2024-06-15 19:06:25,368][1651669] Updated weights for policy 0, policy_version 635889 (0.0011) [2024-06-15 19:06:25,766][1648981] Fps is (10 sec: 42614.5, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1302331392. Throughput: 0: 12516.6. Samples: 325662720. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:25,767][1648981] Avg episode reward: [(0, '601.820')] [2024-06-15 19:06:26,859][1651669] Updated weights for policy 0, policy_version 635952 (0.0010) [2024-06-15 19:06:28,430][1651669] Updated weights for policy 0, policy_version 636018 (0.0012) [2024-06-15 19:06:29,700][1651669] Updated weights for policy 0, policy_version 636089 (0.0017) [2024-06-15 19:06:30,766][1648981] Fps is (10 sec: 52587.1, 60 sec: 51336.6, 300 sec: 47985.7). Total num frames: 1302724608. Throughput: 0: 12242.8. Samples: 325726720. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:30,767][1648981] Avg episode reward: [(0, '598.160')] [2024-06-15 19:06:35,770][1648981] Fps is (10 sec: 49133.7, 60 sec: 46964.5, 300 sec: 47544.6). Total num frames: 1302822912. Throughput: 0: 12400.7. Samples: 325808128. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:35,771][1648981] Avg episode reward: [(0, '596.060')] [2024-06-15 19:06:35,827][1651669] Updated weights for policy 0, policy_version 636150 (0.0058) [2024-06-15 19:06:37,073][1651669] Updated weights for policy 0, policy_version 636195 (0.0016) [2024-06-15 19:06:38,657][1651669] Updated weights for policy 0, policy_version 636256 (0.0180) [2024-06-15 19:06:40,172][1651669] Updated weights for policy 0, policy_version 636323 (0.0011) [2024-06-15 19:06:40,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 51885.8, 300 sec: 47997.4). Total num frames: 1303248896. Throughput: 0: 12128.7. Samples: 325834752. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:40,767][1648981] Avg episode reward: [(0, '596.230')] [2024-06-15 19:06:45,766][1648981] Fps is (10 sec: 42614.4, 60 sec: 45330.0, 300 sec: 47546.5). Total num frames: 1303248896. Throughput: 0: 12356.3. Samples: 325915136. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:45,767][1648981] Avg episode reward: [(0, '639.680')] [2024-06-15 19:06:46,357][1651274] Saving new best policy, reward=639.680! [2024-06-15 19:06:47,529][1651669] Updated weights for policy 0, policy_version 636406 (0.0014) [2024-06-15 19:06:48,592][1651274] Signal inference workers to stop experience collection... (33350 times) [2024-06-15 19:06:48,662][1651669] InferenceWorker_p0-w0: stopping experience collection (33350 times) [2024-06-15 19:06:48,843][1651274] Signal inference workers to resume experience collection... (33350 times) [2024-06-15 19:06:48,844][1651669] InferenceWorker_p0-w0: resuming experience collection (33350 times) [2024-06-15 19:06:48,845][1651669] Updated weights for policy 0, policy_version 636464 (0.0011) [2024-06-15 19:06:50,766][1648981] Fps is (10 sec: 36045.0, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 1303609344. Throughput: 0: 11923.9. Samples: 325969920. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:50,767][1648981] Avg episode reward: [(0, '639.340')] [2024-06-15 19:06:50,950][1651669] Updated weights for policy 0, policy_version 636546 (0.0013) [2024-06-15 19:06:51,864][1651669] Updated weights for policy 0, policy_version 636604 (0.0012) [2024-06-15 19:06:55,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 46421.3, 300 sec: 47763.5). Total num frames: 1303773184. Throughput: 0: 12117.3. Samples: 326016512. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:06:55,767][1648981] Avg episode reward: [(0, '598.390')] [2024-06-15 19:06:55,793][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000636608_1303773184.pth... [2024-06-15 19:06:55,847][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000630976_1292238848.pth [2024-06-15 19:06:57,899][1651669] Updated weights for policy 0, policy_version 636658 (0.0012) [2024-06-15 19:06:59,041][1651669] Updated weights for policy 0, policy_version 636704 (0.0110) [2024-06-15 19:07:00,563][1651669] Updated weights for policy 0, policy_version 636768 (0.0013) [2024-06-15 19:07:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 47764.8). Total num frames: 1304100864. Throughput: 0: 11902.2. Samples: 326085632. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:07:00,767][1648981] Avg episode reward: [(0, '609.260')] [2024-06-15 19:07:02,224][1651669] Updated weights for policy 0, policy_version 636848 (0.0012) [2024-06-15 19:07:05,770][1648981] Fps is (10 sec: 52410.1, 60 sec: 48056.7, 300 sec: 47985.1). Total num frames: 1304297472. Throughput: 0: 12238.3. Samples: 326172672. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:07:05,771][1648981] Avg episode reward: [(0, '593.800')] [2024-06-15 19:07:07,650][1651669] Updated weights for policy 0, policy_version 636896 (0.0014) [2024-06-15 19:07:09,327][1651669] Updated weights for policy 0, policy_version 636947 (0.0020) [2024-06-15 19:07:10,774][1648981] Fps is (10 sec: 45839.5, 60 sec: 48053.5, 300 sec: 47762.3). Total num frames: 1304559616. Throughput: 0: 12115.2. Samples: 326208000. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:07:10,775][1648981] Avg episode reward: [(0, '589.670')] [2024-06-15 19:07:11,449][1651669] Updated weights for policy 0, policy_version 637026 (0.0120) [2024-06-15 19:07:12,841][1651669] Updated weights for policy 0, policy_version 637090 (0.0145) [2024-06-15 19:07:15,766][1648981] Fps is (10 sec: 52448.4, 60 sec: 48609.0, 300 sec: 47985.7). Total num frames: 1304821760. Throughput: 0: 12037.7. Samples: 326268416. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:07:15,767][1648981] Avg episode reward: [(0, '568.580')] [2024-06-15 19:07:18,149][1651669] Updated weights for policy 0, policy_version 637138 (0.0011) [2024-06-15 19:07:19,541][1651669] Updated weights for policy 0, policy_version 637187 (0.0014) [2024-06-15 19:07:20,796][1648981] Fps is (10 sec: 49043.7, 60 sec: 47513.7, 300 sec: 47869.8). Total num frames: 1305051136. Throughput: 0: 12064.8. Samples: 326351360. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:07:20,797][1648981] Avg episode reward: [(0, '541.710')] [2024-06-15 19:07:20,983][1651669] Updated weights for policy 0, policy_version 637248 (0.0035) [2024-06-15 19:07:22,208][1651274] Signal inference workers to stop experience collection... (33400 times) [2024-06-15 19:07:22,265][1651669] InferenceWorker_p0-w0: stopping experience collection (33400 times) [2024-06-15 19:07:22,497][1651274] Signal inference workers to resume experience collection... (33400 times) [2024-06-15 19:07:22,497][1651669] InferenceWorker_p0-w0: resuming experience collection (33400 times) [2024-06-15 19:07:22,898][1651669] Updated weights for policy 0, policy_version 637328 (0.0222) [2024-06-15 19:07:25,802][1648981] Fps is (10 sec: 52240.5, 60 sec: 50214.1, 300 sec: 47979.8). Total num frames: 1305346048. Throughput: 0: 12005.3. Samples: 326375424. Policy #0 lag: (min: 8.0, avg: 85.2, max: 264.0) [2024-06-15 19:07:25,803][1648981] Avg episode reward: [(0, '535.950')] [2024-06-15 19:07:28,499][1651669] Updated weights for policy 0, policy_version 637378 (0.0013) [2024-06-15 19:07:30,768][1648981] Fps is (10 sec: 46004.0, 60 sec: 46419.8, 300 sec: 47985.4). Total num frames: 1305509888. Throughput: 0: 12230.6. Samples: 326465536. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:07:30,769][1648981] Avg episode reward: [(0, '551.980')] [2024-06-15 19:07:30,868][1651669] Updated weights for policy 0, policy_version 637459 (0.0015) [2024-06-15 19:07:32,506][1651669] Updated weights for policy 0, policy_version 637522 (0.0013) [2024-06-15 19:07:33,694][1651669] Updated weights for policy 0, policy_version 637584 (0.0013) [2024-06-15 19:07:34,746][1651669] Updated weights for policy 0, policy_version 637632 (0.0012) [2024-06-15 19:07:35,766][1648981] Fps is (10 sec: 52618.6, 60 sec: 50793.6, 300 sec: 48208.1). Total num frames: 1305870336. Throughput: 0: 12253.9. Samples: 326521344. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:07:35,767][1648981] Avg episode reward: [(0, '567.520')] [2024-06-15 19:07:40,770][1648981] Fps is (10 sec: 45866.4, 60 sec: 45326.2, 300 sec: 47985.8). Total num frames: 1305968640. Throughput: 0: 12252.9. Samples: 326567936. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:07:40,771][1648981] Avg episode reward: [(0, '537.110')] [2024-06-15 19:07:41,273][1651669] Updated weights for policy 0, policy_version 637698 (0.0120) [2024-06-15 19:07:43,002][1651669] Updated weights for policy 0, policy_version 637761 (0.0013) [2024-06-15 19:07:44,591][1651669] Updated weights for policy 0, policy_version 637840 (0.0112) [2024-06-15 19:07:45,782][1648981] Fps is (10 sec: 52345.6, 60 sec: 52415.0, 300 sec: 48427.4). Total num frames: 1306394624. Throughput: 0: 12033.5. Samples: 326627328. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:07:45,783][1648981] Avg episode reward: [(0, '547.540')] [2024-06-15 19:07:50,660][1651669] Updated weights for policy 0, policy_version 637891 (0.0060) [2024-06-15 19:07:50,766][1648981] Fps is (10 sec: 42614.2, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1306394624. Throughput: 0: 12004.5. Samples: 326712832. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:07:50,767][1648981] Avg episode reward: [(0, '567.920')] [2024-06-15 19:07:52,193][1651669] Updated weights for policy 0, policy_version 637956 (0.0024) [2024-06-15 19:07:54,510][1651669] Updated weights for policy 0, policy_version 638037 (0.0213) [2024-06-15 19:07:55,797][1648981] Fps is (10 sec: 42537.0, 60 sec: 50764.9, 300 sec: 48425.0). Total num frames: 1306820608. Throughput: 0: 11804.2. Samples: 326739456. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:07:55,797][1648981] Avg episode reward: [(0, '544.730')] [2024-06-15 19:07:55,870][1651669] Updated weights for policy 0, policy_version 638097 (0.0011) [2024-06-15 19:08:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.4, 300 sec: 47987.2). Total num frames: 1306918912. Throughput: 0: 11958.0. Samples: 326806528. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:00,767][1648981] Avg episode reward: [(0, '549.720')] [2024-06-15 19:08:02,193][1651669] Updated weights for policy 0, policy_version 638147 (0.0014) [2024-06-15 19:08:02,462][1651274] Signal inference workers to stop experience collection... (33450 times) [2024-06-15 19:08:02,511][1651669] InferenceWorker_p0-w0: stopping experience collection (33450 times) [2024-06-15 19:08:02,774][1651274] Signal inference workers to resume experience collection... (33450 times) [2024-06-15 19:08:02,777][1651669] InferenceWorker_p0-w0: resuming experience collection (33450 times) [2024-06-15 19:08:03,356][1651669] Updated weights for policy 0, policy_version 638204 (0.0011) [2024-06-15 19:08:04,809][1651669] Updated weights for policy 0, policy_version 638257 (0.0011) [2024-06-15 19:08:05,766][1648981] Fps is (10 sec: 39440.9, 60 sec: 48608.9, 300 sec: 48096.8). Total num frames: 1307213824. Throughput: 0: 11704.1. Samples: 326877696. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:05,767][1648981] Avg episode reward: [(0, '553.390')] [2024-06-15 19:08:06,443][1651669] Updated weights for policy 0, policy_version 638323 (0.0014) [2024-06-15 19:08:07,946][1651669] Updated weights for policy 0, policy_version 638390 (0.0012) [2024-06-15 19:08:10,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48065.9, 300 sec: 47985.7). Total num frames: 1307443200. Throughput: 0: 11831.0. Samples: 326907392. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:10,767][1648981] Avg episode reward: [(0, '549.600')] [2024-06-15 19:08:13,618][1651669] Updated weights for policy 0, policy_version 638434 (0.0012) [2024-06-15 19:08:14,664][1651669] Updated weights for policy 0, policy_version 638480 (0.0012) [2024-06-15 19:08:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1307705344. Throughput: 0: 11787.9. Samples: 326995968. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:15,767][1648981] Avg episode reward: [(0, '545.340')] [2024-06-15 19:08:16,224][1651669] Updated weights for policy 0, policy_version 638549 (0.0016) [2024-06-15 19:08:18,363][1651669] Updated weights for policy 0, policy_version 638651 (0.0011) [2024-06-15 19:08:20,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48630.1, 300 sec: 47985.7). Total num frames: 1307967488. Throughput: 0: 12037.7. Samples: 327063040. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:20,767][1648981] Avg episode reward: [(0, '555.900')] [2024-06-15 19:08:23,915][1651669] Updated weights for policy 0, policy_version 638706 (0.0039) [2024-06-15 19:08:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46449.2, 300 sec: 47763.6). Total num frames: 1308131328. Throughput: 0: 11845.3. Samples: 327100928. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:25,767][1648981] Avg episode reward: [(0, '582.720')] [2024-06-15 19:08:26,792][1651669] Updated weights for policy 0, policy_version 638784 (0.0023) [2024-06-15 19:08:28,526][1651669] Updated weights for policy 0, policy_version 638851 (0.0012) [2024-06-15 19:08:29,698][1651669] Updated weights for policy 0, policy_version 638904 (0.0011) [2024-06-15 19:08:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49699.6, 300 sec: 47985.7). Total num frames: 1308491776. Throughput: 0: 11837.0. Samples: 327159808. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:30,767][1648981] Avg episode reward: [(0, '586.630')] [2024-06-15 19:08:34,129][1651669] Updated weights for policy 0, policy_version 638944 (0.0013) [2024-06-15 19:08:35,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 45875.1, 300 sec: 47985.7). Total num frames: 1308622848. Throughput: 0: 11753.2. Samples: 327241728. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:35,767][1648981] Avg episode reward: [(0, '604.190')] [2024-06-15 19:08:37,504][1651669] Updated weights for policy 0, policy_version 638997 (0.0012) [2024-06-15 19:08:39,007][1651274] Signal inference workers to stop experience collection... (33500 times) [2024-06-15 19:08:39,057][1651669] InferenceWorker_p0-w0: stopping experience collection (33500 times) [2024-06-15 19:08:39,330][1651274] Signal inference workers to resume experience collection... (33500 times) [2024-06-15 19:08:39,331][1651669] InferenceWorker_p0-w0: resuming experience collection (33500 times) [2024-06-15 19:08:39,499][1651669] Updated weights for policy 0, policy_version 639073 (0.0013) [2024-06-15 19:08:40,676][1651669] Updated weights for policy 0, policy_version 639123 (0.0011) [2024-06-15 19:08:40,770][1648981] Fps is (10 sec: 42582.2, 60 sec: 49151.9, 300 sec: 47651.8). Total num frames: 1308917760. Throughput: 0: 11862.6. Samples: 327272960. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:40,771][1648981] Avg episode reward: [(0, '615.130')] [2024-06-15 19:08:41,470][1651669] Updated weights for policy 0, policy_version 639165 (0.0010) [2024-06-15 19:08:45,522][1651669] Updated weights for policy 0, policy_version 639220 (0.0013) [2024-06-15 19:08:45,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45887.3, 300 sec: 48096.8). Total num frames: 1309147136. Throughput: 0: 12049.1. Samples: 327348736. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:45,767][1648981] Avg episode reward: [(0, '593.250')] [2024-06-15 19:08:49,014][1651669] Updated weights for policy 0, policy_version 639266 (0.0016) [2024-06-15 19:08:50,172][1651669] Updated weights for policy 0, policy_version 639312 (0.0012) [2024-06-15 19:08:50,766][1648981] Fps is (10 sec: 42615.0, 60 sec: 49152.1, 300 sec: 47652.5). Total num frames: 1309343744. Throughput: 0: 11969.4. Samples: 327416320. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:50,767][1648981] Avg episode reward: [(0, '587.550')] [2024-06-15 19:08:51,853][1651669] Updated weights for policy 0, policy_version 639376 (0.0014) [2024-06-15 19:08:55,555][1651669] Updated weights for policy 0, policy_version 639428 (0.0026) [2024-06-15 19:08:55,767][1648981] Fps is (10 sec: 42597.0, 60 sec: 45898.1, 300 sec: 47875.3). Total num frames: 1309573120. Throughput: 0: 11901.1. Samples: 327442944. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:08:55,767][1648981] Avg episode reward: [(0, '590.660')] [2024-06-15 19:08:56,095][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000639456_1309605888.pth... [2024-06-15 19:08:56,271][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000633792_1298006016.pth [2024-06-15 19:08:57,106][1651669] Updated weights for policy 0, policy_version 639488 (0.0012) [2024-06-15 19:09:00,768][1648981] Fps is (10 sec: 45865.8, 60 sec: 48058.2, 300 sec: 47653.0). Total num frames: 1309802496. Throughput: 0: 11616.2. Samples: 327518720. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:09:00,769][1648981] Avg episode reward: [(0, '593.460')] [2024-06-15 19:09:01,849][1651669] Updated weights for policy 0, policy_version 639569 (0.0011) [2024-06-15 19:09:03,220][1651669] Updated weights for policy 0, policy_version 639619 (0.0013) [2024-06-15 19:09:04,521][1651669] Updated weights for policy 0, policy_version 639673 (0.0011) [2024-06-15 19:09:05,788][1648981] Fps is (10 sec: 49045.4, 60 sec: 47496.1, 300 sec: 47982.1). Total num frames: 1310064640. Throughput: 0: 11645.1. Samples: 327587328. Policy #0 lag: (min: 14.0, avg: 76.2, max: 270.0) [2024-06-15 19:09:05,789][1648981] Avg episode reward: [(0, '623.040')] [2024-06-15 19:09:07,997][1651669] Updated weights for policy 0, policy_version 639728 (0.0013) [2024-06-15 19:09:10,496][1651669] Updated weights for policy 0, policy_version 639776 (0.0016) [2024-06-15 19:09:10,768][1648981] Fps is (10 sec: 45875.4, 60 sec: 46966.0, 300 sec: 47541.1). Total num frames: 1310261248. Throughput: 0: 11627.6. Samples: 327624192. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:10,769][1648981] Avg episode reward: [(0, '584.820')] [2024-06-15 19:09:12,827][1651669] Updated weights for policy 0, policy_version 639811 (0.0025) [2024-06-15 19:09:14,446][1651669] Updated weights for policy 0, policy_version 639874 (0.0013) [2024-06-15 19:09:15,766][1648981] Fps is (10 sec: 49260.9, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1310556160. Throughput: 0: 11855.7. Samples: 327693312. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:15,767][1648981] Avg episode reward: [(0, '591.590')] [2024-06-15 19:09:18,151][1651669] Updated weights for policy 0, policy_version 639939 (0.0013) [2024-06-15 19:09:19,411][1651669] Updated weights for policy 0, policy_version 639996 (0.0012) [2024-06-15 19:09:20,771][1648981] Fps is (10 sec: 45864.9, 60 sec: 45872.0, 300 sec: 47651.8). Total num frames: 1310720000. Throughput: 0: 11695.3. Samples: 327768064. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:20,771][1648981] Avg episode reward: [(0, '593.240')] [2024-06-15 19:09:20,887][1651274] Signal inference workers to stop experience collection... (33550 times) [2024-06-15 19:09:20,939][1651669] InferenceWorker_p0-w0: stopping experience collection (33550 times) [2024-06-15 19:09:21,091][1651274] Signal inference workers to resume experience collection... (33550 times) [2024-06-15 19:09:21,093][1651669] InferenceWorker_p0-w0: resuming experience collection (33550 times) [2024-06-15 19:09:21,292][1651669] Updated weights for policy 0, policy_version 640039 (0.0012) [2024-06-15 19:09:23,655][1651669] Updated weights for policy 0, policy_version 640080 (0.0012) [2024-06-15 19:09:25,511][1651669] Updated weights for policy 0, policy_version 640145 (0.0012) [2024-06-15 19:09:25,777][1648981] Fps is (10 sec: 45824.6, 60 sec: 48050.9, 300 sec: 47650.7). Total num frames: 1311014912. Throughput: 0: 11796.9. Samples: 327803904. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:25,778][1648981] Avg episode reward: [(0, '602.880')] [2024-06-15 19:09:26,516][1651669] Updated weights for policy 0, policy_version 640188 (0.0012) [2024-06-15 19:09:30,184][1651669] Updated weights for policy 0, policy_version 640249 (0.0016) [2024-06-15 19:09:30,766][1648981] Fps is (10 sec: 52450.8, 60 sec: 45875.2, 300 sec: 47874.6). Total num frames: 1311244288. Throughput: 0: 11776.0. Samples: 327878656. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:30,767][1648981] Avg episode reward: [(0, '594.930')] [2024-06-15 19:09:32,513][1651669] Updated weights for policy 0, policy_version 640315 (0.0011) [2024-06-15 19:09:35,634][1651669] Updated weights for policy 0, policy_version 640384 (0.0011) [2024-06-15 19:09:35,766][1648981] Fps is (10 sec: 49206.3, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1311506432. Throughput: 0: 11776.0. Samples: 327946240. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:35,767][1648981] Avg episode reward: [(0, '570.940')] [2024-06-15 19:09:37,032][1651669] Updated weights for policy 0, policy_version 640439 (0.0010) [2024-06-15 19:09:40,324][1651669] Updated weights for policy 0, policy_version 640483 (0.0012) [2024-06-15 19:09:40,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 47516.5, 300 sec: 48321.5). Total num frames: 1311768576. Throughput: 0: 12049.1. Samples: 327985152. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:40,767][1648981] Avg episode reward: [(0, '578.970')] [2024-06-15 19:09:43,145][1651669] Updated weights for policy 0, policy_version 640560 (0.0013) [2024-06-15 19:09:45,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46967.5, 300 sec: 47652.4). Total num frames: 1311965184. Throughput: 0: 12026.8. Samples: 328059904. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:45,767][1648981] Avg episode reward: [(0, '588.230')] [2024-06-15 19:09:46,068][1651669] Updated weights for policy 0, policy_version 640624 (0.0012) [2024-06-15 19:09:47,766][1651669] Updated weights for policy 0, policy_version 640696 (0.0013) [2024-06-15 19:09:50,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1312161792. Throughput: 0: 12020.8. Samples: 328128000. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:50,767][1648981] Avg episode reward: [(0, '577.590')] [2024-06-15 19:09:51,664][1651669] Updated weights for policy 0, policy_version 640764 (0.0013) [2024-06-15 19:09:54,334][1651669] Updated weights for policy 0, policy_version 640823 (0.0021) [2024-06-15 19:09:55,774][1648981] Fps is (10 sec: 45839.5, 60 sec: 47507.7, 300 sec: 47984.4). Total num frames: 1312423936. Throughput: 0: 12058.9. Samples: 328166912. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:09:55,775][1648981] Avg episode reward: [(0, '569.800')] [2024-06-15 19:09:56,761][1651669] Updated weights for policy 0, policy_version 640864 (0.0016) [2024-06-15 19:09:58,366][1651669] Updated weights for policy 0, policy_version 640931 (0.0010) [2024-06-15 19:09:59,179][1651669] Updated weights for policy 0, policy_version 640960 (0.0011) [2024-06-15 19:10:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48061.3, 300 sec: 47985.7). Total num frames: 1312686080. Throughput: 0: 11980.8. Samples: 328232448. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:00,767][1648981] Avg episode reward: [(0, '582.100')] [2024-06-15 19:10:02,152][1651274] Signal inference workers to stop experience collection... (33600 times) [2024-06-15 19:10:02,287][1651669] InferenceWorker_p0-w0: stopping experience collection (33600 times) [2024-06-15 19:10:02,422][1651274] Signal inference workers to resume experience collection... (33600 times) [2024-06-15 19:10:02,424][1651669] InferenceWorker_p0-w0: resuming experience collection (33600 times) [2024-06-15 19:10:02,598][1651669] Updated weights for policy 0, policy_version 641017 (0.0135) [2024-06-15 19:10:04,702][1651669] Updated weights for policy 0, policy_version 641062 (0.0011) [2024-06-15 19:10:05,766][1648981] Fps is (10 sec: 52469.4, 60 sec: 48077.4, 300 sec: 48318.9). Total num frames: 1312948224. Throughput: 0: 12095.7. Samples: 328312320. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:05,767][1648981] Avg episode reward: [(0, '596.630')] [2024-06-15 19:10:07,620][1651669] Updated weights for policy 0, policy_version 641105 (0.0013) [2024-06-15 19:10:09,798][1651669] Updated weights for policy 0, policy_version 641187 (0.0012) [2024-06-15 19:10:10,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 49153.5, 300 sec: 47986.9). Total num frames: 1313210368. Throughput: 0: 12029.2. Samples: 328345088. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:10,767][1648981] Avg episode reward: [(0, '589.790')] [2024-06-15 19:10:13,183][1651669] Updated weights for policy 0, policy_version 641248 (0.0012) [2024-06-15 19:10:15,767][1648981] Fps is (10 sec: 39320.8, 60 sec: 46421.1, 300 sec: 48096.7). Total num frames: 1313341440. Throughput: 0: 11832.8. Samples: 328411136. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:15,767][1648981] Avg episode reward: [(0, '568.260')] [2024-06-15 19:10:16,116][1651669] Updated weights for policy 0, policy_version 641298 (0.0013) [2024-06-15 19:10:17,986][1651669] Updated weights for policy 0, policy_version 641345 (0.0016) [2024-06-15 19:10:19,983][1651669] Updated weights for policy 0, policy_version 641424 (0.0025) [2024-06-15 19:10:20,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 49155.5, 300 sec: 47763.6). Total num frames: 1313669120. Throughput: 0: 11798.8. Samples: 328477184. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:20,767][1648981] Avg episode reward: [(0, '549.920')] [2024-06-15 19:10:24,887][1651669] Updated weights for policy 0, policy_version 641489 (0.0113) [2024-06-15 19:10:25,766][1648981] Fps is (10 sec: 49153.3, 60 sec: 46976.1, 300 sec: 48096.8). Total num frames: 1313832960. Throughput: 0: 11844.3. Samples: 328518144. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:25,767][1648981] Avg episode reward: [(0, '551.430')] [2024-06-15 19:10:25,983][1651669] Updated weights for policy 0, policy_version 641535 (0.0091) [2024-06-15 19:10:27,952][1651669] Updated weights for policy 0, policy_version 641573 (0.0013) [2024-06-15 19:10:29,372][1651669] Updated weights for policy 0, policy_version 641616 (0.0011) [2024-06-15 19:10:30,770][1648981] Fps is (10 sec: 45856.1, 60 sec: 48056.5, 300 sec: 47873.9). Total num frames: 1314127872. Throughput: 0: 11831.8. Samples: 328592384. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:30,771][1648981] Avg episode reward: [(0, '549.140')] [2024-06-15 19:10:31,214][1651669] Updated weights for policy 0, policy_version 641696 (0.0129) [2024-06-15 19:10:32,022][1651669] Updated weights for policy 0, policy_version 641725 (0.0013) [2024-06-15 19:10:35,770][1648981] Fps is (10 sec: 49133.2, 60 sec: 46964.5, 300 sec: 48096.7). Total num frames: 1314324480. Throughput: 0: 11888.8. Samples: 328663040. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:35,771][1648981] Avg episode reward: [(0, '540.410')] [2024-06-15 19:10:36,215][1651669] Updated weights for policy 0, policy_version 641782 (0.0168) [2024-06-15 19:10:38,721][1651669] Updated weights for policy 0, policy_version 641829 (0.0013) [2024-06-15 19:10:40,101][1651669] Updated weights for policy 0, policy_version 641876 (0.0014) [2024-06-15 19:10:40,766][1648981] Fps is (10 sec: 49172.0, 60 sec: 47513.7, 300 sec: 47763.7). Total num frames: 1314619392. Throughput: 0: 11846.3. Samples: 328699904. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:40,767][1648981] Avg episode reward: [(0, '528.970')] [2024-06-15 19:10:42,101][1651669] Updated weights for policy 0, policy_version 641955 (0.0011) [2024-06-15 19:10:45,539][1651274] Signal inference workers to stop experience collection... (33650 times) [2024-06-15 19:10:45,619][1651669] InferenceWorker_p0-w0: stopping experience collection (33650 times) [2024-06-15 19:10:45,766][1648981] Fps is (10 sec: 45892.6, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1314783232. Throughput: 0: 11878.4. Samples: 328766976. Policy #0 lag: (min: 14.0, avg: 129.3, max: 270.0) [2024-06-15 19:10:45,767][1648981] Avg episode reward: [(0, '526.870')] [2024-06-15 19:10:45,911][1651274] Signal inference workers to resume experience collection... (33650 times) [2024-06-15 19:10:45,912][1651669] InferenceWorker_p0-w0: resuming experience collection (33650 times) [2024-06-15 19:10:46,095][1651669] Updated weights for policy 0, policy_version 642004 (0.0012) [2024-06-15 19:10:49,751][1651669] Updated weights for policy 0, policy_version 642080 (0.0013) [2024-06-15 19:10:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1315045376. Throughput: 0: 11787.4. Samples: 328842752. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:10:50,767][1648981] Avg episode reward: [(0, '522.790')] [2024-06-15 19:10:51,931][1651669] Updated weights for policy 0, policy_version 642147 (0.0133) [2024-06-15 19:10:53,807][1651669] Updated weights for policy 0, policy_version 642224 (0.0093) [2024-06-15 19:10:55,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 48065.7, 300 sec: 47985.6). Total num frames: 1315307520. Throughput: 0: 11616.7. Samples: 328867840. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:10:55,767][1648981] Avg episode reward: [(0, '515.560')] [2024-06-15 19:10:55,781][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000642240_1315307520.pth... [2024-06-15 19:10:55,935][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000636608_1303773184.pth [2024-06-15 19:10:55,943][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000642240_1315307520.pth [2024-06-15 19:10:57,604][1651669] Updated weights for policy 0, policy_version 642261 (0.0092) [2024-06-15 19:10:58,500][1651669] Updated weights for policy 0, policy_version 642302 (0.0011) [2024-06-15 19:11:00,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1315471360. Throughput: 0: 11878.4. Samples: 328945664. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:00,767][1648981] Avg episode reward: [(0, '496.160')] [2024-06-15 19:11:01,999][1651669] Updated weights for policy 0, policy_version 642370 (0.0027) [2024-06-15 19:11:04,534][1651669] Updated weights for policy 0, policy_version 642464 (0.0012) [2024-06-15 19:11:05,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.5, 300 sec: 47985.6). Total num frames: 1315831808. Throughput: 0: 11764.5. Samples: 329006592. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:05,768][1648981] Avg episode reward: [(0, '519.840')] [2024-06-15 19:11:08,342][1651669] Updated weights for policy 0, policy_version 642512 (0.0012) [2024-06-15 19:11:09,616][1651669] Updated weights for policy 0, policy_version 642560 (0.0012) [2024-06-15 19:11:10,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45875.3, 300 sec: 47653.1). Total num frames: 1315962880. Throughput: 0: 11889.8. Samples: 329053184. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:10,767][1648981] Avg episode reward: [(0, '517.830')] [2024-06-15 19:11:12,590][1651669] Updated weights for policy 0, policy_version 642608 (0.0013) [2024-06-15 19:11:14,494][1651669] Updated weights for policy 0, policy_version 642678 (0.0012) [2024-06-15 19:11:15,773][1648981] Fps is (10 sec: 45848.3, 60 sec: 49147.1, 300 sec: 47767.4). Total num frames: 1316290560. Throughput: 0: 11786.8. Samples: 329122816. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:15,773][1648981] Avg episode reward: [(0, '514.810')] [2024-06-15 19:11:16,099][1651669] Updated weights for policy 0, policy_version 642747 (0.0014) [2024-06-15 19:11:20,294][1651669] Updated weights for policy 0, policy_version 642805 (0.0010) [2024-06-15 19:11:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1316487168. Throughput: 0: 11833.9. Samples: 329195520. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:20,767][1648981] Avg episode reward: [(0, '508.880')] [2024-06-15 19:11:22,681][1651669] Updated weights for policy 0, policy_version 642852 (0.0013) [2024-06-15 19:11:24,284][1651669] Updated weights for policy 0, policy_version 642882 (0.0011) [2024-06-15 19:11:25,712][1651669] Updated weights for policy 0, policy_version 642951 (0.0012) [2024-06-15 19:11:25,767][1648981] Fps is (10 sec: 45903.0, 60 sec: 48605.7, 300 sec: 47541.3). Total num frames: 1316749312. Throughput: 0: 11878.4. Samples: 329234432. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:25,767][1648981] Avg episode reward: [(0, '508.070')] [2024-06-15 19:11:25,943][1651274] Signal inference workers to stop experience collection... (33700 times) [2024-06-15 19:11:25,990][1651669] InferenceWorker_p0-w0: stopping experience collection (33700 times) [2024-06-15 19:11:26,114][1651274] Signal inference workers to resume experience collection... (33700 times) [2024-06-15 19:11:26,122][1651669] InferenceWorker_p0-w0: resuming experience collection (33700 times) [2024-06-15 19:11:26,751][1651669] Updated weights for policy 0, policy_version 642999 (0.0013) [2024-06-15 19:11:30,155][1651669] Updated weights for policy 0, policy_version 643042 (0.0011) [2024-06-15 19:11:30,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 47516.8, 300 sec: 47986.3). Total num frames: 1316978688. Throughput: 0: 12151.5. Samples: 329313792. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:30,767][1648981] Avg episode reward: [(0, '506.990')] [2024-06-15 19:11:32,849][1651669] Updated weights for policy 0, policy_version 643104 (0.0059) [2024-06-15 19:11:35,475][1651669] Updated weights for policy 0, policy_version 643157 (0.0013) [2024-06-15 19:11:35,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 48062.8, 300 sec: 47319.2). Total num frames: 1317208064. Throughput: 0: 12037.7. Samples: 329384448. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:35,767][1648981] Avg episode reward: [(0, '504.280')] [2024-06-15 19:11:37,101][1651669] Updated weights for policy 0, policy_version 643217 (0.0012) [2024-06-15 19:11:37,766][1651669] Updated weights for policy 0, policy_version 643259 (0.0017) [2024-06-15 19:11:40,770][1648981] Fps is (10 sec: 52407.7, 60 sec: 48056.5, 300 sec: 48318.2). Total num frames: 1317502976. Throughput: 0: 12207.3. Samples: 329417216. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:40,771][1648981] Avg episode reward: [(0, '500.380')] [2024-06-15 19:11:40,826][1651669] Updated weights for policy 0, policy_version 643322 (0.0094) [2024-06-15 19:11:44,504][1651669] Updated weights for policy 0, policy_version 643385 (0.0016) [2024-06-15 19:11:45,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 1317666816. Throughput: 0: 12106.0. Samples: 329490432. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:45,767][1648981] Avg episode reward: [(0, '512.760')] [2024-06-15 19:11:46,818][1651669] Updated weights for policy 0, policy_version 643428 (0.0011) [2024-06-15 19:11:48,825][1651669] Updated weights for policy 0, policy_version 643504 (0.0011) [2024-06-15 19:11:50,766][1648981] Fps is (10 sec: 42615.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1317928960. Throughput: 0: 12162.9. Samples: 329553920. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:50,767][1648981] Avg episode reward: [(0, '524.680')] [2024-06-15 19:11:52,358][1651669] Updated weights for policy 0, policy_version 643577 (0.0015) [2024-06-15 19:11:55,505][1651669] Updated weights for policy 0, policy_version 643635 (0.0013) [2024-06-15 19:11:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48060.0, 300 sec: 47763.5). Total num frames: 1318191104. Throughput: 0: 11946.7. Samples: 329590784. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:11:55,767][1648981] Avg episode reward: [(0, '526.090')] [2024-06-15 19:11:57,566][1651669] Updated weights for policy 0, policy_version 643668 (0.0012) [2024-06-15 19:11:59,112][1651669] Updated weights for policy 0, policy_version 643713 (0.0013) [2024-06-15 19:12:00,434][1651669] Updated weights for policy 0, policy_version 643771 (0.0018) [2024-06-15 19:12:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 47986.3). Total num frames: 1318453248. Throughput: 0: 12062.1. Samples: 329665536. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:12:00,767][1648981] Avg episode reward: [(0, '523.530')] [2024-06-15 19:12:03,159][1651669] Updated weights for policy 0, policy_version 643830 (0.0072) [2024-06-15 19:12:05,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.6, 300 sec: 47653.7). Total num frames: 1318617088. Throughput: 0: 12014.9. Samples: 329736192. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:12:05,767][1648981] Avg episode reward: [(0, '501.280')] [2024-06-15 19:12:06,231][1651669] Updated weights for policy 0, policy_version 643873 (0.0015) [2024-06-15 19:12:08,459][1651669] Updated weights for policy 0, policy_version 643923 (0.0026) [2024-06-15 19:12:10,612][1651669] Updated weights for policy 0, policy_version 643971 (0.0012) [2024-06-15 19:12:10,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 48059.8, 300 sec: 47541.4). Total num frames: 1318846464. Throughput: 0: 11924.0. Samples: 329771008. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:12:10,767][1648981] Avg episode reward: [(0, '497.420')] [2024-06-15 19:12:10,831][1651274] Signal inference workers to stop experience collection... (33750 times) [2024-06-15 19:12:10,950][1651669] InferenceWorker_p0-w0: stopping experience collection (33750 times) [2024-06-15 19:12:11,041][1651274] Signal inference workers to resume experience collection... (33750 times) [2024-06-15 19:12:11,041][1651669] InferenceWorker_p0-w0: resuming experience collection (33750 times) [2024-06-15 19:12:11,636][1651669] Updated weights for policy 0, policy_version 644028 (0.0043) [2024-06-15 19:12:13,497][1651669] Updated weights for policy 0, policy_version 644080 (0.0079) [2024-06-15 19:12:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46972.3, 300 sec: 47657.3). Total num frames: 1319108608. Throughput: 0: 11707.7. Samples: 329840640. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:12:15,767][1648981] Avg episode reward: [(0, '526.610')] [2024-06-15 19:12:16,425][1651669] Updated weights for policy 0, policy_version 644128 (0.0012) [2024-06-15 19:12:19,135][1651669] Updated weights for policy 0, policy_version 644178 (0.0011) [2024-06-15 19:12:20,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 48059.6, 300 sec: 47547.1). Total num frames: 1319370752. Throughput: 0: 11867.0. Samples: 329918464. Policy #0 lag: (min: 2.0, avg: 105.7, max: 258.0) [2024-06-15 19:12:20,767][1648981] Avg episode reward: [(0, '530.920')] [2024-06-15 19:12:21,417][1651669] Updated weights for policy 0, policy_version 644229 (0.0010) [2024-06-15 19:12:22,346][1651669] Updated weights for policy 0, policy_version 644279 (0.0013) [2024-06-15 19:12:23,893][1651669] Updated weights for policy 0, policy_version 644348 (0.0011) [2024-06-15 19:12:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.9, 300 sec: 47874.9). Total num frames: 1319632896. Throughput: 0: 11947.7. Samples: 329954816. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:25,767][1648981] Avg episode reward: [(0, '535.780')] [2024-06-15 19:12:27,187][1651669] Updated weights for policy 0, policy_version 644400 (0.0126) [2024-06-15 19:12:30,112][1651669] Updated weights for policy 0, policy_version 644448 (0.0012) [2024-06-15 19:12:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.6, 300 sec: 47430.3). Total num frames: 1319862272. Throughput: 0: 12162.8. Samples: 330037760. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:30,767][1648981] Avg episode reward: [(0, '537.050')] [2024-06-15 19:12:32,122][1651669] Updated weights for policy 0, policy_version 644498 (0.0013) [2024-06-15 19:12:33,716][1651669] Updated weights for policy 0, policy_version 644560 (0.0012) [2024-06-15 19:12:35,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49151.9, 300 sec: 48097.4). Total num frames: 1320157184. Throughput: 0: 12105.9. Samples: 330098688. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:35,767][1648981] Avg episode reward: [(0, '537.810')] [2024-06-15 19:12:37,733][1651669] Updated weights for policy 0, policy_version 644626 (0.0028) [2024-06-15 19:12:38,833][1651669] Updated weights for policy 0, policy_version 644671 (0.0010) [2024-06-15 19:12:40,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 46424.5, 300 sec: 47099.6). Total num frames: 1320288256. Throughput: 0: 12140.1. Samples: 330137088. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:40,767][1648981] Avg episode reward: [(0, '556.760')] [2024-06-15 19:12:41,858][1651669] Updated weights for policy 0, policy_version 644722 (0.0011) [2024-06-15 19:12:43,090][1651669] Updated weights for policy 0, policy_version 644768 (0.0113) [2024-06-15 19:12:44,479][1651669] Updated weights for policy 0, policy_version 644805 (0.0029) [2024-06-15 19:12:45,761][1651669] Updated weights for policy 0, policy_version 644861 (0.0013) [2024-06-15 19:12:45,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1320648704. Throughput: 0: 12094.6. Samples: 330209792. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:45,767][1648981] Avg episode reward: [(0, '558.040')] [2024-06-15 19:12:49,180][1651669] Updated weights for policy 0, policy_version 644923 (0.0014) [2024-06-15 19:12:50,786][1648981] Fps is (10 sec: 52324.6, 60 sec: 48043.8, 300 sec: 47432.0). Total num frames: 1320812544. Throughput: 0: 12248.5. Samples: 330287616. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:50,787][1648981] Avg episode reward: [(0, '547.620')] [2024-06-15 19:12:52,511][1651669] Updated weights for policy 0, policy_version 644976 (0.0134) [2024-06-15 19:12:53,553][1651669] Updated weights for policy 0, policy_version 645024 (0.0011) [2024-06-15 19:12:53,652][1651274] Signal inference workers to stop experience collection... (33800 times) [2024-06-15 19:12:53,752][1651669] InferenceWorker_p0-w0: stopping experience collection (33800 times) [2024-06-15 19:12:53,934][1651274] Signal inference workers to resume experience collection... (33800 times) [2024-06-15 19:12:53,935][1651669] InferenceWorker_p0-w0: resuming experience collection (33800 times) [2024-06-15 19:12:55,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 49698.0, 300 sec: 48318.9). Total num frames: 1321172992. Throughput: 0: 12344.8. Samples: 330326528. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:12:55,767][1648981] Avg episode reward: [(0, '564.580')] [2024-06-15 19:12:55,932][1651669] Updated weights for policy 0, policy_version 645112 (0.0012) [2024-06-15 19:12:56,020][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000645120_1321205760.pth... [2024-06-15 19:12:56,078][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000639456_1309605888.pth [2024-06-15 19:13:00,256][1651669] Updated weights for policy 0, policy_version 645172 (0.0012) [2024-06-15 19:13:00,766][1648981] Fps is (10 sec: 52532.9, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1321336832. Throughput: 0: 12299.4. Samples: 330394112. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:00,767][1648981] Avg episode reward: [(0, '564.290')] [2024-06-15 19:13:03,536][1651669] Updated weights for policy 0, policy_version 645201 (0.0011) [2024-06-15 19:13:04,746][1651669] Updated weights for policy 0, policy_version 645252 (0.0013) [2024-06-15 19:13:05,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1321566208. Throughput: 0: 12128.8. Samples: 330464256. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:05,767][1648981] Avg episode reward: [(0, '560.390')] [2024-06-15 19:13:06,407][1651669] Updated weights for policy 0, policy_version 645315 (0.0011) [2024-06-15 19:13:10,395][1651669] Updated weights for policy 0, policy_version 645378 (0.0014) [2024-06-15 19:13:10,768][1648981] Fps is (10 sec: 42592.4, 60 sec: 48604.7, 300 sec: 47652.2). Total num frames: 1321762816. Throughput: 0: 12037.3. Samples: 330496512. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:10,768][1648981] Avg episode reward: [(0, '558.230')] [2024-06-15 19:13:11,474][1651669] Updated weights for policy 0, policy_version 645440 (0.0012) [2024-06-15 19:13:15,274][1651669] Updated weights for policy 0, policy_version 645508 (0.0017) [2024-06-15 19:13:15,785][1648981] Fps is (10 sec: 45792.2, 60 sec: 48591.2, 300 sec: 47649.5). Total num frames: 1322024960. Throughput: 0: 11987.4. Samples: 330577408. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:15,785][1648981] Avg episode reward: [(0, '550.960')] [2024-06-15 19:13:17,560][1651669] Updated weights for policy 0, policy_version 645600 (0.0012) [2024-06-15 19:13:20,766][1648981] Fps is (10 sec: 49158.9, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 1322254336. Throughput: 0: 11889.8. Samples: 330633728. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:20,767][1648981] Avg episode reward: [(0, '536.260')] [2024-06-15 19:13:22,724][1651669] Updated weights for policy 0, policy_version 645664 (0.0013) [2024-06-15 19:13:25,730][1651669] Updated weights for policy 0, policy_version 645698 (0.0011) [2024-06-15 19:13:25,766][1648981] Fps is (10 sec: 36110.1, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1322385408. Throughput: 0: 11867.0. Samples: 330671104. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:25,767][1648981] Avg episode reward: [(0, '576.990')] [2024-06-15 19:13:27,625][1651669] Updated weights for policy 0, policy_version 645781 (0.0012) [2024-06-15 19:13:29,365][1651669] Updated weights for policy 0, policy_version 645841 (0.0013) [2024-06-15 19:13:30,137][1651669] Updated weights for policy 0, policy_version 645882 (0.0012) [2024-06-15 19:13:30,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1322778624. Throughput: 0: 11798.7. Samples: 330740736. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:30,767][1648981] Avg episode reward: [(0, '578.730')] [2024-06-15 19:13:33,167][1651669] Updated weights for policy 0, policy_version 645936 (0.0014) [2024-06-15 19:13:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47430.9). Total num frames: 1322909696. Throughput: 0: 11951.9. Samples: 330825216. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:35,767][1648981] Avg episode reward: [(0, '593.430')] [2024-06-15 19:13:36,339][1651274] Signal inference workers to stop experience collection... (33850 times) [2024-06-15 19:13:36,387][1651669] InferenceWorker_p0-w0: stopping experience collection (33850 times) [2024-06-15 19:13:36,595][1651274] Signal inference workers to resume experience collection... (33850 times) [2024-06-15 19:13:36,596][1651669] InferenceWorker_p0-w0: resuming experience collection (33850 times) [2024-06-15 19:13:36,597][1651669] Updated weights for policy 0, policy_version 645984 (0.0011) [2024-06-15 19:13:38,542][1651669] Updated weights for policy 0, policy_version 646068 (0.0014) [2024-06-15 19:13:40,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 1323270144. Throughput: 0: 11673.6. Samples: 330851840. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:40,767][1648981] Avg episode reward: [(0, '618.440')] [2024-06-15 19:13:40,976][1651669] Updated weights for policy 0, policy_version 646141 (0.0136) [2024-06-15 19:13:43,868][1651669] Updated weights for policy 0, policy_version 646181 (0.0013) [2024-06-15 19:13:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46421.3, 300 sec: 47763.5). Total num frames: 1323433984. Throughput: 0: 11810.1. Samples: 330925568. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:45,767][1648981] Avg episode reward: [(0, '635.620')] [2024-06-15 19:13:47,018][1651669] Updated weights for policy 0, policy_version 646209 (0.0010) [2024-06-15 19:13:48,907][1651669] Updated weights for policy 0, policy_version 646290 (0.0012) [2024-06-15 19:13:49,942][1651669] Updated weights for policy 0, policy_version 646336 (0.0013) [2024-06-15 19:13:50,774][1648981] Fps is (10 sec: 49113.3, 60 sec: 49161.8, 300 sec: 48095.5). Total num frames: 1323761664. Throughput: 0: 11899.1. Samples: 330999808. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:50,775][1648981] Avg episode reward: [(0, '607.940')] [2024-06-15 19:13:51,268][1651669] Updated weights for policy 0, policy_version 646390 (0.0016) [2024-06-15 19:13:54,256][1651669] Updated weights for policy 0, policy_version 646448 (0.0014) [2024-06-15 19:13:55,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46421.4, 300 sec: 47986.0). Total num frames: 1323958272. Throughput: 0: 12003.9. Samples: 331036672. Policy #0 lag: (min: 47.0, avg: 155.8, max: 303.0) [2024-06-15 19:13:55,767][1648981] Avg episode reward: [(0, '628.220')] [2024-06-15 19:13:59,327][1651669] Updated weights for policy 0, policy_version 646513 (0.0060) [2024-06-15 19:14:00,766][1648981] Fps is (10 sec: 39352.7, 60 sec: 46967.5, 300 sec: 47767.1). Total num frames: 1324154880. Throughput: 0: 11906.0. Samples: 331112960. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:00,767][1648981] Avg episode reward: [(0, '663.260')] [2024-06-15 19:14:01,324][1651274] Saving new best policy, reward=663.260! [2024-06-15 19:14:02,218][1651669] Updated weights for policy 0, policy_version 646624 (0.0011) [2024-06-15 19:14:05,043][1651669] Updated weights for policy 0, policy_version 646676 (0.0013) [2024-06-15 19:14:05,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.8, 300 sec: 48097.1). Total num frames: 1324449792. Throughput: 0: 11855.7. Samples: 331167232. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:05,767][1648981] Avg episode reward: [(0, '679.450')] [2024-06-15 19:14:05,861][1651274] Saving new best policy, reward=679.450! [2024-06-15 19:14:10,245][1651669] Updated weights for policy 0, policy_version 646721 (0.0014) [2024-06-15 19:14:10,774][1648981] Fps is (10 sec: 36016.7, 60 sec: 45870.3, 300 sec: 47318.0). Total num frames: 1324515328. Throughput: 0: 11910.5. Samples: 331207168. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:10,775][1648981] Avg episode reward: [(0, '664.190')] [2024-06-15 19:14:12,151][1651669] Updated weights for policy 0, policy_version 646800 (0.0011) [2024-06-15 19:14:13,708][1651669] Updated weights for policy 0, policy_version 646864 (0.0011) [2024-06-15 19:14:14,280][1651274] Signal inference workers to stop experience collection... (33900 times) [2024-06-15 19:14:14,319][1651669] InferenceWorker_p0-w0: stopping experience collection (33900 times) [2024-06-15 19:14:14,541][1651274] Signal inference workers to resume experience collection... (33900 times) [2024-06-15 19:14:14,541][1651669] InferenceWorker_p0-w0: resuming experience collection (33900 times) [2024-06-15 19:14:15,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47528.0, 300 sec: 47986.4). Total num frames: 1324875776. Throughput: 0: 11730.5. Samples: 331268608. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:15,767][1648981] Avg episode reward: [(0, '652.090')] [2024-06-15 19:14:16,650][1651669] Updated weights for policy 0, policy_version 646917 (0.0014) [2024-06-15 19:14:20,767][1648981] Fps is (10 sec: 49185.5, 60 sec: 45874.5, 300 sec: 47431.9). Total num frames: 1325006848. Throughput: 0: 11525.4. Samples: 331343872. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:20,768][1648981] Avg episode reward: [(0, '688.490')] [2024-06-15 19:14:20,769][1651274] Saving new best policy, reward=688.490! [2024-06-15 19:14:22,088][1651669] Updated weights for policy 0, policy_version 646993 (0.0037) [2024-06-15 19:14:23,575][1651669] Updated weights for policy 0, policy_version 647056 (0.0013) [2024-06-15 19:14:24,855][1651669] Updated weights for policy 0, policy_version 647105 (0.0011) [2024-06-15 19:14:25,767][1648981] Fps is (10 sec: 45873.1, 60 sec: 49151.7, 300 sec: 47763.5). Total num frames: 1325334528. Throughput: 0: 11662.1. Samples: 331376640. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:25,768][1648981] Avg episode reward: [(0, '732.620')] [2024-06-15 19:14:26,088][1651669] Updated weights for policy 0, policy_version 647163 (0.0011) [2024-06-15 19:14:26,144][1651274] Saving new best policy, reward=732.620! [2024-06-15 19:14:28,252][1651669] Updated weights for policy 0, policy_version 647200 (0.0011) [2024-06-15 19:14:30,766][1648981] Fps is (10 sec: 52434.1, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1325531136. Throughput: 0: 11594.0. Samples: 331447296. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:30,767][1648981] Avg episode reward: [(0, '736.330')] [2024-06-15 19:14:30,768][1651274] Saving new best policy, reward=736.330! [2024-06-15 19:14:33,669][1651669] Updated weights for policy 0, policy_version 647264 (0.0016) [2024-06-15 19:14:35,078][1651669] Updated weights for policy 0, policy_version 647328 (0.0013) [2024-06-15 19:14:35,766][1648981] Fps is (10 sec: 42600.4, 60 sec: 47513.7, 300 sec: 47430.3). Total num frames: 1325760512. Throughput: 0: 11550.5. Samples: 331519488. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:35,767][1648981] Avg episode reward: [(0, '727.280')] [2024-06-15 19:14:36,471][1651669] Updated weights for policy 0, policy_version 647377 (0.0014) [2024-06-15 19:14:39,368][1651669] Updated weights for policy 0, policy_version 647442 (0.0013) [2024-06-15 19:14:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 1326055424. Throughput: 0: 11468.8. Samples: 331552768. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:40,767][1648981] Avg episode reward: [(0, '727.720')] [2024-06-15 19:14:44,136][1651669] Updated weights for policy 0, policy_version 647506 (0.0012) [2024-06-15 19:14:45,578][1651669] Updated weights for policy 0, policy_version 647570 (0.0012) [2024-06-15 19:14:45,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1326252032. Throughput: 0: 11548.4. Samples: 331632640. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:45,767][1648981] Avg episode reward: [(0, '768.200')] [2024-06-15 19:14:45,991][1651274] Saving new best policy, reward=768.200! [2024-06-15 19:14:46,806][1651669] Updated weights for policy 0, policy_version 647632 (0.0011) [2024-06-15 19:14:48,086][1651669] Updated weights for policy 0, policy_version 647679 (0.0012) [2024-06-15 19:14:50,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45881.2, 300 sec: 47764.8). Total num frames: 1326514176. Throughput: 0: 11764.6. Samples: 331696640. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:50,767][1648981] Avg episode reward: [(0, '766.710')] [2024-06-15 19:14:54,915][1651669] Updated weights for policy 0, policy_version 647750 (0.0012) [2024-06-15 19:14:55,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 44783.0, 300 sec: 47319.2). Total num frames: 1326645248. Throughput: 0: 11721.1. Samples: 331734528. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:14:55,767][1648981] Avg episode reward: [(0, '795.700')] [2024-06-15 19:14:56,277][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000647808_1326710784.pth... [2024-06-15 19:14:56,375][1651274] Signal inference workers to stop experience collection... (33950 times) [2024-06-15 19:14:56,408][1651669] InferenceWorker_p0-w0: stopping experience collection (33950 times) [2024-06-15 19:14:56,403][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000642240_1315307520.pth [2024-06-15 19:14:56,452][1651274] Saving new best policy, reward=795.700! [2024-06-15 19:14:56,786][1651274] Signal inference workers to resume experience collection... (33950 times) [2024-06-15 19:14:56,787][1651669] InferenceWorker_p0-w0: resuming experience collection (33950 times) [2024-06-15 19:14:56,976][1651669] Updated weights for policy 0, policy_version 647826 (0.0012) [2024-06-15 19:14:58,459][1651669] Updated weights for policy 0, policy_version 647888 (0.0011) [2024-06-15 19:14:59,668][1651669] Updated weights for policy 0, policy_version 647936 (0.0012) [2024-06-15 19:15:00,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1326972928. Throughput: 0: 11707.7. Samples: 331795456. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:00,767][1648981] Avg episode reward: [(0, '788.310')] [2024-06-15 19:15:02,967][1651669] Updated weights for policy 0, policy_version 647994 (0.0015) [2024-06-15 19:15:05,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 44236.7, 300 sec: 47097.1). Total num frames: 1327104000. Throughput: 0: 12015.2. Samples: 331884544. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:05,767][1648981] Avg episode reward: [(0, '774.660')] [2024-06-15 19:15:07,531][1651669] Updated weights for policy 0, policy_version 648086 (0.0032) [2024-06-15 19:15:09,152][1651669] Updated weights for policy 0, policy_version 648160 (0.0010) [2024-06-15 19:15:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49704.5, 300 sec: 47985.7). Total num frames: 1327497216. Throughput: 0: 11798.8. Samples: 331907584. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:10,767][1648981] Avg episode reward: [(0, '787.950')] [2024-06-15 19:15:13,439][1651669] Updated weights for policy 0, policy_version 648208 (0.0012) [2024-06-15 19:15:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1327628288. Throughput: 0: 11935.3. Samples: 331984384. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:15,767][1648981] Avg episode reward: [(0, '797.950')] [2024-06-15 19:15:15,768][1651274] Saving new best policy, reward=797.950! [2024-06-15 19:15:16,341][1651669] Updated weights for policy 0, policy_version 648257 (0.0012) [2024-06-15 19:15:18,075][1651669] Updated weights for policy 0, policy_version 648336 (0.0112) [2024-06-15 19:15:19,567][1651669] Updated weights for policy 0, policy_version 648387 (0.0011) [2024-06-15 19:15:20,767][1648981] Fps is (10 sec: 49151.9, 60 sec: 49698.9, 300 sec: 47985.7). Total num frames: 1327988736. Throughput: 0: 11810.1. Samples: 332050944. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:20,768][1648981] Avg episode reward: [(0, '799.520')] [2024-06-15 19:15:20,910][1651274] Saving new best policy, reward=799.520! [2024-06-15 19:15:20,945][1651669] Updated weights for policy 0, policy_version 648448 (0.0015) [2024-06-15 19:15:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 46967.7, 300 sec: 47542.0). Total num frames: 1328152576. Throughput: 0: 12174.2. Samples: 332100608. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:25,767][1648981] Avg episode reward: [(0, '819.760')] [2024-06-15 19:15:25,772][1651274] Saving new best policy, reward=819.760! [2024-06-15 19:15:26,962][1651669] Updated weights for policy 0, policy_version 648513 (0.0013) [2024-06-15 19:15:28,734][1651669] Updated weights for policy 0, policy_version 648608 (0.0012) [2024-06-15 19:15:30,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49152.0, 300 sec: 47986.3). Total num frames: 1328480256. Throughput: 0: 11923.9. Samples: 332169216. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:30,767][1648981] Avg episode reward: [(0, '791.280')] [2024-06-15 19:15:30,806][1651669] Updated weights for policy 0, policy_version 648673 (0.0014) [2024-06-15 19:15:34,312][1651274] Signal inference workers to stop experience collection... (34000 times) [2024-06-15 19:15:34,357][1651669] InferenceWorker_p0-w0: stopping experience collection (34000 times) [2024-06-15 19:15:34,506][1651274] Signal inference workers to resume experience collection... (34000 times) [2024-06-15 19:15:34,507][1651669] InferenceWorker_p0-w0: resuming experience collection (34000 times) [2024-06-15 19:15:34,509][1651669] Updated weights for policy 0, policy_version 648736 (0.0011) [2024-06-15 19:15:35,778][1648981] Fps is (10 sec: 52367.3, 60 sec: 48596.2, 300 sec: 47650.5). Total num frames: 1328676864. Throughput: 0: 12171.0. Samples: 332244480. Policy #0 lag: (min: 47.0, avg: 109.0, max: 287.0) [2024-06-15 19:15:35,779][1648981] Avg episode reward: [(0, '788.310')] [2024-06-15 19:15:38,200][1651669] Updated weights for policy 0, policy_version 648800 (0.0082) [2024-06-15 19:15:40,079][1651669] Updated weights for policy 0, policy_version 648889 (0.0122) [2024-06-15 19:15:40,792][1648981] Fps is (10 sec: 45759.7, 60 sec: 48039.5, 300 sec: 47981.6). Total num frames: 1328939008. Throughput: 0: 12099.2. Samples: 332279296. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:15:40,792][1648981] Avg episode reward: [(0, '786.480')] [2024-06-15 19:15:41,567][1651669] Updated weights for policy 0, policy_version 648932 (0.0015) [2024-06-15 19:15:45,121][1651669] Updated weights for policy 0, policy_version 648967 (0.0012) [2024-06-15 19:15:45,770][1648981] Fps is (10 sec: 45910.9, 60 sec: 48056.5, 300 sec: 47762.9). Total num frames: 1329135616. Throughput: 0: 12264.1. Samples: 332347392. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:15:45,771][1648981] Avg episode reward: [(0, '778.020')] [2024-06-15 19:15:46,464][1651669] Updated weights for policy 0, policy_version 649024 (0.0014) [2024-06-15 19:15:49,680][1651669] Updated weights for policy 0, policy_version 649074 (0.0010) [2024-06-15 19:15:50,766][1648981] Fps is (10 sec: 45991.5, 60 sec: 48059.8, 300 sec: 47763.6). Total num frames: 1329397760. Throughput: 0: 12015.0. Samples: 332425216. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:15:50,767][1648981] Avg episode reward: [(0, '745.640')] [2024-06-15 19:15:51,019][1651669] Updated weights for policy 0, policy_version 649136 (0.0112) [2024-06-15 19:15:52,828][1651669] Updated weights for policy 0, policy_version 649205 (0.0014) [2024-06-15 19:15:55,757][1651669] Updated weights for policy 0, policy_version 649248 (0.0037) [2024-06-15 19:15:55,766][1648981] Fps is (10 sec: 52450.0, 60 sec: 50244.2, 300 sec: 48096.8). Total num frames: 1329659904. Throughput: 0: 12117.3. Samples: 332452864. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:15:55,767][1648981] Avg episode reward: [(0, '715.730')] [2024-06-15 19:15:59,914][1651669] Updated weights for policy 0, policy_version 649299 (0.0056) [2024-06-15 19:16:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 47513.6, 300 sec: 47430.4). Total num frames: 1329823744. Throughput: 0: 12310.8. Samples: 332538368. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:00,767][1648981] Avg episode reward: [(0, '696.760')] [2024-06-15 19:16:01,464][1651669] Updated weights for policy 0, policy_version 649376 (0.0011) [2024-06-15 19:16:03,987][1651669] Updated weights for policy 0, policy_version 649441 (0.0031) [2024-06-15 19:16:05,778][1648981] Fps is (10 sec: 45821.1, 60 sec: 50234.4, 300 sec: 47983.8). Total num frames: 1330118656. Throughput: 0: 12216.5. Samples: 332600832. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:05,779][1648981] Avg episode reward: [(0, '680.270')] [2024-06-15 19:16:06,151][1651669] Updated weights for policy 0, policy_version 649493 (0.0014) [2024-06-15 19:16:10,220][1651669] Updated weights for policy 0, policy_version 649538 (0.0011) [2024-06-15 19:16:10,766][1648981] Fps is (10 sec: 49151.2, 60 sec: 46967.4, 300 sec: 47542.4). Total num frames: 1330315264. Throughput: 0: 11958.0. Samples: 332638720. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:10,767][1648981] Avg episode reward: [(0, '676.250')] [2024-06-15 19:16:11,393][1651669] Updated weights for policy 0, policy_version 649592 (0.0013) [2024-06-15 19:16:12,286][1651669] Updated weights for policy 0, policy_version 649617 (0.0010) [2024-06-15 19:16:13,941][1651669] Updated weights for policy 0, policy_version 649680 (0.0121) [2024-06-15 19:16:14,394][1651274] Signal inference workers to stop experience collection... (34050 times) [2024-06-15 19:16:14,446][1651669] InferenceWorker_p0-w0: stopping experience collection (34050 times) [2024-06-15 19:16:14,588][1651274] Signal inference workers to resume experience collection... (34050 times) [2024-06-15 19:16:14,589][1651669] InferenceWorker_p0-w0: resuming experience collection (34050 times) [2024-06-15 19:16:14,894][1651669] Updated weights for policy 0, policy_version 649728 (0.0012) [2024-06-15 19:16:15,766][1648981] Fps is (10 sec: 52490.7, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1330642944. Throughput: 0: 11992.2. Samples: 332708864. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:15,767][1648981] Avg episode reward: [(0, '650.120')] [2024-06-15 19:16:17,508][1651669] Updated weights for policy 0, policy_version 649785 (0.0012) [2024-06-15 19:16:20,781][1648981] Fps is (10 sec: 45810.8, 60 sec: 46410.4, 300 sec: 47539.1). Total num frames: 1330774016. Throughput: 0: 12116.7. Samples: 332789760. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:20,781][1648981] Avg episode reward: [(0, '668.020')] [2024-06-15 19:16:21,649][1651669] Updated weights for policy 0, policy_version 649829 (0.0012) [2024-06-15 19:16:23,729][1651669] Updated weights for policy 0, policy_version 649873 (0.0012) [2024-06-15 19:16:25,653][1651669] Updated weights for policy 0, policy_version 649955 (0.0091) [2024-06-15 19:16:25,774][1648981] Fps is (10 sec: 45839.5, 60 sec: 49145.7, 300 sec: 47873.3). Total num frames: 1331101696. Throughput: 0: 12133.4. Samples: 332825088. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:25,775][1648981] Avg episode reward: [(0, '694.840')] [2024-06-15 19:16:26,196][1651669] Updated weights for policy 0, policy_version 649984 (0.0012) [2024-06-15 19:16:29,004][1651669] Updated weights for policy 0, policy_version 650039 (0.0030) [2024-06-15 19:16:30,766][1648981] Fps is (10 sec: 52503.2, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1331298304. Throughput: 0: 12016.0. Samples: 332888064. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:30,767][1648981] Avg episode reward: [(0, '689.900')] [2024-06-15 19:16:33,427][1651669] Updated weights for policy 0, policy_version 650096 (0.0011) [2024-06-15 19:16:35,689][1651669] Updated weights for policy 0, policy_version 650162 (0.0012) [2024-06-15 19:16:35,767][1648981] Fps is (10 sec: 42627.1, 60 sec: 47522.1, 300 sec: 47541.9). Total num frames: 1331527680. Throughput: 0: 11844.0. Samples: 332958208. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:35,768][1648981] Avg episode reward: [(0, '672.090')] [2024-06-15 19:16:37,021][1651669] Updated weights for policy 0, policy_version 650240 (0.0013) [2024-06-15 19:16:40,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 47533.4, 300 sec: 47874.6). Total num frames: 1331789824. Throughput: 0: 12003.5. Samples: 332993024. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:40,767][1648981] Avg episode reward: [(0, '672.540')] [2024-06-15 19:16:40,828][1651669] Updated weights for policy 0, policy_version 650296 (0.0010) [2024-06-15 19:16:45,766][1648981] Fps is (10 sec: 39325.7, 60 sec: 46424.4, 300 sec: 47430.3). Total num frames: 1331920896. Throughput: 0: 11821.5. Samples: 333070336. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:45,767][1648981] Avg episode reward: [(0, '689.080')] [2024-06-15 19:16:46,063][1651669] Updated weights for policy 0, policy_version 650369 (0.0012) [2024-06-15 19:16:48,349][1651669] Updated weights for policy 0, policy_version 650469 (0.0011) [2024-06-15 19:16:50,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1332215808. Throughput: 0: 11756.3. Samples: 333129728. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:50,767][1648981] Avg episode reward: [(0, '668.630')] [2024-06-15 19:16:51,457][1651669] Updated weights for policy 0, policy_version 650512 (0.0013) [2024-06-15 19:16:55,718][1651669] Updated weights for policy 0, policy_version 650561 (0.0014) [2024-06-15 19:16:55,770][1648981] Fps is (10 sec: 42582.0, 60 sec: 44780.0, 300 sec: 47096.4). Total num frames: 1332346880. Throughput: 0: 11740.9. Samples: 333167104. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:16:55,771][1648981] Avg episode reward: [(0, '623.170')] [2024-06-15 19:16:56,275][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000650592_1332412416.pth... [2024-06-15 19:16:56,402][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000645120_1321205760.pth [2024-06-15 19:16:57,464][1651669] Updated weights for policy 0, policy_version 650640 (0.0014) [2024-06-15 19:16:57,910][1651274] Signal inference workers to stop experience collection... (34100 times) [2024-06-15 19:16:57,996][1651669] InferenceWorker_p0-w0: stopping experience collection (34100 times) [2024-06-15 19:16:58,093][1651274] Signal inference workers to resume experience collection... (34100 times) [2024-06-15 19:16:58,093][1651669] InferenceWorker_p0-w0: resuming experience collection (34100 times) [2024-06-15 19:16:58,599][1651669] Updated weights for policy 0, policy_version 650689 (0.0011) [2024-06-15 19:16:59,903][1651669] Updated weights for policy 0, policy_version 650747 (0.0012) [2024-06-15 19:17:00,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48605.8, 300 sec: 47874.6). Total num frames: 1332740096. Throughput: 0: 11730.5. Samples: 333236736. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:17:00,767][1648981] Avg episode reward: [(0, '634.770')] [2024-06-15 19:17:02,791][1651669] Updated weights for policy 0, policy_version 650803 (0.0045) [2024-06-15 19:17:05,767][1648981] Fps is (10 sec: 52447.8, 60 sec: 45884.0, 300 sec: 47541.3). Total num frames: 1332871168. Throughput: 0: 11677.2. Samples: 333315072. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:17:05,767][1648981] Avg episode reward: [(0, '624.350')] [2024-06-15 19:17:07,169][1651669] Updated weights for policy 0, policy_version 650838 (0.0012) [2024-06-15 19:17:08,608][1651669] Updated weights for policy 0, policy_version 650912 (0.0023) [2024-06-15 19:17:10,000][1651669] Updated weights for policy 0, policy_version 650964 (0.0015) [2024-06-15 19:17:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 47874.6). Total num frames: 1333231616. Throughput: 0: 11709.8. Samples: 333351936. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:17:10,767][1648981] Avg episode reward: [(0, '628.220')] [2024-06-15 19:17:12,827][1651669] Updated weights for policy 0, policy_version 651009 (0.0012) [2024-06-15 19:17:14,192][1651669] Updated weights for policy 0, policy_version 651065 (0.0018) [2024-06-15 19:17:15,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1333395456. Throughput: 0: 11776.0. Samples: 333417984. Policy #0 lag: (min: 15.0, avg: 98.6, max: 271.0) [2024-06-15 19:17:15,767][1648981] Avg episode reward: [(0, '607.020')] [2024-06-15 19:17:18,182][1651669] Updated weights for policy 0, policy_version 651120 (0.0122) [2024-06-15 19:17:19,463][1651669] Updated weights for policy 0, policy_version 651185 (0.0011) [2024-06-15 19:17:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49709.9, 300 sec: 47874.6). Total num frames: 1333755904. Throughput: 0: 11833.2. Samples: 333490688. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:20,767][1648981] Avg episode reward: [(0, '613.040')] [2024-06-15 19:17:20,912][1651669] Updated weights for policy 0, policy_version 651257 (0.0013) [2024-06-15 19:17:24,136][1651669] Updated weights for policy 0, policy_version 651312 (0.0013) [2024-06-15 19:17:25,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 46973.4, 300 sec: 47652.4). Total num frames: 1333919744. Throughput: 0: 12117.4. Samples: 333538304. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:25,767][1648981] Avg episode reward: [(0, '581.870')] [2024-06-15 19:17:28,153][1651669] Updated weights for policy 0, policy_version 651360 (0.0041) [2024-06-15 19:17:30,017][1651669] Updated weights for policy 0, policy_version 651431 (0.0063) [2024-06-15 19:17:30,779][1648981] Fps is (10 sec: 45816.7, 60 sec: 48595.5, 300 sec: 47650.4). Total num frames: 1334214656. Throughput: 0: 11920.5. Samples: 333606912. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:30,780][1648981] Avg episode reward: [(0, '587.160')] [2024-06-15 19:17:31,358][1651669] Updated weights for policy 0, policy_version 651504 (0.0025) [2024-06-15 19:17:35,594][1651669] Updated weights for policy 0, policy_version 651576 (0.0016) [2024-06-15 19:17:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48606.6, 300 sec: 47985.7). Total num frames: 1334444032. Throughput: 0: 12174.2. Samples: 333677568. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:35,767][1648981] Avg episode reward: [(0, '626.180')] [2024-06-15 19:17:38,454][1651274] Signal inference workers to stop experience collection... (34150 times) [2024-06-15 19:17:38,569][1651669] InferenceWorker_p0-w0: stopping experience collection (34150 times) [2024-06-15 19:17:38,769][1651274] Signal inference workers to resume experience collection... (34150 times) [2024-06-15 19:17:38,770][1651669] InferenceWorker_p0-w0: resuming experience collection (34150 times) [2024-06-15 19:17:39,245][1651669] Updated weights for policy 0, policy_version 651616 (0.0011) [2024-06-15 19:17:40,766][1648981] Fps is (10 sec: 39371.9, 60 sec: 46967.7, 300 sec: 47319.2). Total num frames: 1334607872. Throughput: 0: 12311.8. Samples: 333721088. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:40,767][1648981] Avg episode reward: [(0, '633.980')] [2024-06-15 19:17:41,260][1651669] Updated weights for policy 0, policy_version 651696 (0.0124) [2024-06-15 19:17:42,436][1651669] Updated weights for policy 0, policy_version 651746 (0.0011) [2024-06-15 19:17:45,745][1651669] Updated weights for policy 0, policy_version 651809 (0.0012) [2024-06-15 19:17:45,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 49698.2, 300 sec: 47766.8). Total num frames: 1334902784. Throughput: 0: 12128.7. Samples: 333782528. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:45,767][1648981] Avg episode reward: [(0, '632.340')] [2024-06-15 19:17:50,570][1651669] Updated weights for policy 0, policy_version 651859 (0.0012) [2024-06-15 19:17:50,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.4, 300 sec: 46986.0). Total num frames: 1335033856. Throughput: 0: 12106.0. Samples: 333859840. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:50,767][1648981] Avg episode reward: [(0, '640.650')] [2024-06-15 19:17:52,306][1651669] Updated weights for policy 0, policy_version 651940 (0.0015) [2024-06-15 19:17:53,616][1651669] Updated weights for policy 0, policy_version 652000 (0.0012) [2024-06-15 19:17:55,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 50247.3, 300 sec: 47541.3). Total num frames: 1335361536. Throughput: 0: 11867.0. Samples: 333885952. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:17:55,767][1648981] Avg episode reward: [(0, '648.540')] [2024-06-15 19:17:55,946][1651669] Updated weights for policy 0, policy_version 652049 (0.0077) [2024-06-15 19:17:56,845][1651669] Updated weights for policy 0, policy_version 652096 (0.0011) [2024-06-15 19:18:00,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 1335525376. Throughput: 0: 12435.9. Samples: 333977600. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:00,767][1648981] Avg episode reward: [(0, '680.170')] [2024-06-15 19:18:02,014][1651669] Updated weights for policy 0, policy_version 652176 (0.0012) [2024-06-15 19:18:04,162][1651669] Updated weights for policy 0, policy_version 652260 (0.0013) [2024-06-15 19:18:05,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 50244.5, 300 sec: 47874.8). Total num frames: 1335885824. Throughput: 0: 12151.5. Samples: 334037504. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:05,767][1648981] Avg episode reward: [(0, '682.210')] [2024-06-15 19:18:06,386][1651669] Updated weights for policy 0, policy_version 652320 (0.0013) [2024-06-15 19:18:10,746][1651669] Updated weights for policy 0, policy_version 652354 (0.0014) [2024-06-15 19:18:10,768][1648981] Fps is (10 sec: 49142.4, 60 sec: 46419.9, 300 sec: 47432.9). Total num frames: 1336016896. Throughput: 0: 11934.8. Samples: 334075392. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:10,769][1648981] Avg episode reward: [(0, '634.090')] [2024-06-15 19:18:12,301][1651669] Updated weights for policy 0, policy_version 652421 (0.0012) [2024-06-15 19:18:13,614][1651669] Updated weights for policy 0, policy_version 652483 (0.0012) [2024-06-15 19:18:13,620][1651274] Signal inference workers to stop experience collection... (34200 times) [2024-06-15 19:18:13,668][1651669] InferenceWorker_p0-w0: stopping experience collection (34200 times) [2024-06-15 19:18:13,882][1651274] Signal inference workers to resume experience collection... (34200 times) [2024-06-15 19:18:13,882][1651669] InferenceWorker_p0-w0: resuming experience collection (34200 times) [2024-06-15 19:18:14,845][1651669] Updated weights for policy 0, policy_version 652539 (0.0013) [2024-06-15 19:18:15,775][1648981] Fps is (10 sec: 52382.7, 60 sec: 50236.9, 300 sec: 47984.3). Total num frames: 1336410112. Throughput: 0: 12141.2. Samples: 334153216. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:15,776][1648981] Avg episode reward: [(0, '607.290')] [2024-06-15 19:18:17,204][1651669] Updated weights for policy 0, policy_version 652597 (0.0010) [2024-06-15 19:18:20,766][1648981] Fps is (10 sec: 52438.7, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1336541184. Throughput: 0: 12504.2. Samples: 334240256. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:20,767][1648981] Avg episode reward: [(0, '584.960')] [2024-06-15 19:18:21,578][1651669] Updated weights for policy 0, policy_version 652642 (0.0012) [2024-06-15 19:18:22,931][1651669] Updated weights for policy 0, policy_version 652704 (0.0099) [2024-06-15 19:18:25,061][1651669] Updated weights for policy 0, policy_version 652797 (0.0013) [2024-06-15 19:18:25,770][1648981] Fps is (10 sec: 52455.2, 60 sec: 50241.3, 300 sec: 47985.1). Total num frames: 1336934400. Throughput: 0: 12230.1. Samples: 334271488. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:25,771][1648981] Avg episode reward: [(0, '601.540')] [2024-06-15 19:18:27,977][1651669] Updated weights for policy 0, policy_version 652837 (0.0012) [2024-06-15 19:18:30,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47523.7, 300 sec: 47985.7). Total num frames: 1337065472. Throughput: 0: 12344.9. Samples: 334338048. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:30,767][1648981] Avg episode reward: [(0, '585.500')] [2024-06-15 19:18:32,285][1651669] Updated weights for policy 0, policy_version 652882 (0.0016) [2024-06-15 19:18:34,264][1651669] Updated weights for policy 0, policy_version 652960 (0.0016) [2024-06-15 19:18:35,502][1651669] Updated weights for policy 0, policy_version 653013 (0.0012) [2024-06-15 19:18:35,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 49152.1, 300 sec: 47874.6). Total num frames: 1337393152. Throughput: 0: 12128.7. Samples: 334405632. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:35,767][1648981] Avg episode reward: [(0, '604.400')] [2024-06-15 19:18:36,244][1651669] Updated weights for policy 0, policy_version 653056 (0.0050) [2024-06-15 19:18:38,963][1651669] Updated weights for policy 0, policy_version 653119 (0.0011) [2024-06-15 19:18:40,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1337589760. Throughput: 0: 12413.2. Samples: 334444544. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:40,767][1648981] Avg episode reward: [(0, '621.200')] [2024-06-15 19:18:44,108][1651669] Updated weights for policy 0, policy_version 653171 (0.0030) [2024-06-15 19:18:45,577][1651669] Updated weights for policy 0, policy_version 653243 (0.0017) [2024-06-15 19:18:45,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.0, 300 sec: 47764.8). Total num frames: 1337851904. Throughput: 0: 12231.1. Samples: 334528000. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:45,767][1648981] Avg episode reward: [(0, '607.710')] [2024-06-15 19:18:46,556][1651669] Updated weights for policy 0, policy_version 653286 (0.0047) [2024-06-15 19:18:49,284][1651669] Updated weights for policy 0, policy_version 653360 (0.0017) [2024-06-15 19:18:50,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 51336.5, 300 sec: 47985.7). Total num frames: 1338114048. Throughput: 0: 12344.9. Samples: 334593024. Policy #0 lag: (min: 15.0, avg: 92.9, max: 271.0) [2024-06-15 19:18:50,767][1648981] Avg episode reward: [(0, '611.010')] [2024-06-15 19:18:54,659][1651669] Updated weights for policy 0, policy_version 653408 (0.0017) [2024-06-15 19:18:54,792][1651274] Signal inference workers to stop experience collection... (34250 times) [2024-06-15 19:18:54,834][1651669] InferenceWorker_p0-w0: stopping experience collection (34250 times) [2024-06-15 19:18:54,990][1651274] Signal inference workers to resume experience collection... (34250 times) [2024-06-15 19:18:54,991][1651669] InferenceWorker_p0-w0: resuming experience collection (34250 times) [2024-06-15 19:18:55,767][1648981] Fps is (10 sec: 42596.6, 60 sec: 48605.8, 300 sec: 47874.5). Total num frames: 1338277888. Throughput: 0: 12504.6. Samples: 334638080. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:18:55,767][1648981] Avg episode reward: [(0, '642.860')] [2024-06-15 19:18:56,114][1651669] Updated weights for policy 0, policy_version 653475 (0.0126) [2024-06-15 19:18:56,284][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000653488_1338343424.pth... [2024-06-15 19:18:56,405][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000647808_1326710784.pth [2024-06-15 19:18:58,128][1651669] Updated weights for policy 0, policy_version 653565 (0.0012) [2024-06-15 19:19:00,075][1651669] Updated weights for policy 0, policy_version 653616 (0.0015) [2024-06-15 19:19:00,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 51882.7, 300 sec: 48096.8). Total num frames: 1338638336. Throughput: 0: 12210.8. Samples: 334702592. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:00,767][1648981] Avg episode reward: [(0, '634.260')] [2024-06-15 19:19:05,278][1651669] Updated weights for policy 0, policy_version 653665 (0.0048) [2024-06-15 19:19:05,766][1648981] Fps is (10 sec: 45876.4, 60 sec: 47513.5, 300 sec: 48209.1). Total num frames: 1338736640. Throughput: 0: 12003.5. Samples: 334780416. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:05,767][1648981] Avg episode reward: [(0, '633.860')] [2024-06-15 19:19:06,395][1651669] Updated weights for policy 0, policy_version 653728 (0.0011) [2024-06-15 19:19:07,832][1651669] Updated weights for policy 0, policy_version 653797 (0.0010) [2024-06-15 19:19:10,395][1651669] Updated weights for policy 0, policy_version 653827 (0.0011) [2024-06-15 19:19:10,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 50792.0, 300 sec: 48096.7). Total num frames: 1339064320. Throughput: 0: 12050.1. Samples: 334813696. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:10,767][1648981] Avg episode reward: [(0, '639.500')] [2024-06-15 19:19:11,545][1651669] Updated weights for policy 0, policy_version 653882 (0.0011) [2024-06-15 19:19:15,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 46974.4, 300 sec: 48208.0). Total num frames: 1339228160. Throughput: 0: 12299.4. Samples: 334891520. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:15,767][1648981] Avg episode reward: [(0, '683.120')] [2024-06-15 19:19:16,358][1651669] Updated weights for policy 0, policy_version 653942 (0.0016) [2024-06-15 19:19:17,898][1651669] Updated weights for policy 0, policy_version 654000 (0.0018) [2024-06-15 19:19:18,971][1651669] Updated weights for policy 0, policy_version 654052 (0.0012) [2024-06-15 19:19:20,768][1648981] Fps is (10 sec: 49145.4, 60 sec: 50243.2, 300 sec: 48207.7). Total num frames: 1339555840. Throughput: 0: 12253.5. Samples: 334957056. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:20,768][1648981] Avg episode reward: [(0, '683.240')] [2024-06-15 19:19:21,174][1651669] Updated weights for policy 0, policy_version 654087 (0.0014) [2024-06-15 19:19:22,378][1651669] Updated weights for policy 0, policy_version 654142 (0.0013) [2024-06-15 19:19:25,778][1648981] Fps is (10 sec: 45819.6, 60 sec: 45868.9, 300 sec: 47983.7). Total num frames: 1339686912. Throughput: 0: 12227.8. Samples: 334994944. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:25,779][1648981] Avg episode reward: [(0, '675.010')] [2024-06-15 19:19:27,421][1651669] Updated weights for policy 0, policy_version 654201 (0.0131) [2024-06-15 19:19:29,429][1651669] Updated weights for policy 0, policy_version 654283 (0.0015) [2024-06-15 19:19:30,397][1651669] Updated weights for policy 0, policy_version 654330 (0.0017) [2024-06-15 19:19:30,770][1648981] Fps is (10 sec: 52416.3, 60 sec: 50241.1, 300 sec: 48540.4). Total num frames: 1340080128. Throughput: 0: 11831.9. Samples: 335060480. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:30,770][1648981] Avg episode reward: [(0, '668.970')] [2024-06-15 19:19:32,343][1651274] Signal inference workers to stop experience collection... (34300 times) [2024-06-15 19:19:32,417][1651669] InferenceWorker_p0-w0: stopping experience collection (34300 times) [2024-06-15 19:19:32,564][1651274] Signal inference workers to resume experience collection... (34300 times) [2024-06-15 19:19:32,567][1651669] InferenceWorker_p0-w0: resuming experience collection (34300 times) [2024-06-15 19:19:32,982][1651669] Updated weights for policy 0, policy_version 654384 (0.0151) [2024-06-15 19:19:35,766][1648981] Fps is (10 sec: 52491.6, 60 sec: 46967.4, 300 sec: 47985.7). Total num frames: 1340211200. Throughput: 0: 12117.3. Samples: 335138304. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:35,767][1648981] Avg episode reward: [(0, '681.670')] [2024-06-15 19:19:37,316][1651669] Updated weights for policy 0, policy_version 654431 (0.0016) [2024-06-15 19:19:38,029][1651669] Updated weights for policy 0, policy_version 654460 (0.0012) [2024-06-15 19:19:40,027][1651669] Updated weights for policy 0, policy_version 654528 (0.0112) [2024-06-15 19:19:40,766][1648981] Fps is (10 sec: 45892.5, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1340538880. Throughput: 0: 11992.3. Samples: 335177728. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:40,767][1648981] Avg episode reward: [(0, '692.330')] [2024-06-15 19:19:41,264][1651669] Updated weights for policy 0, policy_version 654588 (0.0011) [2024-06-15 19:19:43,751][1651669] Updated weights for policy 0, policy_version 654627 (0.0010) [2024-06-15 19:19:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.6, 300 sec: 48207.8). Total num frames: 1340735488. Throughput: 0: 12185.6. Samples: 335250944. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:45,767][1648981] Avg episode reward: [(0, '682.810')] [2024-06-15 19:19:47,792][1651669] Updated weights for policy 0, policy_version 654688 (0.0091) [2024-06-15 19:19:49,348][1651669] Updated weights for policy 0, policy_version 654739 (0.0012) [2024-06-15 19:19:50,768][1648981] Fps is (10 sec: 49141.4, 60 sec: 48604.2, 300 sec: 48762.9). Total num frames: 1341030400. Throughput: 0: 12003.0. Samples: 335320576. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:50,769][1648981] Avg episode reward: [(0, '673.270')] [2024-06-15 19:19:50,816][1651669] Updated weights for policy 0, policy_version 654803 (0.0013) [2024-06-15 19:19:54,216][1651669] Updated weights for policy 0, policy_version 654850 (0.0011) [2024-06-15 19:19:55,425][1651669] Updated weights for policy 0, policy_version 654910 (0.0011) [2024-06-15 19:19:55,787][1648981] Fps is (10 sec: 52321.2, 60 sec: 49681.3, 300 sec: 48426.6). Total num frames: 1341259776. Throughput: 0: 12145.9. Samples: 335360512. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:19:55,792][1648981] Avg episode reward: [(0, '655.430')] [2024-06-15 19:19:58,903][1651669] Updated weights for policy 0, policy_version 654967 (0.0013) [2024-06-15 19:20:00,162][1651669] Updated weights for policy 0, policy_version 655012 (0.0011) [2024-06-15 19:20:00,766][1648981] Fps is (10 sec: 49162.6, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1341521920. Throughput: 0: 12071.8. Samples: 335434752. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:00,767][1648981] Avg episode reward: [(0, '654.540')] [2024-06-15 19:20:01,554][1651669] Updated weights for policy 0, policy_version 655075 (0.0014) [2024-06-15 19:20:02,160][1651669] Updated weights for policy 0, policy_version 655104 (0.0012) [2024-06-15 19:20:05,435][1651669] Updated weights for policy 0, policy_version 655139 (0.0011) [2024-06-15 19:20:05,766][1648981] Fps is (10 sec: 49253.9, 60 sec: 50244.4, 300 sec: 48318.9). Total num frames: 1341751296. Throughput: 0: 12220.1. Samples: 335506944. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:05,767][1648981] Avg episode reward: [(0, '644.880')] [2024-06-15 19:20:08,281][1651669] Updated weights for policy 0, policy_version 655184 (0.0017) [2024-06-15 19:20:10,774][1648981] Fps is (10 sec: 39289.8, 60 sec: 47507.2, 300 sec: 48428.7). Total num frames: 1341915136. Throughput: 0: 12198.1. Samples: 335543808. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:10,775][1648981] Avg episode reward: [(0, '663.550')] [2024-06-15 19:20:10,784][1651669] Updated weights for policy 0, policy_version 655248 (0.0013) [2024-06-15 19:20:11,518][1651669] Updated weights for policy 0, policy_version 655296 (0.0012) [2024-06-15 19:20:12,799][1651669] Updated weights for policy 0, policy_version 655350 (0.0010) [2024-06-15 19:20:15,175][1651274] Signal inference workers to stop experience collection... (34350 times) [2024-06-15 19:20:15,262][1651669] InferenceWorker_p0-w0: stopping experience collection (34350 times) [2024-06-15 19:20:15,556][1651274] Signal inference workers to resume experience collection... (34350 times) [2024-06-15 19:20:15,557][1651669] InferenceWorker_p0-w0: resuming experience collection (34350 times) [2024-06-15 19:20:15,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 49697.8, 300 sec: 48207.8). Total num frames: 1342210048. Throughput: 0: 12436.9. Samples: 335620096. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:15,768][1648981] Avg episode reward: [(0, '675.650')] [2024-06-15 19:20:16,555][1651669] Updated weights for policy 0, policy_version 655415 (0.0011) [2024-06-15 19:20:18,892][1651669] Updated weights for policy 0, policy_version 655458 (0.0013) [2024-06-15 19:20:20,766][1648981] Fps is (10 sec: 52471.1, 60 sec: 48060.8, 300 sec: 48430.0). Total num frames: 1342439424. Throughput: 0: 12413.2. Samples: 335696896. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:20,767][1648981] Avg episode reward: [(0, '663.980')] [2024-06-15 19:20:21,135][1651669] Updated weights for policy 0, policy_version 655509 (0.0012) [2024-06-15 19:20:22,331][1651669] Updated weights for policy 0, policy_version 655568 (0.0173) [2024-06-15 19:20:23,020][1651669] Updated weights for policy 0, policy_version 655615 (0.0034) [2024-06-15 19:20:25,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 50800.6, 300 sec: 48318.9). Total num frames: 1342734336. Throughput: 0: 12367.6. Samples: 335734272. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:25,767][1648981] Avg episode reward: [(0, '641.770')] [2024-06-15 19:20:26,839][1651669] Updated weights for policy 0, policy_version 655674 (0.0015) [2024-06-15 19:20:30,207][1651669] Updated weights for policy 0, policy_version 655736 (0.0015) [2024-06-15 19:20:30,773][1648981] Fps is (10 sec: 52395.6, 60 sec: 48057.7, 300 sec: 48430.9). Total num frames: 1342963712. Throughput: 0: 12309.0. Samples: 335804928. Policy #0 lag: (min: 31.0, avg: 92.5, max: 271.0) [2024-06-15 19:20:30,773][1648981] Avg episode reward: [(0, '635.800')] [2024-06-15 19:20:33,361][1651669] Updated weights for policy 0, policy_version 655808 (0.0099) [2024-06-15 19:20:34,462][1651669] Updated weights for policy 0, policy_version 655866 (0.0013) [2024-06-15 19:20:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.4, 300 sec: 48434.1). Total num frames: 1343225856. Throughput: 0: 12345.5. Samples: 335876096. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:20:35,767][1648981] Avg episode reward: [(0, '627.010')] [2024-06-15 19:20:37,651][1651669] Updated weights for policy 0, policy_version 655920 (0.0012) [2024-06-15 19:20:40,766][1648981] Fps is (10 sec: 42625.3, 60 sec: 47513.6, 300 sec: 48319.6). Total num frames: 1343389696. Throughput: 0: 12270.9. Samples: 335912448. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:20:40,767][1648981] Avg episode reward: [(0, '585.460')] [2024-06-15 19:20:40,855][1651669] Updated weights for policy 0, policy_version 655968 (0.0053) [2024-06-15 19:20:42,969][1651669] Updated weights for policy 0, policy_version 656017 (0.0012) [2024-06-15 19:20:44,559][1651669] Updated weights for policy 0, policy_version 656085 (0.0019) [2024-06-15 19:20:45,346][1651669] Updated weights for policy 0, policy_version 656128 (0.0012) [2024-06-15 19:20:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 48652.1). Total num frames: 1343750144. Throughput: 0: 12151.5. Samples: 335981568. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:20:45,767][1648981] Avg episode reward: [(0, '544.150')] [2024-06-15 19:20:48,611][1651669] Updated weights for policy 0, policy_version 656183 (0.0039) [2024-06-15 19:20:50,779][1648981] Fps is (10 sec: 49089.4, 60 sec: 47505.2, 300 sec: 48205.8). Total num frames: 1343881216. Throughput: 0: 12409.6. Samples: 336065536. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:20:50,780][1648981] Avg episode reward: [(0, '529.120')] [2024-06-15 19:20:52,161][1651669] Updated weights for policy 0, policy_version 656227 (0.0112) [2024-06-15 19:20:53,186][1651669] Updated weights for policy 0, policy_version 656280 (0.0026) [2024-06-15 19:20:53,975][1651274] Signal inference workers to stop experience collection... (34400 times) [2024-06-15 19:20:54,019][1651669] InferenceWorker_p0-w0: stopping experience collection (34400 times) [2024-06-15 19:20:54,123][1651274] Signal inference workers to resume experience collection... (34400 times) [2024-06-15 19:20:54,123][1651669] InferenceWorker_p0-w0: resuming experience collection (34400 times) [2024-06-15 19:20:54,802][1651669] Updated weights for policy 0, policy_version 656368 (0.0025) [2024-06-15 19:20:55,774][1648981] Fps is (10 sec: 52386.9, 60 sec: 50254.9, 300 sec: 48984.1). Total num frames: 1344274432. Throughput: 0: 12333.5. Samples: 336098816. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:20:55,775][1648981] Avg episode reward: [(0, '518.420')] [2024-06-15 19:20:55,794][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000656384_1344274432.pth... [2024-06-15 19:20:55,835][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000650592_1332412416.pth [2024-06-15 19:20:59,134][1651669] Updated weights for policy 0, policy_version 656420 (0.0011) [2024-06-15 19:21:00,790][1648981] Fps is (10 sec: 52373.7, 60 sec: 48041.1, 300 sec: 48428.1). Total num frames: 1344405504. Throughput: 0: 12315.8. Samples: 336174592. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:00,791][1648981] Avg episode reward: [(0, '517.550')] [2024-06-15 19:21:01,279][1651669] Updated weights for policy 0, policy_version 656464 (0.0011) [2024-06-15 19:21:02,194][1651669] Updated weights for policy 0, policy_version 656512 (0.0013) [2024-06-15 19:21:04,263][1651669] Updated weights for policy 0, policy_version 656578 (0.0013) [2024-06-15 19:21:05,766][1648981] Fps is (10 sec: 52471.3, 60 sec: 50790.5, 300 sec: 49096.5). Total num frames: 1344798720. Throughput: 0: 12185.6. Samples: 336245248. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:05,767][1648981] Avg episode reward: [(0, '475.020')] [2024-06-15 19:21:08,981][1651669] Updated weights for policy 0, policy_version 656656 (0.0014) [2024-06-15 19:21:10,766][1648981] Fps is (10 sec: 52551.4, 60 sec: 50251.1, 300 sec: 48430.0). Total num frames: 1344929792. Throughput: 0: 12344.9. Samples: 336289792. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:10,767][1648981] Avg episode reward: [(0, '484.130')] [2024-06-15 19:21:11,955][1651669] Updated weights for policy 0, policy_version 656723 (0.0011) [2024-06-15 19:21:12,652][1651669] Updated weights for policy 0, policy_version 656760 (0.0013) [2024-06-15 19:21:14,189][1651669] Updated weights for policy 0, policy_version 656800 (0.0012) [2024-06-15 19:21:15,767][1648981] Fps is (10 sec: 45872.4, 60 sec: 50790.2, 300 sec: 49098.7). Total num frames: 1345257472. Throughput: 0: 12471.7. Samples: 336366080. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:15,768][1648981] Avg episode reward: [(0, '489.170')] [2024-06-15 19:21:15,870][1651669] Updated weights for policy 0, policy_version 656880 (0.0015) [2024-06-15 19:21:20,476][1651669] Updated weights for policy 0, policy_version 656949 (0.0014) [2024-06-15 19:21:20,775][1648981] Fps is (10 sec: 52384.8, 60 sec: 50237.3, 300 sec: 48652.1). Total num frames: 1345454080. Throughput: 0: 12331.2. Samples: 336431104. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:20,775][1648981] Avg episode reward: [(0, '496.090')] [2024-06-15 19:21:23,177][1651669] Updated weights for policy 0, policy_version 656993 (0.0012) [2024-06-15 19:21:24,134][1651669] Updated weights for policy 0, policy_version 657027 (0.0016) [2024-06-15 19:21:25,543][1651669] Updated weights for policy 0, policy_version 657089 (0.0011) [2024-06-15 19:21:25,766][1648981] Fps is (10 sec: 49154.5, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1345748992. Throughput: 0: 12401.8. Samples: 336470528. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:25,767][1648981] Avg episode reward: [(0, '497.690')] [2024-06-15 19:21:26,682][1651669] Updated weights for policy 0, policy_version 657146 (0.0015) [2024-06-15 19:21:30,798][1648981] Fps is (10 sec: 45767.9, 60 sec: 49131.1, 300 sec: 48758.2). Total num frames: 1345912832. Throughput: 0: 12643.2. Samples: 336550912. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:30,799][1648981] Avg episode reward: [(0, '498.210')] [2024-06-15 19:21:31,130][1651669] Updated weights for policy 0, policy_version 657210 (0.0088) [2024-06-15 19:21:33,086][1651669] Updated weights for policy 0, policy_version 657250 (0.0012) [2024-06-15 19:21:35,283][1651274] Signal inference workers to stop experience collection... (34450 times) [2024-06-15 19:21:35,322][1651669] InferenceWorker_p0-w0: stopping experience collection (34450 times) [2024-06-15 19:21:35,511][1651274] Signal inference workers to resume experience collection... (34450 times) [2024-06-15 19:21:35,511][1651669] InferenceWorker_p0-w0: resuming experience collection (34450 times) [2024-06-15 19:21:35,647][1651669] Updated weights for policy 0, policy_version 657329 (0.0013) [2024-06-15 19:21:35,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1346207744. Throughput: 0: 12507.7. Samples: 336628224. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:35,767][1648981] Avg episode reward: [(0, '459.320')] [2024-06-15 19:21:37,285][1651669] Updated weights for policy 0, policy_version 657396 (0.0010) [2024-06-15 19:21:40,767][1648981] Fps is (10 sec: 46018.8, 60 sec: 49697.7, 300 sec: 48985.3). Total num frames: 1346371584. Throughput: 0: 12460.7. Samples: 336659456. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:40,768][1648981] Avg episode reward: [(0, '443.930')] [2024-06-15 19:21:41,572][1651669] Updated weights for policy 0, policy_version 657444 (0.0014) [2024-06-15 19:21:44,002][1651669] Updated weights for policy 0, policy_version 657506 (0.0020) [2024-06-15 19:21:45,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 1346666496. Throughput: 0: 12317.1. Samples: 336728576. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:45,767][1648981] Avg episode reward: [(0, '459.120')] [2024-06-15 19:21:46,637][1651669] Updated weights for policy 0, policy_version 657586 (0.0071) [2024-06-15 19:21:48,077][1651669] Updated weights for policy 0, policy_version 657654 (0.0012) [2024-06-15 19:21:50,779][1648981] Fps is (10 sec: 52365.1, 60 sec: 50244.3, 300 sec: 49317.1). Total num frames: 1346895872. Throughput: 0: 12466.5. Samples: 336806400. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:50,780][1648981] Avg episode reward: [(0, '473.620')] [2024-06-15 19:21:52,315][1651669] Updated weights for policy 0, policy_version 657697 (0.0013) [2024-06-15 19:21:52,912][1651669] Updated weights for policy 0, policy_version 657728 (0.0013) [2024-06-15 19:21:54,962][1651669] Updated weights for policy 0, policy_version 657776 (0.0012) [2024-06-15 19:21:55,767][1648981] Fps is (10 sec: 49147.2, 60 sec: 48065.4, 300 sec: 48874.2). Total num frames: 1347158016. Throughput: 0: 12344.6. Samples: 336845312. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:21:55,768][1648981] Avg episode reward: [(0, '478.600')] [2024-06-15 19:21:56,449][1651669] Updated weights for policy 0, policy_version 657811 (0.0012) [2024-06-15 19:21:58,061][1651669] Updated weights for policy 0, policy_version 657873 (0.0011) [2024-06-15 19:22:00,769][1648981] Fps is (10 sec: 52482.3, 60 sec: 50261.6, 300 sec: 49318.2). Total num frames: 1347420160. Throughput: 0: 12162.3. Samples: 336913408. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:22:00,770][1648981] Avg episode reward: [(0, '486.350')] [2024-06-15 19:22:02,242][1651669] Updated weights for policy 0, policy_version 657936 (0.0013) [2024-06-15 19:22:03,363][1651669] Updated weights for policy 0, policy_version 657984 (0.0043) [2024-06-15 19:22:05,798][1648981] Fps is (10 sec: 52267.0, 60 sec: 48034.1, 300 sec: 48980.1). Total num frames: 1347682304. Throughput: 0: 12440.8. Samples: 336991232. Policy #0 lag: (min: 15.0, avg: 119.9, max: 271.0) [2024-06-15 19:22:05,799][1648981] Avg episode reward: [(0, '485.230')] [2024-06-15 19:22:07,018][1651669] Updated weights for policy 0, policy_version 658064 (0.0014) [2024-06-15 19:22:08,840][1651669] Updated weights for policy 0, policy_version 658129 (0.0017) [2024-06-15 19:22:09,984][1651669] Updated weights for policy 0, policy_version 658169 (0.0012) [2024-06-15 19:22:10,766][1648981] Fps is (10 sec: 52442.2, 60 sec: 50244.2, 300 sec: 49318.6). Total num frames: 1347944448. Throughput: 0: 12379.0. Samples: 337027584. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:10,767][1648981] Avg episode reward: [(0, '489.670')] [2024-06-15 19:22:14,331][1651669] Updated weights for policy 0, policy_version 658232 (0.0012) [2024-06-15 19:22:15,766][1648981] Fps is (10 sec: 46021.8, 60 sec: 48060.1, 300 sec: 48763.2). Total num frames: 1348141056. Throughput: 0: 12251.1. Samples: 337101824. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:15,767][1648981] Avg episode reward: [(0, '509.830')] [2024-06-15 19:22:16,047][1651669] Updated weights for policy 0, policy_version 658298 (0.0010) [2024-06-15 19:22:17,799][1651274] Signal inference workers to stop experience collection... (34500 times) [2024-06-15 19:22:17,846][1651669] InferenceWorker_p0-w0: stopping experience collection (34500 times) [2024-06-15 19:22:17,988][1651274] Signal inference workers to resume experience collection... (34500 times) [2024-06-15 19:22:17,990][1651669] InferenceWorker_p0-w0: resuming experience collection (34500 times) [2024-06-15 19:22:18,571][1651669] Updated weights for policy 0, policy_version 658352 (0.0094) [2024-06-15 19:22:20,291][1651669] Updated weights for policy 0, policy_version 658416 (0.0016) [2024-06-15 19:22:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50251.2, 300 sec: 49318.6). Total num frames: 1348468736. Throughput: 0: 11946.7. Samples: 337165824. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:20,767][1648981] Avg episode reward: [(0, '554.490')] [2024-06-15 19:22:24,998][1651669] Updated weights for policy 0, policy_version 658480 (0.0015) [2024-06-15 19:22:25,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 47513.5, 300 sec: 48765.3). Total num frames: 1348599808. Throughput: 0: 12265.4. Samples: 337211392. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:25,767][1648981] Avg episode reward: [(0, '560.550')] [2024-06-15 19:22:26,205][1651669] Updated weights for policy 0, policy_version 658528 (0.0013) [2024-06-15 19:22:28,794][1651669] Updated weights for policy 0, policy_version 658576 (0.0011) [2024-06-15 19:22:30,175][1651669] Updated weights for policy 0, policy_version 658640 (0.0117) [2024-06-15 19:22:30,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 50270.9, 300 sec: 49096.5). Total num frames: 1348927488. Throughput: 0: 12242.5. Samples: 337279488. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:30,767][1648981] Avg episode reward: [(0, '561.250')] [2024-06-15 19:22:34,634][1651669] Updated weights for policy 0, policy_version 658708 (0.0013) [2024-06-15 19:22:35,603][1651669] Updated weights for policy 0, policy_version 658755 (0.0039) [2024-06-15 19:22:35,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48605.9, 300 sec: 49207.5). Total num frames: 1349124096. Throughput: 0: 12337.0. Samples: 337361408. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:35,767][1648981] Avg episode reward: [(0, '571.330')] [2024-06-15 19:22:37,061][1651669] Updated weights for policy 0, policy_version 658809 (0.0009) [2024-06-15 19:22:39,651][1651669] Updated weights for policy 0, policy_version 658850 (0.0015) [2024-06-15 19:22:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50790.9, 300 sec: 49207.5). Total num frames: 1349419008. Throughput: 0: 12333.8. Samples: 337400320. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:40,767][1648981] Avg episode reward: [(0, '589.260')] [2024-06-15 19:22:41,147][1651669] Updated weights for policy 0, policy_version 658928 (0.0112) [2024-06-15 19:22:45,013][1651669] Updated weights for policy 0, policy_version 658963 (0.0047) [2024-06-15 19:22:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 49540.8). Total num frames: 1349648384. Throughput: 0: 12493.5. Samples: 337475584. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:45,767][1648981] Avg episode reward: [(0, '605.240')] [2024-06-15 19:22:46,610][1651669] Updated weights for policy 0, policy_version 659024 (0.0023) [2024-06-15 19:22:47,376][1651669] Updated weights for policy 0, policy_version 659071 (0.0011) [2024-06-15 19:22:50,277][1651669] Updated weights for policy 0, policy_version 659126 (0.0011) [2024-06-15 19:22:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 50254.9, 300 sec: 49318.7). Total num frames: 1349910528. Throughput: 0: 12410.6. Samples: 337549312. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:50,767][1648981] Avg episode reward: [(0, '609.340')] [2024-06-15 19:22:51,668][1651669] Updated weights for policy 0, policy_version 659184 (0.0027) [2024-06-15 19:22:55,237][1651669] Updated weights for policy 0, policy_version 659248 (0.0014) [2024-06-15 19:22:55,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50245.0, 300 sec: 49651.8). Total num frames: 1350172672. Throughput: 0: 12401.8. Samples: 337585664. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:22:55,767][1648981] Avg episode reward: [(0, '607.110')] [2024-06-15 19:22:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000659264_1350172672.pth... [2024-06-15 19:22:55,815][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000653488_1338343424.pth [2024-06-15 19:22:56,687][1651274] Signal inference workers to stop experience collection... (34550 times) [2024-06-15 19:22:56,733][1651669] InferenceWorker_p0-w0: stopping experience collection (34550 times) [2024-06-15 19:22:56,942][1651274] Signal inference workers to resume experience collection... (34550 times) [2024-06-15 19:22:56,943][1651669] InferenceWorker_p0-w0: resuming experience collection (34550 times) [2024-06-15 19:22:56,945][1651669] Updated weights for policy 0, policy_version 659280 (0.0012) [2024-06-15 19:22:57,798][1651669] Updated weights for policy 0, policy_version 659327 (0.0014) [2024-06-15 19:23:00,767][1648981] Fps is (10 sec: 45873.2, 60 sec: 49153.7, 300 sec: 49096.4). Total num frames: 1350369280. Throughput: 0: 12413.0. Samples: 337660416. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:00,768][1648981] Avg episode reward: [(0, '626.650')] [2024-06-15 19:23:01,234][1651669] Updated weights for policy 0, policy_version 659390 (0.0013) [2024-06-15 19:23:03,509][1651669] Updated weights for policy 0, policy_version 659453 (0.0013) [2024-06-15 19:23:05,631][1651669] Updated weights for policy 0, policy_version 659518 (0.0017) [2024-06-15 19:23:05,768][1648981] Fps is (10 sec: 52419.1, 60 sec: 50269.4, 300 sec: 49762.9). Total num frames: 1350696960. Throughput: 0: 12526.4. Samples: 337729536. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:05,769][1648981] Avg episode reward: [(0, '679.500')] [2024-06-15 19:23:08,061][1651669] Updated weights for policy 0, policy_version 659575 (0.0015) [2024-06-15 19:23:10,766][1648981] Fps is (10 sec: 49154.3, 60 sec: 48605.9, 300 sec: 48986.8). Total num frames: 1350860800. Throughput: 0: 12310.8. Samples: 337765376. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:10,767][1648981] Avg episode reward: [(0, '661.890')] [2024-06-15 19:23:11,513][1651669] Updated weights for policy 0, policy_version 659632 (0.0043) [2024-06-15 19:23:13,526][1651669] Updated weights for policy 0, policy_version 659683 (0.0010) [2024-06-15 19:23:15,222][1651669] Updated weights for policy 0, policy_version 659728 (0.0019) [2024-06-15 19:23:15,768][1648981] Fps is (10 sec: 45875.4, 60 sec: 50242.7, 300 sec: 49540.5). Total num frames: 1351155712. Throughput: 0: 12549.2. Samples: 337844224. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:15,769][1648981] Avg episode reward: [(0, '660.530')] [2024-06-15 19:23:17,825][1651669] Updated weights for policy 0, policy_version 659808 (0.0013) [2024-06-15 19:23:20,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48874.9). Total num frames: 1351352320. Throughput: 0: 12413.1. Samples: 337920000. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:20,767][1648981] Avg episode reward: [(0, '664.740')] [2024-06-15 19:23:21,389][1651669] Updated weights for policy 0, policy_version 659872 (0.0014) [2024-06-15 19:23:24,040][1651669] Updated weights for policy 0, policy_version 659936 (0.0040) [2024-06-15 19:23:25,767][1648981] Fps is (10 sec: 45879.1, 60 sec: 50243.5, 300 sec: 49318.4). Total num frames: 1351614464. Throughput: 0: 12469.8. Samples: 337961472. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:25,768][1648981] Avg episode reward: [(0, '679.480')] [2024-06-15 19:23:26,432][1651669] Updated weights for policy 0, policy_version 660000 (0.0015) [2024-06-15 19:23:28,074][1651669] Updated weights for policy 0, policy_version 660048 (0.0011) [2024-06-15 19:23:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1351876608. Throughput: 0: 12401.8. Samples: 338033664. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:30,767][1648981] Avg episode reward: [(0, '698.340')] [2024-06-15 19:23:31,936][1651669] Updated weights for policy 0, policy_version 660114 (0.0010) [2024-06-15 19:23:35,213][1651669] Updated weights for policy 0, policy_version 660192 (0.0012) [2024-06-15 19:23:35,767][1648981] Fps is (10 sec: 49154.3, 60 sec: 49697.7, 300 sec: 49207.4). Total num frames: 1352105984. Throughput: 0: 12401.6. Samples: 338107392. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:35,767][1648981] Avg episode reward: [(0, '691.170')] [2024-06-15 19:23:36,528][1651669] Updated weights for policy 0, policy_version 660240 (0.0011) [2024-06-15 19:23:37,635][1651669] Updated weights for policy 0, policy_version 660287 (0.0019) [2024-06-15 19:23:38,720][1651274] Signal inference workers to stop experience collection... (34600 times) [2024-06-15 19:23:38,779][1651669] InferenceWorker_p0-w0: stopping experience collection (34600 times) [2024-06-15 19:23:38,962][1651274] Signal inference workers to resume experience collection... (34600 times) [2024-06-15 19:23:38,967][1651669] InferenceWorker_p0-w0: resuming experience collection (34600 times) [2024-06-15 19:23:39,371][1651669] Updated weights for policy 0, policy_version 660336 (0.0011) [2024-06-15 19:23:40,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 1352400896. Throughput: 0: 12310.8. Samples: 338139648. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:40,767][1648981] Avg episode reward: [(0, '689.960')] [2024-06-15 19:23:43,127][1651669] Updated weights for policy 0, policy_version 660400 (0.0117) [2024-06-15 19:23:45,766][1648981] Fps is (10 sec: 45877.8, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 1352564736. Throughput: 0: 12413.3. Samples: 338219008. Policy #0 lag: (min: 117.0, avg: 221.5, max: 357.0) [2024-06-15 19:23:45,767][1648981] Avg episode reward: [(0, '683.260')] [2024-06-15 19:23:45,894][1651669] Updated weights for policy 0, policy_version 660448 (0.0013) [2024-06-15 19:23:47,347][1651669] Updated weights for policy 0, policy_version 660496 (0.0012) [2024-06-15 19:23:48,569][1651669] Updated weights for policy 0, policy_version 660542 (0.0012) [2024-06-15 19:23:50,784][1648981] Fps is (10 sec: 52334.2, 60 sec: 50229.1, 300 sec: 49648.9). Total num frames: 1352925184. Throughput: 0: 12306.3. Samples: 338283520. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:23:50,785][1648981] Avg episode reward: [(0, '695.030')] [2024-06-15 19:23:53,039][1651669] Updated weights for policy 0, policy_version 660611 (0.0069) [2024-06-15 19:23:54,046][1651669] Updated weights for policy 0, policy_version 660662 (0.0014) [2024-06-15 19:23:55,794][1648981] Fps is (10 sec: 49014.8, 60 sec: 48037.4, 300 sec: 48869.7). Total num frames: 1353056256. Throughput: 0: 12553.3. Samples: 338330624. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:23:55,795][1648981] Avg episode reward: [(0, '685.210')] [2024-06-15 19:23:56,459][1651669] Updated weights for policy 0, policy_version 660694 (0.0014) [2024-06-15 19:23:58,334][1651669] Updated weights for policy 0, policy_version 660771 (0.0012) [2024-06-15 19:24:00,782][1648981] Fps is (10 sec: 45885.5, 60 sec: 50231.3, 300 sec: 49649.2). Total num frames: 1353383936. Throughput: 0: 12284.2. Samples: 338397184. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:00,783][1648981] Avg episode reward: [(0, '686.210')] [2024-06-15 19:24:00,807][1651669] Updated weights for policy 0, policy_version 660835 (0.0014) [2024-06-15 19:24:03,716][1651669] Updated weights for policy 0, policy_version 660871 (0.0015) [2024-06-15 19:24:05,767][1648981] Fps is (10 sec: 52575.0, 60 sec: 48061.1, 300 sec: 49207.5). Total num frames: 1353580544. Throughput: 0: 12458.6. Samples: 338480640. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:05,767][1648981] Avg episode reward: [(0, '707.100')] [2024-06-15 19:24:06,239][1651669] Updated weights for policy 0, policy_version 660944 (0.0015) [2024-06-15 19:24:07,446][1651669] Updated weights for policy 0, policy_version 660992 (0.0023) [2024-06-15 19:24:09,055][1651669] Updated weights for policy 0, policy_version 661051 (0.0052) [2024-06-15 19:24:10,766][1648981] Fps is (10 sec: 49229.9, 60 sec: 50244.2, 300 sec: 49651.8). Total num frames: 1353875456. Throughput: 0: 12197.2. Samples: 338510336. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:10,767][1648981] Avg episode reward: [(0, '742.760')] [2024-06-15 19:24:11,117][1651669] Updated weights for policy 0, policy_version 661107 (0.0109) [2024-06-15 19:24:14,968][1651669] Updated weights for policy 0, policy_version 661154 (0.0011) [2024-06-15 19:24:15,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 49153.5, 300 sec: 49318.8). Total num frames: 1354104832. Throughput: 0: 12413.1. Samples: 338592256. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:15,767][1648981] Avg episode reward: [(0, '764.950')] [2024-06-15 19:24:17,200][1651669] Updated weights for policy 0, policy_version 661200 (0.0013) [2024-06-15 19:24:18,354][1651669] Updated weights for policy 0, policy_version 661248 (0.0012) [2024-06-15 19:24:20,231][1651669] Updated weights for policy 0, policy_version 661309 (0.0013) [2024-06-15 19:24:20,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.2, 300 sec: 49765.0). Total num frames: 1354366976. Throughput: 0: 12197.1. Samples: 338656256. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:20,767][1648981] Avg episode reward: [(0, '733.380')] [2024-06-15 19:24:21,269][1651274] Signal inference workers to stop experience collection... (34650 times) [2024-06-15 19:24:21,343][1651669] InferenceWorker_p0-w0: stopping experience collection (34650 times) [2024-06-15 19:24:21,434][1651274] Signal inference workers to resume experience collection... (34650 times) [2024-06-15 19:24:21,435][1651669] InferenceWorker_p0-w0: resuming experience collection (34650 times) [2024-06-15 19:24:21,799][1651669] Updated weights for policy 0, policy_version 661361 (0.0130) [2024-06-15 19:24:25,005][1651669] Updated weights for policy 0, policy_version 661395 (0.0015) [2024-06-15 19:24:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.9, 300 sec: 49208.2). Total num frames: 1354596352. Throughput: 0: 12470.0. Samples: 338700800. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:25,767][1648981] Avg episode reward: [(0, '699.800')] [2024-06-15 19:24:27,218][1651669] Updated weights for policy 0, policy_version 661456 (0.0028) [2024-06-15 19:24:28,070][1651669] Updated weights for policy 0, policy_version 661504 (0.0012) [2024-06-15 19:24:29,984][1651669] Updated weights for policy 0, policy_version 661566 (0.0012) [2024-06-15 19:24:30,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 50244.1, 300 sec: 49762.9). Total num frames: 1354891264. Throughput: 0: 12447.2. Samples: 338779136. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:30,767][1648981] Avg episode reward: [(0, '696.010')] [2024-06-15 19:24:35,290][1651669] Updated weights for policy 0, policy_version 661638 (0.0045) [2024-06-15 19:24:35,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 49152.3, 300 sec: 49207.5). Total num frames: 1355055104. Throughput: 0: 12759.6. Samples: 338857472. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:35,768][1648981] Avg episode reward: [(0, '695.150')] [2024-06-15 19:24:36,344][1651669] Updated weights for policy 0, policy_version 661684 (0.0013) [2024-06-15 19:24:37,357][1651669] Updated weights for policy 0, policy_version 661712 (0.0094) [2024-06-15 19:24:38,397][1651669] Updated weights for policy 0, policy_version 661759 (0.0011) [2024-06-15 19:24:40,644][1651669] Updated weights for policy 0, policy_version 661820 (0.0012) [2024-06-15 19:24:40,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.3, 300 sec: 49762.9). Total num frames: 1355415552. Throughput: 0: 12455.0. Samples: 338890752. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:40,767][1648981] Avg episode reward: [(0, '670.020')] [2024-06-15 19:24:42,401][1651669] Updated weights for policy 0, policy_version 661885 (0.0015) [2024-06-15 19:24:45,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 50244.3, 300 sec: 49319.0). Total num frames: 1355579392. Throughput: 0: 12611.0. Samples: 338964480. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:45,767][1648981] Avg episode reward: [(0, '703.990')] [2024-06-15 19:24:46,638][1651669] Updated weights for policy 0, policy_version 661945 (0.0109) [2024-06-15 19:24:48,676][1651669] Updated weights for policy 0, policy_version 662008 (0.0014) [2024-06-15 19:24:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48620.5, 300 sec: 49433.1). Total num frames: 1355841536. Throughput: 0: 12561.1. Samples: 339045888. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:50,767][1648981] Avg episode reward: [(0, '694.100')] [2024-06-15 19:24:51,211][1651669] Updated weights for policy 0, policy_version 662054 (0.0011) [2024-06-15 19:24:52,908][1651669] Updated weights for policy 0, policy_version 662136 (0.0013) [2024-06-15 19:24:55,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 50267.5, 300 sec: 49318.6). Total num frames: 1356070912. Throughput: 0: 12504.1. Samples: 339073024. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:24:55,767][1648981] Avg episode reward: [(0, '683.250')] [2024-06-15 19:24:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000662144_1356070912.pth... [2024-06-15 19:24:55,848][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000656384_1344274432.pth [2024-06-15 19:24:57,883][1651669] Updated weights for policy 0, policy_version 662192 (0.0116) [2024-06-15 19:24:59,338][1651669] Updated weights for policy 0, policy_version 662270 (0.0014) [2024-06-15 19:25:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49165.0, 300 sec: 49429.7). Total num frames: 1356333056. Throughput: 0: 12333.5. Samples: 339147264. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:25:00,768][1648981] Avg episode reward: [(0, '696.810')] [2024-06-15 19:25:01,793][1651669] Updated weights for policy 0, policy_version 662312 (0.0011) [2024-06-15 19:25:02,470][1651274] Signal inference workers to stop experience collection... (34700 times) [2024-06-15 19:25:02,521][1651669] InferenceWorker_p0-w0: stopping experience collection (34700 times) [2024-06-15 19:25:02,664][1651274] Signal inference workers to resume experience collection... (34700 times) [2024-06-15 19:25:02,664][1651669] InferenceWorker_p0-w0: resuming experience collection (34700 times) [2024-06-15 19:25:03,033][1651669] Updated weights for policy 0, policy_version 662368 (0.0033) [2024-06-15 19:25:05,787][1648981] Fps is (10 sec: 52324.5, 60 sec: 50227.5, 300 sec: 49760.9). Total num frames: 1356595200. Throughput: 0: 12521.3. Samples: 339219968. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:25:05,788][1648981] Avg episode reward: [(0, '712.610')] [2024-06-15 19:25:08,519][1651669] Updated weights for policy 0, policy_version 662433 (0.0012) [2024-06-15 19:25:09,980][1651669] Updated weights for policy 0, policy_version 662480 (0.0043) [2024-06-15 19:25:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 49540.8). Total num frames: 1356824576. Throughput: 0: 12435.9. Samples: 339260416. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:25:10,767][1648981] Avg episode reward: [(0, '668.170')] [2024-06-15 19:25:11,621][1651669] Updated weights for policy 0, policy_version 662544 (0.0019) [2024-06-15 19:25:13,467][1651669] Updated weights for policy 0, policy_version 662608 (0.0012) [2024-06-15 19:25:14,813][1651669] Updated weights for policy 0, policy_version 662656 (0.0020) [2024-06-15 19:25:15,766][1648981] Fps is (10 sec: 52535.2, 60 sec: 50244.4, 300 sec: 49762.9). Total num frames: 1357119488. Throughput: 0: 12071.9. Samples: 339322368. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:25:15,767][1648981] Avg episode reward: [(0, '670.620')] [2024-06-15 19:25:20,395][1651669] Updated weights for policy 0, policy_version 662720 (0.0026) [2024-06-15 19:25:20,767][1648981] Fps is (10 sec: 42597.4, 60 sec: 48059.5, 300 sec: 49207.5). Total num frames: 1357250560. Throughput: 0: 12151.4. Samples: 339404288. Policy #0 lag: (min: 31.0, avg: 156.2, max: 296.0) [2024-06-15 19:25:20,768][1648981] Avg episode reward: [(0, '667.480')] [2024-06-15 19:25:22,856][1651669] Updated weights for policy 0, policy_version 662808 (0.0016) [2024-06-15 19:25:25,357][1651669] Updated weights for policy 0, policy_version 662896 (0.0014) [2024-06-15 19:25:25,778][1648981] Fps is (10 sec: 52366.0, 60 sec: 50780.3, 300 sec: 49762.0). Total num frames: 1357643776. Throughput: 0: 12045.9. Samples: 339432960. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:25,779][1648981] Avg episode reward: [(0, '668.670')] [2024-06-15 19:25:29,680][1651669] Updated weights for policy 0, policy_version 662929 (0.0011) [2024-06-15 19:25:30,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 48059.9, 300 sec: 49318.6). Total num frames: 1357774848. Throughput: 0: 12310.7. Samples: 339518464. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:30,767][1648981] Avg episode reward: [(0, '654.150')] [2024-06-15 19:25:31,468][1651669] Updated weights for policy 0, policy_version 662992 (0.0016) [2024-06-15 19:25:33,826][1651669] Updated weights for policy 0, policy_version 663046 (0.0013) [2024-06-15 19:25:35,109][1651669] Updated weights for policy 0, policy_version 663104 (0.0011) [2024-06-15 19:25:35,766][1648981] Fps is (10 sec: 42649.4, 60 sec: 50244.4, 300 sec: 49762.9). Total num frames: 1358069760. Throughput: 0: 11798.8. Samples: 339576832. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:35,767][1648981] Avg episode reward: [(0, '627.610')] [2024-06-15 19:25:40,132][1651669] Updated weights for policy 0, policy_version 663184 (0.0013) [2024-06-15 19:25:40,778][1648981] Fps is (10 sec: 45821.2, 60 sec: 46958.3, 300 sec: 49094.5). Total num frames: 1358233600. Throughput: 0: 12159.7. Samples: 339620352. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:40,779][1648981] Avg episode reward: [(0, '633.790')] [2024-06-15 19:25:42,060][1651669] Updated weights for policy 0, policy_version 663239 (0.0012) [2024-06-15 19:25:43,512][1651669] Updated weights for policy 0, policy_version 663296 (0.0102) [2024-06-15 19:25:44,858][1651669] Updated weights for policy 0, policy_version 663353 (0.0112) [2024-06-15 19:25:45,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 49765.1). Total num frames: 1358561280. Throughput: 0: 12026.3. Samples: 339688448. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:45,767][1648981] Avg episode reward: [(0, '634.280')] [2024-06-15 19:25:45,835][1651274] Signal inference workers to stop experience collection... (34750 times) [2024-06-15 19:25:45,863][1651669] InferenceWorker_p0-w0: stopping experience collection (34750 times) [2024-06-15 19:25:46,053][1651274] Signal inference workers to resume experience collection... (34750 times) [2024-06-15 19:25:46,053][1651669] InferenceWorker_p0-w0: resuming experience collection (34750 times) [2024-06-15 19:25:46,844][1651669] Updated weights for policy 0, policy_version 663410 (0.0127) [2024-06-15 19:25:50,766][1648981] Fps is (10 sec: 49210.5, 60 sec: 48059.9, 300 sec: 48986.7). Total num frames: 1358725120. Throughput: 0: 12384.6. Samples: 339777024. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:50,767][1648981] Avg episode reward: [(0, '632.760')] [2024-06-15 19:25:51,526][1651669] Updated weights for policy 0, policy_version 663472 (0.0014) [2024-06-15 19:25:53,809][1651669] Updated weights for policy 0, policy_version 663521 (0.0012) [2024-06-15 19:25:55,321][1651669] Updated weights for policy 0, policy_version 663586 (0.0092) [2024-06-15 19:25:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49698.4, 300 sec: 49655.8). Total num frames: 1359052800. Throughput: 0: 12231.1. Samples: 339810816. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:25:55,767][1648981] Avg episode reward: [(0, '628.130')] [2024-06-15 19:25:56,537][1651669] Updated weights for policy 0, policy_version 663632 (0.0014) [2024-06-15 19:26:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1359216640. Throughput: 0: 12253.9. Samples: 339873792. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:00,767][1648981] Avg episode reward: [(0, '636.040')] [2024-06-15 19:26:01,428][1651669] Updated weights for policy 0, policy_version 663684 (0.0020) [2024-06-15 19:26:02,686][1651669] Updated weights for policy 0, policy_version 663742 (0.0012) [2024-06-15 19:26:05,205][1651669] Updated weights for policy 0, policy_version 663796 (0.0054) [2024-06-15 19:26:05,778][1648981] Fps is (10 sec: 42547.8, 60 sec: 48066.4, 300 sec: 49316.6). Total num frames: 1359478784. Throughput: 0: 12148.3. Samples: 339951104. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:05,779][1648981] Avg episode reward: [(0, '650.970')] [2024-06-15 19:26:06,878][1651669] Updated weights for policy 0, policy_version 663869 (0.0129) [2024-06-15 19:26:08,539][1651669] Updated weights for policy 0, policy_version 663934 (0.0026) [2024-06-15 19:26:10,788][1648981] Fps is (10 sec: 52313.2, 60 sec: 48588.0, 300 sec: 49092.9). Total num frames: 1359740928. Throughput: 0: 12103.2. Samples: 339977728. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:10,789][1648981] Avg episode reward: [(0, '643.530')] [2024-06-15 19:26:13,794][1651669] Updated weights for policy 0, policy_version 663986 (0.0011) [2024-06-15 19:26:15,400][1651669] Updated weights for policy 0, policy_version 664020 (0.0023) [2024-06-15 19:26:15,767][1648981] Fps is (10 sec: 45928.1, 60 sec: 46967.2, 300 sec: 49097.8). Total num frames: 1359937536. Throughput: 0: 12049.0. Samples: 340060672. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:15,767][1648981] Avg episode reward: [(0, '632.410')] [2024-06-15 19:26:16,769][1651669] Updated weights for policy 0, policy_version 664081 (0.0011) [2024-06-15 19:26:18,318][1651669] Updated weights for policy 0, policy_version 664144 (0.0011) [2024-06-15 19:26:19,438][1651669] Updated weights for policy 0, policy_version 664187 (0.0011) [2024-06-15 19:26:20,766][1648981] Fps is (10 sec: 52544.3, 60 sec: 50244.4, 300 sec: 49207.5). Total num frames: 1360265216. Throughput: 0: 12265.2. Samples: 340128768. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:20,767][1648981] Avg episode reward: [(0, '633.890')] [2024-06-15 19:26:24,729][1651669] Updated weights for policy 0, policy_version 664240 (0.0011) [2024-06-15 19:26:25,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 45884.3, 300 sec: 49101.7). Total num frames: 1360396288. Throughput: 0: 12291.2. Samples: 340173312. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:25,767][1648981] Avg episode reward: [(0, '633.200')] [2024-06-15 19:26:26,119][1651669] Updated weights for policy 0, policy_version 664275 (0.0025) [2024-06-15 19:26:27,033][1651274] Signal inference workers to stop experience collection... (34800 times) [2024-06-15 19:26:27,098][1651669] InferenceWorker_p0-w0: stopping experience collection (34800 times) [2024-06-15 19:26:27,286][1651274] Signal inference workers to resume experience collection... (34800 times) [2024-06-15 19:26:27,298][1651669] InferenceWorker_p0-w0: resuming experience collection (34800 times) [2024-06-15 19:26:27,704][1651669] Updated weights for policy 0, policy_version 664352 (0.0012) [2024-06-15 19:26:29,574][1651669] Updated weights for policy 0, policy_version 664417 (0.0012) [2024-06-15 19:26:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.2, 300 sec: 49429.7). Total num frames: 1360789504. Throughput: 0: 12026.3. Samples: 340229632. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:30,767][1648981] Avg episode reward: [(0, '650.090')] [2024-06-15 19:26:35,753][1651669] Updated weights for policy 0, policy_version 664483 (0.0023) [2024-06-15 19:26:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46421.4, 300 sec: 49096.6). Total num frames: 1360855040. Throughput: 0: 11889.8. Samples: 340312064. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:35,767][1648981] Avg episode reward: [(0, '644.610')] [2024-06-15 19:26:36,959][1651669] Updated weights for policy 0, policy_version 664528 (0.0022) [2024-06-15 19:26:38,292][1651669] Updated weights for policy 0, policy_version 664576 (0.0049) [2024-06-15 19:26:39,918][1651669] Updated weights for policy 0, policy_version 664640 (0.0012) [2024-06-15 19:26:40,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 50254.2, 300 sec: 49429.7). Total num frames: 1361248256. Throughput: 0: 11798.8. Samples: 340341760. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:40,767][1648981] Avg episode reward: [(0, '619.640')] [2024-06-15 19:26:45,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 45875.1, 300 sec: 48876.4). Total num frames: 1361313792. Throughput: 0: 12071.8. Samples: 340417024. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:45,767][1648981] Avg episode reward: [(0, '640.480')] [2024-06-15 19:26:46,118][1651669] Updated weights for policy 0, policy_version 664720 (0.0011) [2024-06-15 19:26:48,226][1651669] Updated weights for policy 0, policy_version 664784 (0.0037) [2024-06-15 19:26:49,377][1651669] Updated weights for policy 0, policy_version 664822 (0.0011) [2024-06-15 19:26:50,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48605.8, 300 sec: 49096.6). Total num frames: 1361641472. Throughput: 0: 11767.7. Samples: 340480512. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:50,767][1648981] Avg episode reward: [(0, '635.930')] [2024-06-15 19:26:50,787][1651669] Updated weights for policy 0, policy_version 664880 (0.0011) [2024-06-15 19:26:52,229][1651669] Updated weights for policy 0, policy_version 664957 (0.0013) [2024-06-15 19:26:55,772][1648981] Fps is (10 sec: 52398.7, 60 sec: 46416.8, 300 sec: 48873.8). Total num frames: 1361838080. Throughput: 0: 11973.7. Samples: 340516352. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:26:55,773][1648981] Avg episode reward: [(0, '624.630')] [2024-06-15 19:26:55,789][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000664960_1361838080.pth... [2024-06-15 19:26:55,866][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000659264_1350172672.pth [2024-06-15 19:26:58,300][1651669] Updated weights for policy 0, policy_version 665019 (0.0017) [2024-06-15 19:27:00,310][1651669] Updated weights for policy 0, policy_version 665082 (0.0011) [2024-06-15 19:27:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48879.6). Total num frames: 1362100224. Throughput: 0: 11810.2. Samples: 340592128. Policy #0 lag: (min: 53.0, avg: 214.2, max: 309.0) [2024-06-15 19:27:00,767][1648981] Avg episode reward: [(0, '633.030')] [2024-06-15 19:27:01,774][1651669] Updated weights for policy 0, policy_version 665136 (0.0024) [2024-06-15 19:27:03,361][1651669] Updated weights for policy 0, policy_version 665207 (0.0016) [2024-06-15 19:27:05,766][1648981] Fps is (10 sec: 52459.6, 60 sec: 48069.3, 300 sec: 48874.3). Total num frames: 1362362368. Throughput: 0: 11946.7. Samples: 340666368. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:05,767][1648981] Avg episode reward: [(0, '653.480')] [2024-06-15 19:27:08,945][1651669] Updated weights for policy 0, policy_version 665264 (0.0013) [2024-06-15 19:27:09,984][1651274] Signal inference workers to stop experience collection... (34850 times) [2024-06-15 19:27:10,029][1651669] InferenceWorker_p0-w0: stopping experience collection (34850 times) [2024-06-15 19:27:10,238][1651274] Signal inference workers to resume experience collection... (34850 times) [2024-06-15 19:27:10,239][1651669] InferenceWorker_p0-w0: resuming experience collection (34850 times) [2024-06-15 19:27:10,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 47531.2, 300 sec: 48985.4). Total num frames: 1362591744. Throughput: 0: 11878.4. Samples: 340707840. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:10,767][1648981] Avg episode reward: [(0, '652.010')] [2024-06-15 19:27:11,006][1651669] Updated weights for policy 0, policy_version 665330 (0.0013) [2024-06-15 19:27:12,957][1651669] Updated weights for policy 0, policy_version 665405 (0.0144) [2024-06-15 19:27:13,910][1651669] Updated weights for policy 0, policy_version 665442 (0.0025) [2024-06-15 19:27:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.3, 300 sec: 48874.3). Total num frames: 1362886656. Throughput: 0: 11867.0. Samples: 340763648. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:15,767][1648981] Avg episode reward: [(0, '654.330')] [2024-06-15 19:27:20,774][1648981] Fps is (10 sec: 39290.2, 60 sec: 45323.2, 300 sec: 48762.0). Total num frames: 1362984960. Throughput: 0: 11864.9. Samples: 340846080. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:20,775][1648981] Avg episode reward: [(0, '644.770')] [2024-06-15 19:27:20,844][1651669] Updated weights for policy 0, policy_version 665522 (0.0010) [2024-06-15 19:27:21,720][1651669] Updated weights for policy 0, policy_version 665568 (0.0016) [2024-06-15 19:27:24,226][1651669] Updated weights for policy 0, policy_version 665658 (0.0013) [2024-06-15 19:27:25,444][1651669] Updated weights for policy 0, policy_version 665702 (0.0043) [2024-06-15 19:27:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.3, 300 sec: 49096.5). Total num frames: 1363410944. Throughput: 0: 11730.5. Samples: 340869632. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:25,767][1648981] Avg episode reward: [(0, '626.300')] [2024-06-15 19:27:30,184][1651669] Updated weights for policy 0, policy_version 665747 (0.0047) [2024-06-15 19:27:30,786][1648981] Fps is (10 sec: 52365.6, 60 sec: 45314.1, 300 sec: 48759.9). Total num frames: 1363509248. Throughput: 0: 12009.6. Samples: 340957696. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:30,787][1648981] Avg episode reward: [(0, '614.570')] [2024-06-15 19:27:31,418][1651669] Updated weights for policy 0, policy_version 665794 (0.0011) [2024-06-15 19:27:33,117][1651669] Updated weights for policy 0, policy_version 665857 (0.0010) [2024-06-15 19:27:34,462][1651669] Updated weights for policy 0, policy_version 665911 (0.0014) [2024-06-15 19:27:35,446][1651669] Updated weights for policy 0, policy_version 665941 (0.0012) [2024-06-15 19:27:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1363869696. Throughput: 0: 12015.0. Samples: 341021184. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:35,767][1648981] Avg episode reward: [(0, '622.840')] [2024-06-15 19:27:36,500][1651669] Updated weights for policy 0, policy_version 665984 (0.0012) [2024-06-15 19:27:40,766][1648981] Fps is (10 sec: 45966.9, 60 sec: 45329.1, 300 sec: 48541.1). Total num frames: 1363968000. Throughput: 0: 12096.2. Samples: 341060608. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:40,767][1648981] Avg episode reward: [(0, '607.360')] [2024-06-15 19:27:41,387][1651669] Updated weights for policy 0, policy_version 666039 (0.0012) [2024-06-15 19:27:43,324][1651669] Updated weights for policy 0, policy_version 666098 (0.0014) [2024-06-15 19:27:44,787][1651669] Updated weights for policy 0, policy_version 666160 (0.0012) [2024-06-15 19:27:45,774][1648981] Fps is (10 sec: 45838.8, 60 sec: 50237.8, 300 sec: 48873.0). Total num frames: 1364328448. Throughput: 0: 12047.0. Samples: 341134336. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:45,775][1648981] Avg episode reward: [(0, '621.060')] [2024-06-15 19:27:46,216][1651669] Updated weights for policy 0, policy_version 666208 (0.0012) [2024-06-15 19:27:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 46967.4, 300 sec: 48430.0). Total num frames: 1364459520. Throughput: 0: 12105.9. Samples: 341211136. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:50,767][1648981] Avg episode reward: [(0, '628.440')] [2024-06-15 19:27:51,482][1651274] Signal inference workers to stop experience collection... (34900 times) [2024-06-15 19:27:51,544][1651669] InferenceWorker_p0-w0: stopping experience collection (34900 times) [2024-06-15 19:27:51,874][1651274] Signal inference workers to resume experience collection... (34900 times) [2024-06-15 19:27:51,886][1651669] InferenceWorker_p0-w0: resuming experience collection (34900 times) [2024-06-15 19:27:52,059][1651669] Updated weights for policy 0, policy_version 666257 (0.0011) [2024-06-15 19:27:53,580][1651669] Updated weights for policy 0, policy_version 666322 (0.0012) [2024-06-15 19:27:55,025][1651669] Updated weights for policy 0, policy_version 666388 (0.0012) [2024-06-15 19:27:55,767][1648981] Fps is (10 sec: 52468.4, 60 sec: 50248.9, 300 sec: 49096.5). Total num frames: 1364852736. Throughput: 0: 11923.8. Samples: 341244416. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:27:55,768][1648981] Avg episode reward: [(0, '634.710')] [2024-06-15 19:27:56,612][1651669] Updated weights for policy 0, policy_version 666449 (0.0014) [2024-06-15 19:28:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 48430.3). Total num frames: 1364983808. Throughput: 0: 12288.0. Samples: 341316608. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:00,767][1648981] Avg episode reward: [(0, '641.160')] [2024-06-15 19:28:02,005][1651669] Updated weights for policy 0, policy_version 666501 (0.0012) [2024-06-15 19:28:04,140][1651669] Updated weights for policy 0, policy_version 666577 (0.0139) [2024-06-15 19:28:04,871][1651669] Updated weights for policy 0, policy_version 666624 (0.0014) [2024-06-15 19:28:05,766][1648981] Fps is (10 sec: 39322.8, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 1365245952. Throughput: 0: 11982.9. Samples: 341385216. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:05,767][1648981] Avg episode reward: [(0, '651.160')] [2024-06-15 19:28:06,878][1651669] Updated weights for policy 0, policy_version 666680 (0.0012) [2024-06-15 19:28:08,453][1651669] Updated weights for policy 0, policy_version 666743 (0.0012) [2024-06-15 19:28:10,782][1648981] Fps is (10 sec: 52348.6, 60 sec: 48593.3, 300 sec: 48649.9). Total num frames: 1365508096. Throughput: 0: 12113.2. Samples: 341414912. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:10,782][1648981] Avg episode reward: [(0, '651.160')] [2024-06-15 19:28:14,472][1651669] Updated weights for policy 0, policy_version 666802 (0.0012) [2024-06-15 19:28:15,769][1648981] Fps is (10 sec: 49139.8, 60 sec: 47511.7, 300 sec: 48762.8). Total num frames: 1365737472. Throughput: 0: 11996.8. Samples: 341497344. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:15,769][1648981] Avg episode reward: [(0, '646.930')] [2024-06-15 19:28:15,971][1651669] Updated weights for policy 0, policy_version 666880 (0.0012) [2024-06-15 19:28:18,402][1651669] Updated weights for policy 0, policy_version 666937 (0.0090) [2024-06-15 19:28:19,788][1651669] Updated weights for policy 0, policy_version 667005 (0.0013) [2024-06-15 19:28:20,766][1648981] Fps is (10 sec: 52509.5, 60 sec: 50797.1, 300 sec: 48874.5). Total num frames: 1366032384. Throughput: 0: 11992.2. Samples: 341560832. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:20,767][1648981] Avg episode reward: [(0, '646.990')] [2024-06-15 19:28:25,311][1651669] Updated weights for policy 0, policy_version 667058 (0.0016) [2024-06-15 19:28:25,766][1648981] Fps is (10 sec: 42608.6, 60 sec: 45875.1, 300 sec: 48430.0). Total num frames: 1366163456. Throughput: 0: 12060.4. Samples: 341603328. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:25,767][1648981] Avg episode reward: [(0, '622.910')] [2024-06-15 19:28:26,800][1651669] Updated weights for policy 0, policy_version 667132 (0.0012) [2024-06-15 19:28:28,638][1651274] Signal inference workers to stop experience collection... (34950 times) [2024-06-15 19:28:28,689][1651669] InferenceWorker_p0-w0: stopping experience collection (34950 times) [2024-06-15 19:28:28,808][1651274] Signal inference workers to resume experience collection... (34950 times) [2024-06-15 19:28:28,808][1651669] InferenceWorker_p0-w0: resuming experience collection (34950 times) [2024-06-15 19:28:29,435][1651669] Updated weights for policy 0, policy_version 667188 (0.0013) [2024-06-15 19:28:30,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49714.6, 300 sec: 48763.3). Total num frames: 1366491136. Throughput: 0: 12051.2. Samples: 341676544. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:30,767][1648981] Avg episode reward: [(0, '602.400')] [2024-06-15 19:28:30,954][1651669] Updated weights for policy 0, policy_version 667257 (0.0011) [2024-06-15 19:28:35,472][1651669] Updated weights for policy 0, policy_version 667296 (0.0018) [2024-06-15 19:28:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 45875.1, 300 sec: 48207.8). Total num frames: 1366622208. Throughput: 0: 12071.8. Samples: 341754368. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:35,767][1648981] Avg episode reward: [(0, '608.840')] [2024-06-15 19:28:36,941][1651669] Updated weights for policy 0, policy_version 667363 (0.0012) [2024-06-15 19:28:38,744][1651669] Updated weights for policy 0, policy_version 667393 (0.0011) [2024-06-15 19:28:40,306][1651669] Updated weights for policy 0, policy_version 667457 (0.0011) [2024-06-15 19:28:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1366982656. Throughput: 0: 12140.2. Samples: 341790720. Policy #0 lag: (min: 127.0, avg: 249.2, max: 383.0) [2024-06-15 19:28:40,767][1648981] Avg episode reward: [(0, '636.380')] [2024-06-15 19:28:41,496][1651669] Updated weights for policy 0, policy_version 667515 (0.0011) [2024-06-15 19:28:45,768][1648981] Fps is (10 sec: 52421.8, 60 sec: 46972.5, 300 sec: 48210.6). Total num frames: 1367146496. Throughput: 0: 12321.8. Samples: 341871104. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:28:45,768][1648981] Avg episode reward: [(0, '674.960')] [2024-06-15 19:28:46,350][1651669] Updated weights for policy 0, policy_version 667584 (0.0012) [2024-06-15 19:28:47,428][1651669] Updated weights for policy 0, policy_version 667633 (0.0013) [2024-06-15 19:28:49,714][1651669] Updated weights for policy 0, policy_version 667684 (0.0012) [2024-06-15 19:28:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50790.5, 300 sec: 48990.0). Total num frames: 1367506944. Throughput: 0: 12276.6. Samples: 341937664. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:28:50,767][1648981] Avg episode reward: [(0, '680.790')] [2024-06-15 19:28:51,318][1651669] Updated weights for policy 0, policy_version 667748 (0.0010) [2024-06-15 19:28:55,637][1651669] Updated weights for policy 0, policy_version 667792 (0.0012) [2024-06-15 19:28:55,767][1648981] Fps is (10 sec: 49156.0, 60 sec: 46421.1, 300 sec: 48321.4). Total num frames: 1367638016. Throughput: 0: 12474.2. Samples: 341976064. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:28:55,767][1648981] Avg episode reward: [(0, '710.030')] [2024-06-15 19:28:56,276][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000667824_1367703552.pth... [2024-06-15 19:28:56,320][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000662144_1356070912.pth [2024-06-15 19:28:57,343][1651669] Updated weights for policy 0, policy_version 667857 (0.0012) [2024-06-15 19:28:58,251][1651669] Updated weights for policy 0, policy_version 667901 (0.0126) [2024-06-15 19:29:00,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1367900160. Throughput: 0: 12254.5. Samples: 342048768. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:00,767][1648981] Avg episode reward: [(0, '673.460')] [2024-06-15 19:29:01,898][1651669] Updated weights for policy 0, policy_version 667971 (0.0023) [2024-06-15 19:29:03,127][1651669] Updated weights for policy 0, policy_version 668023 (0.0122) [2024-06-15 19:29:05,774][1648981] Fps is (10 sec: 49115.1, 60 sec: 48053.3, 300 sec: 48317.6). Total num frames: 1368129536. Throughput: 0: 12638.4. Samples: 342129664. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:05,775][1648981] Avg episode reward: [(0, '685.090')] [2024-06-15 19:29:06,091][1651669] Updated weights for policy 0, policy_version 668064 (0.0009) [2024-06-15 19:29:06,196][1651274] Signal inference workers to stop experience collection... (35000 times) [2024-06-15 19:29:06,250][1651669] InferenceWorker_p0-w0: stopping experience collection (35000 times) [2024-06-15 19:29:06,423][1651274] Signal inference workers to resume experience collection... (35000 times) [2024-06-15 19:29:06,424][1651669] InferenceWorker_p0-w0: resuming experience collection (35000 times) [2024-06-15 19:29:06,903][1651669] Updated weights for policy 0, policy_version 668101 (0.0013) [2024-06-15 19:29:10,299][1651669] Updated weights for policy 0, policy_version 668176 (0.0012) [2024-06-15 19:29:10,766][1648981] Fps is (10 sec: 55705.5, 60 sec: 49164.6, 300 sec: 48652.2). Total num frames: 1368457216. Throughput: 0: 12549.7. Samples: 342168064. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:10,767][1648981] Avg episode reward: [(0, '695.420')] [2024-06-15 19:29:11,663][1651669] Updated weights for policy 0, policy_version 668225 (0.0012) [2024-06-15 19:29:13,136][1651669] Updated weights for policy 0, policy_version 668283 (0.0011) [2024-06-15 19:29:15,767][1648981] Fps is (10 sec: 52469.7, 60 sec: 48607.6, 300 sec: 48430.0). Total num frames: 1368653824. Throughput: 0: 12492.7. Samples: 342238720. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:15,768][1648981] Avg episode reward: [(0, '752.930')] [2024-06-15 19:29:17,136][1651669] Updated weights for policy 0, policy_version 668351 (0.0014) [2024-06-15 19:29:18,167][1651669] Updated weights for policy 0, policy_version 668390 (0.0011) [2024-06-15 19:29:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 48763.2). Total num frames: 1368981504. Throughput: 0: 12606.6. Samples: 342321664. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:20,767][1648981] Avg episode reward: [(0, '741.020')] [2024-06-15 19:29:21,114][1651669] Updated weights for policy 0, policy_version 668464 (0.0017) [2024-06-15 19:29:22,724][1651669] Updated weights for policy 0, policy_version 668515 (0.0014) [2024-06-15 19:29:23,376][1651669] Updated weights for policy 0, policy_version 668544 (0.0012) [2024-06-15 19:29:25,785][1648981] Fps is (10 sec: 52331.8, 60 sec: 50228.6, 300 sec: 48426.9). Total num frames: 1369178112. Throughput: 0: 12430.7. Samples: 342350336. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:25,786][1648981] Avg episode reward: [(0, '758.930')] [2024-06-15 19:29:26,797][1651669] Updated weights for policy 0, policy_version 668601 (0.0011) [2024-06-15 19:29:28,595][1651669] Updated weights for policy 0, policy_version 668662 (0.0012) [2024-06-15 19:29:30,767][1648981] Fps is (10 sec: 45872.2, 60 sec: 49151.5, 300 sec: 48763.2). Total num frames: 1369440256. Throughput: 0: 12493.0. Samples: 342433280. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:30,768][1648981] Avg episode reward: [(0, '767.470')] [2024-06-15 19:29:31,802][1651669] Updated weights for policy 0, policy_version 668706 (0.0013) [2024-06-15 19:29:33,342][1651669] Updated weights for policy 0, policy_version 668771 (0.0017) [2024-06-15 19:29:33,905][1651669] Updated weights for policy 0, policy_version 668800 (0.0012) [2024-06-15 19:29:35,766][1648981] Fps is (10 sec: 52527.9, 60 sec: 51336.6, 300 sec: 48430.0). Total num frames: 1369702400. Throughput: 0: 12686.2. Samples: 342508544. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:35,767][1648981] Avg episode reward: [(0, '734.290')] [2024-06-15 19:29:37,525][1651669] Updated weights for policy 0, policy_version 668864 (0.0020) [2024-06-15 19:29:39,136][1651669] Updated weights for policy 0, policy_version 668928 (0.0025) [2024-06-15 19:29:40,766][1648981] Fps is (10 sec: 52431.9, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1369964544. Throughput: 0: 12515.7. Samples: 342539264. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:40,767][1648981] Avg episode reward: [(0, '737.800')] [2024-06-15 19:29:42,492][1651669] Updated weights for policy 0, policy_version 668987 (0.0014) [2024-06-15 19:29:44,062][1651669] Updated weights for policy 0, policy_version 669046 (0.0013) [2024-06-15 19:29:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 51337.7, 300 sec: 48763.2). Total num frames: 1370226688. Throughput: 0: 12640.7. Samples: 342617600. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:45,767][1648981] Avg episode reward: [(0, '776.050')] [2024-06-15 19:29:46,389][1651274] Signal inference workers to stop experience collection... (35050 times) [2024-06-15 19:29:46,427][1651669] InferenceWorker_p0-w0: stopping experience collection (35050 times) [2024-06-15 19:29:46,668][1651274] Signal inference workers to resume experience collection... (35050 times) [2024-06-15 19:29:46,669][1651669] InferenceWorker_p0-w0: resuming experience collection (35050 times) [2024-06-15 19:29:47,304][1651669] Updated weights for policy 0, policy_version 669104 (0.0073) [2024-06-15 19:29:47,799][1651669] Updated weights for policy 0, policy_version 669120 (0.0124) [2024-06-15 19:29:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49698.1, 300 sec: 48874.4). Total num frames: 1370488832. Throughput: 0: 12449.5. Samples: 342689792. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:50,767][1648981] Avg episode reward: [(0, '806.630')] [2024-06-15 19:29:52,909][1651669] Updated weights for policy 0, policy_version 669201 (0.0011) [2024-06-15 19:29:54,435][1651669] Updated weights for policy 0, policy_version 669266 (0.0119) [2024-06-15 19:29:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 51883.1, 300 sec: 48874.3). Total num frames: 1370750976. Throughput: 0: 12561.1. Samples: 342733312. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:29:55,767][1648981] Avg episode reward: [(0, '806.180')] [2024-06-15 19:29:57,205][1651669] Updated weights for policy 0, policy_version 669313 (0.0020) [2024-06-15 19:29:58,706][1651669] Updated weights for policy 0, policy_version 669376 (0.0092) [2024-06-15 19:30:00,767][1648981] Fps is (10 sec: 52425.7, 60 sec: 51882.2, 300 sec: 48877.6). Total num frames: 1371013120. Throughput: 0: 12322.1. Samples: 342793216. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:30:00,768][1648981] Avg episode reward: [(0, '810.720')] [2024-06-15 19:30:04,576][1651669] Updated weights for policy 0, policy_version 669458 (0.0130) [2024-06-15 19:30:05,782][1648981] Fps is (10 sec: 39259.4, 60 sec: 50237.7, 300 sec: 48538.5). Total num frames: 1371144192. Throughput: 0: 12249.5. Samples: 342873088. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:30:05,783][1648981] Avg episode reward: [(0, '828.070')] [2024-06-15 19:30:06,409][1651274] Saving new best policy, reward=828.070! [2024-06-15 19:30:06,828][1651669] Updated weights for policy 0, policy_version 669552 (0.0014) [2024-06-15 19:30:10,245][1651669] Updated weights for policy 0, policy_version 669617 (0.0015) [2024-06-15 19:30:10,772][1648981] Fps is (10 sec: 39302.9, 60 sec: 49147.7, 300 sec: 48429.1). Total num frames: 1371406336. Throughput: 0: 12246.2. Samples: 342901248. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:30:10,772][1648981] Avg episode reward: [(0, '804.760')] [2024-06-15 19:30:11,764][1651669] Updated weights for policy 0, policy_version 669686 (0.0011) [2024-06-15 19:30:15,766][1648981] Fps is (10 sec: 45948.0, 60 sec: 49152.2, 300 sec: 48652.2). Total num frames: 1371602944. Throughput: 0: 12288.2. Samples: 342986240. Policy #0 lag: (min: 127.0, avg: 232.1, max: 367.0) [2024-06-15 19:30:15,767][1648981] Avg episode reward: [(0, '775.130')] [2024-06-15 19:30:16,160][1651669] Updated weights for policy 0, policy_version 669747 (0.0137) [2024-06-15 19:30:17,472][1651669] Updated weights for policy 0, policy_version 669818 (0.0043) [2024-06-15 19:30:20,419][1651669] Updated weights for policy 0, policy_version 669862 (0.0012) [2024-06-15 19:30:20,767][1648981] Fps is (10 sec: 49177.7, 60 sec: 48605.8, 300 sec: 48320.9). Total num frames: 1371897856. Throughput: 0: 11946.6. Samples: 343046144. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:20,767][1648981] Avg episode reward: [(0, '778.350')] [2024-06-15 19:30:21,698][1651669] Updated weights for policy 0, policy_version 669922 (0.0146) [2024-06-15 19:30:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48074.7, 300 sec: 48430.0). Total num frames: 1372061696. Throughput: 0: 12105.9. Samples: 343084032. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:25,767][1648981] Avg episode reward: [(0, '794.990')] [2024-06-15 19:30:25,882][1651274] Signal inference workers to stop experience collection... (35100 times) [2024-06-15 19:30:25,929][1651669] InferenceWorker_p0-w0: stopping experience collection (35100 times) [2024-06-15 19:30:26,253][1651274] Signal inference workers to resume experience collection... (35100 times) [2024-06-15 19:30:26,253][1651669] InferenceWorker_p0-w0: resuming experience collection (35100 times) [2024-06-15 19:30:26,477][1651669] Updated weights for policy 0, policy_version 669987 (0.0011) [2024-06-15 19:30:27,804][1651669] Updated weights for policy 0, policy_version 670040 (0.0015) [2024-06-15 19:30:30,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 49152.5, 300 sec: 48541.1). Total num frames: 1372389376. Throughput: 0: 12071.8. Samples: 343160832. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:30,767][1648981] Avg episode reward: [(0, '829.720')] [2024-06-15 19:30:30,912][1651669] Updated weights for policy 0, policy_version 670128 (0.0014) [2024-06-15 19:30:31,177][1651274] Saving new best policy, reward=829.720! [2024-06-15 19:30:32,574][1651669] Updated weights for policy 0, policy_version 670208 (0.0132) [2024-06-15 19:30:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 48654.1). Total num frames: 1372585984. Throughput: 0: 12162.8. Samples: 343237120. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:35,767][1648981] Avg episode reward: [(0, '860.750')] [2024-06-15 19:30:35,768][1651274] Saving new best policy, reward=860.750! [2024-06-15 19:30:37,448][1651669] Updated weights for policy 0, policy_version 670265 (0.0016) [2024-06-15 19:30:39,067][1651669] Updated weights for policy 0, policy_version 670320 (0.0012) [2024-06-15 19:30:40,433][1651669] Updated weights for policy 0, policy_version 670368 (0.0012) [2024-06-15 19:30:40,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49151.9, 300 sec: 48652.1). Total num frames: 1372913664. Throughput: 0: 12049.0. Samples: 343275520. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:40,767][1648981] Avg episode reward: [(0, '842.930')] [2024-06-15 19:30:42,106][1651669] Updated weights for policy 0, policy_version 670416 (0.0012) [2024-06-15 19:30:42,919][1651669] Updated weights for policy 0, policy_version 670463 (0.0012) [2024-06-15 19:30:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 1373110272. Throughput: 0: 12288.1. Samples: 343346176. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:45,767][1648981] Avg episode reward: [(0, '837.410')] [2024-06-15 19:30:48,457][1651669] Updated weights for policy 0, policy_version 670533 (0.0013) [2024-06-15 19:30:49,945][1651669] Updated weights for policy 0, policy_version 670599 (0.0013) [2024-06-15 19:30:50,782][1648981] Fps is (10 sec: 55619.0, 60 sec: 49685.1, 300 sec: 48871.7). Total num frames: 1373470720. Throughput: 0: 12208.4. Samples: 343422464. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:50,783][1648981] Avg episode reward: [(0, '852.550')] [2024-06-15 19:30:51,023][1651669] Updated weights for policy 0, policy_version 670655 (0.0013) [2024-06-15 19:30:52,723][1651669] Updated weights for policy 0, policy_version 670708 (0.0014) [2024-06-15 19:30:55,767][1648981] Fps is (10 sec: 52427.1, 60 sec: 48059.5, 300 sec: 48874.2). Total num frames: 1373634560. Throughput: 0: 12414.5. Samples: 343459840. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:30:55,767][1648981] Avg episode reward: [(0, '836.290')] [2024-06-15 19:30:55,784][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000670720_1373634560.pth... [2024-06-15 19:30:55,866][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000664960_1361838080.pth [2024-06-15 19:30:55,872][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000670720_1373634560.pth [2024-06-15 19:30:58,564][1651669] Updated weights for policy 0, policy_version 670768 (0.0012) [2024-06-15 19:31:00,318][1651669] Updated weights for policy 0, policy_version 670845 (0.0092) [2024-06-15 19:31:00,766][1648981] Fps is (10 sec: 42665.8, 60 sec: 48060.2, 300 sec: 48876.3). Total num frames: 1373896704. Throughput: 0: 12174.2. Samples: 343534080. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:00,767][1648981] Avg episode reward: [(0, '867.780')] [2024-06-15 19:31:00,768][1651274] Saving new best policy, reward=867.780! [2024-06-15 19:31:02,363][1651669] Updated weights for policy 0, policy_version 670904 (0.0011) [2024-06-15 19:31:03,175][1651274] Signal inference workers to stop experience collection... (35150 times) [2024-06-15 19:31:03,237][1651669] InferenceWorker_p0-w0: stopping experience collection (35150 times) [2024-06-15 19:31:03,388][1651274] Signal inference workers to resume experience collection... (35150 times) [2024-06-15 19:31:03,389][1651669] InferenceWorker_p0-w0: resuming experience collection (35150 times) [2024-06-15 19:31:03,391][1651669] Updated weights for policy 0, policy_version 670944 (0.0011) [2024-06-15 19:31:05,768][1648981] Fps is (10 sec: 52423.7, 60 sec: 50256.4, 300 sec: 48877.7). Total num frames: 1374158848. Throughput: 0: 12367.3. Samples: 343602688. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:05,768][1648981] Avg episode reward: [(0, '841.180')] [2024-06-15 19:31:09,721][1651669] Updated weights for policy 0, policy_version 671009 (0.0011) [2024-06-15 19:31:10,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 48064.1, 300 sec: 48652.2). Total num frames: 1374289920. Throughput: 0: 12595.3. Samples: 343650816. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:10,767][1648981] Avg episode reward: [(0, '858.080')] [2024-06-15 19:31:11,442][1651669] Updated weights for policy 0, policy_version 671074 (0.0011) [2024-06-15 19:31:13,397][1651669] Updated weights for policy 0, policy_version 671152 (0.0013) [2024-06-15 19:31:15,240][1651669] Updated weights for policy 0, policy_version 671229 (0.0104) [2024-06-15 19:31:15,766][1648981] Fps is (10 sec: 52436.0, 60 sec: 51336.6, 300 sec: 48874.3). Total num frames: 1374683136. Throughput: 0: 12014.9. Samples: 343701504. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:15,767][1648981] Avg episode reward: [(0, '860.960')] [2024-06-15 19:31:20,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 47513.7, 300 sec: 48652.2). Total num frames: 1374748672. Throughput: 0: 12174.2. Samples: 343784960. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:20,767][1648981] Avg episode reward: [(0, '858.370')] [2024-06-15 19:31:21,261][1651669] Updated weights for policy 0, policy_version 671286 (0.0013) [2024-06-15 19:31:22,293][1651669] Updated weights for policy 0, policy_version 671317 (0.0026) [2024-06-15 19:31:23,955][1651669] Updated weights for policy 0, policy_version 671392 (0.0021) [2024-06-15 19:31:25,656][1651669] Updated weights for policy 0, policy_version 671456 (0.0011) [2024-06-15 19:31:25,771][1648981] Fps is (10 sec: 45855.1, 60 sec: 51332.9, 300 sec: 48651.4). Total num frames: 1375141888. Throughput: 0: 11991.1. Samples: 343815168. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:25,771][1648981] Avg episode reward: [(0, '872.880')] [2024-06-15 19:31:26,353][1651274] Saving new best policy, reward=872.880! [2024-06-15 19:31:30,782][1648981] Fps is (10 sec: 45801.5, 60 sec: 46954.9, 300 sec: 48649.5). Total num frames: 1375207424. Throughput: 0: 11999.3. Samples: 343886336. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:30,783][1648981] Avg episode reward: [(0, '883.150')] [2024-06-15 19:31:30,784][1651274] Saving new best policy, reward=883.150! [2024-06-15 19:31:31,345][1651669] Updated weights for policy 0, policy_version 671504 (0.0011) [2024-06-15 19:31:33,362][1651669] Updated weights for policy 0, policy_version 671568 (0.0014) [2024-06-15 19:31:35,695][1651669] Updated weights for policy 0, policy_version 671664 (0.0017) [2024-06-15 19:31:35,766][1648981] Fps is (10 sec: 42616.9, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 1375567872. Throughput: 0: 11757.4. Samples: 343951360. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:35,767][1648981] Avg episode reward: [(0, '844.030')] [2024-06-15 19:31:37,625][1651669] Updated weights for policy 0, policy_version 671738 (0.0118) [2024-06-15 19:31:40,766][1648981] Fps is (10 sec: 52512.7, 60 sec: 46967.5, 300 sec: 48874.3). Total num frames: 1375731712. Throughput: 0: 11559.9. Samples: 343980032. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:40,767][1648981] Avg episode reward: [(0, '828.840')] [2024-06-15 19:31:42,822][1651669] Updated weights for policy 0, policy_version 671780 (0.0012) [2024-06-15 19:31:44,715][1651669] Updated weights for policy 0, policy_version 671827 (0.0014) [2024-06-15 19:31:45,095][1651274] Signal inference workers to stop experience collection... (35200 times) [2024-06-15 19:31:45,201][1651669] InferenceWorker_p0-w0: stopping experience collection (35200 times) [2024-06-15 19:31:45,457][1651274] Signal inference workers to resume experience collection... (35200 times) [2024-06-15 19:31:45,458][1651669] InferenceWorker_p0-w0: resuming experience collection (35200 times) [2024-06-15 19:31:45,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 1375961088. Throughput: 0: 11867.0. Samples: 344068096. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:45,767][1648981] Avg episode reward: [(0, '822.390')] [2024-06-15 19:31:46,404][1651669] Updated weights for policy 0, policy_version 671889 (0.0012) [2024-06-15 19:31:48,036][1651669] Updated weights for policy 0, policy_version 671955 (0.0090) [2024-06-15 19:31:50,769][1648981] Fps is (10 sec: 52418.0, 60 sec: 46431.9, 300 sec: 48874.9). Total num frames: 1376256000. Throughput: 0: 11673.4. Samples: 344128000. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:50,769][1648981] Avg episode reward: [(0, '823.110')] [2024-06-15 19:31:53,691][1651669] Updated weights for policy 0, policy_version 672022 (0.0014) [2024-06-15 19:31:55,617][1651669] Updated weights for policy 0, policy_version 672080 (0.0020) [2024-06-15 19:31:55,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46421.6, 300 sec: 48541.1). Total num frames: 1376419840. Throughput: 0: 11650.8. Samples: 344175104. Policy #0 lag: (min: 31.0, avg: 143.3, max: 287.0) [2024-06-15 19:31:55,767][1648981] Avg episode reward: [(0, '796.500')] [2024-06-15 19:31:57,152][1651669] Updated weights for policy 0, policy_version 672144 (0.0012) [2024-06-15 19:31:58,449][1651669] Updated weights for policy 0, policy_version 672194 (0.0011) [2024-06-15 19:32:00,766][1648981] Fps is (10 sec: 52440.1, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1376780288. Throughput: 0: 11798.8. Samples: 344232448. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:00,767][1648981] Avg episode reward: [(0, '813.170')] [2024-06-15 19:32:04,080][1651669] Updated weights for policy 0, policy_version 672259 (0.0013) [2024-06-15 19:32:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45876.2, 300 sec: 48541.0). Total num frames: 1376911360. Throughput: 0: 11753.2. Samples: 344313856. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:05,767][1648981] Avg episode reward: [(0, '780.280')] [2024-06-15 19:32:06,565][1651669] Updated weights for policy 0, policy_version 672321 (0.0013) [2024-06-15 19:32:08,194][1651669] Updated weights for policy 0, policy_version 672390 (0.0012) [2024-06-15 19:32:09,606][1651669] Updated weights for policy 0, policy_version 672448 (0.0010) [2024-06-15 19:32:10,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49698.0, 300 sec: 48763.2). Total num frames: 1377271808. Throughput: 0: 11822.7. Samples: 344347136. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:10,767][1648981] Avg episode reward: [(0, '785.780')] [2024-06-15 19:32:11,031][1651669] Updated weights for policy 0, policy_version 672512 (0.0016) [2024-06-15 19:32:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 44782.9, 300 sec: 48764.5). Total num frames: 1377370112. Throughput: 0: 11939.6. Samples: 344423424. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:15,767][1648981] Avg episode reward: [(0, '766.140')] [2024-06-15 19:32:17,420][1651669] Updated weights for policy 0, policy_version 672592 (0.0016) [2024-06-15 19:32:18,742][1651669] Updated weights for policy 0, policy_version 672644 (0.0013) [2024-06-15 19:32:20,098][1651669] Updated weights for policy 0, policy_version 672697 (0.0012) [2024-06-15 19:32:20,814][1648981] Fps is (10 sec: 45657.0, 60 sec: 49658.6, 300 sec: 48533.2). Total num frames: 1377730560. Throughput: 0: 11968.1. Samples: 344490496. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:20,815][1648981] Avg episode reward: [(0, '776.430')] [2024-06-15 19:32:21,256][1651274] Signal inference workers to stop experience collection... (35250 times) [2024-06-15 19:32:21,292][1651669] InferenceWorker_p0-w0: stopping experience collection (35250 times) [2024-06-15 19:32:21,519][1651274] Signal inference workers to resume experience collection... (35250 times) [2024-06-15 19:32:21,520][1651669] InferenceWorker_p0-w0: resuming experience collection (35250 times) [2024-06-15 19:32:21,665][1651669] Updated weights for policy 0, policy_version 672758 (0.0013) [2024-06-15 19:32:25,770][1648981] Fps is (10 sec: 45856.3, 60 sec: 44783.1, 300 sec: 48543.7). Total num frames: 1377828864. Throughput: 0: 12127.6. Samples: 344525824. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:25,771][1648981] Avg episode reward: [(0, '743.550')] [2024-06-15 19:32:26,376][1651669] Updated weights for policy 0, policy_version 672800 (0.0014) [2024-06-15 19:32:27,219][1651669] Updated weights for policy 0, policy_version 672832 (0.0013) [2024-06-15 19:32:28,984][1651669] Updated weights for policy 0, policy_version 672896 (0.0012) [2024-06-15 19:32:30,766][1648981] Fps is (10 sec: 49387.8, 60 sec: 50257.7, 300 sec: 48652.1). Total num frames: 1378222080. Throughput: 0: 12014.9. Samples: 344608768. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:30,767][1648981] Avg episode reward: [(0, '725.540')] [2024-06-15 19:32:30,974][1651669] Updated weights for policy 0, policy_version 672976 (0.0012) [2024-06-15 19:32:32,049][1651669] Updated weights for policy 0, policy_version 673018 (0.0011) [2024-06-15 19:32:35,766][1648981] Fps is (10 sec: 52450.4, 60 sec: 46421.4, 300 sec: 48763.2). Total num frames: 1378353152. Throughput: 0: 12277.2. Samples: 344680448. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:35,767][1648981] Avg episode reward: [(0, '726.880')] [2024-06-15 19:32:37,624][1651669] Updated weights for policy 0, policy_version 673077 (0.0035) [2024-06-15 19:32:39,575][1651669] Updated weights for policy 0, policy_version 673150 (0.0013) [2024-06-15 19:32:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.2, 300 sec: 48764.5). Total num frames: 1378713600. Throughput: 0: 12037.7. Samples: 344716800. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:40,767][1648981] Avg episode reward: [(0, '730.300')] [2024-06-15 19:32:40,859][1651669] Updated weights for policy 0, policy_version 673211 (0.0011) [2024-06-15 19:32:42,605][1651669] Updated weights for policy 0, policy_version 673270 (0.0011) [2024-06-15 19:32:45,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 48605.7, 300 sec: 48874.3). Total num frames: 1378877440. Throughput: 0: 12322.1. Samples: 344786944. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:45,768][1648981] Avg episode reward: [(0, '725.600')] [2024-06-15 19:32:47,565][1651669] Updated weights for policy 0, policy_version 673302 (0.0011) [2024-06-15 19:32:48,458][1651669] Updated weights for policy 0, policy_version 673342 (0.0010) [2024-06-15 19:32:50,567][1651669] Updated weights for policy 0, policy_version 673395 (0.0064) [2024-06-15 19:32:50,778][1648981] Fps is (10 sec: 42548.1, 60 sec: 48052.0, 300 sec: 48428.1). Total num frames: 1379139584. Throughput: 0: 12159.6. Samples: 344861184. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:50,779][1648981] Avg episode reward: [(0, '683.410')] [2024-06-15 19:32:52,326][1651669] Updated weights for policy 0, policy_version 673472 (0.0013) [2024-06-15 19:32:55,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49697.9, 300 sec: 48874.2). Total num frames: 1379401728. Throughput: 0: 12049.0. Samples: 344889344. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:32:55,767][1648981] Avg episode reward: [(0, '679.930')] [2024-06-15 19:32:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000673536_1379401728.pth... [2024-06-15 19:32:55,855][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000667824_1367703552.pth [2024-06-15 19:32:58,217][1651669] Updated weights for policy 0, policy_version 673538 (0.0015) [2024-06-15 19:32:59,607][1651669] Updated weights for policy 0, policy_version 673596 (0.0013) [2024-06-15 19:33:00,767][1648981] Fps is (10 sec: 42648.2, 60 sec: 46421.2, 300 sec: 48541.0). Total num frames: 1379565568. Throughput: 0: 12242.4. Samples: 344974336. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:00,767][1648981] Avg episode reward: [(0, '666.590')] [2024-06-15 19:33:01,883][1651669] Updated weights for policy 0, policy_version 673665 (0.0016) [2024-06-15 19:33:02,203][1651274] Signal inference workers to stop experience collection... (35300 times) [2024-06-15 19:33:02,258][1651669] InferenceWorker_p0-w0: stopping experience collection (35300 times) [2024-06-15 19:33:02,557][1651274] Signal inference workers to resume experience collection... (35300 times) [2024-06-15 19:33:02,558][1651669] InferenceWorker_p0-w0: resuming experience collection (35300 times) [2024-06-15 19:33:03,278][1651669] Updated weights for policy 0, policy_version 673728 (0.0011) [2024-06-15 19:33:04,486][1651669] Updated weights for policy 0, policy_version 673783 (0.0013) [2024-06-15 19:33:05,766][1648981] Fps is (10 sec: 52431.1, 60 sec: 50244.3, 300 sec: 48876.9). Total num frames: 1379926016. Throughput: 0: 12210.0. Samples: 345039360. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:05,767][1648981] Avg episode reward: [(0, '725.870')] [2024-06-15 19:33:10,005][1651669] Updated weights for policy 0, policy_version 673840 (0.0017) [2024-06-15 19:33:10,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 46421.3, 300 sec: 48541.5). Total num frames: 1380057088. Throughput: 0: 12448.4. Samples: 345085952. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:10,767][1648981] Avg episode reward: [(0, '737.340')] [2024-06-15 19:33:11,878][1651669] Updated weights for policy 0, policy_version 673890 (0.0014) [2024-06-15 19:33:13,022][1651669] Updated weights for policy 0, policy_version 673938 (0.0011) [2024-06-15 19:33:14,426][1651669] Updated weights for policy 0, policy_version 674000 (0.0014) [2024-06-15 19:33:15,490][1651669] Updated weights for policy 0, policy_version 674047 (0.0013) [2024-06-15 19:33:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 51336.5, 300 sec: 48874.3). Total num frames: 1380450304. Throughput: 0: 11912.5. Samples: 345144832. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:15,767][1648981] Avg episode reward: [(0, '745.750')] [2024-06-15 19:33:20,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46458.3, 300 sec: 48652.2). Total num frames: 1380515840. Throughput: 0: 12071.8. Samples: 345223680. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:20,767][1648981] Avg episode reward: [(0, '726.710')] [2024-06-15 19:33:21,116][1651669] Updated weights for policy 0, policy_version 674107 (0.0012) [2024-06-15 19:33:23,404][1651669] Updated weights for policy 0, policy_version 674150 (0.0011) [2024-06-15 19:33:25,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 50247.7, 300 sec: 48652.2). Total num frames: 1380843520. Throughput: 0: 12037.7. Samples: 345258496. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:25,767][1648981] Avg episode reward: [(0, '768.130')] [2024-06-15 19:33:25,825][1651669] Updated weights for policy 0, policy_version 674256 (0.0013) [2024-06-15 19:33:26,946][1651669] Updated weights for policy 0, policy_version 674299 (0.0011) [2024-06-15 19:33:30,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 48652.1). Total num frames: 1380974592. Throughput: 0: 11878.4. Samples: 345321472. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:30,767][1648981] Avg episode reward: [(0, '798.790')] [2024-06-15 19:33:33,127][1651669] Updated weights for policy 0, policy_version 674352 (0.0012) [2024-06-15 19:33:34,400][1651669] Updated weights for policy 0, policy_version 674388 (0.0012) [2024-06-15 19:33:35,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 1381236736. Throughput: 0: 11824.6. Samples: 345393152. Policy #0 lag: (min: 47.0, avg: 186.5, max: 287.0) [2024-06-15 19:33:35,767][1648981] Avg episode reward: [(0, '786.710')] [2024-06-15 19:33:36,390][1651669] Updated weights for policy 0, policy_version 674466 (0.0011) [2024-06-15 19:33:37,732][1651669] Updated weights for policy 0, policy_version 674528 (0.0012) [2024-06-15 19:33:40,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 46421.4, 300 sec: 48652.4). Total num frames: 1381498880. Throughput: 0: 11810.2. Samples: 345420800. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:33:40,767][1648981] Avg episode reward: [(0, '816.310')] [2024-06-15 19:33:43,241][1651669] Updated weights for policy 0, policy_version 674561 (0.0014) [2024-06-15 19:33:44,009][1651274] Signal inference workers to stop experience collection... (35350 times) [2024-06-15 19:33:44,061][1651669] InferenceWorker_p0-w0: stopping experience collection (35350 times) [2024-06-15 19:33:44,247][1651274] Signal inference workers to resume experience collection... (35350 times) [2024-06-15 19:33:44,249][1651669] InferenceWorker_p0-w0: resuming experience collection (35350 times) [2024-06-15 19:33:44,668][1651669] Updated weights for policy 0, policy_version 674624 (0.0013) [2024-06-15 19:33:45,768][1648981] Fps is (10 sec: 45865.8, 60 sec: 46966.0, 300 sec: 48096.4). Total num frames: 1381695488. Throughput: 0: 11764.1. Samples: 345503744. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:33:45,769][1648981] Avg episode reward: [(0, '838.350')] [2024-06-15 19:33:46,190][1651669] Updated weights for policy 0, policy_version 674675 (0.0011) [2024-06-15 19:33:48,363][1651669] Updated weights for policy 0, policy_version 674768 (0.0015) [2024-06-15 19:33:49,293][1651669] Updated weights for policy 0, policy_version 674814 (0.0014) [2024-06-15 19:33:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48069.2, 300 sec: 48763.3). Total num frames: 1382023168. Throughput: 0: 11616.7. Samples: 345562112. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:33:50,767][1648981] Avg episode reward: [(0, '815.680')] [2024-06-15 19:33:55,766][1648981] Fps is (10 sec: 42607.2, 60 sec: 45329.4, 300 sec: 48207.8). Total num frames: 1382121472. Throughput: 0: 11571.2. Samples: 345606656. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:33:55,767][1648981] Avg episode reward: [(0, '812.780')] [2024-06-15 19:33:55,843][1651669] Updated weights for policy 0, policy_version 674866 (0.0017) [2024-06-15 19:33:57,510][1651669] Updated weights for policy 0, policy_version 674932 (0.0157) [2024-06-15 19:33:59,022][1651669] Updated weights for policy 0, policy_version 674992 (0.0093) [2024-06-15 19:34:00,523][1651669] Updated weights for policy 0, policy_version 675042 (0.0011) [2024-06-15 19:34:00,782][1648981] Fps is (10 sec: 49074.5, 60 sec: 49139.2, 300 sec: 48762.0). Total num frames: 1382514688. Throughput: 0: 11635.4. Samples: 345668608. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:00,783][1648981] Avg episode reward: [(0, '832.730')] [2024-06-15 19:34:05,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 43690.7, 300 sec: 47763.5). Total num frames: 1382547456. Throughput: 0: 11548.5. Samples: 345743360. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:05,767][1648981] Avg episode reward: [(0, '837.340')] [2024-06-15 19:34:06,844][1651669] Updated weights for policy 0, policy_version 675104 (0.0012) [2024-06-15 19:34:08,517][1651669] Updated weights for policy 0, policy_version 675168 (0.0013) [2024-06-15 19:34:10,394][1651669] Updated weights for policy 0, policy_version 675232 (0.0012) [2024-06-15 19:34:10,770][1648981] Fps is (10 sec: 36088.2, 60 sec: 46964.6, 300 sec: 48207.3). Total num frames: 1382875136. Throughput: 0: 11570.2. Samples: 345779200. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:10,770][1648981] Avg episode reward: [(0, '827.460')] [2024-06-15 19:34:12,357][1651669] Updated weights for policy 0, policy_version 675312 (0.0111) [2024-06-15 19:34:15,770][1648981] Fps is (10 sec: 52410.6, 60 sec: 43688.2, 300 sec: 47763.0). Total num frames: 1383071744. Throughput: 0: 11422.4. Samples: 345835520. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:15,770][1648981] Avg episode reward: [(0, '798.930')] [2024-06-15 19:34:18,352][1651669] Updated weights for policy 0, policy_version 675344 (0.0012) [2024-06-15 19:34:19,834][1651669] Updated weights for policy 0, policy_version 675394 (0.0148) [2024-06-15 19:34:20,766][1648981] Fps is (10 sec: 39336.2, 60 sec: 45875.2, 300 sec: 47766.6). Total num frames: 1383268352. Throughput: 0: 11537.1. Samples: 345912320. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:20,767][1648981] Avg episode reward: [(0, '847.420')] [2024-06-15 19:34:21,805][1651669] Updated weights for policy 0, policy_version 675472 (0.0013) [2024-06-15 19:34:21,920][1651274] Signal inference workers to stop experience collection... (35400 times) [2024-06-15 19:34:21,960][1651669] InferenceWorker_p0-w0: stopping experience collection (35400 times) [2024-06-15 19:34:22,280][1651274] Signal inference workers to resume experience collection... (35400 times) [2024-06-15 19:34:22,281][1651669] InferenceWorker_p0-w0: resuming experience collection (35400 times) [2024-06-15 19:34:24,331][1651669] Updated weights for policy 0, policy_version 675568 (0.0114) [2024-06-15 19:34:25,786][1648981] Fps is (10 sec: 52343.6, 60 sec: 45860.1, 300 sec: 47982.6). Total num frames: 1383596032. Throughput: 0: 11350.0. Samples: 345931776. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:25,787][1648981] Avg episode reward: [(0, '839.680')] [2024-06-15 19:34:30,774][1648981] Fps is (10 sec: 39291.1, 60 sec: 44777.2, 300 sec: 47318.0). Total num frames: 1383661568. Throughput: 0: 11171.6. Samples: 346006528. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:30,775][1648981] Avg episode reward: [(0, '838.240')] [2024-06-15 19:34:31,192][1651669] Updated weights for policy 0, policy_version 675642 (0.0064) [2024-06-15 19:34:32,463][1651669] Updated weights for policy 0, policy_version 675680 (0.0105) [2024-06-15 19:34:34,659][1651669] Updated weights for policy 0, policy_version 675776 (0.0014) [2024-06-15 19:34:35,770][1648981] Fps is (10 sec: 45948.2, 60 sec: 46964.5, 300 sec: 47762.9). Total num frames: 1384054784. Throughput: 0: 11240.3. Samples: 346067968. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:35,771][1648981] Avg episode reward: [(0, '838.760')] [2024-06-15 19:34:36,411][1651669] Updated weights for policy 0, policy_version 675836 (0.0181) [2024-06-15 19:34:40,790][1648981] Fps is (10 sec: 45801.6, 60 sec: 43673.3, 300 sec: 47093.2). Total num frames: 1384120320. Throughput: 0: 11007.8. Samples: 346102272. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:40,791][1648981] Avg episode reward: [(0, '839.910')] [2024-06-15 19:34:43,638][1651669] Updated weights for policy 0, policy_version 675904 (0.0013) [2024-06-15 19:34:45,195][1651669] Updated weights for policy 0, policy_version 675968 (0.0010) [2024-06-15 19:34:45,774][1648981] Fps is (10 sec: 36032.5, 60 sec: 45325.1, 300 sec: 47207.0). Total num frames: 1384415232. Throughput: 0: 11186.5. Samples: 346171904. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:45,777][1648981] Avg episode reward: [(0, '849.310')] [2024-06-15 19:34:47,611][1651669] Updated weights for policy 0, policy_version 676051 (0.0012) [2024-06-15 19:34:48,516][1651669] Updated weights for policy 0, policy_version 676095 (0.0013) [2024-06-15 19:34:50,772][1648981] Fps is (10 sec: 52524.6, 60 sec: 43686.6, 300 sec: 47096.2). Total num frames: 1384644608. Throughput: 0: 10807.5. Samples: 346229760. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:50,773][1648981] Avg episode reward: [(0, '777.490')] [2024-06-15 19:34:55,466][1651669] Updated weights for policy 0, policy_version 676160 (0.0117) [2024-06-15 19:34:55,766][1648981] Fps is (10 sec: 39349.5, 60 sec: 44782.8, 300 sec: 46763.9). Total num frames: 1384808448. Throughput: 0: 10969.0. Samples: 346272768. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:34:55,767][1648981] Avg episode reward: [(0, '795.570')] [2024-06-15 19:34:56,223][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000676192_1384841216.pth... [2024-06-15 19:34:56,335][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000670720_1373634560.pth [2024-06-15 19:34:57,606][1651669] Updated weights for policy 0, policy_version 676240 (0.0011) [2024-06-15 19:34:59,741][1651669] Updated weights for policy 0, policy_version 676309 (0.0013) [2024-06-15 19:35:00,766][1648981] Fps is (10 sec: 49179.8, 60 sec: 43702.2, 300 sec: 47432.8). Total num frames: 1385136128. Throughput: 0: 10821.1. Samples: 346322432. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:35:00,767][1648981] Avg episode reward: [(0, '787.470')] [2024-06-15 19:35:05,766][1648981] Fps is (10 sec: 36045.3, 60 sec: 43690.6, 300 sec: 46653.6). Total num frames: 1385168896. Throughput: 0: 10820.3. Samples: 346399232. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:35:05,767][1648981] Avg episode reward: [(0, '794.730')] [2024-06-15 19:35:06,842][1651274] Signal inference workers to stop experience collection... (35450 times) [2024-06-15 19:35:06,883][1651669] InferenceWorker_p0-w0: stopping experience collection (35450 times) [2024-06-15 19:35:07,030][1651274] Signal inference workers to resume experience collection... (35450 times) [2024-06-15 19:35:07,031][1651669] InferenceWorker_p0-w0: resuming experience collection (35450 times) [2024-06-15 19:35:07,224][1651669] Updated weights for policy 0, policy_version 676373 (0.0029) [2024-06-15 19:35:08,718][1651669] Updated weights for policy 0, policy_version 676435 (0.0011) [2024-06-15 19:35:10,767][1648981] Fps is (10 sec: 36043.7, 60 sec: 43693.2, 300 sec: 47097.0). Total num frames: 1385496576. Throughput: 0: 11109.5. Samples: 346431488. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:35:10,768][1648981] Avg episode reward: [(0, '810.480')] [2024-06-15 19:35:10,816][1651669] Updated weights for policy 0, policy_version 676518 (0.0075) [2024-06-15 19:35:12,136][1651669] Updated weights for policy 0, policy_version 676578 (0.0013) [2024-06-15 19:35:15,768][1648981] Fps is (10 sec: 52418.7, 60 sec: 43691.7, 300 sec: 46763.5). Total num frames: 1385693184. Throughput: 0: 10889.9. Samples: 346496512. Policy #0 lag: (min: 132.0, avg: 217.9, max: 356.0) [2024-06-15 19:35:15,769][1648981] Avg episode reward: [(0, '751.720')] [2024-06-15 19:35:18,674][1651669] Updated weights for policy 0, policy_version 676640 (0.0088) [2024-06-15 19:35:20,379][1651669] Updated weights for policy 0, policy_version 676708 (0.0076) [2024-06-15 19:35:20,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 44236.8, 300 sec: 46986.0). Total num frames: 1385922560. Throughput: 0: 11173.9. Samples: 346570752. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:20,767][1648981] Avg episode reward: [(0, '736.250')] [2024-06-15 19:35:21,969][1651669] Updated weights for policy 0, policy_version 676784 (0.0111) [2024-06-15 19:35:23,729][1651669] Updated weights for policy 0, policy_version 676864 (0.0021) [2024-06-15 19:35:25,766][1648981] Fps is (10 sec: 52439.0, 60 sec: 43705.0, 300 sec: 46874.9). Total num frames: 1386217472. Throughput: 0: 10939.8. Samples: 346594304. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:25,767][1648981] Avg episode reward: [(0, '745.150')] [2024-06-15 19:35:30,585][1651669] Updated weights for policy 0, policy_version 676929 (0.0012) [2024-06-15 19:35:30,767][1648981] Fps is (10 sec: 42594.6, 60 sec: 44788.1, 300 sec: 46652.6). Total num frames: 1386348544. Throughput: 0: 11265.6. Samples: 346678784. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:30,768][1648981] Avg episode reward: [(0, '762.480')] [2024-06-15 19:35:32,182][1651669] Updated weights for policy 0, policy_version 677008 (0.0033) [2024-06-15 19:35:33,808][1651669] Updated weights for policy 0, policy_version 677073 (0.0011) [2024-06-15 19:35:34,899][1651669] Updated weights for policy 0, policy_version 677118 (0.0026) [2024-06-15 19:35:35,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 44785.6, 300 sec: 46874.9). Total num frames: 1386741760. Throughput: 0: 11219.8. Samples: 346734592. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:35,768][1648981] Avg episode reward: [(0, '743.570')] [2024-06-15 19:35:40,773][1648981] Fps is (10 sec: 45847.5, 60 sec: 44795.5, 300 sec: 46429.5). Total num frames: 1386807296. Throughput: 0: 11387.4. Samples: 346785280. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:40,774][1648981] Avg episode reward: [(0, '747.490')] [2024-06-15 19:35:40,965][1651669] Updated weights for policy 0, policy_version 677169 (0.0015) [2024-06-15 19:35:41,567][1651274] Signal inference workers to stop experience collection... (35500 times) [2024-06-15 19:35:41,637][1651669] InferenceWorker_p0-w0: stopping experience collection (35500 times) [2024-06-15 19:35:41,775][1651274] Signal inference workers to resume experience collection... (35500 times) [2024-06-15 19:35:41,776][1651669] InferenceWorker_p0-w0: resuming experience collection (35500 times) [2024-06-15 19:35:42,506][1651669] Updated weights for policy 0, policy_version 677236 (0.0027) [2024-06-15 19:35:44,805][1651669] Updated weights for policy 0, policy_version 677328 (0.0012) [2024-06-15 19:35:45,766][1648981] Fps is (10 sec: 49153.4, 60 sec: 46973.2, 300 sec: 46655.2). Total num frames: 1387233280. Throughput: 0: 11559.8. Samples: 346842624. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:45,767][1648981] Avg episode reward: [(0, '749.520')] [2024-06-15 19:35:45,880][1651669] Updated weights for policy 0, policy_version 677376 (0.0011) [2024-06-15 19:35:50,766][1648981] Fps is (10 sec: 49186.0, 60 sec: 44240.9, 300 sec: 46319.6). Total num frames: 1387298816. Throughput: 0: 11821.5. Samples: 346931200. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:50,767][1648981] Avg episode reward: [(0, '727.990')] [2024-06-15 19:35:52,204][1651669] Updated weights for policy 0, policy_version 677456 (0.0012) [2024-06-15 19:35:53,919][1651669] Updated weights for policy 0, policy_version 677523 (0.0096) [2024-06-15 19:35:55,586][1651669] Updated weights for policy 0, policy_version 677588 (0.0014) [2024-06-15 19:35:55,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 48059.9, 300 sec: 46763.8). Total num frames: 1387692032. Throughput: 0: 11707.8. Samples: 346958336. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:35:55,767][1648981] Avg episode reward: [(0, '751.310')] [2024-06-15 19:36:00,768][1648981] Fps is (10 sec: 49144.3, 60 sec: 44235.6, 300 sec: 46208.4). Total num frames: 1387790336. Throughput: 0: 11935.4. Samples: 347033600. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:00,769][1648981] Avg episode reward: [(0, '731.240')] [2024-06-15 19:36:00,804][1651669] Updated weights for policy 0, policy_version 677634 (0.0012) [2024-06-15 19:36:01,991][1651669] Updated weights for policy 0, policy_version 677694 (0.0029) [2024-06-15 19:36:03,214][1651669] Updated weights for policy 0, policy_version 677745 (0.0012) [2024-06-15 19:36:04,746][1651669] Updated weights for policy 0, policy_version 677811 (0.0012) [2024-06-15 19:36:05,767][1648981] Fps is (10 sec: 58982.0, 60 sec: 51882.6, 300 sec: 47430.3). Total num frames: 1388281856. Throughput: 0: 11980.8. Samples: 347109888. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:05,769][1648981] Avg episode reward: [(0, '712.990')] [2024-06-15 19:36:10,766][1648981] Fps is (10 sec: 52437.2, 60 sec: 46967.7, 300 sec: 46208.4). Total num frames: 1388314624. Throughput: 0: 12356.3. Samples: 347150336. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:10,767][1648981] Avg episode reward: [(0, '719.170')] [2024-06-15 19:36:11,124][1651669] Updated weights for policy 0, policy_version 677890 (0.0091) [2024-06-15 19:36:12,174][1651669] Updated weights for policy 0, policy_version 677949 (0.0095) [2024-06-15 19:36:13,986][1651669] Updated weights for policy 0, policy_version 678017 (0.0013) [2024-06-15 19:36:15,103][1651669] Updated weights for policy 0, policy_version 678075 (0.0014) [2024-06-15 19:36:15,775][1648981] Fps is (10 sec: 45837.8, 60 sec: 50785.1, 300 sec: 47429.0). Total num frames: 1388740608. Throughput: 0: 12172.2. Samples: 347226624. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:15,775][1648981] Avg episode reward: [(0, '739.250')] [2024-06-15 19:36:16,090][1651274] Signal inference workers to stop experience collection... (35550 times) [2024-06-15 19:36:16,134][1651669] InferenceWorker_p0-w0: stopping experience collection (35550 times) [2024-06-15 19:36:16,243][1651274] Signal inference workers to resume experience collection... (35550 times) [2024-06-15 19:36:16,244][1651669] InferenceWorker_p0-w0: resuming experience collection (35550 times) [2024-06-15 19:36:16,353][1651669] Updated weights for policy 0, policy_version 678136 (0.0013) [2024-06-15 19:36:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 46431.3). Total num frames: 1388838912. Throughput: 0: 12709.0. Samples: 347306496. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:20,767][1648981] Avg episode reward: [(0, '735.440')] [2024-06-15 19:36:22,347][1651669] Updated weights for policy 0, policy_version 678192 (0.0012) [2024-06-15 19:36:24,026][1651669] Updated weights for policy 0, policy_version 678244 (0.0012) [2024-06-15 19:36:25,161][1651669] Updated weights for policy 0, policy_version 678305 (0.0012) [2024-06-15 19:36:25,606][1651669] Updated weights for policy 0, policy_version 678336 (0.0019) [2024-06-15 19:36:25,766][1648981] Fps is (10 sec: 49192.5, 60 sec: 50244.3, 300 sec: 47544.0). Total num frames: 1389232128. Throughput: 0: 12335.4. Samples: 347340288. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:25,767][1648981] Avg episode reward: [(0, '727.560')] [2024-06-15 19:36:27,181][1651669] Updated weights for policy 0, policy_version 678394 (0.0161) [2024-06-15 19:36:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50245.0, 300 sec: 46763.8). Total num frames: 1389363200. Throughput: 0: 12743.1. Samples: 347416064. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:30,767][1648981] Avg episode reward: [(0, '725.460')] [2024-06-15 19:36:32,667][1651669] Updated weights for policy 0, policy_version 678434 (0.0013) [2024-06-15 19:36:34,038][1651669] Updated weights for policy 0, policy_version 678496 (0.0011) [2024-06-15 19:36:35,676][1651669] Updated weights for policy 0, policy_version 678550 (0.0012) [2024-06-15 19:36:35,767][1648981] Fps is (10 sec: 42594.7, 60 sec: 48605.3, 300 sec: 47208.0). Total num frames: 1389658112. Throughput: 0: 12378.8. Samples: 347488256. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:35,768][1648981] Avg episode reward: [(0, '720.390')] [2024-06-15 19:36:36,822][1651669] Updated weights for policy 0, policy_version 678609 (0.0013) [2024-06-15 19:36:37,664][1651669] Updated weights for policy 0, policy_version 678649 (0.0011) [2024-06-15 19:36:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 51342.5, 300 sec: 47208.1). Total num frames: 1389887488. Throughput: 0: 12697.6. Samples: 347529728. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:40,767][1648981] Avg episode reward: [(0, '713.910')] [2024-06-15 19:36:41,568][1651669] Updated weights for policy 0, policy_version 678674 (0.0013) [2024-06-15 19:36:42,561][1651669] Updated weights for policy 0, policy_version 678720 (0.0012) [2024-06-15 19:36:44,622][1651669] Updated weights for policy 0, policy_version 678772 (0.0033) [2024-06-15 19:36:45,767][1648981] Fps is (10 sec: 49155.3, 60 sec: 48605.6, 300 sec: 47097.4). Total num frames: 1390149632. Throughput: 0: 12652.5. Samples: 347602944. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:45,767][1648981] Avg episode reward: [(0, '685.250')] [2024-06-15 19:36:46,401][1651669] Updated weights for policy 0, policy_version 678800 (0.0013) [2024-06-15 19:36:47,645][1651669] Updated weights for policy 0, policy_version 678866 (0.0010) [2024-06-15 19:36:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 51882.7, 300 sec: 47430.3). Total num frames: 1390411776. Throughput: 0: 12777.3. Samples: 347684864. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:50,767][1648981] Avg episode reward: [(0, '695.350')] [2024-06-15 19:36:51,212][1651669] Updated weights for policy 0, policy_version 678918 (0.0028) [2024-06-15 19:36:53,427][1651669] Updated weights for policy 0, policy_version 678979 (0.0013) [2024-06-15 19:36:54,800][1651669] Updated weights for policy 0, policy_version 679032 (0.0011) [2024-06-15 19:36:55,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 49698.2, 300 sec: 47097.1). Total num frames: 1390673920. Throughput: 0: 12754.5. Samples: 347724288. Policy #0 lag: (min: 13.0, avg: 71.4, max: 269.0) [2024-06-15 19:36:55,767][1648981] Avg episode reward: [(0, '683.880')] [2024-06-15 19:36:55,796][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000679040_1390673920.pth... [2024-06-15 19:36:55,842][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000673536_1379401728.pth [2024-06-15 19:36:57,020][1651669] Updated weights for policy 0, policy_version 679072 (0.0011) [2024-06-15 19:36:58,080][1651274] Signal inference workers to stop experience collection... (35600 times) [2024-06-15 19:36:58,132][1651669] InferenceWorker_p0-w0: stopping experience collection (35600 times) [2024-06-15 19:36:58,229][1651274] Signal inference workers to resume experience collection... (35600 times) [2024-06-15 19:36:58,229][1651669] InferenceWorker_p0-w0: resuming experience collection (35600 times) [2024-06-15 19:36:58,987][1651669] Updated weights for policy 0, policy_version 679157 (0.0030) [2024-06-15 19:37:00,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 52430.1, 300 sec: 47541.3). Total num frames: 1390936064. Throughput: 0: 12586.1. Samples: 347792896. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:00,769][1648981] Avg episode reward: [(0, '702.760')] [2024-06-15 19:37:02,658][1651669] Updated weights for policy 0, policy_version 679200 (0.0012) [2024-06-15 19:37:05,262][1651669] Updated weights for policy 0, policy_version 679284 (0.0141) [2024-06-15 19:37:05,788][1648981] Fps is (10 sec: 52314.0, 60 sec: 48588.2, 300 sec: 47204.6). Total num frames: 1391198208. Throughput: 0: 12304.8. Samples: 347860480. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:05,789][1648981] Avg episode reward: [(0, '706.490')] [2024-06-15 19:37:07,496][1651669] Updated weights for policy 0, policy_version 679312 (0.0014) [2024-06-15 19:37:09,511][1651669] Updated weights for policy 0, policy_version 679379 (0.0013) [2024-06-15 19:37:10,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 52428.9, 300 sec: 47763.5). Total num frames: 1391460352. Throughput: 0: 12504.2. Samples: 347902976. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:10,767][1648981] Avg episode reward: [(0, '685.490')] [2024-06-15 19:37:13,423][1651669] Updated weights for policy 0, policy_version 679444 (0.0011) [2024-06-15 19:37:15,622][1651669] Updated weights for policy 0, policy_version 679520 (0.0012) [2024-06-15 19:37:15,766][1648981] Fps is (10 sec: 45976.2, 60 sec: 48612.6, 300 sec: 47215.8). Total num frames: 1391656960. Throughput: 0: 12447.3. Samples: 347976192. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:15,767][1648981] Avg episode reward: [(0, '700.480')] [2024-06-15 19:37:18,580][1651669] Updated weights for policy 0, policy_version 679570 (0.0014) [2024-06-15 19:37:20,553][1651669] Updated weights for policy 0, policy_version 679648 (0.0125) [2024-06-15 19:37:20,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 51336.4, 300 sec: 47764.2). Total num frames: 1391919104. Throughput: 0: 12322.3. Samples: 348042752. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:20,767][1648981] Avg episode reward: [(0, '702.410')] [2024-06-15 19:37:21,375][1651669] Updated weights for policy 0, policy_version 679680 (0.0016) [2024-06-15 19:37:24,666][1651669] Updated weights for policy 0, policy_version 679741 (0.0013) [2024-06-15 19:37:25,767][1648981] Fps is (10 sec: 45873.6, 60 sec: 48059.5, 300 sec: 47097.0). Total num frames: 1392115712. Throughput: 0: 12310.7. Samples: 348083712. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:25,767][1648981] Avg episode reward: [(0, '711.260')] [2024-06-15 19:37:27,680][1651669] Updated weights for policy 0, policy_version 679803 (0.0014) [2024-06-15 19:37:29,798][1651669] Updated weights for policy 0, policy_version 679856 (0.0011) [2024-06-15 19:37:30,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 50790.2, 300 sec: 47652.4). Total num frames: 1392410624. Throughput: 0: 12299.4. Samples: 348156416. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:30,767][1648981] Avg episode reward: [(0, '710.590')] [2024-06-15 19:37:31,428][1651669] Updated weights for policy 0, policy_version 679920 (0.0029) [2024-06-15 19:37:34,701][1651669] Updated weights for policy 0, policy_version 679952 (0.0129) [2024-06-15 19:37:35,499][1651669] Updated weights for policy 0, policy_version 679997 (0.0011) [2024-06-15 19:37:35,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 49698.9, 300 sec: 47208.1). Total num frames: 1392640000. Throughput: 0: 12151.5. Samples: 348231680. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:35,767][1648981] Avg episode reward: [(0, '727.310')] [2024-06-15 19:37:38,760][1651669] Updated weights for policy 0, policy_version 680055 (0.0015) [2024-06-15 19:37:40,468][1651274] Signal inference workers to stop experience collection... (35650 times) [2024-06-15 19:37:40,504][1651669] InferenceWorker_p0-w0: stopping experience collection (35650 times) [2024-06-15 19:37:40,670][1651274] Signal inference workers to resume experience collection... (35650 times) [2024-06-15 19:37:40,670][1651669] InferenceWorker_p0-w0: resuming experience collection (35650 times) [2024-06-15 19:37:40,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 49698.1, 300 sec: 47430.3). Total num frames: 1392869376. Throughput: 0: 12014.9. Samples: 348264960. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:40,767][1648981] Avg episode reward: [(0, '723.000')] [2024-06-15 19:37:40,926][1651669] Updated weights for policy 0, policy_version 680119 (0.0029) [2024-06-15 19:37:42,394][1651669] Updated weights for policy 0, policy_version 680181 (0.0022) [2024-06-15 19:37:45,203][1651669] Updated weights for policy 0, policy_version 680224 (0.0011) [2024-06-15 19:37:45,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 49698.1, 300 sec: 47432.2). Total num frames: 1393131520. Throughput: 0: 12185.6. Samples: 348341248. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:45,767][1648981] Avg episode reward: [(0, '736.640')] [2024-06-15 19:37:50,147][1651669] Updated weights for policy 0, policy_version 680320 (0.0013) [2024-06-15 19:37:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1393295360. Throughput: 0: 12180.2. Samples: 348408320. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:50,767][1648981] Avg episode reward: [(0, '759.380')] [2024-06-15 19:37:52,013][1651669] Updated weights for policy 0, policy_version 680391 (0.0012) [2024-06-15 19:37:55,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1393557504. Throughput: 0: 12037.7. Samples: 348444672. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:37:55,767][1648981] Avg episode reward: [(0, '709.630')] [2024-06-15 19:37:56,123][1651669] Updated weights for policy 0, policy_version 680464 (0.0012) [2024-06-15 19:38:00,178][1651669] Updated weights for policy 0, policy_version 680532 (0.0163) [2024-06-15 19:38:00,778][1648981] Fps is (10 sec: 49094.0, 60 sec: 47504.4, 300 sec: 46984.1). Total num frames: 1393786880. Throughput: 0: 12205.1. Samples: 348525568. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:00,779][1648981] Avg episode reward: [(0, '718.830')] [2024-06-15 19:38:01,167][1651669] Updated weights for policy 0, policy_version 680579 (0.0123) [2024-06-15 19:38:03,248][1651669] Updated weights for policy 0, policy_version 680672 (0.0076) [2024-06-15 19:38:05,770][1648981] Fps is (10 sec: 52409.4, 60 sec: 48074.3, 300 sec: 47540.8). Total num frames: 1394081792. Throughput: 0: 12025.3. Samples: 348583936. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:05,771][1648981] Avg episode reward: [(0, '736.200')] [2024-06-15 19:38:08,719][1651669] Updated weights for policy 0, policy_version 680752 (0.0011) [2024-06-15 19:38:10,766][1648981] Fps is (10 sec: 42648.9, 60 sec: 45875.1, 300 sec: 46652.7). Total num frames: 1394212864. Throughput: 0: 11980.9. Samples: 348622848. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:10,767][1648981] Avg episode reward: [(0, '723.930')] [2024-06-15 19:38:12,463][1651669] Updated weights for policy 0, policy_version 680816 (0.0012) [2024-06-15 19:38:14,923][1651669] Updated weights for policy 0, policy_version 680912 (0.0104) [2024-06-15 19:38:15,776][1648981] Fps is (10 sec: 49124.5, 60 sec: 48598.3, 300 sec: 47650.9). Total num frames: 1394573312. Throughput: 0: 11694.0. Samples: 348682752. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:15,777][1648981] Avg episode reward: [(0, '723.930')] [2024-06-15 19:38:15,822][1651669] Updated weights for policy 0, policy_version 680949 (0.0011) [2024-06-15 19:38:20,077][1651669] Updated weights for policy 0, policy_version 680983 (0.0118) [2024-06-15 19:38:20,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 46421.2, 300 sec: 46985.9). Total num frames: 1394704384. Throughput: 0: 11764.6. Samples: 348761088. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:20,767][1648981] Avg episode reward: [(0, '791.870')] [2024-06-15 19:38:20,775][1651669] Updated weights for policy 0, policy_version 681020 (0.0012) [2024-06-15 19:38:23,176][1651274] Signal inference workers to stop experience collection... (35700 times) [2024-06-15 19:38:23,228][1651669] InferenceWorker_p0-w0: stopping experience collection (35700 times) [2024-06-15 19:38:23,378][1651274] Signal inference workers to resume experience collection... (35700 times) [2024-06-15 19:38:23,379][1651669] InferenceWorker_p0-w0: resuming experience collection (35700 times) [2024-06-15 19:38:23,753][1651669] Updated weights for policy 0, policy_version 681072 (0.0013) [2024-06-15 19:38:25,009][1651669] Updated weights for policy 0, policy_version 681120 (0.0020) [2024-06-15 19:38:25,766][1648981] Fps is (10 sec: 42638.0, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1394999296. Throughput: 0: 11844.3. Samples: 348797952. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:25,767][1648981] Avg episode reward: [(0, '804.030')] [2024-06-15 19:38:26,492][1651669] Updated weights for policy 0, policy_version 681184 (0.0012) [2024-06-15 19:38:30,456][1651669] Updated weights for policy 0, policy_version 681248 (0.0012) [2024-06-15 19:38:30,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 46421.6, 300 sec: 47319.2). Total num frames: 1395195904. Throughput: 0: 11867.1. Samples: 348875264. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:30,767][1648981] Avg episode reward: [(0, '809.320')] [2024-06-15 19:38:34,597][1651669] Updated weights for policy 0, policy_version 681317 (0.0023) [2024-06-15 19:38:35,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46421.3, 300 sec: 47208.1). Total num frames: 1395425280. Throughput: 0: 11844.3. Samples: 348941312. Policy #0 lag: (min: 34.0, avg: 136.9, max: 290.0) [2024-06-15 19:38:35,767][1648981] Avg episode reward: [(0, '832.640')] [2024-06-15 19:38:36,069][1651669] Updated weights for policy 0, policy_version 681376 (0.0024) [2024-06-15 19:38:38,184][1651669] Updated weights for policy 0, policy_version 681465 (0.0013) [2024-06-15 19:38:40,770][1648981] Fps is (10 sec: 45857.0, 60 sec: 46418.4, 300 sec: 47318.9). Total num frames: 1395654656. Throughput: 0: 11638.5. Samples: 348968448. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:38:40,771][1648981] Avg episode reward: [(0, '839.170')] [2024-06-15 19:38:41,105][1651669] Updated weights for policy 0, policy_version 681490 (0.0011) [2024-06-15 19:38:42,103][1651669] Updated weights for policy 0, policy_version 681536 (0.0013) [2024-06-15 19:38:45,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 45875.4, 300 sec: 46986.0). Total num frames: 1395884032. Throughput: 0: 11870.2. Samples: 349059584. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:38:45,767][1648981] Avg episode reward: [(0, '809.320')] [2024-06-15 19:38:46,169][1651669] Updated weights for policy 0, policy_version 681616 (0.0013) [2024-06-15 19:38:47,851][1651669] Updated weights for policy 0, policy_version 681680 (0.0034) [2024-06-15 19:38:50,766][1648981] Fps is (10 sec: 52449.3, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1396178944. Throughput: 0: 12015.9. Samples: 349124608. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:38:50,767][1648981] Avg episode reward: [(0, '768.980')] [2024-06-15 19:38:51,665][1651669] Updated weights for policy 0, policy_version 681744 (0.0014) [2024-06-15 19:38:52,727][1651669] Updated weights for policy 0, policy_version 681788 (0.0011) [2024-06-15 19:38:55,774][1648981] Fps is (10 sec: 49111.8, 60 sec: 46961.1, 300 sec: 46987.2). Total num frames: 1396375552. Throughput: 0: 12069.6. Samples: 349166080. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:38:55,775][1648981] Avg episode reward: [(0, '779.260')] [2024-06-15 19:38:56,063][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000681840_1396408320.pth... [2024-06-15 19:38:56,275][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000676192_1384841216.pth [2024-06-15 19:38:56,816][1651669] Updated weights for policy 0, policy_version 681872 (0.0012) [2024-06-15 19:38:58,803][1651274] Signal inference workers to stop experience collection... (35750 times) [2024-06-15 19:38:58,842][1651669] InferenceWorker_p0-w0: stopping experience collection (35750 times) [2024-06-15 19:38:58,852][1651669] Updated weights for policy 0, policy_version 681937 (0.0013) [2024-06-15 19:38:59,100][1651274] Signal inference workers to resume experience collection... (35750 times) [2024-06-15 19:38:59,102][1651669] InferenceWorker_p0-w0: resuming experience collection (35750 times) [2024-06-15 19:39:00,770][1648981] Fps is (10 sec: 52408.2, 60 sec: 48612.3, 300 sec: 47985.0). Total num frames: 1396703232. Throughput: 0: 11959.5. Samples: 349220864. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:00,773][1648981] Avg episode reward: [(0, '781.610')] [2024-06-15 19:39:02,384][1651669] Updated weights for policy 0, policy_version 681986 (0.0014) [2024-06-15 19:39:03,513][1651669] Updated weights for policy 0, policy_version 682038 (0.0036) [2024-06-15 19:39:05,767][1648981] Fps is (10 sec: 45908.5, 60 sec: 45877.4, 300 sec: 47319.7). Total num frames: 1396834304. Throughput: 0: 12310.6. Samples: 349315072. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:05,768][1648981] Avg episode reward: [(0, '770.780')] [2024-06-15 19:39:06,802][1651669] Updated weights for policy 0, policy_version 682096 (0.0011) [2024-06-15 19:39:08,771][1651669] Updated weights for policy 0, policy_version 682176 (0.0013) [2024-06-15 19:39:10,201][1651669] Updated weights for policy 0, policy_version 682235 (0.0013) [2024-06-15 19:39:10,790][1648981] Fps is (10 sec: 52324.2, 60 sec: 50224.3, 300 sec: 47982.4). Total num frames: 1397227520. Throughput: 0: 12065.4. Samples: 349341184. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:10,791][1648981] Avg episode reward: [(0, '787.550')] [2024-06-15 19:39:13,840][1651669] Updated weights for policy 0, policy_version 682294 (0.0122) [2024-06-15 19:39:15,766][1648981] Fps is (10 sec: 52433.4, 60 sec: 46428.6, 300 sec: 47763.5). Total num frames: 1397358592. Throughput: 0: 11935.3. Samples: 349412352. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:15,767][1648981] Avg episode reward: [(0, '807.050')] [2024-06-15 19:39:17,516][1651669] Updated weights for policy 0, policy_version 682338 (0.0012) [2024-06-15 19:39:19,699][1651669] Updated weights for policy 0, policy_version 682421 (0.0079) [2024-06-15 19:39:20,766][1648981] Fps is (10 sec: 45984.8, 60 sec: 49698.3, 300 sec: 47766.7). Total num frames: 1397686272. Throughput: 0: 11958.0. Samples: 349479424. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:20,767][1648981] Avg episode reward: [(0, '761.670')] [2024-06-15 19:39:21,246][1651669] Updated weights for policy 0, policy_version 682496 (0.0095) [2024-06-15 19:39:25,037][1651669] Updated weights for policy 0, policy_version 682549 (0.0014) [2024-06-15 19:39:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48209.1). Total num frames: 1397882880. Throughput: 0: 12209.4. Samples: 349517824. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:25,767][1648981] Avg episode reward: [(0, '767.570')] [2024-06-15 19:39:28,499][1651669] Updated weights for policy 0, policy_version 682595 (0.0027) [2024-06-15 19:39:29,889][1651669] Updated weights for policy 0, policy_version 682656 (0.0011) [2024-06-15 19:39:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48605.8, 300 sec: 47653.1). Total num frames: 1398112256. Throughput: 0: 11923.9. Samples: 349596160. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:30,767][1648981] Avg episode reward: [(0, '725.710')] [2024-06-15 19:39:32,132][1651669] Updated weights for policy 0, policy_version 682736 (0.0013) [2024-06-15 19:39:35,742][1651669] Updated weights for policy 0, policy_version 682800 (0.0012) [2024-06-15 19:39:35,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49152.0, 300 sec: 48322.8). Total num frames: 1398374400. Throughput: 0: 11753.2. Samples: 349653504. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:35,767][1648981] Avg episode reward: [(0, '752.730')] [2024-06-15 19:39:39,659][1651669] Updated weights for policy 0, policy_version 682850 (0.0012) [2024-06-15 19:39:40,774][1648981] Fps is (10 sec: 42564.0, 60 sec: 48056.3, 300 sec: 47874.5). Total num frames: 1398538240. Throughput: 0: 11753.3. Samples: 349694976. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:40,775][1648981] Avg episode reward: [(0, '758.390')] [2024-06-15 19:39:41,023][1651274] Signal inference workers to stop experience collection... (35800 times) [2024-06-15 19:39:41,081][1651669] InferenceWorker_p0-w0: stopping experience collection (35800 times) [2024-06-15 19:39:41,322][1651274] Signal inference workers to resume experience collection... (35800 times) [2024-06-15 19:39:41,323][1651669] InferenceWorker_p0-w0: resuming experience collection (35800 times) [2024-06-15 19:39:41,560][1651669] Updated weights for policy 0, policy_version 682902 (0.0014) [2024-06-15 19:39:43,437][1651669] Updated weights for policy 0, policy_version 682976 (0.0012) [2024-06-15 19:39:44,237][1651669] Updated weights for policy 0, policy_version 683008 (0.0011) [2024-06-15 19:39:45,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 48097.7). Total num frames: 1398833152. Throughput: 0: 12016.0. Samples: 349761536. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:45,767][1648981] Avg episode reward: [(0, '733.790')] [2024-06-15 19:39:46,521][1651669] Updated weights for policy 0, policy_version 683064 (0.0061) [2024-06-15 19:39:50,124][1651669] Updated weights for policy 0, policy_version 683107 (0.0010) [2024-06-15 19:39:50,766][1648981] Fps is (10 sec: 52471.2, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 1399062528. Throughput: 0: 11776.2. Samples: 349844992. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:50,767][1648981] Avg episode reward: [(0, '737.880')] [2024-06-15 19:39:53,031][1651669] Updated weights for policy 0, policy_version 683186 (0.0104) [2024-06-15 19:39:54,638][1651669] Updated weights for policy 0, policy_version 683248 (0.0011) [2024-06-15 19:39:55,090][1651669] Updated weights for policy 0, policy_version 683264 (0.0009) [2024-06-15 19:39:55,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49158.6, 300 sec: 48096.7). Total num frames: 1399324672. Throughput: 0: 11873.3. Samples: 349875200. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:39:55,767][1648981] Avg episode reward: [(0, '745.470')] [2024-06-15 19:39:57,077][1651669] Updated weights for policy 0, policy_version 683328 (0.0014) [2024-06-15 19:40:00,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 46970.6, 300 sec: 48652.2). Total num frames: 1399521280. Throughput: 0: 11980.8. Samples: 349951488. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:40:00,767][1648981] Avg episode reward: [(0, '787.150')] [2024-06-15 19:40:01,221][1651669] Updated weights for policy 0, policy_version 683387 (0.0012) [2024-06-15 19:40:03,102][1651669] Updated weights for policy 0, policy_version 683440 (0.0011) [2024-06-15 19:40:04,660][1651669] Updated weights for policy 0, policy_version 683504 (0.0015) [2024-06-15 19:40:05,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 50244.8, 300 sec: 48652.1). Total num frames: 1399848960. Throughput: 0: 12151.4. Samples: 350026240. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:40:05,769][1648981] Avg episode reward: [(0, '761.260')] [2024-06-15 19:40:06,967][1651669] Updated weights for policy 0, policy_version 683552 (0.0134) [2024-06-15 19:40:10,716][1651669] Updated weights for policy 0, policy_version 683586 (0.0015) [2024-06-15 19:40:10,772][1648981] Fps is (10 sec: 45847.2, 60 sec: 45888.9, 300 sec: 48429.3). Total num frames: 1399980032. Throughput: 0: 12127.1. Samples: 350063616. Policy #0 lag: (min: 104.0, avg: 184.5, max: 360.0) [2024-06-15 19:40:10,773][1648981] Avg episode reward: [(0, '771.450')] [2024-06-15 19:40:11,932][1651669] Updated weights for policy 0, policy_version 683647 (0.0013) [2024-06-15 19:40:14,453][1651669] Updated weights for policy 0, policy_version 683713 (0.0057) [2024-06-15 19:40:15,702][1651669] Updated weights for policy 0, policy_version 683776 (0.0012) [2024-06-15 19:40:15,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1400373248. Throughput: 0: 12037.7. Samples: 350137856. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:15,767][1648981] Avg episode reward: [(0, '770.540')] [2024-06-15 19:40:18,171][1651669] Updated weights for policy 0, policy_version 683840 (0.0084) [2024-06-15 19:40:20,778][1648981] Fps is (10 sec: 52398.0, 60 sec: 46958.2, 300 sec: 48428.1). Total num frames: 1400504320. Throughput: 0: 12444.0. Samples: 350213632. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:20,779][1648981] Avg episode reward: [(0, '761.250')] [2024-06-15 19:40:21,836][1651274] Signal inference workers to stop experience collection... (35850 times) [2024-06-15 19:40:21,874][1651669] InferenceWorker_p0-w0: stopping experience collection (35850 times) [2024-06-15 19:40:22,096][1651274] Signal inference workers to resume experience collection... (35850 times) [2024-06-15 19:40:22,097][1651669] InferenceWorker_p0-w0: resuming experience collection (35850 times) [2024-06-15 19:40:22,284][1651669] Updated weights for policy 0, policy_version 683898 (0.0016) [2024-06-15 19:40:23,860][1651669] Updated weights for policy 0, policy_version 683952 (0.0020) [2024-06-15 19:40:25,778][1648981] Fps is (10 sec: 45820.7, 60 sec: 49142.3, 300 sec: 49094.6). Total num frames: 1400832000. Throughput: 0: 12343.8. Samples: 350250496. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:25,779][1648981] Avg episode reward: [(0, '765.310')] [2024-06-15 19:40:25,788][1651669] Updated weights for policy 0, policy_version 684000 (0.0011) [2024-06-15 19:40:28,261][1651669] Updated weights for policy 0, policy_version 684080 (0.0014) [2024-06-15 19:40:30,766][1648981] Fps is (10 sec: 52490.7, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1401028608. Throughput: 0: 12481.4. Samples: 350323200. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:30,767][1648981] Avg episode reward: [(0, '746.230')] [2024-06-15 19:40:31,774][1651669] Updated weights for policy 0, policy_version 684114 (0.0011) [2024-06-15 19:40:33,917][1651669] Updated weights for policy 0, policy_version 684167 (0.0027) [2024-06-15 19:40:34,793][1651669] Updated weights for policy 0, policy_version 684212 (0.0034) [2024-06-15 19:40:35,766][1648981] Fps is (10 sec: 45929.8, 60 sec: 48605.8, 300 sec: 49097.6). Total num frames: 1401290752. Throughput: 0: 12299.4. Samples: 350398464. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:35,767][1648981] Avg episode reward: [(0, '744.020')] [2024-06-15 19:40:36,333][1651669] Updated weights for policy 0, policy_version 684256 (0.0024) [2024-06-15 19:40:38,404][1651669] Updated weights for policy 0, policy_version 684304 (0.0014) [2024-06-15 19:40:39,459][1651669] Updated weights for policy 0, policy_version 684352 (0.0016) [2024-06-15 19:40:40,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50251.1, 300 sec: 48541.1). Total num frames: 1401552896. Throughput: 0: 12470.1. Samples: 350436352. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:40,767][1648981] Avg episode reward: [(0, '757.520')] [2024-06-15 19:40:43,055][1651669] Updated weights for policy 0, policy_version 684416 (0.0012) [2024-06-15 19:40:45,774][1648981] Fps is (10 sec: 49115.2, 60 sec: 49145.8, 300 sec: 49095.2). Total num frames: 1401782272. Throughput: 0: 12467.9. Samples: 350512640. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:45,774][1648981] Avg episode reward: [(0, '805.290')] [2024-06-15 19:40:46,732][1651669] Updated weights for policy 0, policy_version 684496 (0.0131) [2024-06-15 19:40:49,236][1651669] Updated weights for policy 0, policy_version 684550 (0.0019) [2024-06-15 19:40:50,515][1651669] Updated weights for policy 0, policy_version 684606 (0.0035) [2024-06-15 19:40:50,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1402077184. Throughput: 0: 12140.1. Samples: 350572544. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:50,767][1648981] Avg episode reward: [(0, '814.530')] [2024-06-15 19:40:54,116][1651669] Updated weights for policy 0, policy_version 684666 (0.0012) [2024-06-15 19:40:55,768][1648981] Fps is (10 sec: 42625.9, 60 sec: 48058.9, 300 sec: 48874.4). Total num frames: 1402208256. Throughput: 0: 12186.9. Samples: 350611968. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:40:55,769][1648981] Avg episode reward: [(0, '812.470')] [2024-06-15 19:40:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000684672_1402208256.pth... [2024-06-15 19:40:55,805][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000679040_1390673920.pth [2024-06-15 19:40:56,963][1651669] Updated weights for policy 0, policy_version 684709 (0.0012) [2024-06-15 19:40:58,682][1651669] Updated weights for policy 0, policy_version 684790 (0.0024) [2024-06-15 19:41:00,538][1651669] Updated weights for policy 0, policy_version 684832 (0.0013) [2024-06-15 19:41:00,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 50244.1, 300 sec: 48318.9). Total num frames: 1402535936. Throughput: 0: 12231.1. Samples: 350688256. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:00,767][1648981] Avg episode reward: [(0, '779.030')] [2024-06-15 19:41:03,642][1651669] Updated weights for policy 0, policy_version 684866 (0.0012) [2024-06-15 19:41:05,766][1648981] Fps is (10 sec: 52434.4, 60 sec: 48060.0, 300 sec: 48874.3). Total num frames: 1402732544. Throughput: 0: 12245.7. Samples: 350764544. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:05,767][1648981] Avg episode reward: [(0, '790.890')] [2024-06-15 19:41:06,232][1651274] Signal inference workers to stop experience collection... (35900 times) [2024-06-15 19:41:06,282][1651669] InferenceWorker_p0-w0: stopping experience collection (35900 times) [2024-06-15 19:41:06,307][1651669] Updated weights for policy 0, policy_version 684931 (0.0016) [2024-06-15 19:41:06,476][1651274] Signal inference workers to resume experience collection... (35900 times) [2024-06-15 19:41:06,477][1651669] InferenceWorker_p0-w0: resuming experience collection (35900 times) [2024-06-15 19:41:07,667][1651669] Updated weights for policy 0, policy_version 684987 (0.0012) [2024-06-15 19:41:08,892][1651669] Updated weights for policy 0, policy_version 685026 (0.0024) [2024-06-15 19:41:09,426][1651669] Updated weights for policy 0, policy_version 685055 (0.0010) [2024-06-15 19:41:10,773][1648981] Fps is (10 sec: 52392.2, 60 sec: 51335.7, 300 sec: 48541.3). Total num frames: 1403060224. Throughput: 0: 12232.5. Samples: 350800896. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:10,774][1648981] Avg episode reward: [(0, '856.710')] [2024-06-15 19:41:11,340][1651669] Updated weights for policy 0, policy_version 685110 (0.0010) [2024-06-15 19:41:15,388][1651669] Updated weights for policy 0, policy_version 685168 (0.0013) [2024-06-15 19:41:15,774][1648981] Fps is (10 sec: 52387.5, 60 sec: 48053.5, 300 sec: 48873.0). Total num frames: 1403256832. Throughput: 0: 12194.9. Samples: 350872064. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:15,778][1648981] Avg episode reward: [(0, '859.170')] [2024-06-15 19:41:18,316][1651669] Updated weights for policy 0, policy_version 685232 (0.0019) [2024-06-15 19:41:19,589][1651669] Updated weights for policy 0, policy_version 685267 (0.0012) [2024-06-15 19:41:20,533][1651669] Updated weights for policy 0, policy_version 685312 (0.0017) [2024-06-15 19:41:20,767][1648981] Fps is (10 sec: 45906.5, 60 sec: 50254.0, 300 sec: 48430.0). Total num frames: 1403518976. Throughput: 0: 12060.4. Samples: 350941184. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:20,767][1648981] Avg episode reward: [(0, '858.880')] [2024-06-15 19:41:21,798][1651669] Updated weights for policy 0, policy_version 685365 (0.0014) [2024-06-15 19:41:25,431][1651669] Updated weights for policy 0, policy_version 685393 (0.0010) [2024-06-15 19:41:25,767][1648981] Fps is (10 sec: 45906.9, 60 sec: 48068.5, 300 sec: 48652.0). Total num frames: 1403715584. Throughput: 0: 12253.6. Samples: 350987776. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:25,768][1648981] Avg episode reward: [(0, '813.370')] [2024-06-15 19:41:27,750][1651669] Updated weights for policy 0, policy_version 685456 (0.0084) [2024-06-15 19:41:29,710][1651669] Updated weights for policy 0, policy_version 685507 (0.0147) [2024-06-15 19:41:30,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.1, 300 sec: 48652.3). Total num frames: 1404010496. Throughput: 0: 12130.7. Samples: 351058432. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:30,767][1648981] Avg episode reward: [(0, '797.750')] [2024-06-15 19:41:31,414][1651669] Updated weights for policy 0, policy_version 685572 (0.0014) [2024-06-15 19:41:32,627][1651669] Updated weights for policy 0, policy_version 685625 (0.0011) [2024-06-15 19:41:35,766][1648981] Fps is (10 sec: 49156.4, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1404207104. Throughput: 0: 12640.7. Samples: 351141376. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:35,767][1648981] Avg episode reward: [(0, '788.570')] [2024-06-15 19:41:36,178][1651669] Updated weights for policy 0, policy_version 685674 (0.0013) [2024-06-15 19:41:37,890][1651669] Updated weights for policy 0, policy_version 685718 (0.0040) [2024-06-15 19:41:38,552][1651669] Updated weights for policy 0, policy_version 685760 (0.0012) [2024-06-15 19:41:40,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1404502016. Throughput: 0: 12561.4. Samples: 351177216. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:40,767][1648981] Avg episode reward: [(0, '792.470')] [2024-06-15 19:41:41,782][1651669] Updated weights for policy 0, policy_version 685825 (0.0013) [2024-06-15 19:41:43,287][1651669] Updated weights for policy 0, policy_version 685888 (0.0013) [2024-06-15 19:41:45,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48612.0, 300 sec: 48430.0). Total num frames: 1404698624. Throughput: 0: 12458.7. Samples: 351248896. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:45,767][1648981] Avg episode reward: [(0, '746.600')] [2024-06-15 19:41:46,441][1651274] Signal inference workers to stop experience collection... (35950 times) [2024-06-15 19:41:46,494][1651669] InferenceWorker_p0-w0: stopping experience collection (35950 times) [2024-06-15 19:41:46,628][1651274] Signal inference workers to resume experience collection... (35950 times) [2024-06-15 19:41:46,629][1651669] InferenceWorker_p0-w0: resuming experience collection (35950 times) [2024-06-15 19:41:46,832][1651669] Updated weights for policy 0, policy_version 685946 (0.0014) [2024-06-15 19:41:49,051][1651669] Updated weights for policy 0, policy_version 686002 (0.0012) [2024-06-15 19:41:50,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1404960768. Throughput: 0: 12492.8. Samples: 351326720. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:50,767][1648981] Avg episode reward: [(0, '777.240')] [2024-06-15 19:41:51,172][1651669] Updated weights for policy 0, policy_version 686035 (0.0010) [2024-06-15 19:41:52,919][1651669] Updated weights for policy 0, policy_version 686114 (0.0185) [2024-06-15 19:41:55,767][1648981] Fps is (10 sec: 52423.6, 60 sec: 50244.4, 300 sec: 48429.9). Total num frames: 1405222912. Throughput: 0: 12369.3. Samples: 351357440. Policy #0 lag: (min: 63.0, avg: 196.4, max: 323.0) [2024-06-15 19:41:55,768][1648981] Avg episode reward: [(0, '803.020')] [2024-06-15 19:41:56,719][1651669] Updated weights for policy 0, policy_version 686161 (0.0015) [2024-06-15 19:41:57,575][1651669] Updated weights for policy 0, policy_version 686207 (0.0010) [2024-06-15 19:41:59,686][1651669] Updated weights for policy 0, policy_version 686272 (0.0013) [2024-06-15 19:42:00,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 49151.9, 300 sec: 48433.6). Total num frames: 1405485056. Throughput: 0: 12654.3. Samples: 351441408. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:00,767][1648981] Avg episode reward: [(0, '844.750')] [2024-06-15 19:42:01,578][1651669] Updated weights for policy 0, policy_version 686320 (0.0037) [2024-06-15 19:42:03,365][1651669] Updated weights for policy 0, policy_version 686394 (0.0015) [2024-06-15 19:42:05,776][1648981] Fps is (10 sec: 52382.6, 60 sec: 50236.1, 300 sec: 48428.4). Total num frames: 1405747200. Throughput: 0: 12694.9. Samples: 351512576. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:05,777][1648981] Avg episode reward: [(0, '863.370')] [2024-06-15 19:42:07,514][1651669] Updated weights for policy 0, policy_version 686448 (0.0033) [2024-06-15 19:42:10,321][1651669] Updated weights for policy 0, policy_version 686512 (0.0011) [2024-06-15 19:42:10,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 49157.7, 300 sec: 48652.1). Total num frames: 1406009344. Throughput: 0: 12584.1. Samples: 351554048. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:10,767][1648981] Avg episode reward: [(0, '890.230')] [2024-06-15 19:42:10,792][1651274] Saving new best policy, reward=890.230! [2024-06-15 19:42:12,263][1651669] Updated weights for policy 0, policy_version 686561 (0.0012) [2024-06-15 19:42:13,739][1651669] Updated weights for policy 0, policy_version 686624 (0.0012) [2024-06-15 19:42:15,766][1648981] Fps is (10 sec: 52479.9, 60 sec: 50250.8, 300 sec: 48652.2). Total num frames: 1406271488. Throughput: 0: 12276.6. Samples: 351610880. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:15,767][1648981] Avg episode reward: [(0, '895.070')] [2024-06-15 19:42:15,768][1651274] Saving new best policy, reward=895.070! [2024-06-15 19:42:18,938][1651669] Updated weights for policy 0, policy_version 686705 (0.0012) [2024-06-15 19:42:20,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1406402560. Throughput: 0: 12322.1. Samples: 351695872. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:20,767][1648981] Avg episode reward: [(0, '909.510')] [2024-06-15 19:42:21,516][1651274] Saving new best policy, reward=909.510! [2024-06-15 19:42:21,518][1651669] Updated weights for policy 0, policy_version 686752 (0.0011) [2024-06-15 19:42:23,573][1651669] Updated weights for policy 0, policy_version 686801 (0.0012) [2024-06-15 19:42:25,117][1651669] Updated weights for policy 0, policy_version 686865 (0.0013) [2024-06-15 19:42:25,778][1648981] Fps is (10 sec: 49093.9, 60 sec: 50781.2, 300 sec: 48650.2). Total num frames: 1406763008. Throughput: 0: 12171.0. Samples: 351725056. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:25,779][1648981] Avg episode reward: [(0, '898.690')] [2024-06-15 19:42:26,011][1651669] Updated weights for policy 0, policy_version 686907 (0.0013) [2024-06-15 19:42:29,036][1651274] Signal inference workers to stop experience collection... (36000 times) [2024-06-15 19:42:29,081][1651669] InferenceWorker_p0-w0: stopping experience collection (36000 times) [2024-06-15 19:42:29,254][1651274] Signal inference workers to resume experience collection... (36000 times) [2024-06-15 19:42:29,254][1651669] InferenceWorker_p0-w0: resuming experience collection (36000 times) [2024-06-15 19:42:30,046][1651669] Updated weights for policy 0, policy_version 686960 (0.0012) [2024-06-15 19:42:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1406926848. Throughput: 0: 12208.3. Samples: 351798272. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:30,767][1648981] Avg episode reward: [(0, '903.500')] [2024-06-15 19:42:32,509][1651669] Updated weights for policy 0, policy_version 687008 (0.0011) [2024-06-15 19:42:34,399][1651669] Updated weights for policy 0, policy_version 687060 (0.0012) [2024-06-15 19:42:35,767][1648981] Fps is (10 sec: 42647.4, 60 sec: 49697.9, 300 sec: 48541.0). Total num frames: 1407188992. Throughput: 0: 11992.1. Samples: 351866368. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:35,767][1648981] Avg episode reward: [(0, '908.750')] [2024-06-15 19:42:35,950][1651669] Updated weights for policy 0, policy_version 687121 (0.0011) [2024-06-15 19:42:40,216][1651669] Updated weights for policy 0, policy_version 687170 (0.0012) [2024-06-15 19:42:40,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 47513.5, 300 sec: 48207.9). Total num frames: 1407352832. Throughput: 0: 12083.4. Samples: 351901184. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:40,767][1648981] Avg episode reward: [(0, '929.590')] [2024-06-15 19:42:41,132][1651274] Saving new best policy, reward=929.590! [2024-06-15 19:42:42,729][1651669] Updated weights for policy 0, policy_version 687235 (0.0012) [2024-06-15 19:42:44,985][1651669] Updated weights for policy 0, policy_version 687312 (0.0012) [2024-06-15 19:42:45,766][1648981] Fps is (10 sec: 45876.8, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1407647744. Throughput: 0: 11889.8. Samples: 351976448. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:45,767][1648981] Avg episode reward: [(0, '919.310')] [2024-06-15 19:42:47,160][1651669] Updated weights for policy 0, policy_version 687392 (0.0088) [2024-06-15 19:42:50,767][1648981] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1407844352. Throughput: 0: 11778.5. Samples: 352042496. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:50,767][1648981] Avg episode reward: [(0, '939.610')] [2024-06-15 19:42:50,770][1651274] Saving new best policy, reward=939.610! [2024-06-15 19:42:52,713][1651669] Updated weights for policy 0, policy_version 687461 (0.0089) [2024-06-15 19:42:54,366][1651669] Updated weights for policy 0, policy_version 687520 (0.0014) [2024-06-15 19:42:55,152][1651669] Updated weights for policy 0, policy_version 687552 (0.0013) [2024-06-15 19:42:55,767][1648981] Fps is (10 sec: 45873.9, 60 sec: 48060.3, 300 sec: 48543.0). Total num frames: 1408106496. Throughput: 0: 11662.2. Samples: 352078848. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:42:55,767][1648981] Avg episode reward: [(0, '951.890')] [2024-06-15 19:42:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000687552_1408106496.pth... [2024-06-15 19:42:55,961][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000681840_1396408320.pth [2024-06-15 19:42:55,965][1651274] Saving new best policy, reward=951.890! [2024-06-15 19:42:57,866][1651669] Updated weights for policy 0, policy_version 687617 (0.0012) [2024-06-15 19:42:59,418][1651669] Updated weights for policy 0, policy_version 687678 (0.0011) [2024-06-15 19:43:00,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48059.9, 300 sec: 48430.6). Total num frames: 1408368640. Throughput: 0: 11753.3. Samples: 352139776. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:00,767][1648981] Avg episode reward: [(0, '957.600')] [2024-06-15 19:43:00,767][1651274] Saving new best policy, reward=957.600! [2024-06-15 19:43:04,618][1651669] Updated weights for policy 0, policy_version 687744 (0.0046) [2024-06-15 19:43:05,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 46975.1, 300 sec: 48652.1). Total num frames: 1408565248. Throughput: 0: 11537.1. Samples: 352215040. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:05,767][1648981] Avg episode reward: [(0, '946.580')] [2024-06-15 19:43:06,271][1651669] Updated weights for policy 0, policy_version 687806 (0.0014) [2024-06-15 19:43:08,580][1651669] Updated weights for policy 0, policy_version 687856 (0.0011) [2024-06-15 19:43:09,849][1651669] Updated weights for policy 0, policy_version 687904 (0.0012) [2024-06-15 19:43:10,038][1651274] Signal inference workers to stop experience collection... (36050 times) [2024-06-15 19:43:10,115][1651669] InferenceWorker_p0-w0: stopping experience collection (36050 times) [2024-06-15 19:43:10,334][1651274] Signal inference workers to resume experience collection... (36050 times) [2024-06-15 19:43:10,335][1651669] InferenceWorker_p0-w0: resuming experience collection (36050 times) [2024-06-15 19:43:10,767][1648981] Fps is (10 sec: 52424.8, 60 sec: 48059.2, 300 sec: 48542.5). Total num frames: 1408892928. Throughput: 0: 11653.7. Samples: 352249344. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:10,767][1648981] Avg episode reward: [(0, '901.430')] [2024-06-15 19:43:14,253][1651669] Updated weights for policy 0, policy_version 687939 (0.0027) [2024-06-15 19:43:15,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 45875.1, 300 sec: 48541.1). Total num frames: 1409024000. Throughput: 0: 11821.5. Samples: 352330240. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:15,768][1648981] Avg episode reward: [(0, '928.260')] [2024-06-15 19:43:15,977][1651669] Updated weights for policy 0, policy_version 688002 (0.0014) [2024-06-15 19:43:16,853][1651669] Updated weights for policy 0, policy_version 688061 (0.0012) [2024-06-15 19:43:18,963][1651669] Updated weights for policy 0, policy_version 688120 (0.0011) [2024-06-15 19:43:19,980][1651669] Updated weights for policy 0, policy_version 688160 (0.0010) [2024-06-15 19:43:20,783][1648981] Fps is (10 sec: 49072.7, 60 sec: 49684.2, 300 sec: 48760.5). Total num frames: 1409384448. Throughput: 0: 11794.4. Samples: 352397312. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:20,784][1648981] Avg episode reward: [(0, '956.940')] [2024-06-15 19:43:20,826][1651669] Updated weights for policy 0, policy_version 688191 (0.0011) [2024-06-15 19:43:25,116][1651669] Updated weights for policy 0, policy_version 688246 (0.0013) [2024-06-15 19:43:25,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46430.4, 300 sec: 48652.1). Total num frames: 1409548288. Throughput: 0: 12162.8. Samples: 352448512. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:25,767][1648981] Avg episode reward: [(0, '996.920')] [2024-06-15 19:43:25,771][1651274] Saving new best policy, reward=996.920! [2024-06-15 19:43:27,781][1651669] Updated weights for policy 0, policy_version 688309 (0.0010) [2024-06-15 19:43:30,069][1651669] Updated weights for policy 0, policy_version 688369 (0.0126) [2024-06-15 19:43:30,766][1648981] Fps is (10 sec: 45953.0, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1409843200. Throughput: 0: 11855.7. Samples: 352509952. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:30,767][1648981] Avg episode reward: [(0, '933.810')] [2024-06-15 19:43:31,644][1651669] Updated weights for policy 0, policy_version 688437 (0.0011) [2024-06-15 19:43:35,632][1651669] Updated weights for policy 0, policy_version 688480 (0.0011) [2024-06-15 19:43:35,811][1648981] Fps is (10 sec: 45670.3, 60 sec: 46932.5, 300 sec: 48645.4). Total num frames: 1410007040. Throughput: 0: 12207.5. Samples: 352592384. Policy #0 lag: (min: 12.0, avg: 107.1, max: 268.0) [2024-06-15 19:43:35,812][1648981] Avg episode reward: [(0, '902.740')] [2024-06-15 19:43:38,245][1651669] Updated weights for policy 0, policy_version 688530 (0.0012) [2024-06-15 19:43:39,217][1651669] Updated weights for policy 0, policy_version 688571 (0.0011) [2024-06-15 19:43:40,776][1651669] Updated weights for policy 0, policy_version 688624 (0.0014) [2024-06-15 19:43:40,778][1648981] Fps is (10 sec: 42547.7, 60 sec: 48596.3, 300 sec: 48761.3). Total num frames: 1410269184. Throughput: 0: 12068.7. Samples: 352622080. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:43:40,779][1648981] Avg episode reward: [(0, '889.740')] [2024-06-15 19:43:42,398][1651669] Updated weights for policy 0, policy_version 688697 (0.0013) [2024-06-15 19:43:45,767][1648981] Fps is (10 sec: 46081.1, 60 sec: 46967.2, 300 sec: 48429.9). Total num frames: 1410465792. Throughput: 0: 12356.2. Samples: 352695808. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:43:45,768][1648981] Avg episode reward: [(0, '848.350')] [2024-06-15 19:43:46,184][1651669] Updated weights for policy 0, policy_version 688724 (0.0043) [2024-06-15 19:43:49,236][1651669] Updated weights for policy 0, policy_version 688784 (0.0012) [2024-06-15 19:43:50,685][1651669] Updated weights for policy 0, policy_version 688848 (0.0011) [2024-06-15 19:43:50,766][1648981] Fps is (10 sec: 49210.6, 60 sec: 48606.0, 300 sec: 48764.6). Total num frames: 1410760704. Throughput: 0: 12242.5. Samples: 352765952. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:43:50,767][1648981] Avg episode reward: [(0, '841.080')] [2024-06-15 19:43:51,759][1651274] Signal inference workers to stop experience collection... (36100 times) [2024-06-15 19:43:51,796][1651669] InferenceWorker_p0-w0: stopping experience collection (36100 times) [2024-06-15 19:43:52,089][1651274] Signal inference workers to resume experience collection... (36100 times) [2024-06-15 19:43:52,090][1651669] InferenceWorker_p0-w0: resuming experience collection (36100 times) [2024-06-15 19:43:53,197][1651669] Updated weights for policy 0, policy_version 688935 (0.0099) [2024-06-15 19:43:55,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 48059.9, 300 sec: 48430.6). Total num frames: 1410990080. Throughput: 0: 11912.7. Samples: 352785408. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:43:55,767][1648981] Avg episode reward: [(0, '834.290')] [2024-06-15 19:43:57,432][1651669] Updated weights for policy 0, policy_version 688979 (0.0012) [2024-06-15 19:44:00,419][1651669] Updated weights for policy 0, policy_version 689043 (0.0014) [2024-06-15 19:44:00,768][1648981] Fps is (10 sec: 42592.5, 60 sec: 46966.4, 300 sec: 48652.1). Total num frames: 1411186688. Throughput: 0: 12105.6. Samples: 352875008. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:00,768][1648981] Avg episode reward: [(0, '821.540')] [2024-06-15 19:44:02,009][1651669] Updated weights for policy 0, policy_version 689092 (0.0113) [2024-06-15 19:44:03,463][1651669] Updated weights for policy 0, policy_version 689152 (0.0018) [2024-06-15 19:44:04,890][1651669] Updated weights for policy 0, policy_version 689209 (0.0011) [2024-06-15 19:44:05,768][1648981] Fps is (10 sec: 52420.7, 60 sec: 49150.8, 300 sec: 48433.7). Total num frames: 1411514368. Throughput: 0: 11848.3. Samples: 352930304. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:05,768][1648981] Avg episode reward: [(0, '782.650')] [2024-06-15 19:44:09,596][1651669] Updated weights for policy 0, policy_version 689264 (0.0012) [2024-06-15 19:44:10,767][1648981] Fps is (10 sec: 45880.4, 60 sec: 45875.6, 300 sec: 48430.0). Total num frames: 1411645440. Throughput: 0: 11798.7. Samples: 352979456. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:10,767][1648981] Avg episode reward: [(0, '744.100')] [2024-06-15 19:44:12,052][1651669] Updated weights for policy 0, policy_version 689328 (0.0012) [2024-06-15 19:44:13,920][1651669] Updated weights for policy 0, policy_version 689397 (0.0010) [2024-06-15 19:44:15,696][1651669] Updated weights for policy 0, policy_version 689456 (0.0014) [2024-06-15 19:44:15,767][1648981] Fps is (10 sec: 49156.1, 60 sec: 49697.7, 300 sec: 48541.0). Total num frames: 1412005888. Throughput: 0: 11764.4. Samples: 353039360. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:15,768][1648981] Avg episode reward: [(0, '742.170')] [2024-06-15 19:44:20,556][1651669] Updated weights for policy 0, policy_version 689504 (0.0013) [2024-06-15 19:44:20,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 45341.6, 300 sec: 48207.8). Total num frames: 1412104192. Throughput: 0: 11673.8. Samples: 353117184. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:20,767][1648981] Avg episode reward: [(0, '708.870')] [2024-06-15 19:44:22,758][1651669] Updated weights for policy 0, policy_version 689537 (0.0012) [2024-06-15 19:44:24,958][1651669] Updated weights for policy 0, policy_version 689620 (0.0013) [2024-06-15 19:44:25,766][1648981] Fps is (10 sec: 39324.6, 60 sec: 47513.7, 300 sec: 48430.0). Total num frames: 1412399104. Throughput: 0: 11745.0. Samples: 353150464. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:25,767][1648981] Avg episode reward: [(0, '756.730')] [2024-06-15 19:44:25,905][1651669] Updated weights for policy 0, policy_version 689664 (0.0010) [2024-06-15 19:44:27,540][1651669] Updated weights for policy 0, policy_version 689721 (0.0012) [2024-06-15 19:44:30,767][1648981] Fps is (10 sec: 45875.8, 60 sec: 45328.9, 300 sec: 48096.7). Total num frames: 1412562944. Throughput: 0: 11594.0. Samples: 353217536. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:30,767][1648981] Avg episode reward: [(0, '730.410')] [2024-06-15 19:44:31,883][1651669] Updated weights for policy 0, policy_version 689790 (0.0014) [2024-06-15 19:44:33,860][1651274] Signal inference workers to stop experience collection... (36150 times) [2024-06-15 19:44:33,894][1651669] InferenceWorker_p0-w0: stopping experience collection (36150 times) [2024-06-15 19:44:34,204][1651274] Signal inference workers to resume experience collection... (36150 times) [2024-06-15 19:44:34,205][1651669] InferenceWorker_p0-w0: resuming experience collection (36150 times) [2024-06-15 19:44:34,557][1651669] Updated weights for policy 0, policy_version 689854 (0.0011) [2024-06-15 19:44:35,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48095.8, 300 sec: 48653.5). Total num frames: 1412890624. Throughput: 0: 11639.4. Samples: 353289728. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:35,767][1648981] Avg episode reward: [(0, '740.110')] [2024-06-15 19:44:36,105][1651669] Updated weights for policy 0, policy_version 689908 (0.0012) [2024-06-15 19:44:37,781][1651669] Updated weights for policy 0, policy_version 689952 (0.0012) [2024-06-15 19:44:38,632][1651669] Updated weights for policy 0, policy_version 689983 (0.0011) [2024-06-15 19:44:40,778][1648981] Fps is (10 sec: 52367.4, 60 sec: 46967.5, 300 sec: 48317.0). Total num frames: 1413087232. Throughput: 0: 11989.0. Samples: 353325056. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:40,779][1648981] Avg episode reward: [(0, '759.810')] [2024-06-15 19:44:42,606][1651669] Updated weights for policy 0, policy_version 690039 (0.0014) [2024-06-15 19:44:45,112][1651669] Updated weights for policy 0, policy_version 690083 (0.0012) [2024-06-15 19:44:45,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 48060.0, 300 sec: 48430.0). Total num frames: 1413349376. Throughput: 0: 11776.4. Samples: 353404928. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:45,767][1648981] Avg episode reward: [(0, '757.180')] [2024-06-15 19:44:46,477][1651669] Updated weights for policy 0, policy_version 690145 (0.0036) [2024-06-15 19:44:47,038][1651669] Updated weights for policy 0, policy_version 690176 (0.0025) [2024-06-15 19:44:48,886][1651669] Updated weights for policy 0, policy_version 690229 (0.0014) [2024-06-15 19:44:50,766][1648981] Fps is (10 sec: 52490.9, 60 sec: 47513.5, 300 sec: 48430.0). Total num frames: 1413611520. Throughput: 0: 12242.9. Samples: 353481216. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:50,767][1648981] Avg episode reward: [(0, '723.920')] [2024-06-15 19:44:52,206][1651669] Updated weights for policy 0, policy_version 690291 (0.0120) [2024-06-15 19:44:55,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 46967.3, 300 sec: 48429.9). Total num frames: 1413808128. Throughput: 0: 11992.2. Samples: 353519104. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:44:55,767][1648981] Avg episode reward: [(0, '687.720')] [2024-06-15 19:44:56,267][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000690368_1413873664.pth... [2024-06-15 19:44:56,436][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000684672_1402208256.pth [2024-06-15 19:44:56,687][1651669] Updated weights for policy 0, policy_version 690384 (0.0012) [2024-06-15 19:44:57,694][1651669] Updated weights for policy 0, policy_version 690432 (0.0013) [2024-06-15 19:44:59,853][1651669] Updated weights for policy 0, policy_version 690490 (0.0013) [2024-06-15 19:45:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49153.0, 300 sec: 48430.0). Total num frames: 1414135808. Throughput: 0: 12117.5. Samples: 353584640. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:45:00,767][1648981] Avg episode reward: [(0, '687.040')] [2024-06-15 19:45:03,330][1651669] Updated weights for policy 0, policy_version 690552 (0.0014) [2024-06-15 19:45:05,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 46422.5, 300 sec: 48542.0). Total num frames: 1414299648. Throughput: 0: 12242.5. Samples: 353668096. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:45:05,767][1648981] Avg episode reward: [(0, '667.110')] [2024-06-15 19:45:06,656][1651669] Updated weights for policy 0, policy_version 690611 (0.0011) [2024-06-15 19:45:08,017][1651669] Updated weights for policy 0, policy_version 690680 (0.0013) [2024-06-15 19:45:10,442][1651669] Updated weights for policy 0, policy_version 690736 (0.0012) [2024-06-15 19:45:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49698.3, 300 sec: 48318.9). Total num frames: 1414627328. Throughput: 0: 12105.9. Samples: 353695232. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:45:10,767][1648981] Avg episode reward: [(0, '684.650')] [2024-06-15 19:45:13,762][1651669] Updated weights for policy 0, policy_version 690785 (0.0023) [2024-06-15 19:45:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 46421.9, 300 sec: 48431.9). Total num frames: 1414791168. Throughput: 0: 12310.8. Samples: 353771520. Policy #0 lag: (min: 47.0, avg: 138.6, max: 303.0) [2024-06-15 19:45:15,767][1648981] Avg episode reward: [(0, '729.010')] [2024-06-15 19:45:16,923][1651274] Signal inference workers to stop experience collection... (36200 times) [2024-06-15 19:45:16,968][1651669] Updated weights for policy 0, policy_version 690834 (0.0013) [2024-06-15 19:45:16,984][1651669] InferenceWorker_p0-w0: stopping experience collection (36200 times) [2024-06-15 19:45:17,291][1651274] Signal inference workers to resume experience collection... (36200 times) [2024-06-15 19:45:17,292][1651669] InferenceWorker_p0-w0: resuming experience collection (36200 times) [2024-06-15 19:45:18,763][1651669] Updated weights for policy 0, policy_version 690901 (0.0011) [2024-06-15 19:45:19,937][1651669] Updated weights for policy 0, policy_version 690945 (0.0021) [2024-06-15 19:45:20,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 50244.3, 300 sec: 48431.9). Total num frames: 1415118848. Throughput: 0: 12151.4. Samples: 353836544. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:20,768][1648981] Avg episode reward: [(0, '736.920')] [2024-06-15 19:45:21,071][1651669] Updated weights for policy 0, policy_version 690998 (0.0101) [2024-06-15 19:45:24,936][1651669] Updated weights for policy 0, policy_version 691040 (0.0013) [2024-06-15 19:45:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1415315456. Throughput: 0: 12257.1. Samples: 353876480. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:25,767][1648981] Avg episode reward: [(0, '729.120')] [2024-06-15 19:45:27,024][1651669] Updated weights for policy 0, policy_version 691073 (0.0012) [2024-06-15 19:45:28,468][1651669] Updated weights for policy 0, policy_version 691124 (0.0010) [2024-06-15 19:45:29,960][1651669] Updated weights for policy 0, policy_version 691191 (0.0012) [2024-06-15 19:45:30,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 50790.5, 300 sec: 48541.1). Total num frames: 1415610368. Throughput: 0: 12094.6. Samples: 353949184. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:30,767][1648981] Avg episode reward: [(0, '709.910')] [2024-06-15 19:45:31,265][1651669] Updated weights for policy 0, policy_version 691248 (0.0012) [2024-06-15 19:45:35,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 47513.7, 300 sec: 48096.8). Total num frames: 1415741440. Throughput: 0: 12185.6. Samples: 354029568. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:35,767][1648981] Avg episode reward: [(0, '711.670')] [2024-06-15 19:45:35,770][1651669] Updated weights for policy 0, policy_version 691296 (0.0011) [2024-06-15 19:45:38,445][1651669] Updated weights for policy 0, policy_version 691360 (0.0011) [2024-06-15 19:45:39,798][1651669] Updated weights for policy 0, policy_version 691410 (0.0037) [2024-06-15 19:45:40,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49707.9, 300 sec: 48431.2). Total num frames: 1416069120. Throughput: 0: 12208.4. Samples: 354068480. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:40,767][1648981] Avg episode reward: [(0, '716.880')] [2024-06-15 19:45:41,703][1651669] Updated weights for policy 0, policy_version 691492 (0.0012) [2024-06-15 19:45:45,771][1648981] Fps is (10 sec: 49129.1, 60 sec: 48056.0, 300 sec: 47984.9). Total num frames: 1416232960. Throughput: 0: 12104.7. Samples: 354129408. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:45,772][1648981] Avg episode reward: [(0, '753.750')] [2024-06-15 19:45:48,229][1651669] Updated weights for policy 0, policy_version 691569 (0.0012) [2024-06-15 19:45:49,492][1651669] Updated weights for policy 0, policy_version 691616 (0.0009) [2024-06-15 19:45:50,784][1648981] Fps is (10 sec: 45793.3, 60 sec: 48591.4, 300 sec: 48538.3). Total num frames: 1416527872. Throughput: 0: 11816.8. Samples: 354200064. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:50,788][1648981] Avg episode reward: [(0, '730.960')] [2024-06-15 19:45:51,338][1651669] Updated weights for policy 0, policy_version 691686 (0.0010) [2024-06-15 19:45:52,687][1651669] Updated weights for policy 0, policy_version 691728 (0.0014) [2024-06-15 19:45:55,766][1648981] Fps is (10 sec: 52452.8, 60 sec: 49152.2, 300 sec: 48207.8). Total num frames: 1416757248. Throughput: 0: 11832.9. Samples: 354227712. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:45:55,767][1648981] Avg episode reward: [(0, '701.640')] [2024-06-15 19:45:58,145][1651669] Updated weights for policy 0, policy_version 691792 (0.0011) [2024-06-15 19:45:58,250][1651274] Signal inference workers to stop experience collection... (36250 times) [2024-06-15 19:45:58,288][1651669] InferenceWorker_p0-w0: stopping experience collection (36250 times) [2024-06-15 19:45:58,509][1651274] Signal inference workers to resume experience collection... (36250 times) [2024-06-15 19:45:58,510][1651669] InferenceWorker_p0-w0: resuming experience collection (36250 times) [2024-06-15 19:45:59,348][1651669] Updated weights for policy 0, policy_version 691839 (0.0141) [2024-06-15 19:46:00,767][1648981] Fps is (10 sec: 42674.2, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 1416953856. Throughput: 0: 11923.9. Samples: 354308096. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:00,768][1648981] Avg episode reward: [(0, '712.530')] [2024-06-15 19:46:01,273][1651669] Updated weights for policy 0, policy_version 691894 (0.0021) [2024-06-15 19:46:02,641][1651669] Updated weights for policy 0, policy_version 691952 (0.0013) [2024-06-15 19:46:04,634][1651669] Updated weights for policy 0, policy_version 692007 (0.0011) [2024-06-15 19:46:05,181][1651669] Updated weights for policy 0, policy_version 692029 (0.0010) [2024-06-15 19:46:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 48209.0). Total num frames: 1417281536. Throughput: 0: 11901.2. Samples: 354372096. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:05,767][1648981] Avg episode reward: [(0, '720.170')] [2024-06-15 19:46:09,532][1651669] Updated weights for policy 0, policy_version 692068 (0.0046) [2024-06-15 19:46:10,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 46421.4, 300 sec: 47987.0). Total num frames: 1417412608. Throughput: 0: 11958.1. Samples: 354414592. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:10,767][1648981] Avg episode reward: [(0, '729.990')] [2024-06-15 19:46:11,399][1651669] Updated weights for policy 0, policy_version 692113 (0.0011) [2024-06-15 19:46:12,582][1651669] Updated weights for policy 0, policy_version 692160 (0.0014) [2024-06-15 19:46:14,038][1651669] Updated weights for policy 0, policy_version 692219 (0.0010) [2024-06-15 19:46:15,682][1651669] Updated weights for policy 0, policy_version 692272 (0.0012) [2024-06-15 19:46:15,786][1648981] Fps is (10 sec: 49055.0, 60 sec: 49681.8, 300 sec: 48315.7). Total num frames: 1417773056. Throughput: 0: 11975.5. Samples: 354488320. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:15,787][1648981] Avg episode reward: [(0, '719.390')] [2024-06-15 19:46:19,709][1651669] Updated weights for policy 0, policy_version 692324 (0.0013) [2024-06-15 19:46:20,773][1648981] Fps is (10 sec: 52393.3, 60 sec: 46962.4, 300 sec: 48206.9). Total num frames: 1417936896. Throughput: 0: 11774.2. Samples: 354559488. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:20,777][1648981] Avg episode reward: [(0, '690.050')] [2024-06-15 19:46:22,630][1651669] Updated weights for policy 0, policy_version 692388 (0.0011) [2024-06-15 19:46:23,487][1651669] Updated weights for policy 0, policy_version 692422 (0.0010) [2024-06-15 19:46:24,610][1651669] Updated weights for policy 0, policy_version 692469 (0.0011) [2024-06-15 19:46:25,766][1648981] Fps is (10 sec: 49249.4, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 1418264576. Throughput: 0: 11832.9. Samples: 354600960. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:25,767][1648981] Avg episode reward: [(0, '731.350')] [2024-06-15 19:46:26,178][1651669] Updated weights for policy 0, policy_version 692533 (0.0088) [2024-06-15 19:46:30,510][1651669] Updated weights for policy 0, policy_version 692581 (0.0012) [2024-06-15 19:46:30,766][1648981] Fps is (10 sec: 49185.1, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 1418428416. Throughput: 0: 12073.0. Samples: 354672640. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:30,767][1648981] Avg episode reward: [(0, '728.050')] [2024-06-15 19:46:33,352][1651669] Updated weights for policy 0, policy_version 692633 (0.0089) [2024-06-15 19:46:34,719][1651669] Updated weights for policy 0, policy_version 692693 (0.0011) [2024-06-15 19:46:35,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49698.1, 300 sec: 48207.8). Total num frames: 1418723328. Throughput: 0: 12065.3. Samples: 354742784. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:35,767][1648981] Avg episode reward: [(0, '735.300')] [2024-06-15 19:46:36,060][1651274] Signal inference workers to stop experience collection... (36300 times) [2024-06-15 19:46:36,080][1651669] Updated weights for policy 0, policy_version 692753 (0.0087) [2024-06-15 19:46:36,099][1651669] InferenceWorker_p0-w0: stopping experience collection (36300 times) [2024-06-15 19:46:36,368][1651274] Signal inference workers to resume experience collection... (36300 times) [2024-06-15 19:46:36,368][1651669] InferenceWorker_p0-w0: resuming experience collection (36300 times) [2024-06-15 19:46:40,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1418854400. Throughput: 0: 12288.0. Samples: 354780672. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:40,767][1648981] Avg episode reward: [(0, '732.450')] [2024-06-15 19:46:41,094][1651669] Updated weights for policy 0, policy_version 692818 (0.0016) [2024-06-15 19:46:41,860][1651669] Updated weights for policy 0, policy_version 692861 (0.0056) [2024-06-15 19:46:44,128][1651669] Updated weights for policy 0, policy_version 692916 (0.0011) [2024-06-15 19:46:45,223][1651669] Updated weights for policy 0, policy_version 692964 (0.0013) [2024-06-15 19:46:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50248.1, 300 sec: 48430.0). Total num frames: 1419247616. Throughput: 0: 12185.6. Samples: 354856448. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:45,767][1648981] Avg episode reward: [(0, '718.350')] [2024-06-15 19:46:47,525][1651669] Updated weights for policy 0, policy_version 693040 (0.0097) [2024-06-15 19:46:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47527.8, 300 sec: 47985.8). Total num frames: 1419378688. Throughput: 0: 12435.9. Samples: 354931712. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:50,767][1648981] Avg episode reward: [(0, '716.890')] [2024-06-15 19:46:51,397][1651669] Updated weights for policy 0, policy_version 693088 (0.0011) [2024-06-15 19:46:54,436][1651669] Updated weights for policy 0, policy_version 693136 (0.0013) [2024-06-15 19:46:55,767][1648981] Fps is (10 sec: 39320.4, 60 sec: 48059.5, 300 sec: 47985.7). Total num frames: 1419640832. Throughput: 0: 12231.0. Samples: 354964992. Policy #0 lag: (min: 15.0, avg: 101.0, max: 271.0) [2024-06-15 19:46:55,768][1648981] Avg episode reward: [(0, '713.630')] [2024-06-15 19:46:56,013][1651669] Updated weights for policy 0, policy_version 693200 (0.0206) [2024-06-15 19:46:56,201][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000693216_1419706368.pth... [2024-06-15 19:46:56,383][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000687552_1408106496.pth [2024-06-15 19:46:58,064][1651669] Updated weights for policy 0, policy_version 693280 (0.0013) [2024-06-15 19:47:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 47987.3). Total num frames: 1419902976. Throughput: 0: 12259.3. Samples: 355039744. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:00,767][1648981] Avg episode reward: [(0, '713.630')] [2024-06-15 19:47:01,612][1651669] Updated weights for policy 0, policy_version 693344 (0.0013) [2024-06-15 19:47:02,266][1651669] Updated weights for policy 0, policy_version 693372 (0.0023) [2024-06-15 19:47:05,149][1651669] Updated weights for policy 0, policy_version 693424 (0.0011) [2024-06-15 19:47:05,787][1648981] Fps is (10 sec: 52324.1, 60 sec: 48043.5, 300 sec: 47982.4). Total num frames: 1420165120. Throughput: 0: 12307.0. Samples: 355113472. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:05,787][1648981] Avg episode reward: [(0, '713.630')] [2024-06-15 19:47:07,488][1651669] Updated weights for policy 0, policy_version 693488 (0.0069) [2024-06-15 19:47:08,648][1651669] Updated weights for policy 0, policy_version 693524 (0.0048) [2024-06-15 19:47:09,561][1651669] Updated weights for policy 0, policy_version 693565 (0.0011) [2024-06-15 19:47:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 1420427264. Throughput: 0: 12208.4. Samples: 355150336. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:10,767][1648981] Avg episode reward: [(0, '710.650')] [2024-06-15 19:47:12,569][1651669] Updated weights for policy 0, policy_version 693625 (0.0133) [2024-06-15 19:47:15,766][1648981] Fps is (10 sec: 49251.9, 60 sec: 48075.6, 300 sec: 48318.9). Total num frames: 1420656640. Throughput: 0: 12379.0. Samples: 355229696. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:15,767][1648981] Avg episode reward: [(0, '712.240')] [2024-06-15 19:47:15,877][1651669] Updated weights for policy 0, policy_version 693691 (0.0098) [2024-06-15 19:47:18,898][1651669] Updated weights for policy 0, policy_version 693749 (0.0013) [2024-06-15 19:47:19,752][1651274] Signal inference workers to stop experience collection... (36350 times) [2024-06-15 19:47:19,777][1651669] InferenceWorker_p0-w0: stopping experience collection (36350 times) [2024-06-15 19:47:20,014][1651274] Signal inference workers to resume experience collection... (36350 times) [2024-06-15 19:47:20,015][1651669] InferenceWorker_p0-w0: resuming experience collection (36350 times) [2024-06-15 19:47:20,640][1651669] Updated weights for policy 0, policy_version 693814 (0.0011) [2024-06-15 19:47:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49703.7, 300 sec: 47987.6). Total num frames: 1420918784. Throughput: 0: 12310.7. Samples: 355296768. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:20,767][1648981] Avg episode reward: [(0, '745.860')] [2024-06-15 19:47:22,912][1651669] Updated weights for policy 0, policy_version 693875 (0.0011) [2024-06-15 19:47:25,014][1651669] Updated weights for policy 0, policy_version 693893 (0.0011) [2024-06-15 19:47:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1421148160. Throughput: 0: 12288.0. Samples: 355333632. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:25,767][1648981] Avg episode reward: [(0, '746.050')] [2024-06-15 19:47:29,404][1651669] Updated weights for policy 0, policy_version 693956 (0.0012) [2024-06-15 19:47:30,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1421344768. Throughput: 0: 12367.7. Samples: 355412992. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:30,767][1648981] Avg episode reward: [(0, '733.190')] [2024-06-15 19:47:31,121][1651669] Updated weights for policy 0, policy_version 694037 (0.0011) [2024-06-15 19:47:32,203][1651669] Updated weights for policy 0, policy_version 694080 (0.0012) [2024-06-15 19:47:33,605][1651669] Updated weights for policy 0, policy_version 694139 (0.0013) [2024-06-15 19:47:35,806][1648981] Fps is (10 sec: 48957.4, 60 sec: 48573.7, 300 sec: 48423.5). Total num frames: 1421639680. Throughput: 0: 12118.0. Samples: 355477504. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:35,807][1648981] Avg episode reward: [(0, '739.590')] [2024-06-15 19:47:36,841][1651669] Updated weights for policy 0, policy_version 694200 (0.0021) [2024-06-15 19:47:40,767][1648981] Fps is (10 sec: 39320.9, 60 sec: 48059.6, 300 sec: 47763.5). Total num frames: 1421737984. Throughput: 0: 12094.6. Samples: 355509248. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:40,767][1648981] Avg episode reward: [(0, '699.490')] [2024-06-15 19:47:41,817][1651669] Updated weights for policy 0, policy_version 694256 (0.0013) [2024-06-15 19:47:43,917][1651669] Updated weights for policy 0, policy_version 694336 (0.0016) [2024-06-15 19:47:45,354][1651669] Updated weights for policy 0, policy_version 694396 (0.0013) [2024-06-15 19:47:45,767][1648981] Fps is (10 sec: 49347.3, 60 sec: 48059.6, 300 sec: 48430.0). Total num frames: 1422131200. Throughput: 0: 11958.0. Samples: 355577856. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:45,767][1648981] Avg episode reward: [(0, '729.240')] [2024-06-15 19:47:48,282][1651669] Updated weights for policy 0, policy_version 694448 (0.0011) [2024-06-15 19:47:50,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 1422262272. Throughput: 0: 11826.8. Samples: 355645440. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:50,767][1648981] Avg episode reward: [(0, '731.450')] [2024-06-15 19:47:52,482][1651669] Updated weights for policy 0, policy_version 694485 (0.0059) [2024-06-15 19:47:53,619][1651669] Updated weights for policy 0, policy_version 694533 (0.0027) [2024-06-15 19:47:54,859][1651669] Updated weights for policy 0, policy_version 694592 (0.0013) [2024-06-15 19:47:55,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.2, 300 sec: 48207.8). Total num frames: 1422589952. Throughput: 0: 11946.6. Samples: 355687936. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:47:55,767][1648981] Avg episode reward: [(0, '710.680')] [2024-06-15 19:47:56,326][1651669] Updated weights for policy 0, policy_version 694649 (0.0011) [2024-06-15 19:47:59,064][1651669] Updated weights for policy 0, policy_version 694704 (0.0013) [2024-06-15 19:48:00,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 48059.8, 300 sec: 48207.9). Total num frames: 1422786560. Throughput: 0: 11639.5. Samples: 355753472. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:00,767][1648981] Avg episode reward: [(0, '723.250')] [2024-06-15 19:48:03,337][1651274] Signal inference workers to stop experience collection... (36400 times) [2024-06-15 19:48:03,368][1651669] InferenceWorker_p0-w0: stopping experience collection (36400 times) [2024-06-15 19:48:03,558][1651274] Signal inference workers to resume experience collection... (36400 times) [2024-06-15 19:48:03,559][1651669] InferenceWorker_p0-w0: resuming experience collection (36400 times) [2024-06-15 19:48:03,561][1651669] Updated weights for policy 0, policy_version 694752 (0.0011) [2024-06-15 19:48:05,433][1651669] Updated weights for policy 0, policy_version 694836 (0.0013) [2024-06-15 19:48:05,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48076.0, 300 sec: 47985.8). Total num frames: 1423048704. Throughput: 0: 11832.9. Samples: 355829248. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:05,767][1648981] Avg episode reward: [(0, '715.550')] [2024-06-15 19:48:06,773][1651669] Updated weights for policy 0, policy_version 694883 (0.0044) [2024-06-15 19:48:08,347][1651669] Updated weights for policy 0, policy_version 694916 (0.0036) [2024-06-15 19:48:09,546][1651669] Updated weights for policy 0, policy_version 694976 (0.0012) [2024-06-15 19:48:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1423310848. Throughput: 0: 11855.6. Samples: 355867136. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:10,767][1648981] Avg episode reward: [(0, '711.660')] [2024-06-15 19:48:15,397][1651669] Updated weights for policy 0, policy_version 695040 (0.0130) [2024-06-15 19:48:15,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46967.5, 300 sec: 47766.3). Total num frames: 1423474688. Throughput: 0: 11798.8. Samples: 355943936. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:15,767][1648981] Avg episode reward: [(0, '744.780')] [2024-06-15 19:48:16,676][1651669] Updated weights for policy 0, policy_version 695104 (0.0012) [2024-06-15 19:48:18,813][1651669] Updated weights for policy 0, policy_version 695160 (0.0011) [2024-06-15 19:48:20,108][1651669] Updated weights for policy 0, policy_version 695216 (0.0014) [2024-06-15 19:48:20,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1423835136. Throughput: 0: 11661.1. Samples: 356001792. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:20,767][1648981] Avg episode reward: [(0, '776.870')] [2024-06-15 19:48:25,767][1648981] Fps is (10 sec: 36044.0, 60 sec: 44782.8, 300 sec: 47430.2). Total num frames: 1423835136. Throughput: 0: 11776.0. Samples: 356039168. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:25,767][1648981] Avg episode reward: [(0, '752.830')] [2024-06-15 19:48:26,639][1651669] Updated weights for policy 0, policy_version 695283 (0.0076) [2024-06-15 19:48:27,908][1651669] Updated weights for policy 0, policy_version 695344 (0.0011) [2024-06-15 19:48:28,471][1651669] Updated weights for policy 0, policy_version 695368 (0.0010) [2024-06-15 19:48:30,236][1651669] Updated weights for policy 0, policy_version 695440 (0.0014) [2024-06-15 19:48:30,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 48437.4). Total num frames: 1424293888. Throughput: 0: 11912.6. Samples: 356113920. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:30,767][1648981] Avg episode reward: [(0, '795.020')] [2024-06-15 19:48:35,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 45359.1, 300 sec: 47765.4). Total num frames: 1424359424. Throughput: 0: 11992.2. Samples: 356185088. Policy #0 lag: (min: 35.0, avg: 157.9, max: 295.0) [2024-06-15 19:48:35,767][1648981] Avg episode reward: [(0, '791.910')] [2024-06-15 19:48:37,773][1651669] Updated weights for policy 0, policy_version 695520 (0.0012) [2024-06-15 19:48:38,951][1651669] Updated weights for policy 0, policy_version 695571 (0.0022) [2024-06-15 19:48:40,583][1651274] Signal inference workers to stop experience collection... (36450 times) [2024-06-15 19:48:40,640][1651669] InferenceWorker_p0-w0: stopping experience collection (36450 times) [2024-06-15 19:48:40,642][1651669] Updated weights for policy 0, policy_version 695636 (0.0010) [2024-06-15 19:48:40,766][1648981] Fps is (10 sec: 36044.6, 60 sec: 48606.0, 300 sec: 48096.8). Total num frames: 1424654336. Throughput: 0: 11798.8. Samples: 356218880. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:48:40,767][1648981] Avg episode reward: [(0, '815.320')] [2024-06-15 19:48:40,899][1651274] Signal inference workers to resume experience collection... (36450 times) [2024-06-15 19:48:40,901][1651669] InferenceWorker_p0-w0: resuming experience collection (36450 times) [2024-06-15 19:48:41,881][1651669] Updated weights for policy 0, policy_version 695687 (0.0010) [2024-06-15 19:48:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45875.3, 300 sec: 47874.6). Total num frames: 1424883712. Throughput: 0: 11844.3. Samples: 356286464. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:48:45,767][1648981] Avg episode reward: [(0, '806.530')] [2024-06-15 19:48:48,250][1651669] Updated weights for policy 0, policy_version 695749 (0.0012) [2024-06-15 19:48:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.6, 300 sec: 47763.5). Total num frames: 1425080320. Throughput: 0: 11628.1. Samples: 356352512. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:48:50,767][1648981] Avg episode reward: [(0, '833.980')] [2024-06-15 19:48:50,906][1651669] Updated weights for policy 0, policy_version 695862 (0.0152) [2024-06-15 19:48:52,018][1651669] Updated weights for policy 0, policy_version 695910 (0.0009) [2024-06-15 19:48:53,296][1651669] Updated weights for policy 0, policy_version 695968 (0.0011) [2024-06-15 19:48:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 46967.5, 300 sec: 48208.0). Total num frames: 1425408000. Throughput: 0: 11480.2. Samples: 356383744. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:48:55,767][1648981] Avg episode reward: [(0, '875.470')] [2024-06-15 19:48:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000696000_1425408000.pth... [2024-06-15 19:48:55,859][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000690368_1413873664.pth [2024-06-15 19:49:00,019][1651669] Updated weights for policy 0, policy_version 696004 (0.0028) [2024-06-15 19:49:00,779][1648981] Fps is (10 sec: 39270.6, 60 sec: 44773.2, 300 sec: 47317.4). Total num frames: 1425473536. Throughput: 0: 11579.2. Samples: 356465152. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:00,780][1648981] Avg episode reward: [(0, '869.240')] [2024-06-15 19:49:01,522][1651669] Updated weights for policy 0, policy_version 696065 (0.0013) [2024-06-15 19:49:03,276][1651669] Updated weights for policy 0, policy_version 696144 (0.0014) [2024-06-15 19:49:04,723][1651669] Updated weights for policy 0, policy_version 696208 (0.0024) [2024-06-15 19:49:05,731][1651669] Updated weights for policy 0, policy_version 696256 (0.0011) [2024-06-15 19:49:05,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 1425932288. Throughput: 0: 11616.7. Samples: 356524544. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:05,767][1648981] Avg episode reward: [(0, '881.400')] [2024-06-15 19:49:10,766][1648981] Fps is (10 sec: 45934.8, 60 sec: 43690.7, 300 sec: 47208.2). Total num frames: 1425932288. Throughput: 0: 11855.7. Samples: 356572672. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:10,767][1648981] Avg episode reward: [(0, '919.060')] [2024-06-15 19:49:11,992][1651669] Updated weights for policy 0, policy_version 696320 (0.0025) [2024-06-15 19:49:13,932][1651669] Updated weights for policy 0, policy_version 696400 (0.0098) [2024-06-15 19:49:15,189][1651669] Updated weights for policy 0, policy_version 696455 (0.0018) [2024-06-15 19:49:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1426391040. Throughput: 0: 11605.3. Samples: 356636160. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:15,767][1648981] Avg episode reward: [(0, '907.670')] [2024-06-15 19:49:16,226][1651669] Updated weights for policy 0, policy_version 696508 (0.0011) [2024-06-15 19:49:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 43690.7, 300 sec: 47652.4). Total num frames: 1426456576. Throughput: 0: 11980.8. Samples: 356724224. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:20,767][1648981] Avg episode reward: [(0, '855.130')] [2024-06-15 19:49:21,350][1651274] Signal inference workers to stop experience collection... (36500 times) [2024-06-15 19:49:21,393][1651669] InferenceWorker_p0-w0: stopping experience collection (36500 times) [2024-06-15 19:49:21,631][1651274] Signal inference workers to resume experience collection... (36500 times) [2024-06-15 19:49:21,631][1651669] InferenceWorker_p0-w0: resuming experience collection (36500 times) [2024-06-15 19:49:22,304][1651669] Updated weights for policy 0, policy_version 696576 (0.0115) [2024-06-15 19:49:23,576][1651669] Updated weights for policy 0, policy_version 696633 (0.0094) [2024-06-15 19:49:25,021][1651669] Updated weights for policy 0, policy_version 696677 (0.0011) [2024-06-15 19:49:25,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 50790.6, 300 sec: 48541.1). Total num frames: 1426882560. Throughput: 0: 11832.9. Samples: 356751360. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:25,767][1648981] Avg episode reward: [(0, '845.870')] [2024-06-15 19:49:26,020][1651669] Updated weights for policy 0, policy_version 696736 (0.0031) [2024-06-15 19:49:26,608][1651669] Updated weights for policy 0, policy_version 696765 (0.0031) [2024-06-15 19:49:30,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 44782.7, 300 sec: 47763.5). Total num frames: 1426980864. Throughput: 0: 12174.2. Samples: 356834304. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:30,767][1648981] Avg episode reward: [(0, '864.680')] [2024-06-15 19:49:33,686][1651669] Updated weights for policy 0, policy_version 696868 (0.0013) [2024-06-15 19:49:35,441][1651669] Updated weights for policy 0, policy_version 696915 (0.0017) [2024-06-15 19:49:35,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 49152.0, 300 sec: 48209.8). Total num frames: 1427308544. Throughput: 0: 12128.7. Samples: 356898304. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:35,767][1648981] Avg episode reward: [(0, '859.820')] [2024-06-15 19:49:37,163][1651669] Updated weights for policy 0, policy_version 696997 (0.0093) [2024-06-15 19:49:40,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1427505152. Throughput: 0: 12197.0. Samples: 356932608. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:40,767][1648981] Avg episode reward: [(0, '879.100')] [2024-06-15 19:49:42,240][1651669] Updated weights for policy 0, policy_version 697029 (0.0068) [2024-06-15 19:49:44,209][1651669] Updated weights for policy 0, policy_version 697091 (0.0052) [2024-06-15 19:49:45,436][1651669] Updated weights for policy 0, policy_version 697152 (0.0011) [2024-06-15 19:49:45,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1427767296. Throughput: 0: 12029.8. Samples: 357006336. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:45,767][1648981] Avg episode reward: [(0, '889.820')] [2024-06-15 19:49:47,608][1651669] Updated weights for policy 0, policy_version 697232 (0.0127) [2024-06-15 19:49:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 48207.9). Total num frames: 1428029440. Throughput: 0: 12231.1. Samples: 357074944. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:50,767][1648981] Avg episode reward: [(0, '907.680')] [2024-06-15 19:49:53,181][1651669] Updated weights for policy 0, policy_version 697296 (0.0012) [2024-06-15 19:49:54,108][1651669] Updated weights for policy 0, policy_version 697342 (0.0089) [2024-06-15 19:49:55,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1428226048. Throughput: 0: 12128.7. Samples: 357118464. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:49:55,767][1648981] Avg episode reward: [(0, '875.530')] [2024-06-15 19:49:56,023][1651669] Updated weights for policy 0, policy_version 697396 (0.0011) [2024-06-15 19:49:56,641][1651669] Updated weights for policy 0, policy_version 697424 (0.0015) [2024-06-15 19:49:57,609][1651274] Signal inference workers to stop experience collection... (36550 times) [2024-06-15 19:49:57,684][1651669] InferenceWorker_p0-w0: stopping experience collection (36550 times) [2024-06-15 19:49:57,829][1651274] Signal inference workers to resume experience collection... (36550 times) [2024-06-15 19:49:57,830][1651669] InferenceWorker_p0-w0: resuming experience collection (36550 times) [2024-06-15 19:49:58,030][1651669] Updated weights for policy 0, policy_version 697479 (0.0100) [2024-06-15 19:49:59,036][1651669] Updated weights for policy 0, policy_version 697536 (0.0081) [2024-06-15 19:50:00,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 51347.5, 300 sec: 48318.9). Total num frames: 1428553728. Throughput: 0: 12242.4. Samples: 357187072. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:50:00,768][1648981] Avg episode reward: [(0, '885.000')] [2024-06-15 19:50:04,537][1651669] Updated weights for policy 0, policy_version 697598 (0.0090) [2024-06-15 19:50:05,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45875.1, 300 sec: 47652.5). Total num frames: 1428684800. Throughput: 0: 12049.1. Samples: 357266432. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:50:05,767][1648981] Avg episode reward: [(0, '880.640')] [2024-06-15 19:50:07,039][1651669] Updated weights for policy 0, policy_version 697659 (0.0013) [2024-06-15 19:50:08,578][1651669] Updated weights for policy 0, policy_version 697715 (0.0011) [2024-06-15 19:50:09,617][1651669] Updated weights for policy 0, policy_version 697774 (0.0017) [2024-06-15 19:50:10,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 52428.9, 300 sec: 48430.0). Total num frames: 1429078016. Throughput: 0: 12037.7. Samples: 357293056. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:50:10,767][1648981] Avg episode reward: [(0, '867.970')] [2024-06-15 19:50:15,071][1651669] Updated weights for policy 0, policy_version 697824 (0.0019) [2024-06-15 19:50:15,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 46421.2, 300 sec: 47652.5). Total num frames: 1429176320. Throughput: 0: 12128.7. Samples: 357380096. Policy #0 lag: (min: 15.0, avg: 77.7, max: 271.0) [2024-06-15 19:50:15,767][1648981] Avg episode reward: [(0, '841.790')] [2024-06-15 19:50:16,248][1651669] Updated weights for policy 0, policy_version 697859 (0.0013) [2024-06-15 19:50:17,637][1651669] Updated weights for policy 0, policy_version 697911 (0.0031) [2024-06-15 19:50:18,697][1651669] Updated weights for policy 0, policy_version 697954 (0.0011) [2024-06-15 19:50:20,184][1651669] Updated weights for policy 0, policy_version 698016 (0.0018) [2024-06-15 19:50:20,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 51882.7, 300 sec: 48318.9). Total num frames: 1429569536. Throughput: 0: 12083.2. Samples: 357442048. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:20,767][1648981] Avg episode reward: [(0, '878.490')] [2024-06-15 19:50:25,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1429635072. Throughput: 0: 12265.3. Samples: 357484544. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:25,767][1648981] Avg episode reward: [(0, '861.150')] [2024-06-15 19:50:25,947][1651669] Updated weights for policy 0, policy_version 698080 (0.0013) [2024-06-15 19:50:27,973][1651669] Updated weights for policy 0, policy_version 698148 (0.0011) [2024-06-15 19:50:28,608][1651669] Updated weights for policy 0, policy_version 698176 (0.0011) [2024-06-15 19:50:30,160][1651669] Updated weights for policy 0, policy_version 698240 (0.0011) [2024-06-15 19:50:30,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 50790.4, 300 sec: 48429.9). Total num frames: 1430028288. Throughput: 0: 12128.7. Samples: 357552128. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:30,767][1648981] Avg episode reward: [(0, '854.940')] [2024-06-15 19:50:31,358][1651669] Updated weights for policy 0, policy_version 698300 (0.0015) [2024-06-15 19:50:35,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 1430126592. Throughput: 0: 12379.0. Samples: 357632000. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:35,767][1648981] Avg episode reward: [(0, '862.770')] [2024-06-15 19:50:37,520][1651669] Updated weights for policy 0, policy_version 698352 (0.0015) [2024-06-15 19:50:37,944][1651274] Signal inference workers to stop experience collection... (36600 times) [2024-06-15 19:50:38,027][1651669] InferenceWorker_p0-w0: stopping experience collection (36600 times) [2024-06-15 19:50:38,203][1651274] Signal inference workers to resume experience collection... (36600 times) [2024-06-15 19:50:38,204][1651669] InferenceWorker_p0-w0: resuming experience collection (36600 times) [2024-06-15 19:50:39,467][1651669] Updated weights for policy 0, policy_version 698428 (0.0020) [2024-06-15 19:50:40,785][1648981] Fps is (10 sec: 39248.7, 60 sec: 48590.7, 300 sec: 48094.4). Total num frames: 1430421504. Throughput: 0: 12044.0. Samples: 357660672. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:40,786][1648981] Avg episode reward: [(0, '883.770')] [2024-06-15 19:50:42,072][1651669] Updated weights for policy 0, policy_version 698499 (0.0013) [2024-06-15 19:50:42,950][1651669] Updated weights for policy 0, policy_version 698556 (0.0011) [2024-06-15 19:50:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 47877.5). Total num frames: 1430650880. Throughput: 0: 12026.4. Samples: 357728256. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:45,767][1648981] Avg episode reward: [(0, '856.590')] [2024-06-15 19:50:49,553][1651669] Updated weights for policy 0, policy_version 698624 (0.0012) [2024-06-15 19:50:50,766][1648981] Fps is (10 sec: 42678.5, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1430847488. Throughput: 0: 11798.8. Samples: 357797376. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:50,767][1648981] Avg episode reward: [(0, '888.800')] [2024-06-15 19:50:51,114][1651669] Updated weights for policy 0, policy_version 698680 (0.0013) [2024-06-15 19:50:52,405][1651669] Updated weights for policy 0, policy_version 698722 (0.0011) [2024-06-15 19:50:53,509][1651669] Updated weights for policy 0, policy_version 698772 (0.0029) [2024-06-15 19:50:55,777][1648981] Fps is (10 sec: 52370.9, 60 sec: 49142.9, 300 sec: 48206.1). Total num frames: 1431175168. Throughput: 0: 11921.0. Samples: 357829632. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:50:55,780][1648981] Avg episode reward: [(0, '893.880')] [2024-06-15 19:50:55,792][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000698816_1431175168.pth... [2024-06-15 19:50:55,848][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000693216_1419706368.pth [2024-06-15 19:50:55,853][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000698816_1431175168.pth [2024-06-15 19:50:59,174][1651669] Updated weights for policy 0, policy_version 698848 (0.0014) [2024-06-15 19:51:00,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 46967.5, 300 sec: 47763.5). Total num frames: 1431371776. Throughput: 0: 11867.0. Samples: 357914112. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:00,767][1648981] Avg episode reward: [(0, '896.180')] [2024-06-15 19:51:01,419][1651669] Updated weights for policy 0, policy_version 698939 (0.0029) [2024-06-15 19:51:04,140][1651669] Updated weights for policy 0, policy_version 699008 (0.0012) [2024-06-15 19:51:05,293][1651669] Updated weights for policy 0, policy_version 699066 (0.0012) [2024-06-15 19:51:05,767][1648981] Fps is (10 sec: 52485.7, 60 sec: 50244.1, 300 sec: 48429.9). Total num frames: 1431699456. Throughput: 0: 11719.0. Samples: 357969408. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:05,767][1648981] Avg episode reward: [(0, '906.220')] [2024-06-15 19:51:10,266][1651669] Updated weights for policy 0, policy_version 699111 (0.0017) [2024-06-15 19:51:10,766][1648981] Fps is (10 sec: 42599.4, 60 sec: 45329.0, 300 sec: 47544.6). Total num frames: 1431797760. Throughput: 0: 11867.0. Samples: 358018560. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:10,767][1648981] Avg episode reward: [(0, '863.270')] [2024-06-15 19:51:11,314][1651669] Updated weights for policy 0, policy_version 699152 (0.0013) [2024-06-15 19:51:14,380][1651669] Updated weights for policy 0, policy_version 699234 (0.0014) [2024-06-15 19:51:15,078][1651274] Signal inference workers to stop experience collection... (36650 times) [2024-06-15 19:51:15,161][1651669] InferenceWorker_p0-w0: stopping experience collection (36650 times) [2024-06-15 19:51:15,323][1651274] Signal inference workers to resume experience collection... (36650 times) [2024-06-15 19:51:15,324][1651669] InferenceWorker_p0-w0: resuming experience collection (36650 times) [2024-06-15 19:51:15,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 49698.2, 300 sec: 48208.9). Total num frames: 1432158208. Throughput: 0: 11798.8. Samples: 358083072. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:15,767][1648981] Avg episode reward: [(0, '927.570')] [2024-06-15 19:51:16,216][1651669] Updated weights for policy 0, policy_version 699328 (0.0151) [2024-06-15 19:51:20,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 47430.3). Total num frames: 1432256512. Throughput: 0: 11776.0. Samples: 358161920. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:20,767][1648981] Avg episode reward: [(0, '979.600')] [2024-06-15 19:51:21,375][1651669] Updated weights for policy 0, policy_version 699385 (0.0023) [2024-06-15 19:51:22,671][1651669] Updated weights for policy 0, policy_version 699424 (0.0012) [2024-06-15 19:51:24,394][1651669] Updated weights for policy 0, policy_version 699461 (0.0016) [2024-06-15 19:51:25,779][1648981] Fps is (10 sec: 42543.1, 60 sec: 49141.2, 300 sec: 47983.6). Total num frames: 1432584192. Throughput: 0: 11891.3. Samples: 358195712. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:25,780][1648981] Avg episode reward: [(0, '966.120')] [2024-06-15 19:51:26,067][1651669] Updated weights for policy 0, policy_version 699523 (0.0011) [2024-06-15 19:51:27,271][1651669] Updated weights for policy 0, policy_version 699579 (0.0011) [2024-06-15 19:51:30,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45329.3, 300 sec: 47541.4). Total num frames: 1432748032. Throughput: 0: 11923.9. Samples: 358264832. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:30,767][1648981] Avg episode reward: [(0, '968.190')] [2024-06-15 19:51:31,904][1651669] Updated weights for policy 0, policy_version 699641 (0.0012) [2024-06-15 19:51:32,774][1651669] Updated weights for policy 0, policy_version 699680 (0.0044) [2024-06-15 19:51:34,654][1651669] Updated weights for policy 0, policy_version 699715 (0.0012) [2024-06-15 19:51:35,766][1648981] Fps is (10 sec: 52497.5, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1433108480. Throughput: 0: 12174.2. Samples: 358345216. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:35,767][1648981] Avg episode reward: [(0, '979.480')] [2024-06-15 19:51:35,952][1651669] Updated weights for policy 0, policy_version 699778 (0.0013) [2024-06-15 19:51:36,946][1651669] Updated weights for policy 0, policy_version 699831 (0.0099) [2024-06-15 19:51:40,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 47528.4, 300 sec: 47541.4). Total num frames: 1433272320. Throughput: 0: 12234.1. Samples: 358380032. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:40,767][1648981] Avg episode reward: [(0, '979.680')] [2024-06-15 19:51:42,158][1651669] Updated weights for policy 0, policy_version 699872 (0.0011) [2024-06-15 19:51:43,754][1651669] Updated weights for policy 0, policy_version 699937 (0.0011) [2024-06-15 19:51:45,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1433567232. Throughput: 0: 12151.5. Samples: 358460928. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:45,767][1648981] Avg episode reward: [(0, '902.500')] [2024-06-15 19:51:45,831][1651669] Updated weights for policy 0, policy_version 700000 (0.0015) [2024-06-15 19:51:47,307][1651669] Updated weights for policy 0, policy_version 700067 (0.0053) [2024-06-15 19:51:50,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 49151.8, 300 sec: 47985.7). Total num frames: 1433796608. Throughput: 0: 12447.3. Samples: 358529536. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:50,768][1648981] Avg episode reward: [(0, '893.670')] [2024-06-15 19:51:54,056][1651669] Updated weights for policy 0, policy_version 700161 (0.0028) [2024-06-15 19:51:54,663][1651274] Signal inference workers to stop experience collection... (36700 times) [2024-06-15 19:51:54,730][1651669] InferenceWorker_p0-w0: stopping experience collection (36700 times) [2024-06-15 19:51:54,930][1651274] Signal inference workers to resume experience collection... (36700 times) [2024-06-15 19:51:54,931][1651669] InferenceWorker_p0-w0: resuming experience collection (36700 times) [2024-06-15 19:51:55,148][1651669] Updated weights for policy 0, policy_version 700221 (0.0012) [2024-06-15 19:51:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48068.6, 300 sec: 47985.7). Total num frames: 1434058752. Throughput: 0: 12288.0. Samples: 358571520. Policy #0 lag: (min: 52.0, avg: 147.6, max: 308.0) [2024-06-15 19:51:55,767][1648981] Avg episode reward: [(0, '895.900')] [2024-06-15 19:51:57,459][1651669] Updated weights for policy 0, policy_version 700289 (0.0012) [2024-06-15 19:51:58,746][1651669] Updated weights for policy 0, policy_version 700352 (0.0015) [2024-06-15 19:52:00,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 49152.2, 300 sec: 47989.0). Total num frames: 1434320896. Throughput: 0: 12253.9. Samples: 358634496. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:00,767][1648981] Avg episode reward: [(0, '915.260')] [2024-06-15 19:52:05,412][1651669] Updated weights for policy 0, policy_version 700435 (0.0012) [2024-06-15 19:52:05,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46967.7, 300 sec: 47763.5). Total num frames: 1434517504. Throughput: 0: 12151.5. Samples: 358708736. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:05,767][1648981] Avg episode reward: [(0, '922.660')] [2024-06-15 19:52:07,428][1651669] Updated weights for policy 0, policy_version 700516 (0.0010) [2024-06-15 19:52:08,064][1651669] Updated weights for policy 0, policy_version 700544 (0.0012) [2024-06-15 19:52:09,468][1651669] Updated weights for policy 0, policy_version 700594 (0.0013) [2024-06-15 19:52:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50790.4, 300 sec: 48096.8). Total num frames: 1434845184. Throughput: 0: 12189.1. Samples: 358744064. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:10,767][1648981] Avg episode reward: [(0, '923.150')] [2024-06-15 19:52:15,201][1651669] Updated weights for policy 0, policy_version 700625 (0.0011) [2024-06-15 19:52:15,770][1648981] Fps is (10 sec: 39306.8, 60 sec: 45872.4, 300 sec: 47429.7). Total num frames: 1434910720. Throughput: 0: 12434.9. Samples: 358824448. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:15,771][1648981] Avg episode reward: [(0, '898.510')] [2024-06-15 19:52:16,418][1651669] Updated weights for policy 0, policy_version 700679 (0.0013) [2024-06-15 19:52:17,610][1651669] Updated weights for policy 0, policy_version 700736 (0.0012) [2024-06-15 19:52:20,165][1651669] Updated weights for policy 0, policy_version 700837 (0.0107) [2024-06-15 19:52:20,771][1648981] Fps is (10 sec: 52403.5, 60 sec: 51878.5, 300 sec: 48207.0). Total num frames: 1435369472. Throughput: 0: 11854.4. Samples: 358878720. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:20,772][1648981] Avg episode reward: [(0, '919.520')] [2024-06-15 19:52:25,772][1648981] Fps is (10 sec: 45864.2, 60 sec: 46426.7, 300 sec: 47540.4). Total num frames: 1435369472. Throughput: 0: 12001.9. Samples: 358920192. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:25,773][1648981] Avg episode reward: [(0, '894.210')] [2024-06-15 19:52:26,272][1651669] Updated weights for policy 0, policy_version 700880 (0.0013) [2024-06-15 19:52:27,841][1651669] Updated weights for policy 0, policy_version 700944 (0.0103) [2024-06-15 19:52:29,471][1651669] Updated weights for policy 0, policy_version 701010 (0.0022) [2024-06-15 19:52:30,458][1651669] Updated weights for policy 0, policy_version 701056 (0.0011) [2024-06-15 19:52:30,766][1648981] Fps is (10 sec: 39340.9, 60 sec: 50244.3, 300 sec: 47881.1). Total num frames: 1435762688. Throughput: 0: 11753.3. Samples: 358989824. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:30,767][1648981] Avg episode reward: [(0, '893.620')] [2024-06-15 19:52:31,269][1651274] Signal inference workers to stop experience collection... (36750 times) [2024-06-15 19:52:31,317][1651669] InferenceWorker_p0-w0: stopping experience collection (36750 times) [2024-06-15 19:52:31,468][1651274] Signal inference workers to resume experience collection... (36750 times) [2024-06-15 19:52:31,469][1651669] InferenceWorker_p0-w0: resuming experience collection (36750 times) [2024-06-15 19:52:31,644][1651669] Updated weights for policy 0, policy_version 701119 (0.0020) [2024-06-15 19:52:35,766][1648981] Fps is (10 sec: 52461.1, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1435893760. Throughput: 0: 12026.4. Samples: 359070720. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:35,767][1648981] Avg episode reward: [(0, '885.990')] [2024-06-15 19:52:37,552][1651669] Updated weights for policy 0, policy_version 701168 (0.0019) [2024-06-15 19:52:39,160][1651669] Updated weights for policy 0, policy_version 701245 (0.0013) [2024-06-15 19:52:40,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1436221440. Throughput: 0: 11855.6. Samples: 359105024. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:40,767][1648981] Avg episode reward: [(0, '890.690')] [2024-06-15 19:52:41,777][1651669] Updated weights for policy 0, policy_version 701314 (0.0049) [2024-06-15 19:52:42,843][1651669] Updated weights for policy 0, policy_version 701372 (0.0010) [2024-06-15 19:52:45,767][1648981] Fps is (10 sec: 52425.5, 60 sec: 47513.1, 300 sec: 47985.6). Total num frames: 1436418048. Throughput: 0: 12037.5. Samples: 359176192. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:45,768][1648981] Avg episode reward: [(0, '860.240')] [2024-06-15 19:52:48,662][1651669] Updated weights for policy 0, policy_version 701431 (0.0017) [2024-06-15 19:52:50,271][1651669] Updated weights for policy 0, policy_version 701502 (0.0011) [2024-06-15 19:52:50,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 48060.0, 300 sec: 47763.5). Total num frames: 1436680192. Throughput: 0: 11889.8. Samples: 359243776. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:50,767][1648981] Avg episode reward: [(0, '846.110')] [2024-06-15 19:52:51,680][1651669] Updated weights for policy 0, policy_version 701548 (0.0016) [2024-06-15 19:52:53,021][1651669] Updated weights for policy 0, policy_version 701603 (0.0017) [2024-06-15 19:52:55,766][1648981] Fps is (10 sec: 52432.2, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1436942336. Throughput: 0: 11776.0. Samples: 359273984. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:52:55,767][1648981] Avg episode reward: [(0, '818.370')] [2024-06-15 19:52:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000701632_1436942336.pth... [2024-06-15 19:52:55,819][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000696000_1425408000.pth [2024-06-15 19:52:59,031][1651669] Updated weights for policy 0, policy_version 701664 (0.0010) [2024-06-15 19:53:00,720][1651669] Updated weights for policy 0, policy_version 701728 (0.0014) [2024-06-15 19:53:00,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1437138944. Throughput: 0: 11811.1. Samples: 359355904. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:00,767][1648981] Avg episode reward: [(0, '817.040')] [2024-06-15 19:53:01,305][1651669] Updated weights for policy 0, policy_version 701760 (0.0010) [2024-06-15 19:53:03,112][1651669] Updated weights for policy 0, policy_version 701824 (0.0012) [2024-06-15 19:53:04,120][1651669] Updated weights for policy 0, policy_version 701872 (0.0012) [2024-06-15 19:53:05,777][1648981] Fps is (10 sec: 52370.7, 60 sec: 49142.9, 300 sec: 47983.9). Total num frames: 1437466624. Throughput: 0: 12104.3. Samples: 359423488. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:05,778][1648981] Avg episode reward: [(0, '772.230')] [2024-06-15 19:53:10,162][1651669] Updated weights for policy 0, policy_version 701910 (0.0010) [2024-06-15 19:53:10,775][1648981] Fps is (10 sec: 39287.4, 60 sec: 44776.4, 300 sec: 47651.0). Total num frames: 1437532160. Throughput: 0: 12139.4. Samples: 359466496. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:10,776][1648981] Avg episode reward: [(0, '828.130')] [2024-06-15 19:53:11,904][1651669] Updated weights for policy 0, policy_version 701971 (0.0011) [2024-06-15 19:53:12,692][1651274] Signal inference workers to stop experience collection... (36800 times) [2024-06-15 19:53:12,750][1651669] InferenceWorker_p0-w0: stopping experience collection (36800 times) [2024-06-15 19:53:12,918][1651274] Signal inference workers to resume experience collection... (36800 times) [2024-06-15 19:53:12,919][1651669] InferenceWorker_p0-w0: resuming experience collection (36800 times) [2024-06-15 19:53:13,467][1651669] Updated weights for policy 0, policy_version 702034 (0.0017) [2024-06-15 19:53:14,877][1651669] Updated weights for policy 0, policy_version 702097 (0.0011) [2024-06-15 19:53:15,766][1648981] Fps is (10 sec: 52487.3, 60 sec: 51339.8, 300 sec: 47985.7). Total num frames: 1437990912. Throughput: 0: 11935.3. Samples: 359526912. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:15,767][1648981] Avg episode reward: [(0, '889.370')] [2024-06-15 19:53:20,778][1648981] Fps is (10 sec: 45863.9, 60 sec: 43686.0, 300 sec: 47983.9). Total num frames: 1437990912. Throughput: 0: 11841.3. Samples: 359603712. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:20,778][1648981] Avg episode reward: [(0, '898.840')] [2024-06-15 19:53:21,203][1651669] Updated weights for policy 0, policy_version 702160 (0.0011) [2024-06-15 19:53:23,283][1651669] Updated weights for policy 0, policy_version 702240 (0.0137) [2024-06-15 19:53:25,531][1651669] Updated weights for policy 0, policy_version 702327 (0.0012) [2024-06-15 19:53:25,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 50249.4, 300 sec: 47763.5). Total num frames: 1438384128. Throughput: 0: 11730.5. Samples: 359632896. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:25,767][1648981] Avg episode reward: [(0, '895.560')] [2024-06-15 19:53:26,477][1651669] Updated weights for policy 0, policy_version 702364 (0.0116) [2024-06-15 19:53:30,767][1648981] Fps is (10 sec: 52486.6, 60 sec: 45874.9, 300 sec: 47985.6). Total num frames: 1438515200. Throughput: 0: 11571.3. Samples: 359696896. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:30,768][1648981] Avg episode reward: [(0, '877.280')] [2024-06-15 19:53:33,284][1651669] Updated weights for policy 0, policy_version 702417 (0.0013) [2024-06-15 19:53:34,266][1651669] Updated weights for policy 0, policy_version 702456 (0.0010) [2024-06-15 19:53:35,378][1651669] Updated weights for policy 0, policy_version 702496 (0.0042) [2024-06-15 19:53:35,766][1648981] Fps is (10 sec: 32768.2, 60 sec: 46967.5, 300 sec: 47652.5). Total num frames: 1438711808. Throughput: 0: 11685.0. Samples: 359769600. Policy #0 lag: (min: 75.0, avg: 204.5, max: 348.0) [2024-06-15 19:53:35,767][1648981] Avg episode reward: [(0, '867.040')] [2024-06-15 19:53:37,622][1651669] Updated weights for policy 0, policy_version 702579 (0.0131) [2024-06-15 19:53:38,920][1651669] Updated weights for policy 0, policy_version 702646 (0.0014) [2024-06-15 19:53:40,768][1648981] Fps is (10 sec: 52420.1, 60 sec: 46966.0, 300 sec: 47985.4). Total num frames: 1439039488. Throughput: 0: 11570.7. Samples: 359794688. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:53:40,769][1648981] Avg episode reward: [(0, '851.100')] [2024-06-15 19:53:45,523][1651669] Updated weights for policy 0, policy_version 702704 (0.0012) [2024-06-15 19:53:45,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45329.6, 300 sec: 47652.5). Total num frames: 1439137792. Throughput: 0: 11571.2. Samples: 359876608. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:53:45,767][1648981] Avg episode reward: [(0, '845.470')] [2024-06-15 19:53:46,121][1651669] Updated weights for policy 0, policy_version 702723 (0.0011) [2024-06-15 19:53:47,955][1651669] Updated weights for policy 0, policy_version 702786 (0.0013) [2024-06-15 19:53:49,712][1651669] Updated weights for policy 0, policy_version 702857 (0.0014) [2024-06-15 19:53:49,911][1651274] Signal inference workers to stop experience collection... (36850 times) [2024-06-15 19:53:49,944][1651669] InferenceWorker_p0-w0: stopping experience collection (36850 times) [2024-06-15 19:53:50,142][1651274] Signal inference workers to resume experience collection... (36850 times) [2024-06-15 19:53:50,143][1651669] InferenceWorker_p0-w0: resuming experience collection (36850 times) [2024-06-15 19:53:50,771][1648981] Fps is (10 sec: 52417.5, 60 sec: 48056.4, 300 sec: 47985.0). Total num frames: 1439563776. Throughput: 0: 11288.5. Samples: 359931392. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:53:50,771][1648981] Avg episode reward: [(0, '850.030')] [2024-06-15 19:53:55,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 43690.7, 300 sec: 47765.6). Total num frames: 1439563776. Throughput: 0: 11163.8. Samples: 359968768. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:53:55,767][1648981] Avg episode reward: [(0, '926.580')] [2024-06-15 19:53:57,027][1651669] Updated weights for policy 0, policy_version 702947 (0.0012) [2024-06-15 19:53:59,288][1651669] Updated weights for policy 0, policy_version 703027 (0.0114) [2024-06-15 19:54:00,766][1648981] Fps is (10 sec: 36059.6, 60 sec: 46421.4, 300 sec: 47430.3). Total num frames: 1439924224. Throughput: 0: 11377.8. Samples: 360038912. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:00,767][1648981] Avg episode reward: [(0, '924.470')] [2024-06-15 19:54:00,872][1651669] Updated weights for policy 0, policy_version 703092 (0.0011) [2024-06-15 19:54:02,236][1651669] Updated weights for policy 0, policy_version 703165 (0.0013) [2024-06-15 19:54:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 43698.7, 300 sec: 47985.7). Total num frames: 1440088064. Throughput: 0: 11380.6. Samples: 360115712. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:05,767][1648981] Avg episode reward: [(0, '927.540')] [2024-06-15 19:54:08,329][1651669] Updated weights for policy 0, policy_version 703218 (0.0019) [2024-06-15 19:54:09,629][1651669] Updated weights for policy 0, policy_version 703269 (0.0011) [2024-06-15 19:54:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47520.5, 300 sec: 47430.3). Total num frames: 1440382976. Throughput: 0: 11741.9. Samples: 360161280. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:10,767][1648981] Avg episode reward: [(0, '912.900')] [2024-06-15 19:54:11,273][1651669] Updated weights for policy 0, policy_version 703332 (0.0011) [2024-06-15 19:54:12,983][1651669] Updated weights for policy 0, policy_version 703415 (0.0013) [2024-06-15 19:54:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 47985.7). Total num frames: 1440612352. Throughput: 0: 11696.4. Samples: 360223232. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:15,767][1648981] Avg episode reward: [(0, '931.760')] [2024-06-15 19:54:19,722][1651669] Updated weights for policy 0, policy_version 703472 (0.0011) [2024-06-15 19:54:20,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 46429.9, 300 sec: 47097.0). Total num frames: 1440776192. Throughput: 0: 11662.2. Samples: 360294400. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:20,768][1648981] Avg episode reward: [(0, '911.690')] [2024-06-15 19:54:21,720][1651669] Updated weights for policy 0, policy_version 703552 (0.0011) [2024-06-15 19:54:23,599][1651669] Updated weights for policy 0, policy_version 703616 (0.0021) [2024-06-15 19:54:24,687][1651669] Updated weights for policy 0, policy_version 703673 (0.0139) [2024-06-15 19:54:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1441136640. Throughput: 0: 11651.3. Samples: 360318976. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:25,767][1648981] Avg episode reward: [(0, '887.260')] [2024-06-15 19:54:30,774][1648981] Fps is (10 sec: 39291.7, 60 sec: 44231.3, 300 sec: 46984.7). Total num frames: 1441169408. Throughput: 0: 11614.7. Samples: 360399360. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:30,775][1648981] Avg episode reward: [(0, '908.180')] [2024-06-15 19:54:31,535][1651274] Signal inference workers to stop experience collection... (36900 times) [2024-06-15 19:54:31,648][1651669] InferenceWorker_p0-w0: stopping experience collection (36900 times) [2024-06-15 19:54:31,655][1651669] Updated weights for policy 0, policy_version 703737 (0.0125) [2024-06-15 19:54:31,761][1651274] Signal inference workers to resume experience collection... (36900 times) [2024-06-15 19:54:31,762][1651669] InferenceWorker_p0-w0: resuming experience collection (36900 times) [2024-06-15 19:54:33,882][1651669] Updated weights for policy 0, policy_version 703815 (0.0012) [2024-06-15 19:54:35,643][1651669] Updated weights for policy 0, policy_version 703890 (0.0011) [2024-06-15 19:54:35,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 1441562624. Throughput: 0: 11560.9. Samples: 360451584. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:35,767][1648981] Avg episode reward: [(0, '928.770')] [2024-06-15 19:54:36,443][1651669] Updated weights for policy 0, policy_version 703936 (0.0063) [2024-06-15 19:54:40,766][1648981] Fps is (10 sec: 49190.3, 60 sec: 43692.1, 300 sec: 47097.1). Total num frames: 1441660928. Throughput: 0: 11628.1. Samples: 360492032. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:40,767][1648981] Avg episode reward: [(0, '916.250')] [2024-06-15 19:54:43,539][1651669] Updated weights for policy 0, policy_version 704003 (0.0013) [2024-06-15 19:54:45,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 47513.4, 300 sec: 47319.2). Total num frames: 1441988608. Throughput: 0: 11650.8. Samples: 360563200. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:45,767][1648981] Avg episode reward: [(0, '919.480')] [2024-06-15 19:54:46,095][1651669] Updated weights for policy 0, policy_version 704112 (0.0011) [2024-06-15 19:54:47,346][1651669] Updated weights for policy 0, policy_version 704164 (0.0011) [2024-06-15 19:54:47,934][1651669] Updated weights for policy 0, policy_version 704192 (0.0012) [2024-06-15 19:54:50,770][1648981] Fps is (10 sec: 52408.8, 60 sec: 43690.9, 300 sec: 47318.6). Total num frames: 1442185216. Throughput: 0: 11467.8. Samples: 360631808. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:50,771][1648981] Avg episode reward: [(0, '918.880')] [2024-06-15 19:54:54,507][1651669] Updated weights for policy 0, policy_version 704256 (0.0254) [2024-06-15 19:54:55,724][1651669] Updated weights for policy 0, policy_version 704326 (0.0019) [2024-06-15 19:54:55,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 48059.5, 300 sec: 47097.0). Total num frames: 1442447360. Throughput: 0: 11366.3. Samples: 360672768. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:54:55,768][1648981] Avg episode reward: [(0, '919.130')] [2024-06-15 19:54:56,111][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000704352_1442512896.pth... [2024-06-15 19:54:56,236][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000698816_1431175168.pth [2024-06-15 19:54:57,353][1651669] Updated weights for policy 0, policy_version 704401 (0.0012) [2024-06-15 19:55:00,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1442709504. Throughput: 0: 11423.3. Samples: 360737280. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:55:00,767][1648981] Avg episode reward: [(0, '870.660')] [2024-06-15 19:55:03,669][1651669] Updated weights for policy 0, policy_version 704451 (0.0011) [2024-06-15 19:55:04,904][1651669] Updated weights for policy 0, policy_version 704512 (0.0012) [2024-06-15 19:55:05,766][1648981] Fps is (10 sec: 42599.8, 60 sec: 46421.3, 300 sec: 46763.8). Total num frames: 1442873344. Throughput: 0: 11594.0. Samples: 360816128. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:55:05,767][1648981] Avg episode reward: [(0, '846.160')] [2024-06-15 19:55:06,725][1651274] Signal inference workers to stop experience collection... (36950 times) [2024-06-15 19:55:06,803][1651669] InferenceWorker_p0-w0: stopping experience collection (36950 times) [2024-06-15 19:55:06,805][1651669] Updated weights for policy 0, policy_version 704581 (0.0013) [2024-06-15 19:55:06,897][1651274] Signal inference workers to resume experience collection... (36950 times) [2024-06-15 19:55:06,897][1651669] InferenceWorker_p0-w0: resuming experience collection (36950 times) [2024-06-15 19:55:07,914][1651669] Updated weights for policy 0, policy_version 704640 (0.0012) [2024-06-15 19:55:09,374][1651669] Updated weights for policy 0, policy_version 704704 (0.0013) [2024-06-15 19:55:10,800][1648981] Fps is (10 sec: 52253.6, 60 sec: 47487.1, 300 sec: 47647.0). Total num frames: 1443233792. Throughput: 0: 11710.4. Samples: 360846336. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:55:10,801][1648981] Avg episode reward: [(0, '826.930')] [2024-06-15 19:55:14,974][1651669] Updated weights for policy 0, policy_version 704755 (0.0017) [2024-06-15 19:55:15,766][1648981] Fps is (10 sec: 55705.7, 60 sec: 46967.5, 300 sec: 46986.0). Total num frames: 1443430400. Throughput: 0: 11823.6. Samples: 360931328. Policy #0 lag: (min: 157.0, avg: 243.7, max: 426.0) [2024-06-15 19:55:15,767][1648981] Avg episode reward: [(0, '837.020')] [2024-06-15 19:55:16,229][1651669] Updated weights for policy 0, policy_version 704823 (0.0011) [2024-06-15 19:55:17,632][1651669] Updated weights for policy 0, policy_version 704866 (0.0026) [2024-06-15 19:55:18,875][1651669] Updated weights for policy 0, policy_version 704931 (0.0012) [2024-06-15 19:55:20,767][1648981] Fps is (10 sec: 52603.2, 60 sec: 49698.0, 300 sec: 47874.5). Total num frames: 1443758080. Throughput: 0: 12356.1. Samples: 361007616. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:20,767][1648981] Avg episode reward: [(0, '894.890')] [2024-06-15 19:55:23,309][1651669] Updated weights for policy 0, policy_version 704964 (0.0011) [2024-06-15 19:55:25,073][1651669] Updated weights for policy 0, policy_version 705027 (0.0012) [2024-06-15 19:55:25,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 46967.4, 300 sec: 47208.1). Total num frames: 1443954688. Throughput: 0: 12561.0. Samples: 361057280. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:25,767][1648981] Avg episode reward: [(0, '923.810')] [2024-06-15 19:55:27,480][1651669] Updated weights for policy 0, policy_version 705121 (0.0011) [2024-06-15 19:55:29,208][1651669] Updated weights for policy 0, policy_version 705206 (0.0148) [2024-06-15 19:55:30,776][1648981] Fps is (10 sec: 52379.5, 60 sec: 51880.9, 300 sec: 47984.1). Total num frames: 1444282368. Throughput: 0: 12251.3. Samples: 361114624. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:30,777][1648981] Avg episode reward: [(0, '924.920')] [2024-06-15 19:55:34,612][1651669] Updated weights for policy 0, policy_version 705264 (0.0012) [2024-06-15 19:55:35,766][1648981] Fps is (10 sec: 49152.9, 60 sec: 48059.7, 300 sec: 47544.4). Total num frames: 1444446208. Throughput: 0: 12698.7. Samples: 361203200. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:35,767][1648981] Avg episode reward: [(0, '909.190')] [2024-06-15 19:55:36,026][1651669] Updated weights for policy 0, policy_version 705315 (0.0012) [2024-06-15 19:55:37,464][1651669] Updated weights for policy 0, policy_version 705376 (0.0011) [2024-06-15 19:55:38,378][1651669] Updated weights for policy 0, policy_version 705414 (0.0012) [2024-06-15 19:55:40,766][1648981] Fps is (10 sec: 52480.2, 60 sec: 52428.8, 300 sec: 47985.7). Total num frames: 1444806656. Throughput: 0: 12492.9. Samples: 361234944. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:40,767][1648981] Avg episode reward: [(0, '870.480')] [2024-06-15 19:55:44,344][1651274] Signal inference workers to stop experience collection... (37000 times) [2024-06-15 19:55:44,434][1651669] InferenceWorker_p0-w0: stopping experience collection (37000 times) [2024-06-15 19:55:44,584][1651274] Signal inference workers to resume experience collection... (37000 times) [2024-06-15 19:55:44,585][1651669] InferenceWorker_p0-w0: resuming experience collection (37000 times) [2024-06-15 19:55:44,691][1651669] Updated weights for policy 0, policy_version 705504 (0.0260) [2024-06-15 19:55:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.3, 300 sec: 47874.6). Total num frames: 1444970496. Throughput: 0: 12822.8. Samples: 361314304. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:45,767][1648981] Avg episode reward: [(0, '851.270')] [2024-06-15 19:55:46,463][1651669] Updated weights for policy 0, policy_version 705588 (0.0013) [2024-06-15 19:55:49,060][1651669] Updated weights for policy 0, policy_version 705648 (0.0012) [2024-06-15 19:55:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 52432.2, 300 sec: 47987.5). Total num frames: 1445330944. Throughput: 0: 12333.5. Samples: 361371136. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:50,767][1648981] Avg episode reward: [(0, '901.340')] [2024-06-15 19:55:55,389][1651669] Updated weights for policy 0, policy_version 705744 (0.0023) [2024-06-15 19:55:55,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48606.1, 300 sec: 47430.3). Total num frames: 1445363712. Throughput: 0: 12604.6. Samples: 361413120. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:55:55,767][1648981] Avg episode reward: [(0, '910.570')] [2024-06-15 19:55:57,509][1651669] Updated weights for policy 0, policy_version 705812 (0.0011) [2024-06-15 19:55:58,780][1651669] Updated weights for policy 0, policy_version 705858 (0.0014) [2024-06-15 19:56:00,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1445724160. Throughput: 0: 12310.7. Samples: 361485312. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:00,767][1648981] Avg episode reward: [(0, '906.680')] [2024-06-15 19:56:01,147][1651669] Updated weights for policy 0, policy_version 705952 (0.0014) [2024-06-15 19:56:05,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.2, 300 sec: 47652.5). Total num frames: 1445855232. Throughput: 0: 12345.0. Samples: 361563136. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:05,767][1648981] Avg episode reward: [(0, '931.130')] [2024-06-15 19:56:06,671][1651669] Updated weights for policy 0, policy_version 706008 (0.0012) [2024-06-15 19:56:08,083][1651669] Updated weights for policy 0, policy_version 706064 (0.0087) [2024-06-15 19:56:10,222][1651669] Updated weights for policy 0, policy_version 706128 (0.0147) [2024-06-15 19:56:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49179.5, 300 sec: 47541.4). Total num frames: 1446182912. Throughput: 0: 11992.2. Samples: 361596928. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:10,767][1648981] Avg episode reward: [(0, '901.710')] [2024-06-15 19:56:11,407][1651669] Updated weights for policy 0, policy_version 706179 (0.0024) [2024-06-15 19:56:15,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1446379520. Throughput: 0: 12245.2. Samples: 361665536. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:15,767][1648981] Avg episode reward: [(0, '881.660')] [2024-06-15 19:56:17,467][1651669] Updated weights for policy 0, policy_version 706262 (0.0059) [2024-06-15 19:56:19,847][1651669] Updated weights for policy 0, policy_version 706342 (0.0018) [2024-06-15 19:56:20,452][1651669] Updated weights for policy 0, policy_version 706368 (0.0012) [2024-06-15 19:56:20,774][1648981] Fps is (10 sec: 45838.7, 60 sec: 48053.7, 300 sec: 47653.3). Total num frames: 1446641664. Throughput: 0: 11899.0. Samples: 361738752. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:20,775][1648981] Avg episode reward: [(0, '886.230')] [2024-06-15 19:56:21,467][1651274] Signal inference workers to stop experience collection... (37050 times) [2024-06-15 19:56:21,577][1651669] InferenceWorker_p0-w0: stopping experience collection (37050 times) [2024-06-15 19:56:21,703][1651274] Signal inference workers to resume experience collection... (37050 times) [2024-06-15 19:56:21,703][1651669] InferenceWorker_p0-w0: resuming experience collection (37050 times) [2024-06-15 19:56:21,706][1651669] Updated weights for policy 0, policy_version 706416 (0.0010) [2024-06-15 19:56:23,019][1651669] Updated weights for policy 0, policy_version 706480 (0.0013) [2024-06-15 19:56:25,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1446903808. Throughput: 0: 11889.8. Samples: 361769984. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:25,767][1648981] Avg episode reward: [(0, '920.450')] [2024-06-15 19:56:29,148][1651669] Updated weights for policy 0, policy_version 706544 (0.0122) [2024-06-15 19:56:30,499][1651669] Updated weights for policy 0, policy_version 706594 (0.0069) [2024-06-15 19:56:30,767][1648981] Fps is (10 sec: 49189.8, 60 sec: 47521.2, 300 sec: 47541.3). Total num frames: 1447133184. Throughput: 0: 11866.9. Samples: 361848320. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:30,767][1648981] Avg episode reward: [(0, '959.210')] [2024-06-15 19:56:31,931][1651669] Updated weights for policy 0, policy_version 706640 (0.0014) [2024-06-15 19:56:33,019][1651669] Updated weights for policy 0, policy_version 706686 (0.0012) [2024-06-15 19:56:34,526][1651669] Updated weights for policy 0, policy_version 706747 (0.0011) [2024-06-15 19:56:35,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 49697.9, 300 sec: 47985.7). Total num frames: 1447428096. Throughput: 0: 12083.1. Samples: 361914880. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:35,767][1648981] Avg episode reward: [(0, '976.320')] [2024-06-15 19:56:40,606][1651669] Updated weights for policy 0, policy_version 706802 (0.0138) [2024-06-15 19:56:40,766][1648981] Fps is (10 sec: 39322.7, 60 sec: 45329.1, 300 sec: 47319.2). Total num frames: 1447526400. Throughput: 0: 12071.8. Samples: 361956352. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:40,767][1648981] Avg episode reward: [(0, '982.220')] [2024-06-15 19:56:42,299][1651669] Updated weights for policy 0, policy_version 706869 (0.0011) [2024-06-15 19:56:43,192][1651669] Updated weights for policy 0, policy_version 706898 (0.0018) [2024-06-15 19:56:44,881][1651669] Updated weights for policy 0, policy_version 706961 (0.0012) [2024-06-15 19:56:45,618][1651669] Updated weights for policy 0, policy_version 707008 (0.0011) [2024-06-15 19:56:45,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1447952384. Throughput: 0: 11901.2. Samples: 362020864. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:45,767][1648981] Avg episode reward: [(0, '991.840')] [2024-06-15 19:56:50,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 44236.6, 300 sec: 47208.1). Total num frames: 1447985152. Throughput: 0: 11935.2. Samples: 362100224. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:50,768][1648981] Avg episode reward: [(0, '1023.950')] [2024-06-15 19:56:51,054][1651274] Saving new best policy, reward=1023.950! [2024-06-15 19:56:52,183][1651669] Updated weights for policy 0, policy_version 707077 (0.0011) [2024-06-15 19:56:53,683][1651669] Updated weights for policy 0, policy_version 707136 (0.0013) [2024-06-15 19:56:54,992][1651669] Updated weights for policy 0, policy_version 707197 (0.0011) [2024-06-15 19:56:55,778][1648981] Fps is (10 sec: 42547.6, 60 sec: 50234.3, 300 sec: 47650.5). Total num frames: 1448378368. Throughput: 0: 11693.2. Samples: 362123264. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:56:55,779][1648981] Avg episode reward: [(0, '980.780')] [2024-06-15 19:56:56,265][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000707248_1448443904.pth... [2024-06-15 19:56:56,266][1651669] Updated weights for policy 0, policy_version 707248 (0.0022) [2024-06-15 19:56:56,296][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000701632_1436942336.pth [2024-06-15 19:57:00,766][1648981] Fps is (10 sec: 49153.3, 60 sec: 45875.2, 300 sec: 47319.2). Total num frames: 1448476672. Throughput: 0: 12014.9. Samples: 362206208. Policy #0 lag: (min: 111.0, avg: 240.0, max: 382.0) [2024-06-15 19:57:00,767][1648981] Avg episode reward: [(0, '968.150')] [2024-06-15 19:57:01,949][1651669] Updated weights for policy 0, policy_version 707282 (0.0015) [2024-06-15 19:57:03,122][1651669] Updated weights for policy 0, policy_version 707334 (0.0013) [2024-06-15 19:57:03,870][1651274] Signal inference workers to stop experience collection... (37100 times) [2024-06-15 19:57:03,915][1651669] InferenceWorker_p0-w0: stopping experience collection (37100 times) [2024-06-15 19:57:04,168][1651274] Signal inference workers to resume experience collection... (37100 times) [2024-06-15 19:57:04,174][1651669] InferenceWorker_p0-w0: resuming experience collection (37100 times) [2024-06-15 19:57:04,437][1651669] Updated weights for policy 0, policy_version 707386 (0.0013) [2024-06-15 19:57:05,766][1648981] Fps is (10 sec: 42649.0, 60 sec: 49151.9, 300 sec: 47319.2). Total num frames: 1448804352. Throughput: 0: 11891.9. Samples: 362273792. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:05,767][1648981] Avg episode reward: [(0, '955.150')] [2024-06-15 19:57:06,021][1651669] Updated weights for policy 0, policy_version 707446 (0.0013) [2024-06-15 19:57:07,241][1651669] Updated weights for policy 0, policy_version 707504 (0.0011) [2024-06-15 19:57:10,793][1648981] Fps is (10 sec: 52289.7, 60 sec: 46946.7, 300 sec: 47759.8). Total num frames: 1449000960. Throughput: 0: 12007.9. Samples: 362310656. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:10,794][1648981] Avg episode reward: [(0, '900.550')] [2024-06-15 19:57:13,241][1651669] Updated weights for policy 0, policy_version 707573 (0.0014) [2024-06-15 19:57:14,270][1651669] Updated weights for policy 0, policy_version 707617 (0.0035) [2024-06-15 19:57:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 47097.8). Total num frames: 1449263104. Throughput: 0: 11901.2. Samples: 362383872. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:15,767][1648981] Avg episode reward: [(0, '900.550')] [2024-06-15 19:57:16,225][1651669] Updated weights for policy 0, policy_version 707668 (0.0042) [2024-06-15 19:57:17,187][1651669] Updated weights for policy 0, policy_version 707728 (0.0013) [2024-06-15 19:57:18,183][1651669] Updated weights for policy 0, policy_version 707776 (0.0011) [2024-06-15 19:57:20,766][1648981] Fps is (10 sec: 52568.4, 60 sec: 48066.1, 300 sec: 47986.7). Total num frames: 1449525248. Throughput: 0: 12333.6. Samples: 362469888. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:20,767][1648981] Avg episode reward: [(0, '896.530')] [2024-06-15 19:57:22,769][1651669] Updated weights for policy 0, policy_version 707830 (0.0016) [2024-06-15 19:57:23,834][1651669] Updated weights for policy 0, policy_version 707856 (0.0011) [2024-06-15 19:57:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 47541.3). Total num frames: 1449787392. Throughput: 0: 12208.3. Samples: 362505728. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:25,767][1648981] Avg episode reward: [(0, '884.980')] [2024-06-15 19:57:25,902][1651669] Updated weights for policy 0, policy_version 707910 (0.0037) [2024-06-15 19:57:27,868][1651669] Updated weights for policy 0, policy_version 707987 (0.0013) [2024-06-15 19:57:30,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 48603.0, 300 sec: 47985.1). Total num frames: 1450049536. Throughput: 0: 12298.3. Samples: 362574336. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:30,771][1648981] Avg episode reward: [(0, '870.990')] [2024-06-15 19:57:32,066][1651669] Updated weights for policy 0, policy_version 708033 (0.0025) [2024-06-15 19:57:33,602][1651669] Updated weights for policy 0, policy_version 708097 (0.0012) [2024-06-15 19:57:35,016][1651669] Updated weights for policy 0, policy_version 708159 (0.0011) [2024-06-15 19:57:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.9, 300 sec: 47763.5). Total num frames: 1450311680. Throughput: 0: 12276.7. Samples: 362652672. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:35,767][1648981] Avg episode reward: [(0, '857.370')] [2024-06-15 19:57:37,194][1651669] Updated weights for policy 0, policy_version 708208 (0.0013) [2024-06-15 19:57:38,379][1651669] Updated weights for policy 0, policy_version 708257 (0.0011) [2024-06-15 19:57:40,767][1648981] Fps is (10 sec: 52448.2, 60 sec: 50790.3, 300 sec: 47985.8). Total num frames: 1450573824. Throughput: 0: 12598.5. Samples: 362690048. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:40,767][1648981] Avg episode reward: [(0, '864.900')] [2024-06-15 19:57:41,899][1651669] Updated weights for policy 0, policy_version 708290 (0.0012) [2024-06-15 19:57:42,337][1651274] Signal inference workers to stop experience collection... (37150 times) [2024-06-15 19:57:42,391][1651669] InferenceWorker_p0-w0: stopping experience collection (37150 times) [2024-06-15 19:57:42,596][1651274] Signal inference workers to resume experience collection... (37150 times) [2024-06-15 19:57:42,597][1651669] InferenceWorker_p0-w0: resuming experience collection (37150 times) [2024-06-15 19:57:43,521][1651669] Updated weights for policy 0, policy_version 708352 (0.0010) [2024-06-15 19:57:45,010][1651669] Updated weights for policy 0, policy_version 708415 (0.0011) [2024-06-15 19:57:45,767][1648981] Fps is (10 sec: 52424.1, 60 sec: 48059.0, 300 sec: 47985.5). Total num frames: 1450835968. Throughput: 0: 12538.0. Samples: 362770432. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:45,768][1648981] Avg episode reward: [(0, '886.870')] [2024-06-15 19:57:48,223][1651669] Updated weights for policy 0, policy_version 708475 (0.0107) [2024-06-15 19:57:49,105][1651669] Updated weights for policy 0, policy_version 708516 (0.0011) [2024-06-15 19:57:50,769][1648981] Fps is (10 sec: 52413.0, 60 sec: 51880.2, 300 sec: 47985.2). Total num frames: 1451098112. Throughput: 0: 12753.6. Samples: 362847744. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:50,770][1648981] Avg episode reward: [(0, '881.620')] [2024-06-15 19:57:53,308][1651669] Updated weights for policy 0, policy_version 708580 (0.0011) [2024-06-15 19:57:54,902][1651669] Updated weights for policy 0, policy_version 708640 (0.0014) [2024-06-15 19:57:55,766][1648981] Fps is (10 sec: 52433.3, 60 sec: 49708.0, 300 sec: 48207.8). Total num frames: 1451360256. Throughput: 0: 12944.2. Samples: 362892800. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:57:55,767][1648981] Avg episode reward: [(0, '846.010')] [2024-06-15 19:57:57,653][1651669] Updated weights for policy 0, policy_version 708707 (0.0039) [2024-06-15 19:57:59,081][1651669] Updated weights for policy 0, policy_version 708768 (0.0014) [2024-06-15 19:58:00,783][1648981] Fps is (10 sec: 52358.2, 60 sec: 52414.3, 300 sec: 47984.8). Total num frames: 1451622400. Throughput: 0: 12533.7. Samples: 362948096. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:00,783][1648981] Avg episode reward: [(0, '822.280')] [2024-06-15 19:58:03,977][1651669] Updated weights for policy 0, policy_version 708821 (0.0020) [2024-06-15 19:58:05,666][1651669] Updated weights for policy 0, policy_version 708896 (0.0113) [2024-06-15 19:58:05,771][1648981] Fps is (10 sec: 45856.5, 60 sec: 50240.9, 300 sec: 48430.8). Total num frames: 1451819008. Throughput: 0: 12446.2. Samples: 363030016. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:05,771][1648981] Avg episode reward: [(0, '802.150')] [2024-06-15 19:58:06,407][1651669] Updated weights for policy 0, policy_version 708927 (0.0012) [2024-06-15 19:58:09,422][1651669] Updated weights for policy 0, policy_version 708977 (0.0104) [2024-06-15 19:58:10,766][1648981] Fps is (10 sec: 49233.6, 60 sec: 51905.6, 300 sec: 47874.6). Total num frames: 1452113920. Throughput: 0: 12492.8. Samples: 363067904. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:10,767][1648981] Avg episode reward: [(0, '773.270')] [2024-06-15 19:58:10,922][1651669] Updated weights for policy 0, policy_version 709048 (0.0104) [2024-06-15 19:58:15,354][1651669] Updated weights for policy 0, policy_version 709104 (0.0013) [2024-06-15 19:58:15,771][1648981] Fps is (10 sec: 45875.4, 60 sec: 50240.9, 300 sec: 48431.2). Total num frames: 1452277760. Throughput: 0: 12674.8. Samples: 363144704. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:15,771][1648981] Avg episode reward: [(0, '751.280')] [2024-06-15 19:58:20,001][1651669] Updated weights for policy 0, policy_version 709216 (0.0013) [2024-06-15 19:58:20,116][1651274] Signal inference workers to stop experience collection... (37200 times) [2024-06-15 19:58:20,188][1651669] InferenceWorker_p0-w0: stopping experience collection (37200 times) [2024-06-15 19:58:20,423][1651274] Signal inference workers to resume experience collection... (37200 times) [2024-06-15 19:58:20,424][1651669] InferenceWorker_p0-w0: resuming experience collection (37200 times) [2024-06-15 19:58:20,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 1452507136. Throughput: 0: 12310.8. Samples: 363206656. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:20,767][1648981] Avg episode reward: [(0, '722.800')] [2024-06-15 19:58:22,009][1651669] Updated weights for policy 0, policy_version 709296 (0.0178) [2024-06-15 19:58:25,767][1648981] Fps is (10 sec: 39337.2, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1452670976. Throughput: 0: 12162.8. Samples: 363237376. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:25,767][1648981] Avg episode reward: [(0, '731.780')] [2024-06-15 19:58:25,825][1651669] Updated weights for policy 0, policy_version 709328 (0.0011) [2024-06-15 19:58:27,501][1651669] Updated weights for policy 0, policy_version 709394 (0.0015) [2024-06-15 19:58:28,221][1651669] Updated weights for policy 0, policy_version 709439 (0.0010) [2024-06-15 19:58:30,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48062.7, 300 sec: 48207.8). Total num frames: 1452933120. Throughput: 0: 12128.9. Samples: 363316224. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:30,767][1648981] Avg episode reward: [(0, '739.290')] [2024-06-15 19:58:32,227][1651669] Updated weights for policy 0, policy_version 709505 (0.0012) [2024-06-15 19:58:33,783][1651669] Updated weights for policy 0, policy_version 709563 (0.0012) [2024-06-15 19:58:35,767][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.6, 300 sec: 47986.0). Total num frames: 1453195264. Throughput: 0: 11981.6. Samples: 363386880. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:35,767][1648981] Avg episode reward: [(0, '765.550')] [2024-06-15 19:58:36,936][1651669] Updated weights for policy 0, policy_version 709618 (0.0012) [2024-06-15 19:58:40,774][1648981] Fps is (10 sec: 52387.8, 60 sec: 48053.5, 300 sec: 48539.8). Total num frames: 1453457408. Throughput: 0: 11603.3. Samples: 363415040. Policy #0 lag: (min: 3.0, avg: 66.0, max: 259.0) [2024-06-15 19:58:40,775][1648981] Avg episode reward: [(0, '778.880')] [2024-06-15 19:58:42,398][1651669] Updated weights for policy 0, policy_version 709699 (0.0044) [2024-06-15 19:58:44,822][1651669] Updated weights for policy 0, policy_version 709792 (0.0110) [2024-06-15 19:58:45,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48060.5, 300 sec: 47986.4). Total num frames: 1453719552. Throughput: 0: 11939.7. Samples: 363485184. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:58:45,767][1651669] Updated weights for policy 0, policy_version 709824 (0.0013) [2024-06-15 19:58:45,767][1648981] Avg episode reward: [(0, '757.970')] [2024-06-15 19:58:48,335][1651669] Updated weights for policy 0, policy_version 709890 (0.0084) [2024-06-15 19:58:50,766][1648981] Fps is (10 sec: 52469.9, 60 sec: 48062.2, 300 sec: 48874.3). Total num frames: 1453981696. Throughput: 0: 11742.9. Samples: 363558400. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:58:50,767][1648981] Avg episode reward: [(0, '791.050')] [2024-06-15 19:58:54,384][1651669] Updated weights for policy 0, policy_version 709968 (0.0045) [2024-06-15 19:58:55,774][1648981] Fps is (10 sec: 39290.8, 60 sec: 45869.3, 300 sec: 48095.5). Total num frames: 1454112768. Throughput: 0: 11830.8. Samples: 363600384. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:58:55,775][1648981] Avg episode reward: [(0, '784.040')] [2024-06-15 19:58:55,931][1651669] Updated weights for policy 0, policy_version 710020 (0.0012) [2024-06-15 19:58:56,150][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000710032_1454145536.pth... [2024-06-15 19:58:56,316][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000704352_1442512896.pth [2024-06-15 19:58:57,384][1651669] Updated weights for policy 0, policy_version 710077 (0.0010) [2024-06-15 19:58:58,434][1651274] Signal inference workers to stop experience collection... (37250 times) [2024-06-15 19:58:58,533][1651669] InferenceWorker_p0-w0: stopping experience collection (37250 times) [2024-06-15 19:58:58,718][1651274] Signal inference workers to resume experience collection... (37250 times) [2024-06-15 19:58:58,719][1651669] InferenceWorker_p0-w0: resuming experience collection (37250 times) [2024-06-15 19:58:58,721][1651669] Updated weights for policy 0, policy_version 710128 (0.0011) [2024-06-15 19:59:00,076][1651669] Updated weights for policy 0, policy_version 710202 (0.0012) [2024-06-15 19:59:00,770][1648981] Fps is (10 sec: 52409.2, 60 sec: 48070.0, 300 sec: 48873.7). Total num frames: 1454505984. Throughput: 0: 11480.3. Samples: 363661312. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:00,771][1648981] Avg episode reward: [(0, '822.490')] [2024-06-15 19:59:05,766][1648981] Fps is (10 sec: 42631.8, 60 sec: 45332.2, 300 sec: 47985.7). Total num frames: 1454538752. Throughput: 0: 11889.8. Samples: 363741696. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:05,767][1648981] Avg episode reward: [(0, '778.560')] [2024-06-15 19:59:06,089][1651669] Updated weights for policy 0, policy_version 710242 (0.0013) [2024-06-15 19:59:07,874][1651669] Updated weights for policy 0, policy_version 710307 (0.0012) [2024-06-15 19:59:09,388][1651669] Updated weights for policy 0, policy_version 710353 (0.0037) [2024-06-15 19:59:10,767][1648981] Fps is (10 sec: 42611.6, 60 sec: 46966.9, 300 sec: 48541.0). Total num frames: 1454931968. Throughput: 0: 11912.4. Samples: 363773440. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:10,767][1648981] Avg episode reward: [(0, '801.170')] [2024-06-15 19:59:10,818][1651669] Updated weights for policy 0, policy_version 710417 (0.0011) [2024-06-15 19:59:11,508][1651669] Updated weights for policy 0, policy_version 710464 (0.0011) [2024-06-15 19:59:15,767][1648981] Fps is (10 sec: 49149.9, 60 sec: 45878.0, 300 sec: 48318.9). Total num frames: 1455030272. Throughput: 0: 11980.7. Samples: 363855360. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:15,769][1648981] Avg episode reward: [(0, '840.940')] [2024-06-15 19:59:16,995][1651669] Updated weights for policy 0, policy_version 710523 (0.0013) [2024-06-15 19:59:18,681][1651669] Updated weights for policy 0, policy_version 710576 (0.0012) [2024-06-15 19:59:19,900][1651669] Updated weights for policy 0, policy_version 710624 (0.0010) [2024-06-15 19:59:20,766][1648981] Fps is (10 sec: 49155.5, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1455423488. Throughput: 0: 11753.3. Samples: 363915776. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:20,767][1648981] Avg episode reward: [(0, '878.250')] [2024-06-15 19:59:21,600][1651669] Updated weights for policy 0, policy_version 710693 (0.0011) [2024-06-15 19:59:25,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 48059.8, 300 sec: 48764.5). Total num frames: 1455554560. Throughput: 0: 12051.2. Samples: 363957248. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:25,767][1648981] Avg episode reward: [(0, '849.530')] [2024-06-15 19:59:27,325][1651669] Updated weights for policy 0, policy_version 710740 (0.0011) [2024-06-15 19:59:28,602][1651669] Updated weights for policy 0, policy_version 710788 (0.0013) [2024-06-15 19:59:30,770][1648981] Fps is (10 sec: 45858.4, 60 sec: 49149.1, 300 sec: 48540.5). Total num frames: 1455882240. Throughput: 0: 12048.1. Samples: 364027392. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:30,771][1648981] Avg episode reward: [(0, '845.690')] [2024-06-15 19:59:30,835][1651669] Updated weights for policy 0, policy_version 710882 (0.0014) [2024-06-15 19:59:32,303][1651669] Updated weights for policy 0, policy_version 710945 (0.0009) [2024-06-15 19:59:35,802][1648981] Fps is (10 sec: 52240.9, 60 sec: 48031.0, 300 sec: 48868.3). Total num frames: 1456078848. Throughput: 0: 12130.4. Samples: 364104704. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:35,803][1648981] Avg episode reward: [(0, '895.010')] [2024-06-15 19:59:37,769][1651669] Updated weights for policy 0, policy_version 710992 (0.0011) [2024-06-15 19:59:39,572][1651274] Signal inference workers to stop experience collection... (37300 times) [2024-06-15 19:59:39,609][1651669] InferenceWorker_p0-w0: stopping experience collection (37300 times) [2024-06-15 19:59:39,845][1651274] Signal inference workers to resume experience collection... (37300 times) [2024-06-15 19:59:39,846][1651669] InferenceWorker_p0-w0: resuming experience collection (37300 times) [2024-06-15 19:59:40,230][1651669] Updated weights for policy 0, policy_version 711072 (0.0111) [2024-06-15 19:59:40,766][1648981] Fps is (10 sec: 42613.8, 60 sec: 47519.8, 300 sec: 48541.1). Total num frames: 1456308224. Throughput: 0: 12051.2. Samples: 364142592. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:40,767][1648981] Avg episode reward: [(0, '887.400')] [2024-06-15 19:59:41,819][1651669] Updated weights for policy 0, policy_version 711136 (0.0014) [2024-06-15 19:59:43,573][1651669] Updated weights for policy 0, policy_version 711201 (0.0014) [2024-06-15 19:59:45,766][1648981] Fps is (10 sec: 52618.3, 60 sec: 48059.7, 300 sec: 48874.9). Total num frames: 1456603136. Throughput: 0: 11970.4. Samples: 364199936. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:45,767][1648981] Avg episode reward: [(0, '911.000')] [2024-06-15 19:59:49,000][1651669] Updated weights for policy 0, policy_version 711236 (0.0014) [2024-06-15 19:59:50,419][1651669] Updated weights for policy 0, policy_version 711286 (0.0012) [2024-06-15 19:59:50,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1456734208. Throughput: 0: 12151.5. Samples: 364288512. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:50,767][1648981] Avg episode reward: [(0, '935.490')] [2024-06-15 19:59:52,334][1651669] Updated weights for policy 0, policy_version 711376 (0.0012) [2024-06-15 19:59:54,647][1651669] Updated weights for policy 0, policy_version 711456 (0.0014) [2024-06-15 19:59:55,407][1651669] Updated weights for policy 0, policy_version 711487 (0.0012) [2024-06-15 19:59:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50250.8, 300 sec: 48874.3). Total num frames: 1457127424. Throughput: 0: 12026.5. Samples: 364314624. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 19:59:55,767][1648981] Avg episode reward: [(0, '936.370')] [2024-06-15 20:00:00,767][1648981] Fps is (10 sec: 45872.4, 60 sec: 44785.3, 300 sec: 48541.0). Total num frames: 1457192960. Throughput: 0: 12049.0. Samples: 364397568. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 20:00:00,770][1648981] Avg episode reward: [(0, '905.680')] [2024-06-15 20:00:02,072][1651669] Updated weights for policy 0, policy_version 711569 (0.0016) [2024-06-15 20:00:03,661][1651669] Updated weights for policy 0, policy_version 711634 (0.0015) [2024-06-15 20:00:05,429][1651669] Updated weights for policy 0, policy_version 711702 (0.0011) [2024-06-15 20:00:05,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 50790.4, 300 sec: 48657.7). Total num frames: 1457586176. Throughput: 0: 11832.9. Samples: 364448256. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 20:00:05,767][1648981] Avg episode reward: [(0, '897.310')] [2024-06-15 20:00:10,766][1648981] Fps is (10 sec: 45877.8, 60 sec: 45329.6, 300 sec: 48207.8). Total num frames: 1457651712. Throughput: 0: 11844.3. Samples: 364490240. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 20:00:10,767][1648981] Avg episode reward: [(0, '872.730')] [2024-06-15 20:00:11,937][1651669] Updated weights for policy 0, policy_version 711766 (0.0011) [2024-06-15 20:00:13,957][1651669] Updated weights for policy 0, policy_version 711840 (0.0182) [2024-06-15 20:00:15,706][1651274] Signal inference workers to stop experience collection... (37350 times) [2024-06-15 20:00:15,766][1648981] Fps is (10 sec: 36044.8, 60 sec: 48606.2, 300 sec: 48096.8). Total num frames: 1457946624. Throughput: 0: 11924.9. Samples: 364563968. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 20:00:15,767][1648981] Avg episode reward: [(0, '922.360')] [2024-06-15 20:00:15,769][1651669] Updated weights for policy 0, policy_version 711889 (0.0012) [2024-06-15 20:00:15,794][1651669] InferenceWorker_p0-w0: stopping experience collection (37350 times) [2024-06-15 20:00:16,008][1651274] Signal inference workers to resume experience collection... (37350 times) [2024-06-15 20:00:16,009][1651669] InferenceWorker_p0-w0: resuming experience collection (37350 times) [2024-06-15 20:00:17,398][1651669] Updated weights for policy 0, policy_version 711955 (0.0031) [2024-06-15 20:00:20,782][1648981] Fps is (10 sec: 52346.0, 60 sec: 45863.1, 300 sec: 48205.3). Total num frames: 1458176000. Throughput: 0: 11792.7. Samples: 364635136. Policy #0 lag: (min: 15.0, avg: 100.8, max: 271.0) [2024-06-15 20:00:20,783][1648981] Avg episode reward: [(0, '922.370')] [2024-06-15 20:00:22,608][1651669] Updated weights for policy 0, policy_version 712002 (0.0014) [2024-06-15 20:00:24,489][1651669] Updated weights for policy 0, policy_version 712081 (0.0012) [2024-06-15 20:00:25,767][1648981] Fps is (10 sec: 49147.4, 60 sec: 48059.0, 300 sec: 47987.1). Total num frames: 1458438144. Throughput: 0: 11900.9. Samples: 364678144. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:25,768][1648981] Avg episode reward: [(0, '914.700')] [2024-06-15 20:00:26,052][1651669] Updated weights for policy 0, policy_version 712150 (0.0012) [2024-06-15 20:00:27,565][1651669] Updated weights for policy 0, policy_version 712208 (0.0013) [2024-06-15 20:00:30,766][1648981] Fps is (10 sec: 52512.0, 60 sec: 46970.3, 300 sec: 48318.9). Total num frames: 1458700288. Throughput: 0: 11844.3. Samples: 364732928. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:30,767][1648981] Avg episode reward: [(0, '892.530')] [2024-06-15 20:00:33,731][1651669] Updated weights for policy 0, policy_version 712272 (0.0123) [2024-06-15 20:00:35,258][1651669] Updated weights for policy 0, policy_version 712340 (0.0011) [2024-06-15 20:00:35,766][1648981] Fps is (10 sec: 45879.6, 60 sec: 46995.7, 300 sec: 47763.5). Total num frames: 1458896896. Throughput: 0: 11764.6. Samples: 364817920. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:35,767][1648981] Avg episode reward: [(0, '907.450')] [2024-06-15 20:00:37,767][1651669] Updated weights for policy 0, policy_version 712433 (0.0012) [2024-06-15 20:00:39,013][1651669] Updated weights for policy 0, policy_version 712483 (0.0011) [2024-06-15 20:00:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1459224576. Throughput: 0: 11673.6. Samples: 364839936. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:40,767][1648981] Avg episode reward: [(0, '891.520')] [2024-06-15 20:00:45,421][1651669] Updated weights for policy 0, policy_version 712560 (0.0016) [2024-06-15 20:00:45,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 45329.0, 300 sec: 47430.3). Total num frames: 1459322880. Throughput: 0: 11833.0. Samples: 364930048. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:45,767][1648981] Avg episode reward: [(0, '875.900')] [2024-06-15 20:00:46,929][1651669] Updated weights for policy 0, policy_version 712624 (0.0011) [2024-06-15 20:00:48,020][1651669] Updated weights for policy 0, policy_version 712672 (0.0011) [2024-06-15 20:00:49,649][1651669] Updated weights for policy 0, policy_version 712736 (0.0015) [2024-06-15 20:00:49,769][1651274] Signal inference workers to stop experience collection... (37400 times) [2024-06-15 20:00:49,851][1651669] InferenceWorker_p0-w0: stopping experience collection (37400 times) [2024-06-15 20:00:49,962][1651274] Signal inference workers to resume experience collection... (37400 times) [2024-06-15 20:00:49,963][1651669] InferenceWorker_p0-w0: resuming experience collection (37400 times) [2024-06-15 20:00:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1459748864. Throughput: 0: 11878.4. Samples: 364982784. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:50,767][1648981] Avg episode reward: [(0, '912.590')] [2024-06-15 20:00:55,035][1651669] Updated weights for policy 0, policy_version 712769 (0.0025) [2024-06-15 20:00:55,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 44236.7, 300 sec: 47652.4). Total num frames: 1459781632. Throughput: 0: 12060.4. Samples: 365032960. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:00:55,767][1648981] Avg episode reward: [(0, '903.130')] [2024-06-15 20:00:56,364][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000712816_1459847168.pth... [2024-06-15 20:00:56,528][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000707248_1448443904.pth [2024-06-15 20:00:57,140][1651669] Updated weights for policy 0, policy_version 712848 (0.0012) [2024-06-15 20:01:00,028][1651669] Updated weights for policy 0, policy_version 712960 (0.0014) [2024-06-15 20:01:00,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49698.6, 300 sec: 48541.1). Total num frames: 1460174848. Throughput: 0: 11764.6. Samples: 365093376. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:00,767][1648981] Avg episode reward: [(0, '866.600')] [2024-06-15 20:01:05,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 44782.9, 300 sec: 47763.5). Total num frames: 1460273152. Throughput: 0: 11780.1. Samples: 365165056. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:05,767][1648981] Avg episode reward: [(0, '893.140')] [2024-06-15 20:01:06,837][1651669] Updated weights for policy 0, policy_version 713027 (0.0012) [2024-06-15 20:01:08,320][1651669] Updated weights for policy 0, policy_version 713088 (0.0013) [2024-06-15 20:01:10,130][1651669] Updated weights for policy 0, policy_version 713155 (0.0012) [2024-06-15 20:01:10,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 49151.9, 300 sec: 48207.8). Total num frames: 1460600832. Throughput: 0: 11651.0. Samples: 365202432. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:10,767][1648981] Avg episode reward: [(0, '865.710')] [2024-06-15 20:01:11,408][1651669] Updated weights for policy 0, policy_version 713216 (0.0141) [2024-06-15 20:01:13,027][1651669] Updated weights for policy 0, policy_version 713272 (0.0020) [2024-06-15 20:01:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 47987.0). Total num frames: 1460797440. Throughput: 0: 11821.5. Samples: 365264896. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:15,767][1648981] Avg episode reward: [(0, '840.600')] [2024-06-15 20:01:18,448][1651669] Updated weights for policy 0, policy_version 713315 (0.0013) [2024-06-15 20:01:19,671][1651669] Updated weights for policy 0, policy_version 713363 (0.0011) [2024-06-15 20:01:20,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 48072.3, 300 sec: 47985.7). Total num frames: 1461059584. Throughput: 0: 11730.4. Samples: 365345792. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:20,767][1648981] Avg episode reward: [(0, '853.560')] [2024-06-15 20:01:21,250][1651669] Updated weights for policy 0, policy_version 713426 (0.0013) [2024-06-15 20:01:22,769][1651669] Updated weights for policy 0, policy_version 713491 (0.0013) [2024-06-15 20:01:25,767][1648981] Fps is (10 sec: 52423.7, 60 sec: 48059.7, 300 sec: 48096.7). Total num frames: 1461321728. Throughput: 0: 11753.0. Samples: 365368832. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:25,768][1648981] Avg episode reward: [(0, '850.950')] [2024-06-15 20:01:28,519][1651669] Updated weights for policy 0, policy_version 713537 (0.0012) [2024-06-15 20:01:30,375][1651669] Updated weights for policy 0, policy_version 713616 (0.0102) [2024-06-15 20:01:30,474][1651274] Signal inference workers to stop experience collection... (37450 times) [2024-06-15 20:01:30,549][1651669] InferenceWorker_p0-w0: stopping experience collection (37450 times) [2024-06-15 20:01:30,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 46421.4, 300 sec: 47652.5). Total num frames: 1461485568. Throughput: 0: 11696.4. Samples: 365456384. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:30,767][1648981] Avg episode reward: [(0, '888.070')] [2024-06-15 20:01:30,781][1651274] Signal inference workers to resume experience collection... (37450 times) [2024-06-15 20:01:30,781][1651669] InferenceWorker_p0-w0: resuming experience collection (37450 times) [2024-06-15 20:01:31,923][1651669] Updated weights for policy 0, policy_version 713667 (0.0012) [2024-06-15 20:01:34,153][1651669] Updated weights for policy 0, policy_version 713763 (0.0012) [2024-06-15 20:01:34,793][1651669] Updated weights for policy 0, policy_version 713792 (0.0021) [2024-06-15 20:01:35,766][1648981] Fps is (10 sec: 52433.5, 60 sec: 49151.9, 300 sec: 48541.1). Total num frames: 1461846016. Throughput: 0: 11787.4. Samples: 365513216. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:35,767][1648981] Avg episode reward: [(0, '881.890')] [2024-06-15 20:01:40,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 44782.9, 300 sec: 47319.2). Total num frames: 1461911552. Throughput: 0: 11730.5. Samples: 365560832. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:40,768][1648981] Avg episode reward: [(0, '847.120')] [2024-06-15 20:01:40,774][1651669] Updated weights for policy 0, policy_version 713840 (0.0013) [2024-06-15 20:01:43,191][1651669] Updated weights for policy 0, policy_version 713921 (0.0250) [2024-06-15 20:01:44,536][1651669] Updated weights for policy 0, policy_version 713984 (0.0011) [2024-06-15 20:01:45,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50790.5, 300 sec: 48763.3). Total num frames: 1462370304. Throughput: 0: 11685.0. Samples: 365619200. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:45,767][1648981] Avg episode reward: [(0, '859.110')] [2024-06-15 20:01:50,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 43690.7, 300 sec: 47432.2). Total num frames: 1462370304. Throughput: 0: 12083.2. Samples: 365708800. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:50,767][1648981] Avg episode reward: [(0, '964.120')] [2024-06-15 20:01:50,828][1651669] Updated weights for policy 0, policy_version 714064 (0.0012) [2024-06-15 20:01:52,295][1651669] Updated weights for policy 0, policy_version 714114 (0.0011) [2024-06-15 20:01:53,545][1651669] Updated weights for policy 0, policy_version 714175 (0.0012) [2024-06-15 20:01:55,254][1651669] Updated weights for policy 0, policy_version 714230 (0.0011) [2024-06-15 20:01:55,778][1648981] Fps is (10 sec: 39274.8, 60 sec: 49688.5, 300 sec: 48428.0). Total num frames: 1462763520. Throughput: 0: 11863.9. Samples: 365736448. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:01:55,779][1648981] Avg episode reward: [(0, '955.280')] [2024-06-15 20:01:56,588][1651669] Updated weights for policy 0, policy_version 714293 (0.0014) [2024-06-15 20:02:00,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 45329.0, 300 sec: 47763.5). Total num frames: 1462894592. Throughput: 0: 12014.9. Samples: 365805568. Policy #0 lag: (min: 59.0, avg: 121.5, max: 315.0) [2024-06-15 20:02:00,767][1648981] Avg episode reward: [(0, '915.930')] [2024-06-15 20:02:01,847][1651669] Updated weights for policy 0, policy_version 714336 (0.0014) [2024-06-15 20:02:03,641][1651669] Updated weights for policy 0, policy_version 714403 (0.0014) [2024-06-15 20:02:04,949][1651669] Updated weights for policy 0, policy_version 714433 (0.0012) [2024-06-15 20:02:05,766][1648981] Fps is (10 sec: 45929.8, 60 sec: 49152.0, 300 sec: 48212.2). Total num frames: 1463222272. Throughput: 0: 11867.1. Samples: 365879808. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:05,767][1648981] Avg episode reward: [(0, '904.070')] [2024-06-15 20:02:06,983][1651274] Signal inference workers to stop experience collection... (37500 times) [2024-06-15 20:02:07,067][1651669] InferenceWorker_p0-w0: stopping experience collection (37500 times) [2024-06-15 20:02:07,085][1651669] Updated weights for policy 0, policy_version 714517 (0.0022) [2024-06-15 20:02:07,260][1651274] Signal inference workers to resume experience collection... (37500 times) [2024-06-15 20:02:07,260][1651669] InferenceWorker_p0-w0: resuming experience collection (37500 times) [2024-06-15 20:02:07,869][1651669] Updated weights for policy 0, policy_version 714558 (0.0013) [2024-06-15 20:02:10,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 46967.6, 300 sec: 47985.7). Total num frames: 1463418880. Throughput: 0: 12003.8. Samples: 365908992. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:10,767][1648981] Avg episode reward: [(0, '911.610')] [2024-06-15 20:02:13,876][1651669] Updated weights for policy 0, policy_version 714624 (0.0094) [2024-06-15 20:02:15,175][1651669] Updated weights for policy 0, policy_version 714678 (0.0012) [2024-06-15 20:02:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1463681024. Throughput: 0: 11810.1. Samples: 365987840. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:15,767][1648981] Avg episode reward: [(0, '916.670')] [2024-06-15 20:02:16,090][1651669] Updated weights for policy 0, policy_version 714708 (0.0012) [2024-06-15 20:02:17,676][1651669] Updated weights for policy 0, policy_version 714770 (0.0011) [2024-06-15 20:02:20,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1463943168. Throughput: 0: 12094.5. Samples: 366057472. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:20,768][1648981] Avg episode reward: [(0, '938.880')] [2024-06-15 20:02:23,590][1651669] Updated weights for policy 0, policy_version 714832 (0.0021) [2024-06-15 20:02:24,552][1651669] Updated weights for policy 0, policy_version 714873 (0.0019) [2024-06-15 20:02:25,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46968.2, 300 sec: 47764.1). Total num frames: 1464139776. Throughput: 0: 12015.0. Samples: 366101504. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:25,767][1648981] Avg episode reward: [(0, '994.460')] [2024-06-15 20:02:26,249][1651669] Updated weights for policy 0, policy_version 714928 (0.0012) [2024-06-15 20:02:28,419][1651669] Updated weights for policy 0, policy_version 715008 (0.0012) [2024-06-15 20:02:29,964][1651669] Updated weights for policy 0, policy_version 715072 (0.0120) [2024-06-15 20:02:30,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 49698.0, 300 sec: 47985.7). Total num frames: 1464467456. Throughput: 0: 11901.1. Samples: 366154752. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:30,767][1648981] Avg episode reward: [(0, '1034.350')] [2024-06-15 20:02:30,768][1651274] Saving new best policy, reward=1034.350! [2024-06-15 20:02:35,767][1648981] Fps is (10 sec: 36043.3, 60 sec: 44236.5, 300 sec: 47208.1). Total num frames: 1464500224. Throughput: 0: 11764.5. Samples: 366238208. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:35,767][1648981] Avg episode reward: [(0, '1006.060')] [2024-06-15 20:02:36,695][1651669] Updated weights for policy 0, policy_version 715133 (0.0012) [2024-06-15 20:02:38,469][1651669] Updated weights for policy 0, policy_version 715189 (0.0012) [2024-06-15 20:02:40,557][1651669] Updated weights for policy 0, policy_version 715280 (0.0018) [2024-06-15 20:02:40,768][1648981] Fps is (10 sec: 42589.9, 60 sec: 49696.5, 300 sec: 47652.3). Total num frames: 1464893440. Throughput: 0: 11687.5. Samples: 366262272. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:40,769][1648981] Avg episode reward: [(0, '986.160')] [2024-06-15 20:02:45,788][1648981] Fps is (10 sec: 49050.0, 60 sec: 43675.2, 300 sec: 47094.2). Total num frames: 1464991744. Throughput: 0: 11702.3. Samples: 366332416. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:45,788][1648981] Avg episode reward: [(0, '1006.040')] [2024-06-15 20:02:47,554][1651669] Updated weights for policy 0, policy_version 715350 (0.0014) [2024-06-15 20:02:48,648][1651669] Updated weights for policy 0, policy_version 715397 (0.0013) [2024-06-15 20:02:49,496][1651274] Signal inference workers to stop experience collection... (37550 times) [2024-06-15 20:02:49,561][1651669] InferenceWorker_p0-w0: stopping experience collection (37550 times) [2024-06-15 20:02:49,866][1651274] Signal inference workers to resume experience collection... (37550 times) [2024-06-15 20:02:49,867][1651669] InferenceWorker_p0-w0: resuming experience collection (37550 times) [2024-06-15 20:02:50,208][1651669] Updated weights for policy 0, policy_version 715456 (0.0012) [2024-06-15 20:02:50,766][1648981] Fps is (10 sec: 39329.3, 60 sec: 48605.8, 300 sec: 47208.1). Total num frames: 1465286656. Throughput: 0: 11537.0. Samples: 366398976. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:50,767][1648981] Avg episode reward: [(0, '1063.400')] [2024-06-15 20:02:51,208][1651274] Saving new best policy, reward=1063.400! [2024-06-15 20:02:51,921][1651669] Updated weights for policy 0, policy_version 715525 (0.0116) [2024-06-15 20:02:55,766][1648981] Fps is (10 sec: 52540.1, 60 sec: 45884.3, 300 sec: 47099.7). Total num frames: 1465516032. Throughput: 0: 11537.1. Samples: 366428160. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:02:55,767][1648981] Avg episode reward: [(0, '1058.640')] [2024-06-15 20:02:55,785][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000715584_1465516032.pth... [2024-06-15 20:02:55,868][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000710032_1454145536.pth [2024-06-15 20:02:58,346][1651669] Updated weights for policy 0, policy_version 715586 (0.0014) [2024-06-15 20:02:59,825][1651669] Updated weights for policy 0, policy_version 715649 (0.0069) [2024-06-15 20:03:00,769][1648981] Fps is (10 sec: 42587.6, 60 sec: 46965.5, 300 sec: 47097.3). Total num frames: 1465712640. Throughput: 0: 11684.3. Samples: 366513664. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:00,770][1648981] Avg episode reward: [(0, '1056.640')] [2024-06-15 20:03:01,647][1651669] Updated weights for policy 0, policy_version 715715 (0.0012) [2024-06-15 20:03:03,367][1651669] Updated weights for policy 0, policy_version 715782 (0.0020) [2024-06-15 20:03:05,778][1648981] Fps is (10 sec: 52367.4, 60 sec: 46958.3, 300 sec: 47206.3). Total num frames: 1466040320. Throughput: 0: 11363.5. Samples: 366568960. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:05,779][1648981] Avg episode reward: [(0, '1053.210')] [2024-06-15 20:03:09,779][1651669] Updated weights for policy 0, policy_version 715856 (0.0012) [2024-06-15 20:03:10,767][1648981] Fps is (10 sec: 42609.0, 60 sec: 45328.9, 300 sec: 46986.6). Total num frames: 1466138624. Throughput: 0: 11457.4. Samples: 366617088. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:10,767][1648981] Avg episode reward: [(0, '1087.450')] [2024-06-15 20:03:11,575][1651274] Saving new best policy, reward=1087.450! [2024-06-15 20:03:12,302][1651669] Updated weights for policy 0, policy_version 715938 (0.0013) [2024-06-15 20:03:13,743][1651669] Updated weights for policy 0, policy_version 716000 (0.0011) [2024-06-15 20:03:14,982][1651669] Updated weights for policy 0, policy_version 716053 (0.0010) [2024-06-15 20:03:15,766][1648981] Fps is (10 sec: 52490.6, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1466564608. Throughput: 0: 11480.2. Samples: 366671360. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:15,767][1648981] Avg episode reward: [(0, '1050.760')] [2024-06-15 20:03:20,621][1651669] Updated weights for policy 0, policy_version 716097 (0.0013) [2024-06-15 20:03:20,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 43690.8, 300 sec: 47097.1). Total num frames: 1466564608. Throughput: 0: 11628.2. Samples: 366761472. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:20,767][1648981] Avg episode reward: [(0, '1060.860')] [2024-06-15 20:03:22,278][1651669] Updated weights for policy 0, policy_version 716159 (0.0169) [2024-06-15 20:03:23,442][1651669] Updated weights for policy 0, policy_version 716208 (0.0011) [2024-06-15 20:03:25,413][1651274] Signal inference workers to stop experience collection... (37600 times) [2024-06-15 20:03:25,461][1651669] InferenceWorker_p0-w0: stopping experience collection (37600 times) [2024-06-15 20:03:25,483][1651669] Updated weights for policy 0, policy_version 716276 (0.0011) [2024-06-15 20:03:25,653][1651274] Signal inference workers to resume experience collection... (37600 times) [2024-06-15 20:03:25,653][1651669] InferenceWorker_p0-w0: resuming experience collection (37600 times) [2024-06-15 20:03:25,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 1466957824. Throughput: 0: 11731.0. Samples: 366790144. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:25,767][1648981] Avg episode reward: [(0, '1047.830')] [2024-06-15 20:03:26,664][1651669] Updated weights for policy 0, policy_version 716341 (0.0012) [2024-06-15 20:03:30,774][1648981] Fps is (10 sec: 52389.6, 60 sec: 43685.2, 300 sec: 47095.9). Total num frames: 1467088896. Throughput: 0: 11768.2. Samples: 366861824. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:30,774][1648981] Avg episode reward: [(0, '1010.680')] [2024-06-15 20:03:31,752][1651669] Updated weights for policy 0, policy_version 716387 (0.0104) [2024-06-15 20:03:33,029][1651669] Updated weights for policy 0, policy_version 716432 (0.0011) [2024-06-15 20:03:35,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49152.4, 300 sec: 47431.6). Total num frames: 1467449344. Throughput: 0: 11650.9. Samples: 366923264. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:35,767][1648981] Avg episode reward: [(0, '1018.280')] [2024-06-15 20:03:35,950][1651669] Updated weights for policy 0, policy_version 716544 (0.0136) [2024-06-15 20:03:37,348][1651669] Updated weights for policy 0, policy_version 716592 (0.0046) [2024-06-15 20:03:40,766][1648981] Fps is (10 sec: 52467.9, 60 sec: 45330.6, 300 sec: 47097.0). Total num frames: 1467613184. Throughput: 0: 11639.5. Samples: 366951936. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:40,767][1648981] Avg episode reward: [(0, '1000.730')] [2024-06-15 20:03:43,567][1651669] Updated weights for policy 0, policy_version 716626 (0.0014) [2024-06-15 20:03:44,640][1651669] Updated weights for policy 0, policy_version 716669 (0.0012) [2024-06-15 20:03:45,769][1648981] Fps is (10 sec: 36034.4, 60 sec: 46981.8, 300 sec: 46874.5). Total num frames: 1467809792. Throughput: 0: 11480.1. Samples: 367030272. Policy #0 lag: (min: 2.0, avg: 126.6, max: 258.0) [2024-06-15 20:03:45,770][1648981] Avg episode reward: [(0, '1007.790')] [2024-06-15 20:03:46,360][1651669] Updated weights for policy 0, policy_version 716723 (0.0012) [2024-06-15 20:03:48,383][1651669] Updated weights for policy 0, policy_version 716816 (0.0043) [2024-06-15 20:03:49,353][1651669] Updated weights for policy 0, policy_version 716861 (0.0017) [2024-06-15 20:03:50,792][1648981] Fps is (10 sec: 52297.0, 60 sec: 47493.7, 300 sec: 47538.6). Total num frames: 1468137472. Throughput: 0: 11567.7. Samples: 367089664. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:03:50,792][1648981] Avg episode reward: [(0, '1017.200')] [2024-06-15 20:03:55,766][1648981] Fps is (10 sec: 39332.8, 60 sec: 44782.9, 300 sec: 46431.2). Total num frames: 1468203008. Throughput: 0: 11434.7. Samples: 367131648. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:03:55,767][1648981] Avg episode reward: [(0, '1008.240')] [2024-06-15 20:03:56,295][1651669] Updated weights for policy 0, policy_version 716920 (0.0183) [2024-06-15 20:03:58,175][1651669] Updated weights for policy 0, policy_version 716992 (0.0016) [2024-06-15 20:03:59,976][1651669] Updated weights for policy 0, policy_version 717072 (0.0010) [2024-06-15 20:04:00,766][1648981] Fps is (10 sec: 49276.2, 60 sec: 48608.0, 300 sec: 47763.5). Total num frames: 1468628992. Throughput: 0: 11582.6. Samples: 367192576. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:00,767][1648981] Avg episode reward: [(0, '997.630')] [2024-06-15 20:04:00,875][1651669] Updated weights for policy 0, policy_version 717118 (0.0012) [2024-06-15 20:04:05,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 43699.2, 300 sec: 46541.8). Total num frames: 1468661760. Throughput: 0: 11320.9. Samples: 367270912. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:05,767][1648981] Avg episode reward: [(0, '1063.280')] [2024-06-15 20:04:06,709][1651274] Signal inference workers to stop experience collection... (37650 times) [2024-06-15 20:04:06,748][1651669] InferenceWorker_p0-w0: stopping experience collection (37650 times) [2024-06-15 20:04:06,749][1651669] Updated weights for policy 0, policy_version 717154 (0.0013) [2024-06-15 20:04:07,091][1651274] Signal inference workers to resume experience collection... (37650 times) [2024-06-15 20:04:07,092][1651669] InferenceWorker_p0-w0: resuming experience collection (37650 times) [2024-06-15 20:04:07,759][1651669] Updated weights for policy 0, policy_version 717186 (0.0014) [2024-06-15 20:04:09,771][1651669] Updated weights for policy 0, policy_version 717264 (0.0011) [2024-06-15 20:04:10,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 48059.8, 300 sec: 47430.4). Total num frames: 1469022208. Throughput: 0: 11514.3. Samples: 367308288. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:10,767][1648981] Avg episode reward: [(0, '1036.740')] [2024-06-15 20:04:12,188][1651669] Updated weights for policy 0, policy_version 717360 (0.0011) [2024-06-15 20:04:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 43690.6, 300 sec: 46652.7). Total num frames: 1469186048. Throughput: 0: 11186.2. Samples: 367365120. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:15,767][1648981] Avg episode reward: [(0, '1029.200')] [2024-06-15 20:04:17,120][1651669] Updated weights for policy 0, policy_version 717378 (0.0050) [2024-06-15 20:04:18,631][1651669] Updated weights for policy 0, policy_version 717441 (0.0011) [2024-06-15 20:04:20,411][1651669] Updated weights for policy 0, policy_version 717511 (0.0011) [2024-06-15 20:04:20,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 48605.6, 300 sec: 47208.1). Total num frames: 1469480960. Throughput: 0: 11525.6. Samples: 367441920. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:20,768][1648981] Avg episode reward: [(0, '1002.330')] [2024-06-15 20:04:21,758][1651669] Updated weights for policy 0, policy_version 717567 (0.0012) [2024-06-15 20:04:23,232][1651669] Updated weights for policy 0, policy_version 717626 (0.0106) [2024-06-15 20:04:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 45875.2, 300 sec: 46875.5). Total num frames: 1469710336. Throughput: 0: 11571.2. Samples: 367472640. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:25,767][1648981] Avg episode reward: [(0, '988.710')] [2024-06-15 20:04:28,716][1651669] Updated weights for policy 0, policy_version 717680 (0.0122) [2024-06-15 20:04:30,205][1651669] Updated weights for policy 0, policy_version 717731 (0.0012) [2024-06-15 20:04:30,774][1648981] Fps is (10 sec: 49115.1, 60 sec: 48059.5, 300 sec: 47101.6). Total num frames: 1469972480. Throughput: 0: 11717.8. Samples: 367557632. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:30,775][1648981] Avg episode reward: [(0, '1013.270')] [2024-06-15 20:04:31,836][1651669] Updated weights for policy 0, policy_version 717808 (0.0092) [2024-06-15 20:04:32,982][1651669] Updated weights for policy 0, policy_version 717841 (0.0017) [2024-06-15 20:04:33,982][1651669] Updated weights for policy 0, policy_version 717887 (0.0016) [2024-06-15 20:04:35,767][1648981] Fps is (10 sec: 52427.1, 60 sec: 46421.0, 300 sec: 47208.1). Total num frames: 1470234624. Throughput: 0: 11919.1. Samples: 367625728. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:35,767][1648981] Avg episode reward: [(0, '1053.810')] [2024-06-15 20:04:40,031][1651669] Updated weights for policy 0, policy_version 717952 (0.0106) [2024-06-15 20:04:40,767][1648981] Fps is (10 sec: 39348.3, 60 sec: 45874.5, 300 sec: 46652.6). Total num frames: 1470365696. Throughput: 0: 11866.8. Samples: 367665664. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:40,768][1648981] Avg episode reward: [(0, '999.420')] [2024-06-15 20:04:42,338][1651669] Updated weights for policy 0, policy_version 718016 (0.0012) [2024-06-15 20:04:43,695][1651274] Signal inference workers to stop experience collection... (37700 times) [2024-06-15 20:04:43,760][1651669] InferenceWorker_p0-w0: stopping experience collection (37700 times) [2024-06-15 20:04:43,769][1651669] Updated weights for policy 0, policy_version 718070 (0.0011) [2024-06-15 20:04:43,993][1651274] Signal inference workers to resume experience collection... (37700 times) [2024-06-15 20:04:43,994][1651669] InferenceWorker_p0-w0: resuming experience collection (37700 times) [2024-06-15 20:04:45,318][1651669] Updated weights for policy 0, policy_version 718137 (0.0021) [2024-06-15 20:04:45,766][1648981] Fps is (10 sec: 52431.0, 60 sec: 49154.4, 300 sec: 47541.4). Total num frames: 1470758912. Throughput: 0: 11798.8. Samples: 367723520. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:45,767][1648981] Avg episode reward: [(0, '1008.480')] [2024-06-15 20:04:50,766][1648981] Fps is (10 sec: 45880.0, 60 sec: 44801.8, 300 sec: 46430.6). Total num frames: 1470824448. Throughput: 0: 11707.8. Samples: 367797760. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:50,767][1648981] Avg episode reward: [(0, '1009.590')] [2024-06-15 20:04:50,967][1651669] Updated weights for policy 0, policy_version 718199 (0.0111) [2024-06-15 20:04:52,521][1651669] Updated weights for policy 0, policy_version 718225 (0.0016) [2024-06-15 20:04:53,461][1651669] Updated weights for policy 0, policy_version 718272 (0.0011) [2024-06-15 20:04:54,856][1651669] Updated weights for policy 0, policy_version 718330 (0.0038) [2024-06-15 20:04:55,770][1648981] Fps is (10 sec: 42584.7, 60 sec: 49695.5, 300 sec: 47429.9). Total num frames: 1471184896. Throughput: 0: 11695.5. Samples: 367834624. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:04:55,770][1648981] Avg episode reward: [(0, '983.710')] [2024-06-15 20:04:56,101][1651669] Updated weights for policy 0, policy_version 718369 (0.0011) [2024-06-15 20:04:56,315][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000718384_1471250432.pth... [2024-06-15 20:04:56,394][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000712816_1459847168.pth [2024-06-15 20:04:56,780][1651669] Updated weights for policy 0, policy_version 718400 (0.0009) [2024-06-15 20:05:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45329.1, 300 sec: 46652.8). Total num frames: 1471348736. Throughput: 0: 12197.0. Samples: 367913984. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:05:00,767][1648981] Avg episode reward: [(0, '1010.620')] [2024-06-15 20:05:01,046][1651669] Updated weights for policy 0, policy_version 718460 (0.0012) [2024-06-15 20:05:03,373][1651669] Updated weights for policy 0, policy_version 718498 (0.0012) [2024-06-15 20:05:04,213][1651669] Updated weights for policy 0, policy_version 718532 (0.0010) [2024-06-15 20:05:05,386][1651669] Updated weights for policy 0, policy_version 718588 (0.0012) [2024-06-15 20:05:05,777][1648981] Fps is (10 sec: 49117.4, 60 sec: 50235.7, 300 sec: 47539.7). Total num frames: 1471676416. Throughput: 0: 11944.0. Samples: 367979520. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:05:05,777][1648981] Avg episode reward: [(0, '970.310')] [2024-06-15 20:05:07,380][1651669] Updated weights for policy 0, policy_version 718655 (0.0011) [2024-06-15 20:05:10,766][1648981] Fps is (10 sec: 52428.0, 60 sec: 47513.6, 300 sec: 47208.1). Total num frames: 1471873024. Throughput: 0: 12083.2. Samples: 368016384. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:05:10,767][1648981] Avg episode reward: [(0, '928.220')] [2024-06-15 20:05:11,134][1651669] Updated weights for policy 0, policy_version 718707 (0.0012) [2024-06-15 20:05:13,539][1651669] Updated weights for policy 0, policy_version 718752 (0.0013) [2024-06-15 20:05:14,952][1651669] Updated weights for policy 0, policy_version 718785 (0.0072) [2024-06-15 20:05:15,766][1648981] Fps is (10 sec: 45922.2, 60 sec: 49152.0, 300 sec: 47321.8). Total num frames: 1472135168. Throughput: 0: 11937.4. Samples: 368094720. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:05:15,767][1648981] Avg episode reward: [(0, '941.190')] [2024-06-15 20:05:16,475][1651669] Updated weights for policy 0, policy_version 718844 (0.0009) [2024-06-15 20:05:18,054][1651669] Updated weights for policy 0, policy_version 718898 (0.0013) [2024-06-15 20:05:20,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 47513.8, 300 sec: 47097.2). Total num frames: 1472331776. Throughput: 0: 12094.7. Samples: 368169984. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:05:20,767][1648981] Avg episode reward: [(0, '936.940')] [2024-06-15 20:05:21,518][1651669] Updated weights for policy 0, policy_version 718951 (0.0093) [2024-06-15 20:05:24,445][1651669] Updated weights for policy 0, policy_version 719009 (0.0012) [2024-06-15 20:05:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1472593920. Throughput: 0: 12083.5. Samples: 368209408. Policy #0 lag: (min: 66.0, avg: 140.5, max: 287.0) [2024-06-15 20:05:25,767][1648981] Avg episode reward: [(0, '921.960')] [2024-06-15 20:05:26,073][1651274] Signal inference workers to stop experience collection... (37750 times) [2024-06-15 20:05:26,098][1651669] InferenceWorker_p0-w0: stopping experience collection (37750 times) [2024-06-15 20:05:26,385][1651274] Signal inference workers to resume experience collection... (37750 times) [2024-06-15 20:05:26,387][1651669] InferenceWorker_p0-w0: resuming experience collection (37750 times) [2024-06-15 20:05:26,570][1651669] Updated weights for policy 0, policy_version 719074 (0.0012) [2024-06-15 20:05:28,173][1651669] Updated weights for policy 0, policy_version 719136 (0.0011) [2024-06-15 20:05:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48065.9, 300 sec: 47319.2). Total num frames: 1472856064. Throughput: 0: 12208.3. Samples: 368272896. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:05:30,767][1648981] Avg episode reward: [(0, '944.770')] [2024-06-15 20:05:32,133][1651669] Updated weights for policy 0, policy_version 719184 (0.0014) [2024-06-15 20:05:33,203][1651669] Updated weights for policy 0, policy_version 719226 (0.0009) [2024-06-15 20:05:34,713][1651669] Updated weights for policy 0, policy_version 719264 (0.0043) [2024-06-15 20:05:35,490][1651669] Updated weights for policy 0, policy_version 719296 (0.0010) [2024-06-15 20:05:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48060.0, 300 sec: 47097.1). Total num frames: 1473118208. Throughput: 0: 12253.8. Samples: 368349184. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:05:35,767][1648981] Avg episode reward: [(0, '973.990')] [2024-06-15 20:05:38,006][1651669] Updated weights for policy 0, policy_version 719355 (0.0101) [2024-06-15 20:05:39,670][1651669] Updated weights for policy 0, policy_version 719408 (0.0018) [2024-06-15 20:05:40,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 50244.9, 300 sec: 47652.4). Total num frames: 1473380352. Throughput: 0: 12231.9. Samples: 368385024. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:05:40,767][1648981] Avg episode reward: [(0, '931.410')] [2024-06-15 20:05:42,586][1651669] Updated weights for policy 0, policy_version 719463 (0.0134) [2024-06-15 20:05:44,682][1651669] Updated weights for policy 0, policy_version 719509 (0.0013) [2024-06-15 20:05:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1473642496. Throughput: 0: 12185.6. Samples: 368462336. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:05:45,767][1648981] Avg episode reward: [(0, '889.590')] [2024-06-15 20:05:47,955][1651669] Updated weights for policy 0, policy_version 719572 (0.0011) [2024-06-15 20:05:49,356][1651669] Updated weights for policy 0, policy_version 719640 (0.0014) [2024-06-15 20:05:50,000][1651669] Updated weights for policy 0, policy_version 719680 (0.0013) [2024-06-15 20:05:50,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 51336.4, 300 sec: 47874.6). Total num frames: 1473904640. Throughput: 0: 12450.1. Samples: 368539648. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:05:50,767][1648981] Avg episode reward: [(0, '854.010')] [2024-06-15 20:05:53,591][1651669] Updated weights for policy 0, policy_version 719736 (0.0215) [2024-06-15 20:05:54,719][1651669] Updated weights for policy 0, policy_version 719776 (0.0012) [2024-06-15 20:05:55,775][1648981] Fps is (10 sec: 52382.7, 60 sec: 49693.6, 300 sec: 47428.9). Total num frames: 1474166784. Throughput: 0: 12353.9. Samples: 368572416. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:05:55,776][1648981] Avg episode reward: [(0, '795.800')] [2024-06-15 20:05:58,511][1651669] Updated weights for policy 0, policy_version 719842 (0.0014) [2024-06-15 20:06:00,682][1651669] Updated weights for policy 0, policy_version 719920 (0.0011) [2024-06-15 20:06:00,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 50790.1, 300 sec: 47874.6). Total num frames: 1474396160. Throughput: 0: 12526.9. Samples: 368658432. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:00,767][1648981] Avg episode reward: [(0, '781.080')] [2024-06-15 20:06:04,237][1651669] Updated weights for policy 0, policy_version 719986 (0.0154) [2024-06-15 20:06:05,741][1651669] Updated weights for policy 0, policy_version 720048 (0.0014) [2024-06-15 20:06:05,766][1648981] Fps is (10 sec: 49195.3, 60 sec: 49706.7, 300 sec: 47652.5). Total num frames: 1474658304. Throughput: 0: 12333.5. Samples: 368724992. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:05,767][1648981] Avg episode reward: [(0, '791.320')] [2024-06-15 20:06:09,014][1651274] Signal inference workers to stop experience collection... (37800 times) [2024-06-15 20:06:09,071][1651669] InferenceWorker_p0-w0: stopping experience collection (37800 times) [2024-06-15 20:06:09,076][1651669] Updated weights for policy 0, policy_version 720098 (0.0132) [2024-06-15 20:06:09,378][1651274] Signal inference workers to resume experience collection... (37800 times) [2024-06-15 20:06:09,380][1651669] InferenceWorker_p0-w0: resuming experience collection (37800 times) [2024-06-15 20:06:10,767][1648981] Fps is (10 sec: 42598.9, 60 sec: 49152.0, 300 sec: 47541.3). Total num frames: 1474822144. Throughput: 0: 12333.5. Samples: 368764416. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:10,767][1648981] Avg episode reward: [(0, '792.980')] [2024-06-15 20:06:12,040][1651669] Updated weights for policy 0, policy_version 720187 (0.0014) [2024-06-15 20:06:15,166][1651669] Updated weights for policy 0, policy_version 720250 (0.0025) [2024-06-15 20:06:15,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 49698.1, 300 sec: 47652.5). Total num frames: 1475117056. Throughput: 0: 12447.3. Samples: 368833024. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:15,767][1648981] Avg episode reward: [(0, '776.550')] [2024-06-15 20:06:16,506][1651669] Updated weights for policy 0, policy_version 720312 (0.0010) [2024-06-15 20:06:20,010][1651669] Updated weights for policy 0, policy_version 720352 (0.0014) [2024-06-15 20:06:20,767][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.1, 300 sec: 47430.4). Total num frames: 1475313664. Throughput: 0: 12322.1. Samples: 368903680. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:20,767][1648981] Avg episode reward: [(0, '780.020')] [2024-06-15 20:06:23,493][1651669] Updated weights for policy 0, policy_version 720446 (0.0014) [2024-06-15 20:06:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 49152.0, 300 sec: 47652.4). Total num frames: 1475543040. Throughput: 0: 12276.7. Samples: 368937472. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:25,767][1648981] Avg episode reward: [(0, '745.830')] [2024-06-15 20:06:25,930][1651669] Updated weights for policy 0, policy_version 720483 (0.0013) [2024-06-15 20:06:27,043][1651669] Updated weights for policy 0, policy_version 720530 (0.0030) [2024-06-15 20:06:30,778][1648981] Fps is (10 sec: 42548.0, 60 sec: 48050.2, 300 sec: 47095.2). Total num frames: 1475739648. Throughput: 0: 12136.8. Samples: 369008640. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:30,779][1648981] Avg episode reward: [(0, '778.780')] [2024-06-15 20:06:31,287][1651669] Updated weights for policy 0, policy_version 720608 (0.0203) [2024-06-15 20:06:35,232][1651669] Updated weights for policy 0, policy_version 720694 (0.0022) [2024-06-15 20:06:35,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1476001792. Throughput: 0: 12049.1. Samples: 369081856. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:35,767][1648981] Avg episode reward: [(0, '796.740')] [2024-06-15 20:06:36,470][1651669] Updated weights for policy 0, policy_version 720738 (0.0014) [2024-06-15 20:06:38,444][1651669] Updated weights for policy 0, policy_version 720824 (0.0012) [2024-06-15 20:06:40,766][1648981] Fps is (10 sec: 52491.5, 60 sec: 48059.9, 300 sec: 47097.1). Total num frames: 1476263936. Throughput: 0: 12017.3. Samples: 369113088. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:40,767][1648981] Avg episode reward: [(0, '804.700')] [2024-06-15 20:06:41,755][1651669] Updated weights for policy 0, policy_version 720880 (0.0013) [2024-06-15 20:06:44,623][1651669] Updated weights for policy 0, policy_version 720912 (0.0031) [2024-06-15 20:06:45,540][1651669] Updated weights for policy 0, policy_version 720955 (0.0013) [2024-06-15 20:06:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1476526080. Throughput: 0: 11958.1. Samples: 369196544. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:45,767][1648981] Avg episode reward: [(0, '827.020')] [2024-06-15 20:06:47,199][1651669] Updated weights for policy 0, policy_version 721008 (0.0013) [2024-06-15 20:06:48,508][1651669] Updated weights for policy 0, policy_version 721056 (0.0010) [2024-06-15 20:06:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 47543.3). Total num frames: 1476788224. Throughput: 0: 12162.8. Samples: 369272320. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:50,767][1648981] Avg episode reward: [(0, '838.830')] [2024-06-15 20:06:51,969][1651274] Signal inference workers to stop experience collection... (37850 times) [2024-06-15 20:06:52,064][1651669] InferenceWorker_p0-w0: stopping experience collection (37850 times) [2024-06-15 20:06:52,066][1651669] Updated weights for policy 0, policy_version 721111 (0.0011) [2024-06-15 20:06:52,210][1651274] Signal inference workers to resume experience collection... (37850 times) [2024-06-15 20:06:52,211][1651669] InferenceWorker_p0-w0: resuming experience collection (37850 times) [2024-06-15 20:06:55,081][1651669] Updated weights for policy 0, policy_version 721184 (0.0016) [2024-06-15 20:06:55,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47520.5, 300 sec: 47874.6). Total num frames: 1477017600. Throughput: 0: 12094.6. Samples: 369308672. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:06:55,767][1648981] Avg episode reward: [(0, '821.390')] [2024-06-15 20:06:55,858][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000721216_1477050368.pth... [2024-06-15 20:06:55,900][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000715584_1465516032.pth [2024-06-15 20:06:57,243][1651669] Updated weights for policy 0, policy_version 721251 (0.0016) [2024-06-15 20:06:57,927][1651669] Updated weights for policy 0, policy_version 721280 (0.0013) [2024-06-15 20:07:00,133][1651669] Updated weights for policy 0, policy_version 721341 (0.0012) [2024-06-15 20:07:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48606.1, 300 sec: 47763.5). Total num frames: 1477312512. Throughput: 0: 12140.1. Samples: 369379328. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:07:00,767][1648981] Avg episode reward: [(0, '817.530')] [2024-06-15 20:07:03,817][1651669] Updated weights for policy 0, policy_version 721400 (0.0113) [2024-06-15 20:07:05,767][1648981] Fps is (10 sec: 49149.8, 60 sec: 47513.1, 300 sec: 47763.4). Total num frames: 1477509120. Throughput: 0: 12253.8. Samples: 369455104. Policy #0 lag: (min: 25.0, avg: 169.8, max: 281.0) [2024-06-15 20:07:05,767][1648981] Avg episode reward: [(0, '843.820')] [2024-06-15 20:07:06,041][1651669] Updated weights for policy 0, policy_version 721456 (0.0011) [2024-06-15 20:07:08,467][1651669] Updated weights for policy 0, policy_version 721505 (0.0019) [2024-06-15 20:07:09,101][1651669] Updated weights for policy 0, policy_version 721536 (0.0012) [2024-06-15 20:07:10,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1477804032. Throughput: 0: 12128.7. Samples: 369483264. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:10,767][1648981] Avg episode reward: [(0, '882.610')] [2024-06-15 20:07:10,771][1651669] Updated weights for policy 0, policy_version 721597 (0.0037) [2024-06-15 20:07:14,469][1651669] Updated weights for policy 0, policy_version 721637 (0.0012) [2024-06-15 20:07:15,766][1648981] Fps is (10 sec: 45877.2, 60 sec: 47513.5, 300 sec: 47541.4). Total num frames: 1477967872. Throughput: 0: 12427.8. Samples: 369567744. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:15,767][1648981] Avg episode reward: [(0, '866.540')] [2024-06-15 20:07:15,982][1651669] Updated weights for policy 0, policy_version 721665 (0.0012) [2024-06-15 20:07:17,005][1651669] Updated weights for policy 0, policy_version 721720 (0.0011) [2024-06-15 20:07:17,865][1651669] Updated weights for policy 0, policy_version 721746 (0.0012) [2024-06-15 20:07:19,427][1651669] Updated weights for policy 0, policy_version 721795 (0.0014) [2024-06-15 20:07:20,682][1651669] Updated weights for policy 0, policy_version 721852 (0.0137) [2024-06-15 20:07:20,766][1648981] Fps is (10 sec: 55706.4, 60 sec: 50790.6, 300 sec: 48207.9). Total num frames: 1478361088. Throughput: 0: 12276.6. Samples: 369634304. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:20,767][1648981] Avg episode reward: [(0, '914.520')] [2024-06-15 20:07:25,381][1651669] Updated weights for policy 0, policy_version 721904 (0.0018) [2024-06-15 20:07:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1478492160. Throughput: 0: 12572.4. Samples: 369678848. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:25,767][1648981] Avg episode reward: [(0, '974.160')] [2024-06-15 20:07:27,390][1651669] Updated weights for policy 0, policy_version 721968 (0.0022) [2024-06-15 20:07:28,439][1651669] Updated weights for policy 0, policy_version 722001 (0.0012) [2024-06-15 20:07:29,250][1651669] Updated weights for policy 0, policy_version 722044 (0.0012) [2024-06-15 20:07:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 51346.8, 300 sec: 48541.1). Total num frames: 1478819840. Throughput: 0: 12367.6. Samples: 369753088. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:30,767][1648981] Avg episode reward: [(0, '1005.990')] [2024-06-15 20:07:34,323][1651669] Updated weights for policy 0, policy_version 722114 (0.0012) [2024-06-15 20:07:35,040][1651274] Signal inference workers to stop experience collection... (37900 times) [2024-06-15 20:07:35,083][1651669] InferenceWorker_p0-w0: stopping experience collection (37900 times) [2024-06-15 20:07:35,293][1651274] Signal inference workers to resume experience collection... (37900 times) [2024-06-15 20:07:35,294][1651669] InferenceWorker_p0-w0: resuming experience collection (37900 times) [2024-06-15 20:07:35,297][1651669] Updated weights for policy 0, policy_version 722160 (0.0012) [2024-06-15 20:07:35,783][1648981] Fps is (10 sec: 52340.4, 60 sec: 50230.1, 300 sec: 47872.2). Total num frames: 1479016448. Throughput: 0: 12419.9. Samples: 369831424. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:35,784][1648981] Avg episode reward: [(0, '1013.050')] [2024-06-15 20:07:37,636][1651669] Updated weights for policy 0, policy_version 722209 (0.0011) [2024-06-15 20:07:39,100][1651669] Updated weights for policy 0, policy_version 722272 (0.0012) [2024-06-15 20:07:40,396][1651669] Updated weights for policy 0, policy_version 722323 (0.0118) [2024-06-15 20:07:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 51336.5, 300 sec: 48655.6). Total num frames: 1479344128. Throughput: 0: 12526.9. Samples: 369872384. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:40,767][1648981] Avg episode reward: [(0, '1016.710')] [2024-06-15 20:07:45,425][1651669] Updated weights for policy 0, policy_version 722387 (0.0012) [2024-06-15 20:07:45,766][1648981] Fps is (10 sec: 45952.6, 60 sec: 49151.9, 300 sec: 48096.8). Total num frames: 1479475200. Throughput: 0: 12549.7. Samples: 369944064. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:45,767][1648981] Avg episode reward: [(0, '1015.540')] [2024-06-15 20:07:48,500][1651669] Updated weights for policy 0, policy_version 722464 (0.0012) [2024-06-15 20:07:49,704][1651669] Updated weights for policy 0, policy_version 722512 (0.0012) [2024-06-15 20:07:50,786][1648981] Fps is (10 sec: 42514.5, 60 sec: 49681.8, 300 sec: 48315.7). Total num frames: 1479770112. Throughput: 0: 12260.0. Samples: 370007040. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:50,787][1648981] Avg episode reward: [(0, '1004.800')] [2024-06-15 20:07:50,850][1651669] Updated weights for policy 0, policy_version 722551 (0.0012) [2024-06-15 20:07:52,202][1651669] Updated weights for policy 0, policy_version 722608 (0.0012) [2024-06-15 20:07:55,790][1648981] Fps is (10 sec: 45767.5, 60 sec: 48586.8, 300 sec: 48204.4). Total num frames: 1479933952. Throughput: 0: 12452.2. Samples: 370043904. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:07:55,790][1648981] Avg episode reward: [(0, '1043.390')] [2024-06-15 20:07:56,566][1651669] Updated weights for policy 0, policy_version 722644 (0.0063) [2024-06-15 20:07:57,476][1651669] Updated weights for policy 0, policy_version 722679 (0.0011) [2024-06-15 20:07:59,317][1651669] Updated weights for policy 0, policy_version 722740 (0.0017) [2024-06-15 20:08:00,537][1651669] Updated weights for policy 0, policy_version 722786 (0.0027) [2024-06-15 20:08:00,766][1648981] Fps is (10 sec: 52533.2, 60 sec: 49698.2, 300 sec: 48320.9). Total num frames: 1480294400. Throughput: 0: 12379.1. Samples: 370124800. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:00,767][1648981] Avg episode reward: [(0, '1079.290')] [2024-06-15 20:08:01,705][1651669] Updated weights for policy 0, policy_version 722832 (0.0010) [2024-06-15 20:08:05,766][1648981] Fps is (10 sec: 52553.0, 60 sec: 49152.5, 300 sec: 48541.1). Total num frames: 1480458240. Throughput: 0: 12561.1. Samples: 370199552. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:05,767][1648981] Avg episode reward: [(0, '1060.550')] [2024-06-15 20:08:06,248][1651669] Updated weights for policy 0, policy_version 722887 (0.0017) [2024-06-15 20:08:07,512][1651669] Updated weights for policy 0, policy_version 722944 (0.0024) [2024-06-15 20:08:09,937][1651669] Updated weights for policy 0, policy_version 722992 (0.0018) [2024-06-15 20:08:10,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1480720384. Throughput: 0: 12367.7. Samples: 370235392. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:10,767][1648981] Avg episode reward: [(0, '1042.990')] [2024-06-15 20:08:11,336][1651669] Updated weights for policy 0, policy_version 723041 (0.0014) [2024-06-15 20:08:12,139][1651669] Updated weights for policy 0, policy_version 723074 (0.0030) [2024-06-15 20:08:12,526][1651274] Signal inference workers to stop experience collection... (37950 times) [2024-06-15 20:08:12,573][1651669] InferenceWorker_p0-w0: stopping experience collection (37950 times) [2024-06-15 20:08:12,818][1651274] Signal inference workers to resume experience collection... (37950 times) [2024-06-15 20:08:12,818][1651669] InferenceWorker_p0-w0: resuming experience collection (37950 times) [2024-06-15 20:08:13,383][1651669] Updated weights for policy 0, policy_version 723125 (0.0013) [2024-06-15 20:08:15,790][1648981] Fps is (10 sec: 52302.9, 60 sec: 50224.2, 300 sec: 48870.3). Total num frames: 1480982528. Throughput: 0: 12281.4. Samples: 370306048. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:15,791][1648981] Avg episode reward: [(0, '1057.550')] [2024-06-15 20:08:17,695][1651669] Updated weights for policy 0, policy_version 723184 (0.0012) [2024-06-15 20:08:20,767][1648981] Fps is (10 sec: 45872.1, 60 sec: 46966.9, 300 sec: 48207.7). Total num frames: 1481179136. Throughput: 0: 12281.1. Samples: 370383872. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:20,768][1648981] Avg episode reward: [(0, '1063.030')] [2024-06-15 20:08:20,877][1651669] Updated weights for policy 0, policy_version 723239 (0.0020) [2024-06-15 20:08:22,164][1651669] Updated weights for policy 0, policy_version 723281 (0.0015) [2024-06-15 20:08:23,904][1651669] Updated weights for policy 0, policy_version 723350 (0.0019) [2024-06-15 20:08:25,766][1648981] Fps is (10 sec: 52555.0, 60 sec: 50244.3, 300 sec: 48875.5). Total num frames: 1481506816. Throughput: 0: 11958.1. Samples: 370410496. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:25,767][1648981] Avg episode reward: [(0, '1049.130')] [2024-06-15 20:08:28,476][1651669] Updated weights for policy 0, policy_version 723410 (0.0013) [2024-06-15 20:08:30,766][1648981] Fps is (10 sec: 45878.4, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1481637888. Throughput: 0: 11992.2. Samples: 370483712. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:30,767][1648981] Avg episode reward: [(0, '1021.800')] [2024-06-15 20:08:31,931][1651669] Updated weights for policy 0, policy_version 723472 (0.0036) [2024-06-15 20:08:33,378][1651669] Updated weights for policy 0, policy_version 723524 (0.0010) [2024-06-15 20:08:35,131][1651669] Updated weights for policy 0, policy_version 723589 (0.0013) [2024-06-15 20:08:35,770][1648981] Fps is (10 sec: 42582.4, 60 sec: 48616.5, 300 sec: 48540.5). Total num frames: 1481932800. Throughput: 0: 12030.6. Samples: 370548224. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:35,771][1648981] Avg episode reward: [(0, '992.710')] [2024-06-15 20:08:36,390][1651669] Updated weights for policy 0, policy_version 723639 (0.0010) [2024-06-15 20:08:39,393][1651669] Updated weights for policy 0, policy_version 723670 (0.0012) [2024-06-15 20:08:40,794][1648981] Fps is (10 sec: 52283.3, 60 sec: 46945.7, 300 sec: 48648.0). Total num frames: 1482162176. Throughput: 0: 12184.5. Samples: 370592256. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:40,795][1648981] Avg episode reward: [(0, '990.420')] [2024-06-15 20:08:43,250][1651669] Updated weights for policy 0, policy_version 723728 (0.0146) [2024-06-15 20:08:44,856][1651669] Updated weights for policy 0, policy_version 723792 (0.0013) [2024-06-15 20:08:45,766][1648981] Fps is (10 sec: 45892.2, 60 sec: 48605.9, 300 sec: 48323.0). Total num frames: 1482391552. Throughput: 0: 11923.9. Samples: 370661376. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:45,767][1648981] Avg episode reward: [(0, '969.410')] [2024-06-15 20:08:46,287][1651669] Updated weights for policy 0, policy_version 723842 (0.0013) [2024-06-15 20:08:47,692][1651669] Updated weights for policy 0, policy_version 723904 (0.0010) [2024-06-15 20:08:50,770][1648981] Fps is (10 sec: 45988.1, 60 sec: 47526.7, 300 sec: 48873.8). Total num frames: 1482620928. Throughput: 0: 11854.8. Samples: 370733056. Policy #0 lag: (min: 25.0, avg: 162.5, max: 281.0) [2024-06-15 20:08:50,770][1648981] Avg episode reward: [(0, '958.560')] [2024-06-15 20:08:50,996][1651669] Updated weights for policy 0, policy_version 723957 (0.0010) [2024-06-15 20:08:53,766][1651669] Updated weights for policy 0, policy_version 724000 (0.0150) [2024-06-15 20:08:55,256][1651669] Updated weights for policy 0, policy_version 724050 (0.0011) [2024-06-15 20:08:55,610][1651274] Signal inference workers to stop experience collection... (38000 times) [2024-06-15 20:08:55,660][1651669] InferenceWorker_p0-w0: stopping experience collection (38000 times) [2024-06-15 20:08:55,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49171.4, 300 sec: 48318.9). Total num frames: 1482883072. Throughput: 0: 11946.7. Samples: 370772992. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:08:55,767][1648981] Avg episode reward: [(0, '963.730')] [2024-06-15 20:08:55,839][1651274] Signal inference workers to resume experience collection... (38000 times) [2024-06-15 20:08:55,850][1651669] InferenceWorker_p0-w0: resuming experience collection (38000 times) [2024-06-15 20:08:56,087][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000724096_1482948608.pth... [2024-06-15 20:08:56,152][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000718384_1471250432.pth [2024-06-15 20:08:57,437][1651669] Updated weights for policy 0, policy_version 724112 (0.0011) [2024-06-15 20:09:00,430][1651669] Updated weights for policy 0, policy_version 724162 (0.0012) [2024-06-15 20:09:00,766][1648981] Fps is (10 sec: 49168.0, 60 sec: 46967.4, 300 sec: 48985.4). Total num frames: 1483112448. Throughput: 0: 11930.3. Samples: 370842624. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:00,767][1648981] Avg episode reward: [(0, '928.570')] [2024-06-15 20:09:01,589][1651669] Updated weights for policy 0, policy_version 724218 (0.0093) [2024-06-15 20:09:04,228][1651669] Updated weights for policy 0, policy_version 724279 (0.0013) [2024-06-15 20:09:05,767][1648981] Fps is (10 sec: 49150.3, 60 sec: 48605.6, 300 sec: 48652.1). Total num frames: 1483374592. Throughput: 0: 11867.1. Samples: 370917888. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:05,767][1648981] Avg episode reward: [(0, '903.690')] [2024-06-15 20:09:06,489][1651669] Updated weights for policy 0, policy_version 724350 (0.0012) [2024-06-15 20:09:08,803][1651669] Updated weights for policy 0, policy_version 724409 (0.0018) [2024-06-15 20:09:10,796][1648981] Fps is (10 sec: 49004.4, 60 sec: 48035.6, 300 sec: 48869.3). Total num frames: 1483603968. Throughput: 0: 12097.9. Samples: 370955264. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:10,797][1648981] Avg episode reward: [(0, '892.660')] [2024-06-15 20:09:12,423][1651669] Updated weights for policy 0, policy_version 724475 (0.0015) [2024-06-15 20:09:14,783][1651669] Updated weights for policy 0, policy_version 724538 (0.0061) [2024-06-15 20:09:15,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 48625.3, 300 sec: 48874.4). Total num frames: 1483898880. Throughput: 0: 12231.1. Samples: 371034112. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:15,767][1648981] Avg episode reward: [(0, '873.370')] [2024-06-15 20:09:16,658][1651669] Updated weights for policy 0, policy_version 724601 (0.0052) [2024-06-15 20:09:19,080][1651669] Updated weights for policy 0, policy_version 724656 (0.0022) [2024-06-15 20:09:20,794][1648981] Fps is (10 sec: 52440.8, 60 sec: 49129.8, 300 sec: 48869.7). Total num frames: 1484128256. Throughput: 0: 12292.8. Samples: 371101696. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:20,795][1648981] Avg episode reward: [(0, '871.570')] [2024-06-15 20:09:22,664][1651669] Updated weights for policy 0, policy_version 724720 (0.0012) [2024-06-15 20:09:25,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 46421.3, 300 sec: 48542.4). Total num frames: 1484292096. Throughput: 0: 12159.0. Samples: 371139072. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:25,767][1648981] Avg episode reward: [(0, '888.270')] [2024-06-15 20:09:25,796][1651669] Updated weights for policy 0, policy_version 724768 (0.0018) [2024-06-15 20:09:26,942][1651669] Updated weights for policy 0, policy_version 724816 (0.0136) [2024-06-15 20:09:29,341][1651669] Updated weights for policy 0, policy_version 724880 (0.0012) [2024-06-15 20:09:30,776][1648981] Fps is (10 sec: 52526.6, 60 sec: 50236.5, 300 sec: 48872.8). Total num frames: 1484652544. Throughput: 0: 12228.6. Samples: 371211776. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:30,776][1648981] Avg episode reward: [(0, '857.320')] [2024-06-15 20:09:33,165][1651669] Updated weights for policy 0, policy_version 724984 (0.0146) [2024-06-15 20:09:35,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47516.6, 300 sec: 48874.5). Total num frames: 1484783616. Throughput: 0: 12368.5. Samples: 371289600. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:35,767][1648981] Avg episode reward: [(0, '887.410')] [2024-06-15 20:09:36,914][1651669] Updated weights for policy 0, policy_version 725024 (0.0010) [2024-06-15 20:09:38,694][1651274] Signal inference workers to stop experience collection... (38050 times) [2024-06-15 20:09:38,705][1651669] Updated weights for policy 0, policy_version 725089 (0.0015) [2024-06-15 20:09:38,721][1651669] InferenceWorker_p0-w0: stopping experience collection (38050 times) [2024-06-15 20:09:38,861][1651274] Signal inference workers to resume experience collection... (38050 times) [2024-06-15 20:09:38,862][1651669] InferenceWorker_p0-w0: resuming experience collection (38050 times) [2024-06-15 20:09:40,320][1651669] Updated weights for policy 0, policy_version 725153 (0.0014) [2024-06-15 20:09:40,771][1648981] Fps is (10 sec: 49174.2, 60 sec: 49717.3, 300 sec: 48762.4). Total num frames: 1485144064. Throughput: 0: 12229.8. Samples: 371323392. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:40,772][1648981] Avg episode reward: [(0, '891.580')] [2024-06-15 20:09:40,979][1651669] Updated weights for policy 0, policy_version 725183 (0.0012) [2024-06-15 20:09:43,842][1651669] Updated weights for policy 0, policy_version 725248 (0.0108) [2024-06-15 20:09:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 49096.5). Total num frames: 1485307904. Throughput: 0: 12162.9. Samples: 371389952. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:45,767][1648981] Avg episode reward: [(0, '920.530')] [2024-06-15 20:09:48,570][1651669] Updated weights for policy 0, policy_version 725297 (0.0115) [2024-06-15 20:09:49,801][1651669] Updated weights for policy 0, policy_version 725360 (0.0012) [2024-06-15 20:09:50,767][1648981] Fps is (10 sec: 45896.1, 60 sec: 49700.7, 300 sec: 48874.8). Total num frames: 1485602816. Throughput: 0: 12253.9. Samples: 371469312. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:50,767][1648981] Avg episode reward: [(0, '931.260')] [2024-06-15 20:09:51,295][1651669] Updated weights for policy 0, policy_version 725417 (0.0015) [2024-06-15 20:09:53,788][1651669] Updated weights for policy 0, policy_version 725456 (0.0011) [2024-06-15 20:09:55,766][1648981] Fps is (10 sec: 52428.0, 60 sec: 49151.9, 300 sec: 49096.4). Total num frames: 1485832192. Throughput: 0: 12182.3. Samples: 371503104. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:09:55,767][1648981] Avg episode reward: [(0, '910.250')] [2024-06-15 20:09:57,975][1651669] Updated weights for policy 0, policy_version 725521 (0.0024) [2024-06-15 20:09:59,452][1651669] Updated weights for policy 0, policy_version 725587 (0.0012) [2024-06-15 20:10:00,767][1648981] Fps is (10 sec: 49151.6, 60 sec: 49697.9, 300 sec: 48876.0). Total num frames: 1486094336. Throughput: 0: 12083.1. Samples: 371577856. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:00,767][1648981] Avg episode reward: [(0, '944.890')] [2024-06-15 20:10:01,154][1651669] Updated weights for policy 0, policy_version 725633 (0.0012) [2024-06-15 20:10:02,561][1651669] Updated weights for policy 0, policy_version 725692 (0.0011) [2024-06-15 20:10:05,651][1651669] Updated weights for policy 0, policy_version 725752 (0.0016) [2024-06-15 20:10:05,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 49698.4, 300 sec: 49096.5). Total num frames: 1486356480. Throughput: 0: 12136.2. Samples: 371647488. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:05,767][1648981] Avg episode reward: [(0, '918.140')] [2024-06-15 20:10:09,570][1651669] Updated weights for policy 0, policy_version 725804 (0.0011) [2024-06-15 20:10:10,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 49176.7, 300 sec: 48874.3). Total num frames: 1486553088. Throughput: 0: 12310.8. Samples: 371693056. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:10,767][1648981] Avg episode reward: [(0, '923.250')] [2024-06-15 20:10:11,259][1651669] Updated weights for policy 0, policy_version 725881 (0.0123) [2024-06-15 20:10:12,576][1651669] Updated weights for policy 0, policy_version 725924 (0.0028) [2024-06-15 20:10:13,084][1651669] Updated weights for policy 0, policy_version 725951 (0.0012) [2024-06-15 20:10:15,782][1648981] Fps is (10 sec: 39259.7, 60 sec: 47501.1, 300 sec: 48871.7). Total num frames: 1486749696. Throughput: 0: 12206.6. Samples: 371761152. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:15,783][1648981] Avg episode reward: [(0, '941.610')] [2024-06-15 20:10:16,486][1651669] Updated weights for policy 0, policy_version 725988 (0.0014) [2024-06-15 20:10:19,260][1651669] Updated weights for policy 0, policy_version 726032 (0.0013) [2024-06-15 20:10:20,494][1651274] Signal inference workers to stop experience collection... (38100 times) [2024-06-15 20:10:20,579][1651669] InferenceWorker_p0-w0: stopping experience collection (38100 times) [2024-06-15 20:10:20,727][1651274] Signal inference workers to resume experience collection... (38100 times) [2024-06-15 20:10:20,728][1651669] InferenceWorker_p0-w0: resuming experience collection (38100 times) [2024-06-15 20:10:20,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48628.4, 300 sec: 48985.4). Total num frames: 1487044608. Throughput: 0: 12105.9. Samples: 371834368. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:20,767][1648981] Avg episode reward: [(0, '944.230')] [2024-06-15 20:10:20,865][1651669] Updated weights for policy 0, policy_version 726098 (0.0090) [2024-06-15 20:10:22,242][1651669] Updated weights for policy 0, policy_version 726160 (0.0011) [2024-06-15 20:10:25,766][1648981] Fps is (10 sec: 52511.3, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1487273984. Throughput: 0: 12095.8. Samples: 371867648. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:25,767][1648981] Avg episode reward: [(0, '938.850')] [2024-06-15 20:10:26,507][1651669] Updated weights for policy 0, policy_version 726224 (0.0021) [2024-06-15 20:10:30,075][1651669] Updated weights for policy 0, policy_version 726276 (0.0013) [2024-06-15 20:10:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46974.7, 300 sec: 48652.2). Total num frames: 1487470592. Throughput: 0: 12435.9. Samples: 371949568. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:30,767][1648981] Avg episode reward: [(0, '972.280')] [2024-06-15 20:10:32,272][1651669] Updated weights for policy 0, policy_version 726368 (0.0167) [2024-06-15 20:10:32,961][1651669] Updated weights for policy 0, policy_version 726399 (0.0108) [2024-06-15 20:10:34,132][1651669] Updated weights for policy 0, policy_version 726448 (0.0013) [2024-06-15 20:10:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 48874.4). Total num frames: 1487798272. Throughput: 0: 12003.6. Samples: 372009472. Policy #0 lag: (min: 4.0, avg: 91.0, max: 260.0) [2024-06-15 20:10:35,767][1648981] Avg episode reward: [(0, '985.930')] [2024-06-15 20:10:37,623][1651669] Updated weights for policy 0, policy_version 726480 (0.0016) [2024-06-15 20:10:38,727][1651669] Updated weights for policy 0, policy_version 726520 (0.0012) [2024-06-15 20:10:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46424.9, 300 sec: 48430.0). Total num frames: 1487929344. Throughput: 0: 12094.6. Samples: 372047360. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:10:40,767][1648981] Avg episode reward: [(0, '955.760')] [2024-06-15 20:10:42,125][1651669] Updated weights for policy 0, policy_version 726560 (0.0010) [2024-06-15 20:10:43,693][1651669] Updated weights for policy 0, policy_version 726624 (0.0050) [2024-06-15 20:10:45,179][1651669] Updated weights for policy 0, policy_version 726678 (0.0010) [2024-06-15 20:10:45,778][1648981] Fps is (10 sec: 49096.5, 60 sec: 49688.8, 300 sec: 48761.4). Total num frames: 1488289792. Throughput: 0: 12046.1. Samples: 372120064. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:10:45,778][1648981] Avg episode reward: [(0, '950.810')] [2024-06-15 20:10:46,054][1651669] Updated weights for policy 0, policy_version 726720 (0.0012) [2024-06-15 20:10:49,439][1651669] Updated weights for policy 0, policy_version 726776 (0.0014) [2024-06-15 20:10:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47513.7, 300 sec: 48431.4). Total num frames: 1488453632. Throughput: 0: 12185.6. Samples: 372195840. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:10:50,767][1648981] Avg episode reward: [(0, '1005.250')] [2024-06-15 20:10:52,531][1651669] Updated weights for policy 0, policy_version 726816 (0.0013) [2024-06-15 20:10:53,713][1651669] Updated weights for policy 0, policy_version 726880 (0.0014) [2024-06-15 20:10:55,126][1651669] Updated weights for policy 0, policy_version 726930 (0.0040) [2024-06-15 20:10:55,767][1648981] Fps is (10 sec: 49205.8, 60 sec: 49151.8, 300 sec: 48763.2). Total num frames: 1488781312. Throughput: 0: 11935.2. Samples: 372230144. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:10:55,767][1648981] Avg episode reward: [(0, '1032.710')] [2024-06-15 20:10:56,217][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000726976_1488846848.pth... [2024-06-15 20:10:56,266][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000721216_1477050368.pth [2024-06-15 20:10:56,272][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000726976_1488846848.pth [2024-06-15 20:10:59,541][1651669] Updated weights for policy 0, policy_version 726992 (0.0012) [2024-06-15 20:11:00,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48060.0, 300 sec: 48541.1). Total num frames: 1488977920. Throughput: 0: 12098.8. Samples: 372305408. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:00,767][1648981] Avg episode reward: [(0, '1046.440')] [2024-06-15 20:11:02,140][1651274] Signal inference workers to stop experience collection... (38150 times) [2024-06-15 20:11:02,181][1651669] InferenceWorker_p0-w0: stopping experience collection (38150 times) [2024-06-15 20:11:02,452][1651274] Signal inference workers to resume experience collection... (38150 times) [2024-06-15 20:11:02,452][1651669] InferenceWorker_p0-w0: resuming experience collection (38150 times) [2024-06-15 20:11:02,890][1651669] Updated weights for policy 0, policy_version 727072 (0.0014) [2024-06-15 20:11:04,226][1651669] Updated weights for policy 0, policy_version 727120 (0.0015) [2024-06-15 20:11:05,681][1651669] Updated weights for policy 0, policy_version 727184 (0.0018) [2024-06-15 20:11:05,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 1489272832. Throughput: 0: 12071.8. Samples: 372377600. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:05,767][1648981] Avg episode reward: [(0, '1018.190')] [2024-06-15 20:11:10,488][1651669] Updated weights for policy 0, policy_version 727238 (0.0015) [2024-06-15 20:11:10,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 47513.6, 300 sec: 48430.0). Total num frames: 1489403904. Throughput: 0: 12060.5. Samples: 372410368. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:10,767][1648981] Avg episode reward: [(0, '981.470')] [2024-06-15 20:11:13,710][1651669] Updated weights for policy 0, policy_version 727312 (0.0012) [2024-06-15 20:11:15,050][1651669] Updated weights for policy 0, policy_version 727360 (0.0011) [2024-06-15 20:11:15,767][1648981] Fps is (10 sec: 39320.7, 60 sec: 48618.5, 300 sec: 48652.1). Total num frames: 1489666048. Throughput: 0: 11855.6. Samples: 372483072. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:15,767][1648981] Avg episode reward: [(0, '976.270')] [2024-06-15 20:11:16,423][1651669] Updated weights for policy 0, policy_version 727418 (0.0092) [2024-06-15 20:11:18,077][1651669] Updated weights for policy 0, policy_version 727483 (0.0116) [2024-06-15 20:11:20,814][1648981] Fps is (10 sec: 48918.4, 60 sec: 47475.8, 300 sec: 48644.3). Total num frames: 1489895424. Throughput: 0: 12059.0. Samples: 372552704. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:20,815][1648981] Avg episode reward: [(0, '983.150')] [2024-06-15 20:11:22,302][1651669] Updated weights for policy 0, policy_version 727521 (0.0019) [2024-06-15 20:11:24,989][1651669] Updated weights for policy 0, policy_version 727572 (0.0012) [2024-06-15 20:11:25,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 48059.7, 300 sec: 48876.3). Total num frames: 1490157568. Throughput: 0: 12140.1. Samples: 372593664. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:25,767][1648981] Avg episode reward: [(0, '1036.150')] [2024-06-15 20:11:26,279][1651669] Updated weights for policy 0, policy_version 727637 (0.0011) [2024-06-15 20:11:27,106][1651669] Updated weights for policy 0, policy_version 727678 (0.0019) [2024-06-15 20:11:28,677][1651669] Updated weights for policy 0, policy_version 727728 (0.0011) [2024-06-15 20:11:30,766][1648981] Fps is (10 sec: 52680.5, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1490419712. Throughput: 0: 12165.9. Samples: 372667392. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:30,767][1648981] Avg episode reward: [(0, '1017.390')] [2024-06-15 20:11:32,535][1651669] Updated weights for policy 0, policy_version 727776 (0.0013) [2024-06-15 20:11:35,646][1651669] Updated weights for policy 0, policy_version 727842 (0.0056) [2024-06-15 20:11:35,769][1648981] Fps is (10 sec: 45863.5, 60 sec: 46965.4, 300 sec: 48651.7). Total num frames: 1490616320. Throughput: 0: 12093.9. Samples: 372740096. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:35,770][1648981] Avg episode reward: [(0, '1047.710')] [2024-06-15 20:11:37,172][1651669] Updated weights for policy 0, policy_version 727909 (0.0013) [2024-06-15 20:11:39,273][1651669] Updated weights for policy 0, policy_version 727968 (0.0013) [2024-06-15 20:11:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1490944000. Throughput: 0: 12060.5. Samples: 372772864. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:40,767][1648981] Avg episode reward: [(0, '1046.500')] [2024-06-15 20:11:43,210][1651274] Signal inference workers to stop experience collection... (38200 times) [2024-06-15 20:11:43,241][1651669] InferenceWorker_p0-w0: stopping experience collection (38200 times) [2024-06-15 20:11:43,473][1651274] Signal inference workers to resume experience collection... (38200 times) [2024-06-15 20:11:43,474][1651669] InferenceWorker_p0-w0: resuming experience collection (38200 times) [2024-06-15 20:11:43,726][1651669] Updated weights for policy 0, policy_version 728022 (0.0037) [2024-06-15 20:11:45,669][1651669] Updated weights for policy 0, policy_version 728065 (0.0012) [2024-06-15 20:11:45,790][1648981] Fps is (10 sec: 45778.2, 60 sec: 46411.6, 300 sec: 48426.1). Total num frames: 1491075072. Throughput: 0: 11974.4. Samples: 372844544. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:45,791][1648981] Avg episode reward: [(0, '1042.130')] [2024-06-15 20:11:46,895][1651669] Updated weights for policy 0, policy_version 728116 (0.0015) [2024-06-15 20:11:48,217][1651669] Updated weights for policy 0, policy_version 728162 (0.0011) [2024-06-15 20:11:50,067][1651669] Updated weights for policy 0, policy_version 728224 (0.0015) [2024-06-15 20:11:50,794][1648981] Fps is (10 sec: 52283.7, 60 sec: 50221.0, 300 sec: 48980.8). Total num frames: 1491468288. Throughput: 0: 11871.1. Samples: 372912128. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:50,795][1648981] Avg episode reward: [(0, '1068.330')] [2024-06-15 20:11:54,877][1651669] Updated weights for policy 0, policy_version 728291 (0.0011) [2024-06-15 20:11:55,767][1648981] Fps is (10 sec: 52553.2, 60 sec: 46967.6, 300 sec: 48430.0). Total num frames: 1491599360. Throughput: 0: 12208.3. Samples: 372959744. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:11:55,767][1648981] Avg episode reward: [(0, '1028.980')] [2024-06-15 20:11:56,366][1651669] Updated weights for policy 0, policy_version 728340 (0.0011) [2024-06-15 20:11:58,021][1651669] Updated weights for policy 0, policy_version 728386 (0.0011) [2024-06-15 20:11:59,279][1651669] Updated weights for policy 0, policy_version 728446 (0.0011) [2024-06-15 20:12:00,770][1648981] Fps is (10 sec: 45988.3, 60 sec: 49149.3, 300 sec: 48873.9). Total num frames: 1491927040. Throughput: 0: 12105.1. Samples: 373027840. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:12:00,770][1648981] Avg episode reward: [(0, '1072.780')] [2024-06-15 20:12:00,947][1651669] Updated weights for policy 0, policy_version 728502 (0.0012) [2024-06-15 20:12:05,472][1651669] Updated weights for policy 0, policy_version 728549 (0.0011) [2024-06-15 20:12:05,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 46967.4, 300 sec: 48430.0). Total num frames: 1492090880. Throughput: 0: 12392.2. Samples: 373109760. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:12:05,767][1648981] Avg episode reward: [(0, '1020.710')] [2024-06-15 20:12:06,706][1651669] Updated weights for policy 0, policy_version 728593 (0.0012) [2024-06-15 20:12:07,579][1651669] Updated weights for policy 0, policy_version 728640 (0.0014) [2024-06-15 20:12:09,556][1651669] Updated weights for policy 0, policy_version 728700 (0.0013) [2024-06-15 20:12:10,766][1648981] Fps is (10 sec: 49167.8, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1492418560. Throughput: 0: 12288.0. Samples: 373146624. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:12:10,767][1648981] Avg episode reward: [(0, '989.940')] [2024-06-15 20:12:11,562][1651669] Updated weights for policy 0, policy_version 728761 (0.0021) [2024-06-15 20:12:15,780][1648981] Fps is (10 sec: 45814.0, 60 sec: 48049.2, 300 sec: 48094.6). Total num frames: 1492549632. Throughput: 0: 12216.1. Samples: 373217280. Policy #0 lag: (min: 5.0, avg: 112.0, max: 261.0) [2024-06-15 20:12:15,780][1648981] Avg episode reward: [(0, '1011.780')] [2024-06-15 20:12:16,185][1651669] Updated weights for policy 0, policy_version 728803 (0.0019) [2024-06-15 20:12:17,358][1651669] Updated weights for policy 0, policy_version 728848 (0.0037) [2024-06-15 20:12:19,849][1651669] Updated weights for policy 0, policy_version 728899 (0.0018) [2024-06-15 20:12:20,768][1648981] Fps is (10 sec: 42590.8, 60 sec: 49189.7, 300 sec: 48651.9). Total num frames: 1492844544. Throughput: 0: 12106.2. Samples: 373284864. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:20,769][1648981] Avg episode reward: [(0, '999.890')] [2024-06-15 20:12:21,528][1651669] Updated weights for policy 0, policy_version 728976 (0.0119) [2024-06-15 20:12:22,682][1651669] Updated weights for policy 0, policy_version 729018 (0.0034) [2024-06-15 20:12:25,766][1648981] Fps is (10 sec: 49217.9, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 1493041152. Throughput: 0: 12094.6. Samples: 373317120. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:25,767][1648981] Avg episode reward: [(0, '991.350')] [2024-06-15 20:12:27,411][1651274] Signal inference workers to stop experience collection... (38250 times) [2024-06-15 20:12:27,529][1651669] InferenceWorker_p0-w0: stopping experience collection (38250 times) [2024-06-15 20:12:27,531][1651669] Updated weights for policy 0, policy_version 729066 (0.0012) [2024-06-15 20:12:27,632][1651274] Signal inference workers to resume experience collection... (38250 times) [2024-06-15 20:12:27,633][1651669] InferenceWorker_p0-w0: resuming experience collection (38250 times) [2024-06-15 20:12:29,109][1651669] Updated weights for policy 0, policy_version 729106 (0.0013) [2024-06-15 20:12:30,122][1651669] Updated weights for policy 0, policy_version 729150 (0.0014) [2024-06-15 20:12:30,766][1648981] Fps is (10 sec: 45883.4, 60 sec: 48059.7, 300 sec: 48432.8). Total num frames: 1493303296. Throughput: 0: 12146.5. Samples: 373390848. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:30,767][1648981] Avg episode reward: [(0, '983.630')] [2024-06-15 20:12:31,895][1651669] Updated weights for policy 0, policy_version 729210 (0.0055) [2024-06-15 20:12:33,785][1651669] Updated weights for policy 0, policy_version 729275 (0.0013) [2024-06-15 20:12:35,770][1648981] Fps is (10 sec: 52408.5, 60 sec: 49151.0, 300 sec: 48207.2). Total num frames: 1493565440. Throughput: 0: 12192.1. Samples: 373460480. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:35,771][1648981] Avg episode reward: [(0, '1050.050')] [2024-06-15 20:12:39,066][1651669] Updated weights for policy 0, policy_version 729340 (0.0014) [2024-06-15 20:12:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1493762048. Throughput: 0: 11958.1. Samples: 373497856. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:40,767][1648981] Avg episode reward: [(0, '1070.440')] [2024-06-15 20:12:40,995][1651669] Updated weights for policy 0, policy_version 729379 (0.0011) [2024-06-15 20:12:42,920][1651669] Updated weights for policy 0, policy_version 729427 (0.0012) [2024-06-15 20:12:44,475][1651669] Updated weights for policy 0, policy_version 729488 (0.0014) [2024-06-15 20:12:45,766][1648981] Fps is (10 sec: 52449.4, 60 sec: 50264.2, 300 sec: 48544.3). Total num frames: 1494089728. Throughput: 0: 11867.9. Samples: 373561856. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:45,767][1648981] Avg episode reward: [(0, '1049.350')] [2024-06-15 20:12:49,862][1651669] Updated weights for policy 0, policy_version 729568 (0.0012) [2024-06-15 20:12:50,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45896.4, 300 sec: 48433.9). Total num frames: 1494220800. Throughput: 0: 11537.1. Samples: 373628928. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:50,767][1648981] Avg episode reward: [(0, '1029.340')] [2024-06-15 20:12:52,151][1651669] Updated weights for policy 0, policy_version 729632 (0.0012) [2024-06-15 20:12:54,964][1651669] Updated weights for policy 0, policy_version 729698 (0.0035) [2024-06-15 20:12:55,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48059.8, 300 sec: 48096.7). Total num frames: 1494482944. Throughput: 0: 11571.2. Samples: 373667328. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:12:55,767][1648981] Avg episode reward: [(0, '1029.340')] [2024-06-15 20:12:56,448][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000729760_1494548480.pth... [2024-06-15 20:12:56,616][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000724096_1482948608.pth [2024-06-15 20:12:57,108][1651669] Updated weights for policy 0, policy_version 729779 (0.0011) [2024-06-15 20:13:00,732][1651669] Updated weights for policy 0, policy_version 729793 (0.0010) [2024-06-15 20:13:00,776][1648981] Fps is (10 sec: 39283.6, 60 sec: 44778.1, 300 sec: 47984.1). Total num frames: 1494614016. Throughput: 0: 11560.8. Samples: 373737472. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:00,777][1648981] Avg episode reward: [(0, '1034.840')] [2024-06-15 20:13:02,039][1651669] Updated weights for policy 0, policy_version 729848 (0.0013) [2024-06-15 20:13:03,401][1651669] Updated weights for policy 0, policy_version 729888 (0.0012) [2024-06-15 20:13:05,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1494908928. Throughput: 0: 11719.6. Samples: 373812224. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:05,767][1648981] Avg episode reward: [(0, '1003.170')] [2024-06-15 20:13:06,352][1651669] Updated weights for policy 0, policy_version 729968 (0.0137) [2024-06-15 20:13:07,363][1651669] Updated weights for policy 0, policy_version 730016 (0.0018) [2024-06-15 20:13:10,766][1648981] Fps is (10 sec: 52479.4, 60 sec: 45329.0, 300 sec: 47989.6). Total num frames: 1495138304. Throughput: 0: 11707.7. Samples: 373843968. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:10,767][1648981] Avg episode reward: [(0, '1002.660')] [2024-06-15 20:13:11,620][1651669] Updated weights for policy 0, policy_version 730064 (0.0011) [2024-06-15 20:13:11,755][1651274] Signal inference workers to stop experience collection... (38300 times) [2024-06-15 20:13:11,847][1651669] InferenceWorker_p0-w0: stopping experience collection (38300 times) [2024-06-15 20:13:11,972][1651274] Signal inference workers to resume experience collection... (38300 times) [2024-06-15 20:13:11,973][1651669] InferenceWorker_p0-w0: resuming experience collection (38300 times) [2024-06-15 20:13:13,336][1651669] Updated weights for policy 0, policy_version 730128 (0.0012) [2024-06-15 20:13:14,481][1651669] Updated weights for policy 0, policy_version 730176 (0.0034) [2024-06-15 20:13:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48070.5, 300 sec: 48319.0). Total num frames: 1495433216. Throughput: 0: 11719.1. Samples: 373918208. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:15,767][1648981] Avg episode reward: [(0, '992.390')] [2024-06-15 20:13:16,619][1651669] Updated weights for policy 0, policy_version 730228 (0.0107) [2024-06-15 20:13:18,039][1651669] Updated weights for policy 0, policy_version 730295 (0.0015) [2024-06-15 20:13:20,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 46968.7, 300 sec: 47985.7). Total num frames: 1495662592. Throughput: 0: 11936.3. Samples: 373997568. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:20,767][1648981] Avg episode reward: [(0, '1036.370')] [2024-06-15 20:13:22,741][1651669] Updated weights for policy 0, policy_version 730337 (0.0013) [2024-06-15 20:13:23,809][1651669] Updated weights for policy 0, policy_version 730390 (0.0011) [2024-06-15 20:13:25,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 48059.6, 300 sec: 48429.9). Total num frames: 1495924736. Throughput: 0: 11958.0. Samples: 374035968. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:25,767][1648981] Avg episode reward: [(0, '1078.250')] [2024-06-15 20:13:25,983][1651669] Updated weights for policy 0, policy_version 730448 (0.0011) [2024-06-15 20:13:27,169][1651669] Updated weights for policy 0, policy_version 730498 (0.0012) [2024-06-15 20:13:30,770][1648981] Fps is (10 sec: 52407.8, 60 sec: 48056.4, 300 sec: 48318.8). Total num frames: 1496186880. Throughput: 0: 12161.7. Samples: 374109184. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:30,771][1648981] Avg episode reward: [(0, '1041.050')] [2024-06-15 20:13:32,750][1651669] Updated weights for policy 0, policy_version 730565 (0.0013) [2024-06-15 20:13:34,631][1651669] Updated weights for policy 0, policy_version 730643 (0.0014) [2024-06-15 20:13:35,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 48062.9, 300 sec: 48434.6). Total num frames: 1496449024. Throughput: 0: 12231.1. Samples: 374179328. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:35,767][1648981] Avg episode reward: [(0, '1061.690')] [2024-06-15 20:13:36,765][1651669] Updated weights for policy 0, policy_version 730690 (0.0012) [2024-06-15 20:13:38,435][1651669] Updated weights for policy 0, policy_version 730753 (0.0010) [2024-06-15 20:13:40,067][1651669] Updated weights for policy 0, policy_version 730816 (0.0010) [2024-06-15 20:13:40,767][1648981] Fps is (10 sec: 52449.5, 60 sec: 49151.8, 300 sec: 48541.0). Total num frames: 1496711168. Throughput: 0: 12208.3. Samples: 374216704. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:40,767][1648981] Avg episode reward: [(0, '1062.020')] [2024-06-15 20:13:45,778][1648981] Fps is (10 sec: 39274.9, 60 sec: 45866.1, 300 sec: 48206.4). Total num frames: 1496842240. Throughput: 0: 12310.2. Samples: 374291456. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:45,779][1648981] Avg episode reward: [(0, '1083.330')] [2024-06-15 20:13:46,237][1651669] Updated weights for policy 0, policy_version 730896 (0.0020) [2024-06-15 20:13:47,206][1651669] Updated weights for policy 0, policy_version 730939 (0.0018) [2024-06-15 20:13:49,270][1651669] Updated weights for policy 0, policy_version 730999 (0.0014) [2024-06-15 20:13:49,339][1651274] Signal inference workers to stop experience collection... (38350 times) [2024-06-15 20:13:49,397][1651669] InferenceWorker_p0-w0: stopping experience collection (38350 times) [2024-06-15 20:13:49,529][1651274] Signal inference workers to resume experience collection... (38350 times) [2024-06-15 20:13:49,531][1651669] InferenceWorker_p0-w0: resuming experience collection (38350 times) [2024-06-15 20:13:50,568][1651669] Updated weights for policy 0, policy_version 731042 (0.0012) [2024-06-15 20:13:50,774][1648981] Fps is (10 sec: 45840.5, 60 sec: 49145.7, 300 sec: 48428.7). Total num frames: 1497169920. Throughput: 0: 11921.8. Samples: 374348800. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:50,775][1648981] Avg episode reward: [(0, '1102.170')] [2024-06-15 20:13:51,046][1651274] Saving new best policy, reward=1102.170! [2024-06-15 20:13:55,445][1651669] Updated weights for policy 0, policy_version 731088 (0.0016) [2024-06-15 20:13:55,766][1648981] Fps is (10 sec: 42648.6, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1497268224. Throughput: 0: 12128.7. Samples: 374389760. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:13:55,767][1648981] Avg episode reward: [(0, '1105.370')] [2024-06-15 20:13:56,239][1651274] Saving new best policy, reward=1105.370! [2024-06-15 20:13:57,630][1651669] Updated weights for policy 0, policy_version 731168 (0.0099) [2024-06-15 20:14:00,724][1651669] Updated weights for policy 0, policy_version 731232 (0.0013) [2024-06-15 20:14:00,766][1648981] Fps is (10 sec: 39352.2, 60 sec: 49160.0, 300 sec: 48096.8). Total num frames: 1497563136. Throughput: 0: 12014.9. Samples: 374458880. Policy #0 lag: (min: 1.0, avg: 125.1, max: 257.0) [2024-06-15 20:14:00,767][1648981] Avg episode reward: [(0, '1131.490')] [2024-06-15 20:14:01,141][1651274] Saving new best policy, reward=1131.490! [2024-06-15 20:14:02,358][1651669] Updated weights for policy 0, policy_version 731296 (0.0013) [2024-06-15 20:14:05,801][1648981] Fps is (10 sec: 48980.5, 60 sec: 47485.8, 300 sec: 47984.9). Total num frames: 1497759744. Throughput: 0: 11789.6. Samples: 374528512. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:05,802][1648981] Avg episode reward: [(0, '1115.340')] [2024-06-15 20:14:06,747][1651669] Updated weights for policy 0, policy_version 731347 (0.0012) [2024-06-15 20:14:08,759][1651669] Updated weights for policy 0, policy_version 731424 (0.0014) [2024-06-15 20:14:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1498021888. Throughput: 0: 11696.4. Samples: 374562304. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:10,767][1648981] Avg episode reward: [(0, '1110.590')] [2024-06-15 20:14:11,221][1651669] Updated weights for policy 0, policy_version 731460 (0.0011) [2024-06-15 20:14:12,581][1651669] Updated weights for policy 0, policy_version 731521 (0.0065) [2024-06-15 20:14:13,855][1651669] Updated weights for policy 0, policy_version 731579 (0.0012) [2024-06-15 20:14:15,766][1648981] Fps is (10 sec: 52613.3, 60 sec: 47513.5, 300 sec: 47990.2). Total num frames: 1498284032. Throughput: 0: 11651.9. Samples: 374633472. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:15,767][1648981] Avg episode reward: [(0, '1117.780')] [2024-06-15 20:14:18,188][1651669] Updated weights for policy 0, policy_version 731639 (0.0012) [2024-06-15 20:14:19,869][1651669] Updated weights for policy 0, policy_version 731700 (0.0013) [2024-06-15 20:14:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1498546176. Throughput: 0: 11662.2. Samples: 374704128. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:20,767][1648981] Avg episode reward: [(0, '1121.720')] [2024-06-15 20:14:23,305][1651669] Updated weights for policy 0, policy_version 731776 (0.0012) [2024-06-15 20:14:24,378][1651669] Updated weights for policy 0, policy_version 731832 (0.0012) [2024-06-15 20:14:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.9, 300 sec: 47987.2). Total num frames: 1498808320. Throughput: 0: 11696.4. Samples: 374743040. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:25,767][1648981] Avg episode reward: [(0, '1092.810')] [2024-06-15 20:14:28,761][1651669] Updated weights for policy 0, policy_version 731888 (0.0016) [2024-06-15 20:14:29,296][1651274] Signal inference workers to stop experience collection... (38400 times) [2024-06-15 20:14:29,376][1651669] InferenceWorker_p0-w0: stopping experience collection (38400 times) [2024-06-15 20:14:29,586][1651274] Signal inference workers to resume experience collection... (38400 times) [2024-06-15 20:14:29,587][1651669] InferenceWorker_p0-w0: resuming experience collection (38400 times) [2024-06-15 20:14:30,404][1651669] Updated weights for policy 0, policy_version 731952 (0.0012) [2024-06-15 20:14:30,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 47516.9, 300 sec: 48318.9). Total num frames: 1499037696. Throughput: 0: 11745.0. Samples: 374819840. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:30,767][1648981] Avg episode reward: [(0, '1100.880')] [2024-06-15 20:14:33,588][1651669] Updated weights for policy 0, policy_version 732000 (0.0011) [2024-06-15 20:14:35,149][1651669] Updated weights for policy 0, policy_version 732066 (0.0051) [2024-06-15 20:14:35,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 48097.5). Total num frames: 1499332608. Throughput: 0: 12005.6. Samples: 374888960. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:35,767][1648981] Avg episode reward: [(0, '1095.490')] [2024-06-15 20:14:40,067][1651669] Updated weights for policy 0, policy_version 732130 (0.0013) [2024-06-15 20:14:40,769][1648981] Fps is (10 sec: 42586.8, 60 sec: 45873.3, 300 sec: 47985.2). Total num frames: 1499463680. Throughput: 0: 11957.4. Samples: 374927872. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:40,770][1648981] Avg episode reward: [(0, '1074.380')] [2024-06-15 20:14:41,526][1651669] Updated weights for policy 0, policy_version 732195 (0.0012) [2024-06-15 20:14:44,221][1651669] Updated weights for policy 0, policy_version 732242 (0.0013) [2024-06-15 20:14:45,631][1651669] Updated weights for policy 0, policy_version 732304 (0.0014) [2024-06-15 20:14:45,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 48615.5, 300 sec: 47985.7). Total num frames: 1499758592. Throughput: 0: 11855.6. Samples: 374992384. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:45,767][1648981] Avg episode reward: [(0, '1035.600')] [2024-06-15 20:14:46,600][1651669] Updated weights for policy 0, policy_version 732349 (0.0055) [2024-06-15 20:14:50,766][1648981] Fps is (10 sec: 39332.2, 60 sec: 44788.7, 300 sec: 47541.4). Total num frames: 1499856896. Throughput: 0: 11990.2. Samples: 375067648. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:50,767][1648981] Avg episode reward: [(0, '1025.240')] [2024-06-15 20:14:52,773][1651669] Updated weights for policy 0, policy_version 732432 (0.0094) [2024-06-15 20:14:54,629][1651669] Updated weights for policy 0, policy_version 732485 (0.0013) [2024-06-15 20:14:55,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 49151.9, 300 sec: 47874.6). Total num frames: 1500217344. Throughput: 0: 11946.6. Samples: 375099904. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:14:55,767][1648981] Avg episode reward: [(0, '1003.210')] [2024-06-15 20:14:56,395][1651669] Updated weights for policy 0, policy_version 732560 (0.0011) [2024-06-15 20:14:56,399][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000732560_1500282880.pth... [2024-06-15 20:14:56,590][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000726976_1488846848.pth [2024-06-15 20:15:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.4, 300 sec: 47541.4). Total num frames: 1500381184. Throughput: 0: 11844.3. Samples: 375166464. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:00,767][1648981] Avg episode reward: [(0, '968.770')] [2024-06-15 20:15:02,272][1651669] Updated weights for policy 0, policy_version 732626 (0.0010) [2024-06-15 20:15:04,348][1651669] Updated weights for policy 0, policy_version 732705 (0.0021) [2024-06-15 20:15:05,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 48634.3, 300 sec: 47874.6). Total num frames: 1500676096. Throughput: 0: 11923.9. Samples: 375240704. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:05,767][1648981] Avg episode reward: [(0, '969.510')] [2024-06-15 20:15:05,839][1651669] Updated weights for policy 0, policy_version 732754 (0.0012) [2024-06-15 20:15:06,553][1651669] Updated weights for policy 0, policy_version 732794 (0.0015) [2024-06-15 20:15:07,823][1651274] Signal inference workers to stop experience collection... (38450 times) [2024-06-15 20:15:07,854][1651669] InferenceWorker_p0-w0: stopping experience collection (38450 times) [2024-06-15 20:15:08,044][1651274] Signal inference workers to resume experience collection... (38450 times) [2024-06-15 20:15:08,044][1651669] InferenceWorker_p0-w0: resuming experience collection (38450 times) [2024-06-15 20:15:08,046][1651669] Updated weights for policy 0, policy_version 732848 (0.0012) [2024-06-15 20:15:10,767][1648981] Fps is (10 sec: 52424.7, 60 sec: 48059.1, 300 sec: 47988.1). Total num frames: 1500905472. Throughput: 0: 11855.4. Samples: 375276544. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:10,768][1648981] Avg episode reward: [(0, '927.350')] [2024-06-15 20:15:12,516][1651669] Updated weights for policy 0, policy_version 732896 (0.0012) [2024-06-15 20:15:13,684][1651669] Updated weights for policy 0, policy_version 732944 (0.0012) [2024-06-15 20:15:14,897][1651669] Updated weights for policy 0, policy_version 732987 (0.0012) [2024-06-15 20:15:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1501200384. Throughput: 0: 11901.1. Samples: 375355392. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:15,767][1648981] Avg episode reward: [(0, '916.400')] [2024-06-15 20:15:16,197][1651669] Updated weights for policy 0, policy_version 733027 (0.0012) [2024-06-15 20:15:18,663][1651669] Updated weights for policy 0, policy_version 733089 (0.0028) [2024-06-15 20:15:20,767][1648981] Fps is (10 sec: 52432.6, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1501429760. Throughput: 0: 11935.3. Samples: 375426048. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:20,767][1648981] Avg episode reward: [(0, '896.940')] [2024-06-15 20:15:22,858][1651669] Updated weights for policy 0, policy_version 733136 (0.0013) [2024-06-15 20:15:25,017][1651669] Updated weights for policy 0, policy_version 733201 (0.0013) [2024-06-15 20:15:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 1501659136. Throughput: 0: 11958.8. Samples: 375465984. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:25,767][1648981] Avg episode reward: [(0, '896.960')] [2024-06-15 20:15:26,301][1651669] Updated weights for policy 0, policy_version 733252 (0.0014) [2024-06-15 20:15:27,754][1651669] Updated weights for policy 0, policy_version 733312 (0.0032) [2024-06-15 20:15:30,419][1651669] Updated weights for policy 0, policy_version 733376 (0.0019) [2024-06-15 20:15:30,770][1648981] Fps is (10 sec: 52409.4, 60 sec: 48602.8, 300 sec: 47985.1). Total num frames: 1501954048. Throughput: 0: 12082.2. Samples: 375536128. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:30,771][1648981] Avg episode reward: [(0, '862.070')] [2024-06-15 20:15:34,766][1651669] Updated weights for policy 0, policy_version 733435 (0.0012) [2024-06-15 20:15:35,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 1502117888. Throughput: 0: 12083.2. Samples: 375611392. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:35,767][1648981] Avg episode reward: [(0, '844.570')] [2024-06-15 20:15:36,598][1651669] Updated weights for policy 0, policy_version 733493 (0.0012) [2024-06-15 20:15:37,916][1651669] Updated weights for policy 0, policy_version 733537 (0.0011) [2024-06-15 20:15:40,779][1651669] Updated weights for policy 0, policy_version 733600 (0.0119) [2024-06-15 20:15:40,782][1648981] Fps is (10 sec: 45819.9, 60 sec: 49141.2, 300 sec: 47873.9). Total num frames: 1502412800. Throughput: 0: 12022.1. Samples: 375641088. Policy #0 lag: (min: 95.0, avg: 193.8, max: 295.0) [2024-06-15 20:15:40,783][1648981] Avg episode reward: [(0, '813.780')] [2024-06-15 20:15:43,781][1651669] Updated weights for policy 0, policy_version 733634 (0.0013) [2024-06-15 20:15:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1502609408. Throughput: 0: 12390.4. Samples: 375724032. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:15:45,767][1648981] Avg episode reward: [(0, '827.380')] [2024-06-15 20:15:46,005][1651669] Updated weights for policy 0, policy_version 733712 (0.0011) [2024-06-15 20:15:48,139][1651669] Updated weights for policy 0, policy_version 733792 (0.0012) [2024-06-15 20:15:48,883][1651669] Updated weights for policy 0, policy_version 733824 (0.0011) [2024-06-15 20:15:50,766][1648981] Fps is (10 sec: 45948.2, 60 sec: 50244.3, 300 sec: 47763.6). Total num frames: 1502871552. Throughput: 0: 12401.8. Samples: 375798784. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:15:50,767][1648981] Avg episode reward: [(0, '827.340')] [2024-06-15 20:15:51,899][1651274] Signal inference workers to stop experience collection... (38500 times) [2024-06-15 20:15:51,942][1651669] InferenceWorker_p0-w0: stopping experience collection (38500 times) [2024-06-15 20:15:52,185][1651274] Signal inference workers to resume experience collection... (38500 times) [2024-06-15 20:15:52,187][1651669] InferenceWorker_p0-w0: resuming experience collection (38500 times) [2024-06-15 20:15:52,784][1651669] Updated weights for policy 0, policy_version 733880 (0.0013) [2024-06-15 20:15:55,403][1651669] Updated weights for policy 0, policy_version 733939 (0.0011) [2024-06-15 20:15:55,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 48603.0, 300 sec: 47985.1). Total num frames: 1503133696. Throughput: 0: 12423.7. Samples: 375835648. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:15:55,771][1648981] Avg episode reward: [(0, '828.370')] [2024-06-15 20:15:56,909][1651669] Updated weights for policy 0, policy_version 733972 (0.0011) [2024-06-15 20:15:58,843][1651669] Updated weights for policy 0, policy_version 734052 (0.0012) [2024-06-15 20:16:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.3, 300 sec: 47874.6). Total num frames: 1503395840. Throughput: 0: 12128.7. Samples: 375901184. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:00,767][1648981] Avg episode reward: [(0, '783.780')] [2024-06-15 20:16:03,496][1651669] Updated weights for policy 0, policy_version 734098 (0.0042) [2024-06-15 20:16:05,007][1651669] Updated weights for policy 0, policy_version 734176 (0.0098) [2024-06-15 20:16:05,766][1648981] Fps is (10 sec: 52449.0, 60 sec: 49698.2, 300 sec: 48318.9). Total num frames: 1503657984. Throughput: 0: 12367.7. Samples: 375982592. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:05,767][1648981] Avg episode reward: [(0, '777.730')] [2024-06-15 20:16:06,542][1651669] Updated weights for policy 0, policy_version 734224 (0.0012) [2024-06-15 20:16:07,906][1651669] Updated weights for policy 0, policy_version 734273 (0.0029) [2024-06-15 20:16:09,248][1651669] Updated weights for policy 0, policy_version 734333 (0.0012) [2024-06-15 20:16:10,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.8, 300 sec: 48318.9). Total num frames: 1503920128. Throughput: 0: 12219.7. Samples: 376015872. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:10,767][1648981] Avg episode reward: [(0, '782.450')] [2024-06-15 20:16:14,948][1651669] Updated weights for policy 0, policy_version 734384 (0.0016) [2024-06-15 20:16:15,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 47513.6, 300 sec: 47993.5). Total num frames: 1504051200. Throughput: 0: 12345.9. Samples: 376091648. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:15,767][1648981] Avg episode reward: [(0, '812.010')] [2024-06-15 20:16:15,961][1651669] Updated weights for policy 0, policy_version 734417 (0.0011) [2024-06-15 20:16:18,087][1651669] Updated weights for policy 0, policy_version 734480 (0.0013) [2024-06-15 20:16:19,986][1651669] Updated weights for policy 0, policy_version 734546 (0.0012) [2024-06-15 20:16:20,790][1648981] Fps is (10 sec: 49038.4, 60 sec: 49678.9, 300 sec: 48315.1). Total num frames: 1504411648. Throughput: 0: 12042.8. Samples: 376153600. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:20,790][1648981] Avg episode reward: [(0, '827.440')] [2024-06-15 20:16:25,319][1651669] Updated weights for policy 0, policy_version 734593 (0.0067) [2024-06-15 20:16:25,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 1504477184. Throughput: 0: 12303.7. Samples: 376194560. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:25,767][1648981] Avg episode reward: [(0, '842.520')] [2024-06-15 20:16:26,776][1651669] Updated weights for policy 0, policy_version 734657 (0.0119) [2024-06-15 20:16:27,670][1651669] Updated weights for policy 0, policy_version 734710 (0.0012) [2024-06-15 20:16:30,142][1651669] Updated weights for policy 0, policy_version 734777 (0.0017) [2024-06-15 20:16:30,766][1648981] Fps is (10 sec: 42697.6, 60 sec: 48062.7, 300 sec: 48208.3). Total num frames: 1504837632. Throughput: 0: 11946.7. Samples: 376261632. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:30,767][1648981] Avg episode reward: [(0, '810.440')] [2024-06-15 20:16:30,945][1651274] Signal inference workers to stop experience collection... (38550 times) [2024-06-15 20:16:30,996][1651669] InferenceWorker_p0-w0: stopping experience collection (38550 times) [2024-06-15 20:16:31,242][1651274] Signal inference workers to resume experience collection... (38550 times) [2024-06-15 20:16:31,243][1651669] InferenceWorker_p0-w0: resuming experience collection (38550 times) [2024-06-15 20:16:31,406][1651669] Updated weights for policy 0, policy_version 734817 (0.0012) [2024-06-15 20:16:35,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1504968704. Throughput: 0: 12105.9. Samples: 376343552. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:35,767][1648981] Avg episode reward: [(0, '813.620')] [2024-06-15 20:16:36,508][1651669] Updated weights for policy 0, policy_version 734851 (0.0029) [2024-06-15 20:16:37,914][1651669] Updated weights for policy 0, policy_version 734918 (0.0012) [2024-06-15 20:16:39,333][1651669] Updated weights for policy 0, policy_version 734989 (0.0034) [2024-06-15 20:16:40,375][1651669] Updated weights for policy 0, policy_version 735035 (0.0014) [2024-06-15 20:16:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49164.9, 300 sec: 48433.9). Total num frames: 1505361920. Throughput: 0: 12015.9. Samples: 376376320. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:40,767][1648981] Avg episode reward: [(0, '835.980')] [2024-06-15 20:16:41,870][1651669] Updated weights for policy 0, policy_version 735088 (0.0014) [2024-06-15 20:16:45,772][1648981] Fps is (10 sec: 52401.8, 60 sec: 48055.6, 300 sec: 47545.0). Total num frames: 1505492992. Throughput: 0: 12275.2. Samples: 376453632. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:45,772][1648981] Avg episode reward: [(0, '854.600')] [2024-06-15 20:16:47,194][1651669] Updated weights for policy 0, policy_version 735138 (0.0018) [2024-06-15 20:16:49,029][1651669] Updated weights for policy 0, policy_version 735217 (0.0157) [2024-06-15 20:16:50,146][1651669] Updated weights for policy 0, policy_version 735280 (0.0013) [2024-06-15 20:16:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1505886208. Throughput: 0: 12003.5. Samples: 376522752. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:50,767][1648981] Avg episode reward: [(0, '868.860')] [2024-06-15 20:16:52,284][1651669] Updated weights for policy 0, policy_version 735328 (0.0012) [2024-06-15 20:16:53,059][1651669] Updated weights for policy 0, policy_version 735360 (0.0012) [2024-06-15 20:16:55,766][1648981] Fps is (10 sec: 52455.2, 60 sec: 48062.6, 300 sec: 47764.0). Total num frames: 1506017280. Throughput: 0: 12094.6. Samples: 376560128. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:16:55,767][1648981] Avg episode reward: [(0, '863.390')] [2024-06-15 20:16:55,770][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000735360_1506017280.pth... [2024-06-15 20:16:55,805][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000729760_1494548480.pth [2024-06-15 20:16:58,259][1651669] Updated weights for policy 0, policy_version 735423 (0.0013) [2024-06-15 20:16:59,582][1651669] Updated weights for policy 0, policy_version 735474 (0.0013) [2024-06-15 20:17:00,770][1648981] Fps is (10 sec: 45860.1, 60 sec: 49149.4, 300 sec: 48318.4). Total num frames: 1506344960. Throughput: 0: 12150.6. Samples: 376638464. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:17:00,770][1648981] Avg episode reward: [(0, '819.540')] [2024-06-15 20:17:01,154][1651669] Updated weights for policy 0, policy_version 735544 (0.0098) [2024-06-15 20:17:03,428][1651669] Updated weights for policy 0, policy_version 735591 (0.0013) [2024-06-15 20:17:05,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 48059.7, 300 sec: 47874.6). Total num frames: 1506541568. Throughput: 0: 12385.5. Samples: 376710656. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:17:05,767][1648981] Avg episode reward: [(0, '846.780')] [2024-06-15 20:17:08,184][1651669] Updated weights for policy 0, policy_version 735632 (0.0015) [2024-06-15 20:17:09,837][1651669] Updated weights for policy 0, policy_version 735696 (0.0017) [2024-06-15 20:17:10,626][1651274] Signal inference workers to stop experience collection... (38600 times) [2024-06-15 20:17:10,683][1651669] InferenceWorker_p0-w0: stopping experience collection (38600 times) [2024-06-15 20:17:10,767][1648981] Fps is (10 sec: 42611.7, 60 sec: 47513.6, 300 sec: 48210.0). Total num frames: 1506770944. Throughput: 0: 12424.5. Samples: 376753664. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:17:10,767][1648981] Avg episode reward: [(0, '861.830')] [2024-06-15 20:17:10,808][1651274] Signal inference workers to resume experience collection... (38600 times) [2024-06-15 20:17:10,808][1651669] InferenceWorker_p0-w0: resuming experience collection (38600 times) [2024-06-15 20:17:10,993][1651669] Updated weights for policy 0, policy_version 735747 (0.0114) [2024-06-15 20:17:12,090][1651669] Updated weights for policy 0, policy_version 735803 (0.0037) [2024-06-15 20:17:14,079][1651669] Updated weights for policy 0, policy_version 735861 (0.0105) [2024-06-15 20:17:15,770][1648981] Fps is (10 sec: 52410.3, 60 sec: 50241.4, 300 sec: 48207.6). Total num frames: 1507065856. Throughput: 0: 12366.7. Samples: 376818176. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:17:15,770][1648981] Avg episode reward: [(0, '892.110')] [2024-06-15 20:17:19,211][1651669] Updated weights for policy 0, policy_version 735920 (0.0109) [2024-06-15 20:17:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48078.4, 300 sec: 48318.9). Total num frames: 1507295232. Throughput: 0: 12310.8. Samples: 376897536. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:17:20,767][1648981] Avg episode reward: [(0, '914.310')] [2024-06-15 20:17:20,782][1651669] Updated weights for policy 0, policy_version 735985 (0.0013) [2024-06-15 20:17:22,278][1651669] Updated weights for policy 0, policy_version 736060 (0.0129) [2024-06-15 20:17:25,023][1651669] Updated weights for policy 0, policy_version 736120 (0.0126) [2024-06-15 20:17:25,766][1648981] Fps is (10 sec: 52446.9, 60 sec: 51882.7, 300 sec: 48430.0). Total num frames: 1507590144. Throughput: 0: 12253.9. Samples: 376927744. Policy #0 lag: (min: 63.0, avg: 150.2, max: 319.0) [2024-06-15 20:17:25,767][1648981] Avg episode reward: [(0, '885.030')] [2024-06-15 20:17:30,400][1651669] Updated weights for policy 0, policy_version 736176 (0.0012) [2024-06-15 20:17:30,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.8, 300 sec: 47986.3). Total num frames: 1507721216. Throughput: 0: 12562.5. Samples: 377018880. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:17:30,767][1648981] Avg episode reward: [(0, '867.480')] [2024-06-15 20:17:32,178][1651669] Updated weights for policy 0, policy_version 736256 (0.0014) [2024-06-15 20:17:33,559][1651669] Updated weights for policy 0, policy_version 736320 (0.0032) [2024-06-15 20:17:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 51882.7, 300 sec: 48541.1). Total num frames: 1508081664. Throughput: 0: 12356.3. Samples: 377078784. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:17:35,767][1648981] Avg episode reward: [(0, '869.840')] [2024-06-15 20:17:35,885][1651669] Updated weights for policy 0, policy_version 736379 (0.0012) [2024-06-15 20:17:40,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 46967.3, 300 sec: 47763.5). Total num frames: 1508179968. Throughput: 0: 12424.5. Samples: 377119232. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:17:40,768][1648981] Avg episode reward: [(0, '922.210')] [2024-06-15 20:17:41,217][1651669] Updated weights for policy 0, policy_version 736436 (0.0013) [2024-06-15 20:17:42,840][1651669] Updated weights for policy 0, policy_version 736501 (0.0014) [2024-06-15 20:17:44,566][1651669] Updated weights for policy 0, policy_version 736576 (0.0013) [2024-06-15 20:17:45,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 50248.6, 300 sec: 48430.0). Total num frames: 1508507648. Throughput: 0: 12004.4. Samples: 377178624. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:17:45,767][1648981] Avg episode reward: [(0, '903.610')] [2024-06-15 20:17:47,155][1651669] Updated weights for policy 0, policy_version 736632 (0.0024) [2024-06-15 20:17:50,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 45875.1, 300 sec: 47985.7). Total num frames: 1508638720. Throughput: 0: 12242.5. Samples: 377261568. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:17:50,767][1648981] Avg episode reward: [(0, '901.230')] [2024-06-15 20:17:51,259][1651274] Signal inference workers to stop experience collection... (38650 times) [2024-06-15 20:17:51,323][1651669] InferenceWorker_p0-w0: stopping experience collection (38650 times) [2024-06-15 20:17:51,513][1651274] Signal inference workers to resume experience collection... (38650 times) [2024-06-15 20:17:51,514][1651669] InferenceWorker_p0-w0: resuming experience collection (38650 times) [2024-06-15 20:17:51,676][1651669] Updated weights for policy 0, policy_version 736678 (0.0025) [2024-06-15 20:17:53,066][1651669] Updated weights for policy 0, policy_version 736736 (0.0013) [2024-06-15 20:17:54,314][1651669] Updated weights for policy 0, policy_version 736784 (0.0011) [2024-06-15 20:17:55,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 50244.3, 300 sec: 48875.9). Total num frames: 1509031936. Throughput: 0: 11912.5. Samples: 377289728. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:17:55,767][1648981] Avg episode reward: [(0, '913.510')] [2024-06-15 20:17:56,808][1651669] Updated weights for policy 0, policy_version 736834 (0.0019) [2024-06-15 20:17:58,156][1651669] Updated weights for policy 0, policy_version 736891 (0.0016) [2024-06-15 20:18:00,767][1648981] Fps is (10 sec: 52423.8, 60 sec: 46969.2, 300 sec: 48318.7). Total num frames: 1509163008. Throughput: 0: 12118.0. Samples: 377363456. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:00,768][1648981] Avg episode reward: [(0, '918.130')] [2024-06-15 20:18:03,272][1651669] Updated weights for policy 0, policy_version 736944 (0.0012) [2024-06-15 20:18:04,769][1651669] Updated weights for policy 0, policy_version 737008 (0.0013) [2024-06-15 20:18:05,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1509457920. Throughput: 0: 11958.0. Samples: 377435648. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:05,767][1648981] Avg episode reward: [(0, '902.080')] [2024-06-15 20:18:06,598][1651669] Updated weights for policy 0, policy_version 737078 (0.0011) [2024-06-15 20:18:08,367][1651669] Updated weights for policy 0, policy_version 737122 (0.0011) [2024-06-15 20:18:08,967][1651669] Updated weights for policy 0, policy_version 737151 (0.0010) [2024-06-15 20:18:10,766][1648981] Fps is (10 sec: 52433.8, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1509687296. Throughput: 0: 12026.3. Samples: 377468928. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:10,767][1648981] Avg episode reward: [(0, '934.960')] [2024-06-15 20:18:13,569][1651669] Updated weights for policy 0, policy_version 737216 (0.0011) [2024-06-15 20:18:15,647][1651669] Updated weights for policy 0, policy_version 737281 (0.0014) [2024-06-15 20:18:15,775][1648981] Fps is (10 sec: 49111.9, 60 sec: 48056.0, 300 sec: 48428.7). Total num frames: 1509949440. Throughput: 0: 11751.1. Samples: 377547776. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:15,775][1648981] Avg episode reward: [(0, '937.640')] [2024-06-15 20:18:16,975][1651669] Updated weights for policy 0, policy_version 737340 (0.0043) [2024-06-15 20:18:19,754][1651669] Updated weights for policy 0, policy_version 737399 (0.0109) [2024-06-15 20:18:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1510211584. Throughput: 0: 12026.3. Samples: 377619968. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:20,767][1648981] Avg episode reward: [(0, '959.380')] [2024-06-15 20:18:23,418][1651669] Updated weights for policy 0, policy_version 737472 (0.0104) [2024-06-15 20:18:25,084][1651669] Updated weights for policy 0, policy_version 737536 (0.0012) [2024-06-15 20:18:25,766][1648981] Fps is (10 sec: 52471.4, 60 sec: 48059.7, 300 sec: 48430.7). Total num frames: 1510473728. Throughput: 0: 12003.6. Samples: 377659392. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:25,767][1648981] Avg episode reward: [(0, '962.280')] [2024-06-15 20:18:27,893][1651669] Updated weights for policy 0, policy_version 737597 (0.0013) [2024-06-15 20:18:30,767][1648981] Fps is (10 sec: 45872.1, 60 sec: 49151.5, 300 sec: 48207.7). Total num frames: 1510670336. Throughput: 0: 12276.5. Samples: 377731072. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:30,768][1648981] Avg episode reward: [(0, '967.260')] [2024-06-15 20:18:31,050][1651669] Updated weights for policy 0, policy_version 737650 (0.0025) [2024-06-15 20:18:32,655][1651274] Signal inference workers to stop experience collection... (38700 times) [2024-06-15 20:18:32,701][1651669] InferenceWorker_p0-w0: stopping experience collection (38700 times) [2024-06-15 20:18:32,978][1651274] Signal inference workers to resume experience collection... (38700 times) [2024-06-15 20:18:32,978][1651669] InferenceWorker_p0-w0: resuming experience collection (38700 times) [2024-06-15 20:18:33,823][1651669] Updated weights for policy 0, policy_version 737712 (0.0098) [2024-06-15 20:18:35,770][1648981] Fps is (10 sec: 45858.0, 60 sec: 47510.6, 300 sec: 48207.3). Total num frames: 1510932480. Throughput: 0: 12059.4. Samples: 377804288. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:35,771][1648981] Avg episode reward: [(0, '946.320')] [2024-06-15 20:18:36,034][1651669] Updated weights for policy 0, policy_version 737787 (0.0019) [2024-06-15 20:18:38,984][1651669] Updated weights for policy 0, policy_version 737851 (0.0067) [2024-06-15 20:18:40,798][1648981] Fps is (10 sec: 45732.7, 60 sec: 49126.2, 300 sec: 48426.7). Total num frames: 1511129088. Throughput: 0: 12142.9. Samples: 377836544. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:40,799][1648981] Avg episode reward: [(0, '1009.070')] [2024-06-15 20:18:42,406][1651669] Updated weights for policy 0, policy_version 737904 (0.0038) [2024-06-15 20:18:44,692][1651669] Updated weights for policy 0, policy_version 737973 (0.0014) [2024-06-15 20:18:45,766][1648981] Fps is (10 sec: 45892.4, 60 sec: 48059.7, 300 sec: 48209.1). Total num frames: 1511391232. Throughput: 0: 12049.3. Samples: 377905664. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:45,767][1648981] Avg episode reward: [(0, '1011.420')] [2024-06-15 20:18:47,136][1651669] Updated weights for policy 0, policy_version 738018 (0.0011) [2024-06-15 20:18:49,043][1651669] Updated weights for policy 0, policy_version 738064 (0.0013) [2024-06-15 20:18:50,767][1648981] Fps is (10 sec: 52595.3, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1511653376. Throughput: 0: 12060.4. Samples: 377978368. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:50,767][1648981] Avg episode reward: [(0, '994.900')] [2024-06-15 20:18:53,031][1651669] Updated weights for policy 0, policy_version 738118 (0.0012) [2024-06-15 20:18:54,828][1651669] Updated weights for policy 0, policy_version 738192 (0.0013) [2024-06-15 20:18:55,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 47513.5, 300 sec: 48541.0). Total num frames: 1511882752. Throughput: 0: 12162.8. Samples: 378016256. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:18:55,767][1648981] Avg episode reward: [(0, '980.190')] [2024-06-15 20:18:55,801][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000738240_1511915520.pth... [2024-06-15 20:18:55,855][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000732560_1500282880.pth [2024-06-15 20:18:57,726][1651669] Updated weights for policy 0, policy_version 738259 (0.0021) [2024-06-15 20:18:58,614][1651669] Updated weights for policy 0, policy_version 738304 (0.0010) [2024-06-15 20:19:00,532][1651669] Updated weights for policy 0, policy_version 738352 (0.0012) [2024-06-15 20:19:00,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 49698.8, 300 sec: 48769.0). Total num frames: 1512144896. Throughput: 0: 12108.1. Samples: 378092544. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:19:00,767][1648981] Avg episode reward: [(0, '1010.480')] [2024-06-15 20:19:03,008][1651669] Updated weights for policy 0, policy_version 738384 (0.0013) [2024-06-15 20:19:04,802][1651669] Updated weights for policy 0, policy_version 738448 (0.0011) [2024-06-15 20:19:05,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 49152.0, 300 sec: 48763.2). Total num frames: 1512407040. Throughput: 0: 12060.4. Samples: 378162688. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:19:05,767][1648981] Avg episode reward: [(0, '977.300')] [2024-06-15 20:19:07,980][1651669] Updated weights for policy 0, policy_version 738512 (0.0013) [2024-06-15 20:19:10,767][1648981] Fps is (10 sec: 42598.6, 60 sec: 48059.6, 300 sec: 48430.0). Total num frames: 1512570880. Throughput: 0: 11923.9. Samples: 378195968. Policy #0 lag: (min: 15.0, avg: 59.5, max: 239.0) [2024-06-15 20:19:10,767][1648981] Avg episode reward: [(0, '937.370')] [2024-06-15 20:19:10,830][1651669] Updated weights for policy 0, policy_version 738562 (0.0016) [2024-06-15 20:19:13,773][1651669] Updated weights for policy 0, policy_version 738626 (0.0012) [2024-06-15 20:19:15,162][1651669] Updated weights for policy 0, policy_version 738688 (0.0012) [2024-06-15 20:19:15,731][1651274] Signal inference workers to stop experience collection... (38750 times) [2024-06-15 20:19:15,764][1651669] InferenceWorker_p0-w0: stopping experience collection (38750 times) [2024-06-15 20:19:15,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 48066.0, 300 sec: 48430.0). Total num frames: 1512833024. Throughput: 0: 12015.0. Samples: 378271744. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:15,767][1648981] Avg episode reward: [(0, '957.850')] [2024-06-15 20:19:15,976][1651274] Signal inference workers to resume experience collection... (38750 times) [2024-06-15 20:19:15,977][1651669] InferenceWorker_p0-w0: resuming experience collection (38750 times) [2024-06-15 20:19:16,994][1651669] Updated weights for policy 0, policy_version 738746 (0.0012) [2024-06-15 20:19:20,372][1651669] Updated weights for policy 0, policy_version 738803 (0.0014) [2024-06-15 20:19:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1513095168. Throughput: 0: 11890.8. Samples: 378339328. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:20,767][1648981] Avg episode reward: [(0, '953.180')] [2024-06-15 20:19:22,685][1651669] Updated weights for policy 0, policy_version 738849 (0.0012) [2024-06-15 20:19:25,533][1651669] Updated weights for policy 0, policy_version 738897 (0.0012) [2024-06-15 20:19:25,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 46967.5, 300 sec: 48318.9). Total num frames: 1513291776. Throughput: 0: 12091.7. Samples: 378380288. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:25,767][1648981] Avg episode reward: [(0, '933.240')] [2024-06-15 20:19:27,044][1651669] Updated weights for policy 0, policy_version 738964 (0.0012) [2024-06-15 20:19:28,162][1651669] Updated weights for policy 0, policy_version 739006 (0.0012) [2024-06-15 20:19:30,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47514.0, 300 sec: 48096.8). Total num frames: 1513521152. Throughput: 0: 12060.4. Samples: 378448384. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:30,767][1648981] Avg episode reward: [(0, '918.190')] [2024-06-15 20:19:31,317][1651669] Updated weights for policy 0, policy_version 739067 (0.0030) [2024-06-15 20:19:33,587][1651669] Updated weights for policy 0, policy_version 739107 (0.0027) [2024-06-15 20:19:35,773][1648981] Fps is (10 sec: 45843.2, 60 sec: 46964.9, 300 sec: 48429.3). Total num frames: 1513750528. Throughput: 0: 12115.5. Samples: 378523648. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:35,774][1648981] Avg episode reward: [(0, '895.380')] [2024-06-15 20:19:37,349][1651669] Updated weights for policy 0, policy_version 739168 (0.0014) [2024-06-15 20:19:38,741][1651669] Updated weights for policy 0, policy_version 739232 (0.0017) [2024-06-15 20:19:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48085.1, 300 sec: 48318.9). Total num frames: 1514012672. Throughput: 0: 11980.8. Samples: 378555392. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:40,767][1648981] Avg episode reward: [(0, '913.050')] [2024-06-15 20:19:41,020][1651669] Updated weights for policy 0, policy_version 739282 (0.0015) [2024-06-15 20:19:41,841][1651669] Updated weights for policy 0, policy_version 739325 (0.0023) [2024-06-15 20:19:44,333][1651669] Updated weights for policy 0, policy_version 739378 (0.0013) [2024-06-15 20:19:45,766][1648981] Fps is (10 sec: 52465.6, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1514274816. Throughput: 0: 12026.4. Samples: 378633728. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:45,767][1648981] Avg episode reward: [(0, '866.840')] [2024-06-15 20:19:47,736][1651669] Updated weights for policy 0, policy_version 739424 (0.0013) [2024-06-15 20:19:49,631][1651669] Updated weights for policy 0, policy_version 739494 (0.0016) [2024-06-15 20:19:50,767][1648981] Fps is (10 sec: 52424.6, 60 sec: 48059.1, 300 sec: 48541.0). Total num frames: 1514536960. Throughput: 0: 12003.3. Samples: 378702848. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:50,768][1648981] Avg episode reward: [(0, '874.680')] [2024-06-15 20:19:50,951][1651669] Updated weights for policy 0, policy_version 739530 (0.0012) [2024-06-15 20:19:52,026][1651669] Updated weights for policy 0, policy_version 739577 (0.0022) [2024-06-15 20:19:55,706][1651669] Updated weights for policy 0, policy_version 739648 (0.0011) [2024-06-15 20:19:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48606.0, 300 sec: 48874.3). Total num frames: 1514799104. Throughput: 0: 12117.4. Samples: 378741248. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:19:55,767][1648981] Avg episode reward: [(0, '919.240')] [2024-06-15 20:19:58,587][1651274] Signal inference workers to stop experience collection... (38800 times) [2024-06-15 20:19:58,717][1651669] InferenceWorker_p0-w0: stopping experience collection (38800 times) [2024-06-15 20:19:58,962][1651274] Signal inference workers to resume experience collection... (38800 times) [2024-06-15 20:19:58,963][1651669] InferenceWorker_p0-w0: resuming experience collection (38800 times) [2024-06-15 20:19:59,209][1651669] Updated weights for policy 0, policy_version 739704 (0.0129) [2024-06-15 20:20:00,766][1648981] Fps is (10 sec: 49156.1, 60 sec: 48059.8, 300 sec: 48652.1). Total num frames: 1515028480. Throughput: 0: 11980.9. Samples: 378810880. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:00,767][1648981] Avg episode reward: [(0, '939.270')] [2024-06-15 20:20:00,877][1651669] Updated weights for policy 0, policy_version 739766 (0.0020) [2024-06-15 20:20:03,159][1651669] Updated weights for policy 0, policy_version 739814 (0.0014) [2024-06-15 20:20:05,574][1651669] Updated weights for policy 0, policy_version 739856 (0.0011) [2024-06-15 20:20:05,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.4, 300 sec: 48541.2). Total num frames: 1515225088. Throughput: 0: 12219.7. Samples: 378889216. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:05,767][1648981] Avg episode reward: [(0, '933.890')] [2024-06-15 20:20:09,573][1651669] Updated weights for policy 0, policy_version 739937 (0.0011) [2024-06-15 20:20:10,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1515454464. Throughput: 0: 12117.3. Samples: 378925568. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:10,767][1648981] Avg episode reward: [(0, '957.640')] [2024-06-15 20:20:11,620][1651669] Updated weights for policy 0, policy_version 740016 (0.0011) [2024-06-15 20:20:14,004][1651669] Updated weights for policy 0, policy_version 740065 (0.0020) [2024-06-15 20:20:15,786][1648981] Fps is (10 sec: 49056.2, 60 sec: 48044.3, 300 sec: 48426.8). Total num frames: 1515716608. Throughput: 0: 11998.3. Samples: 378988544. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:15,787][1648981] Avg episode reward: [(0, '915.520')] [2024-06-15 20:20:17,567][1651669] Updated weights for policy 0, policy_version 740144 (0.0024) [2024-06-15 20:20:19,312][1651669] Updated weights for policy 0, policy_version 740192 (0.0011) [2024-06-15 20:20:20,024][1651669] Updated weights for policy 0, policy_version 740224 (0.0011) [2024-06-15 20:20:20,766][1648981] Fps is (10 sec: 55705.7, 60 sec: 48605.9, 300 sec: 48652.1). Total num frames: 1516011520. Throughput: 0: 12221.6. Samples: 379073536. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:20,767][1648981] Avg episode reward: [(0, '936.800')] [2024-06-15 20:20:21,560][1651669] Updated weights for policy 0, policy_version 740287 (0.0015) [2024-06-15 20:20:24,522][1651669] Updated weights for policy 0, policy_version 740340 (0.0015) [2024-06-15 20:20:25,766][1648981] Fps is (10 sec: 52531.7, 60 sec: 49152.0, 300 sec: 48430.6). Total num frames: 1516240896. Throughput: 0: 12242.5. Samples: 379106304. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:25,767][1648981] Avg episode reward: [(0, '953.160')] [2024-06-15 20:20:28,491][1651669] Updated weights for policy 0, policy_version 740404 (0.0014) [2024-06-15 20:20:29,553][1651669] Updated weights for policy 0, policy_version 740438 (0.0015) [2024-06-15 20:20:30,511][1651669] Updated weights for policy 0, policy_version 740480 (0.0010) [2024-06-15 20:20:30,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.2, 300 sec: 48763.2). Total num frames: 1516503040. Throughput: 0: 12208.4. Samples: 379183104. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:30,767][1648981] Avg episode reward: [(0, '983.700')] [2024-06-15 20:20:31,857][1651669] Updated weights for policy 0, policy_version 740538 (0.0011) [2024-06-15 20:20:34,690][1651669] Updated weights for policy 0, policy_version 740592 (0.0013) [2024-06-15 20:20:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50250.1, 300 sec: 48654.8). Total num frames: 1516765184. Throughput: 0: 12299.6. Samples: 379256320. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:35,767][1648981] Avg episode reward: [(0, '978.170')] [2024-06-15 20:20:38,672][1651669] Updated weights for policy 0, policy_version 740644 (0.0021) [2024-06-15 20:20:39,992][1651669] Updated weights for policy 0, policy_version 740704 (0.0015) [2024-06-15 20:20:40,135][1651274] Signal inference workers to stop experience collection... (38850 times) [2024-06-15 20:20:40,235][1651669] InferenceWorker_p0-w0: stopping experience collection (38850 times) [2024-06-15 20:20:40,424][1651274] Signal inference workers to resume experience collection... (38850 times) [2024-06-15 20:20:40,424][1651669] InferenceWorker_p0-w0: resuming experience collection (38850 times) [2024-06-15 20:20:40,704][1651669] Updated weights for policy 0, policy_version 740729 (0.0010) [2024-06-15 20:20:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.2, 300 sec: 48763.2). Total num frames: 1516994560. Throughput: 0: 12367.7. Samples: 379297792. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:40,767][1648981] Avg episode reward: [(0, '985.030')] [2024-06-15 20:20:42,721][1651669] Updated weights for policy 0, policy_version 740797 (0.0013) [2024-06-15 20:20:45,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1517223936. Throughput: 0: 12413.2. Samples: 379369472. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:45,767][1648981] Avg episode reward: [(0, '961.460')] [2024-06-15 20:20:48,665][1651669] Updated weights for policy 0, policy_version 740880 (0.0035) [2024-06-15 20:20:50,498][1651669] Updated weights for policy 0, policy_version 740944 (0.0012) [2024-06-15 20:20:50,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48606.6, 300 sec: 48541.7). Total num frames: 1517453312. Throughput: 0: 12185.6. Samples: 379437568. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:50,767][1648981] Avg episode reward: [(0, '957.530')] [2024-06-15 20:20:53,377][1651669] Updated weights for policy 0, policy_version 741008 (0.0013) [2024-06-15 20:20:54,397][1651669] Updated weights for policy 0, policy_version 741052 (0.0013) [2024-06-15 20:20:55,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1517715456. Throughput: 0: 12219.7. Samples: 379475456. Policy #0 lag: (min: 8.0, avg: 104.6, max: 264.0) [2024-06-15 20:20:55,767][1648981] Avg episode reward: [(0, '937.350')] [2024-06-15 20:20:56,414][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000741104_1517780992.pth... [2024-06-15 20:20:56,497][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000735360_1506017280.pth [2024-06-15 20:20:56,575][1651669] Updated weights for policy 0, policy_version 741105 (0.0012) [2024-06-15 20:20:59,950][1651669] Updated weights for policy 0, policy_version 741152 (0.0013) [2024-06-15 20:21:00,768][1648981] Fps is (10 sec: 49143.4, 60 sec: 48604.5, 300 sec: 48429.7). Total num frames: 1517944832. Throughput: 0: 12497.7. Samples: 379550720. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:00,769][1648981] Avg episode reward: [(0, '967.470')] [2024-06-15 20:21:01,107][1651669] Updated weights for policy 0, policy_version 741187 (0.0011) [2024-06-15 20:21:02,228][1651669] Updated weights for policy 0, policy_version 741245 (0.0014) [2024-06-15 20:21:04,907][1651669] Updated weights for policy 0, policy_version 741300 (0.0010) [2024-06-15 20:21:05,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49698.2, 300 sec: 48430.0). Total num frames: 1518206976. Throughput: 0: 12253.9. Samples: 379624960. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:05,767][1648981] Avg episode reward: [(0, '939.280')] [2024-06-15 20:21:06,165][1651669] Updated weights for policy 0, policy_version 741332 (0.0013) [2024-06-15 20:21:06,926][1651669] Updated weights for policy 0, policy_version 741371 (0.0101) [2024-06-15 20:21:10,656][1651669] Updated weights for policy 0, policy_version 741424 (0.0012) [2024-06-15 20:21:10,770][1648981] Fps is (10 sec: 49142.0, 60 sec: 49695.0, 300 sec: 48762.6). Total num frames: 1518436352. Throughput: 0: 12378.0. Samples: 379663360. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:10,771][1648981] Avg episode reward: [(0, '921.550')] [2024-06-15 20:21:11,950][1651669] Updated weights for policy 0, policy_version 741474 (0.0030) [2024-06-15 20:21:12,609][1651669] Updated weights for policy 0, policy_version 741504 (0.0011) [2024-06-15 20:21:15,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49714.4, 300 sec: 48433.8). Total num frames: 1518698496. Throughput: 0: 12424.5. Samples: 379742208. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:15,767][1648981] Avg episode reward: [(0, '965.060')] [2024-06-15 20:21:16,400][1651669] Updated weights for policy 0, policy_version 741585 (0.0010) [2024-06-15 20:21:20,778][1648981] Fps is (10 sec: 42563.8, 60 sec: 47504.1, 300 sec: 48761.3). Total num frames: 1518862336. Throughput: 0: 12341.6. Samples: 379811840. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:20,779][1648981] Avg episode reward: [(0, '964.300')] [2024-06-15 20:21:21,004][1651669] Updated weights for policy 0, policy_version 741634 (0.0011) [2024-06-15 20:21:22,875][1651274] Signal inference workers to stop experience collection... (38900 times) [2024-06-15 20:21:22,967][1651669] InferenceWorker_p0-w0: stopping experience collection (38900 times) [2024-06-15 20:21:22,969][1651669] Updated weights for policy 0, policy_version 741718 (0.0012) [2024-06-15 20:21:23,085][1651274] Signal inference workers to resume experience collection... (38900 times) [2024-06-15 20:21:23,089][1651669] InferenceWorker_p0-w0: resuming experience collection (38900 times) [2024-06-15 20:21:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1519124480. Throughput: 0: 12037.7. Samples: 379839488. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:25,767][1648981] Avg episode reward: [(0, '947.020')] [2024-06-15 20:21:26,349][1651669] Updated weights for policy 0, policy_version 741792 (0.0013) [2024-06-15 20:21:27,405][1651669] Updated weights for policy 0, policy_version 741829 (0.0012) [2024-06-15 20:21:28,493][1651669] Updated weights for policy 0, policy_version 741882 (0.0038) [2024-06-15 20:21:30,766][1648981] Fps is (10 sec: 52492.0, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1519386624. Throughput: 0: 12083.2. Samples: 379913216. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:30,767][1648981] Avg episode reward: [(0, '905.450')] [2024-06-15 20:21:33,627][1651669] Updated weights for policy 0, policy_version 741942 (0.0020) [2024-06-15 20:21:34,932][1651669] Updated weights for policy 0, policy_version 742006 (0.0020) [2024-06-15 20:21:35,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1519648768. Throughput: 0: 12071.8. Samples: 379980800. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:35,767][1648981] Avg episode reward: [(0, '909.170')] [2024-06-15 20:21:37,399][1651669] Updated weights for policy 0, policy_version 742050 (0.0032) [2024-06-15 20:21:38,817][1651669] Updated weights for policy 0, policy_version 742112 (0.0012) [2024-06-15 20:21:40,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 48605.8, 300 sec: 48875.2). Total num frames: 1519910912. Throughput: 0: 12094.6. Samples: 380019712. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:40,767][1648981] Avg episode reward: [(0, '879.660')] [2024-06-15 20:21:43,179][1651669] Updated weights for policy 0, policy_version 742160 (0.0012) [2024-06-15 20:21:44,108][1651669] Updated weights for policy 0, policy_version 742208 (0.0011) [2024-06-15 20:21:45,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49151.8, 300 sec: 48430.0). Total num frames: 1520173056. Throughput: 0: 12106.4. Samples: 380095488. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:45,767][1648981] Avg episode reward: [(0, '863.580')] [2024-06-15 20:21:48,014][1651669] Updated weights for policy 0, policy_version 742288 (0.0014) [2024-06-15 20:21:50,277][1651669] Updated weights for policy 0, policy_version 742384 (0.0014) [2024-06-15 20:21:50,717][1651669] Updated weights for policy 0, policy_version 742400 (0.0011) [2024-06-15 20:21:50,771][1648981] Fps is (10 sec: 52405.0, 60 sec: 49694.4, 300 sec: 48873.6). Total num frames: 1520435200. Throughput: 0: 11786.2. Samples: 380155392. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:50,772][1648981] Avg episode reward: [(0, '910.460')] [2024-06-15 20:21:55,123][1651669] Updated weights for policy 0, policy_version 742454 (0.0013) [2024-06-15 20:21:55,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 47513.6, 300 sec: 48208.4). Total num frames: 1520566272. Throughput: 0: 11981.8. Samples: 380202496. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:21:55,767][1648981] Avg episode reward: [(0, '969.120')] [2024-06-15 20:21:56,879][1651669] Updated weights for policy 0, policy_version 742501 (0.0012) [2024-06-15 20:21:58,288][1651669] Updated weights for policy 0, policy_version 742549 (0.0012) [2024-06-15 20:21:59,707][1651669] Updated weights for policy 0, policy_version 742608 (0.0127) [2024-06-15 20:22:00,777][1648981] Fps is (10 sec: 49124.3, 60 sec: 49691.1, 300 sec: 48761.5). Total num frames: 1520926720. Throughput: 0: 11830.2. Samples: 380274688. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:00,777][1648981] Avg episode reward: [(0, '988.210')] [2024-06-15 20:22:00,880][1651669] Updated weights for policy 0, policy_version 742654 (0.0011) [2024-06-15 20:22:05,087][1651274] Signal inference workers to stop experience collection... (38950 times) [2024-06-15 20:22:05,153][1651669] InferenceWorker_p0-w0: stopping experience collection (38950 times) [2024-06-15 20:22:05,241][1651274] Signal inference workers to resume experience collection... (38950 times) [2024-06-15 20:22:05,242][1651669] InferenceWorker_p0-w0: resuming experience collection (38950 times) [2024-06-15 20:22:05,445][1651669] Updated weights for policy 0, policy_version 742714 (0.0011) [2024-06-15 20:22:05,767][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 1521090560. Throughput: 0: 12131.9. Samples: 380357632. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:05,767][1648981] Avg episode reward: [(0, '987.070')] [2024-06-15 20:22:06,980][1651669] Updated weights for policy 0, policy_version 742768 (0.0011) [2024-06-15 20:22:08,410][1651669] Updated weights for policy 0, policy_version 742816 (0.0013) [2024-06-15 20:22:10,287][1651669] Updated weights for policy 0, policy_version 742888 (0.0015) [2024-06-15 20:22:10,786][1648981] Fps is (10 sec: 55651.2, 60 sec: 50776.7, 300 sec: 48871.6). Total num frames: 1521483776. Throughput: 0: 12225.6. Samples: 380389888. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:10,787][1648981] Avg episode reward: [(0, '985.230')] [2024-06-15 20:22:15,018][1651669] Updated weights for policy 0, policy_version 742928 (0.0023) [2024-06-15 20:22:15,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 48059.5, 300 sec: 48429.9). Total num frames: 1521582080. Throughput: 0: 12344.8. Samples: 380468736. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:15,767][1648981] Avg episode reward: [(0, '971.520')] [2024-06-15 20:22:15,940][1651669] Updated weights for policy 0, policy_version 742976 (0.0014) [2024-06-15 20:22:17,660][1651669] Updated weights for policy 0, policy_version 743030 (0.0011) [2024-06-15 20:22:19,218][1651669] Updated weights for policy 0, policy_version 743088 (0.0017) [2024-06-15 20:22:20,735][1651669] Updated weights for policy 0, policy_version 743164 (0.0013) [2024-06-15 20:22:20,766][1648981] Fps is (10 sec: 49250.6, 60 sec: 51893.0, 300 sec: 48763.2). Total num frames: 1521975296. Throughput: 0: 12435.9. Samples: 380540416. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:20,767][1648981] Avg episode reward: [(0, '1011.370')] [2024-06-15 20:22:25,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 49151.9, 300 sec: 48652.1). Total num frames: 1522073600. Throughput: 0: 12618.0. Samples: 380587520. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:25,767][1648981] Avg episode reward: [(0, '988.310')] [2024-06-15 20:22:25,834][1651669] Updated weights for policy 0, policy_version 743216 (0.0016) [2024-06-15 20:22:27,465][1651669] Updated weights for policy 0, policy_version 743265 (0.0010) [2024-06-15 20:22:29,127][1651669] Updated weights for policy 0, policy_version 743330 (0.0009) [2024-06-15 20:22:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 50790.3, 300 sec: 48652.1). Total num frames: 1522434048. Throughput: 0: 12390.4. Samples: 380653056. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:30,767][1648981] Avg episode reward: [(0, '1005.830')] [2024-06-15 20:22:30,837][1651669] Updated weights for policy 0, policy_version 743377 (0.0013) [2024-06-15 20:22:35,341][1651669] Updated weights for policy 0, policy_version 743440 (0.0010) [2024-06-15 20:22:35,778][1648981] Fps is (10 sec: 52367.2, 60 sec: 49142.3, 300 sec: 48872.4). Total num frames: 1522597888. Throughput: 0: 12980.0. Samples: 380739584. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:35,779][1648981] Avg episode reward: [(0, '1036.190')] [2024-06-15 20:22:37,807][1651669] Updated weights for policy 0, policy_version 743490 (0.0013) [2024-06-15 20:22:39,368][1651669] Updated weights for policy 0, policy_version 743557 (0.0022) [2024-06-15 20:22:40,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1522925568. Throughput: 0: 12618.0. Samples: 380770304. Policy #0 lag: (min: 9.0, avg: 99.8, max: 265.0) [2024-06-15 20:22:40,767][1648981] Avg episode reward: [(0, '1029.430')] [2024-06-15 20:22:41,290][1651669] Updated weights for policy 0, policy_version 743622 (0.0016) [2024-06-15 20:22:42,415][1651669] Updated weights for policy 0, policy_version 743680 (0.0011) [2024-06-15 20:22:45,362][1651274] Signal inference workers to stop experience collection... (39000 times) [2024-06-15 20:22:45,487][1651669] InferenceWorker_p0-w0: stopping experience collection (39000 times) [2024-06-15 20:22:45,703][1651274] Signal inference workers to resume experience collection... (39000 times) [2024-06-15 20:22:45,704][1651669] InferenceWorker_p0-w0: resuming experience collection (39000 times) [2024-06-15 20:22:45,766][1648981] Fps is (10 sec: 52490.5, 60 sec: 49152.1, 300 sec: 49096.5). Total num frames: 1523122176. Throughput: 0: 12814.3. Samples: 380851200. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:22:45,767][1648981] Avg episode reward: [(0, '1052.960')] [2024-06-15 20:22:46,150][1651669] Updated weights for policy 0, policy_version 743728 (0.0013) [2024-06-15 20:22:48,772][1651669] Updated weights for policy 0, policy_version 743768 (0.0011) [2024-06-15 20:22:50,786][1648981] Fps is (10 sec: 45784.8, 60 sec: 49139.6, 300 sec: 48648.9). Total num frames: 1523384320. Throughput: 0: 12430.5. Samples: 380917248. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:22:50,787][1648981] Avg episode reward: [(0, '1026.100')] [2024-06-15 20:22:50,890][1651669] Updated weights for policy 0, policy_version 743856 (0.0012) [2024-06-15 20:22:52,426][1651669] Updated weights for policy 0, policy_version 743890 (0.0029) [2024-06-15 20:22:53,067][1651669] Updated weights for policy 0, policy_version 743928 (0.0031) [2024-06-15 20:22:55,709][1651669] Updated weights for policy 0, policy_version 743954 (0.0018) [2024-06-15 20:22:55,772][1648981] Fps is (10 sec: 49122.0, 60 sec: 50785.3, 300 sec: 48984.5). Total num frames: 1523613696. Throughput: 0: 12542.2. Samples: 380954112. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:22:55,773][1648981] Avg episode reward: [(0, '1049.130')] [2024-06-15 20:22:56,332][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000743984_1523679232.pth... [2024-06-15 20:22:56,373][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000738240_1511915520.pth [2024-06-15 20:22:58,981][1651669] Updated weights for policy 0, policy_version 744006 (0.0012) [2024-06-15 20:23:00,155][1651669] Updated weights for policy 0, policy_version 744064 (0.0018) [2024-06-15 20:23:00,766][1648981] Fps is (10 sec: 49249.3, 60 sec: 49160.4, 300 sec: 48874.3). Total num frames: 1523875840. Throughput: 0: 12640.8. Samples: 381037568. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:00,767][1648981] Avg episode reward: [(0, '1040.340')] [2024-06-15 20:23:01,700][1651669] Updated weights for policy 0, policy_version 744118 (0.0012) [2024-06-15 20:23:03,458][1651669] Updated weights for policy 0, policy_version 744162 (0.0011) [2024-06-15 20:23:05,766][1648981] Fps is (10 sec: 49182.1, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1524105216. Throughput: 0: 12572.4. Samples: 381106176. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:05,767][1648981] Avg episode reward: [(0, '1039.970')] [2024-06-15 20:23:07,156][1651669] Updated weights for policy 0, policy_version 744244 (0.0012) [2024-06-15 20:23:10,131][1651669] Updated weights for policy 0, policy_version 744279 (0.0011) [2024-06-15 20:23:10,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 47529.4, 300 sec: 48764.5). Total num frames: 1524334592. Throughput: 0: 12379.0. Samples: 381144576. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:10,767][1648981] Avg episode reward: [(0, '1036.240')] [2024-06-15 20:23:11,637][1651669] Updated weights for policy 0, policy_version 744352 (0.0013) [2024-06-15 20:23:14,033][1651669] Updated weights for policy 0, policy_version 744416 (0.0039) [2024-06-15 20:23:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50790.6, 300 sec: 48874.3). Total num frames: 1524629504. Throughput: 0: 12367.7. Samples: 381209600. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:15,767][1648981] Avg episode reward: [(0, '1034.320')] [2024-06-15 20:23:17,615][1651669] Updated weights for policy 0, policy_version 744480 (0.0011) [2024-06-15 20:23:20,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46967.4, 300 sec: 48541.1). Total num frames: 1524793344. Throughput: 0: 12222.9. Samples: 381289472. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:20,767][1648981] Avg episode reward: [(0, '1055.300')] [2024-06-15 20:23:20,940][1651669] Updated weights for policy 0, policy_version 744544 (0.0143) [2024-06-15 20:23:22,379][1651669] Updated weights for policy 0, policy_version 744599 (0.0120) [2024-06-15 20:23:24,719][1651669] Updated weights for policy 0, policy_version 744645 (0.0012) [2024-06-15 20:23:25,377][1651274] Signal inference workers to stop experience collection... (39050 times) [2024-06-15 20:23:25,440][1651669] InferenceWorker_p0-w0: stopping experience collection (39050 times) [2024-06-15 20:23:25,670][1651274] Signal inference workers to resume experience collection... (39050 times) [2024-06-15 20:23:25,671][1651669] InferenceWorker_p0-w0: resuming experience collection (39050 times) [2024-06-15 20:23:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50790.4, 300 sec: 48985.5). Total num frames: 1525121024. Throughput: 0: 12219.7. Samples: 381320192. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:25,767][1648981] Avg episode reward: [(0, '1050.950')] [2024-06-15 20:23:25,999][1651669] Updated weights for policy 0, policy_version 744704 (0.0088) [2024-06-15 20:23:28,801][1651669] Updated weights for policy 0, policy_version 744752 (0.0015) [2024-06-15 20:23:30,198][1651669] Updated weights for policy 0, policy_version 744771 (0.0025) [2024-06-15 20:23:30,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 48763.8). Total num frames: 1525317632. Throughput: 0: 12162.9. Samples: 381398528. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:30,767][1648981] Avg episode reward: [(0, '1044.980')] [2024-06-15 20:23:31,355][1651669] Updated weights for policy 0, policy_version 744832 (0.0032) [2024-06-15 20:23:33,688][1651669] Updated weights for policy 0, policy_version 744892 (0.0014) [2024-06-15 20:23:35,767][1648981] Fps is (10 sec: 42595.3, 60 sec: 49161.1, 300 sec: 48879.4). Total num frames: 1525547008. Throughput: 0: 12316.0. Samples: 381471232. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:35,768][1648981] Avg episode reward: [(0, '1000.110')] [2024-06-15 20:23:36,418][1651669] Updated weights for policy 0, policy_version 744930 (0.0031) [2024-06-15 20:23:36,939][1651669] Updated weights for policy 0, policy_version 744958 (0.0010) [2024-06-15 20:23:38,305][1651669] Updated weights for policy 0, policy_version 744992 (0.0011) [2024-06-15 20:23:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.8, 300 sec: 48985.4). Total num frames: 1525841920. Throughput: 0: 12414.8. Samples: 381512704. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:40,767][1648981] Avg episode reward: [(0, '957.330')] [2024-06-15 20:23:40,847][1651669] Updated weights for policy 0, policy_version 745043 (0.0014) [2024-06-15 20:23:41,776][1651669] Updated weights for policy 0, policy_version 745084 (0.0011) [2024-06-15 20:23:43,688][1651669] Updated weights for policy 0, policy_version 745129 (0.0011) [2024-06-15 20:23:45,766][1648981] Fps is (10 sec: 52432.6, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1526071296. Throughput: 0: 12197.0. Samples: 381586432. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:45,767][1648981] Avg episode reward: [(0, '955.070')] [2024-06-15 20:23:46,070][1651669] Updated weights for policy 0, policy_version 745155 (0.0012) [2024-06-15 20:23:47,562][1651669] Updated weights for policy 0, policy_version 745215 (0.0010) [2024-06-15 20:23:49,606][1651669] Updated weights for policy 0, policy_version 745272 (0.0012) [2024-06-15 20:23:50,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 49168.0, 300 sec: 48985.4). Total num frames: 1526333440. Throughput: 0: 12231.0. Samples: 381656576. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:50,767][1648981] Avg episode reward: [(0, '977.400')] [2024-06-15 20:23:52,056][1651669] Updated weights for policy 0, policy_version 745314 (0.0016) [2024-06-15 20:23:54,210][1651669] Updated weights for policy 0, policy_version 745361 (0.0011) [2024-06-15 20:23:54,923][1651669] Updated weights for policy 0, policy_version 745408 (0.0011) [2024-06-15 20:23:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49703.2, 300 sec: 48985.4). Total num frames: 1526595584. Throughput: 0: 12310.8. Samples: 381698560. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:23:55,767][1648981] Avg episode reward: [(0, '976.340')] [2024-06-15 20:23:57,431][1651669] Updated weights for policy 0, policy_version 745456 (0.0013) [2024-06-15 20:23:57,962][1651669] Updated weights for policy 0, policy_version 745472 (0.0014) [2024-06-15 20:23:59,663][1651669] Updated weights for policy 0, policy_version 745529 (0.0012) [2024-06-15 20:24:00,775][1648981] Fps is (10 sec: 52383.9, 60 sec: 49690.8, 300 sec: 48983.9). Total num frames: 1526857728. Throughput: 0: 12479.0. Samples: 381771264. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:24:00,776][1648981] Avg episode reward: [(0, '976.310')] [2024-06-15 20:24:02,221][1651669] Updated weights for policy 0, policy_version 745595 (0.0013) [2024-06-15 20:24:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.2, 300 sec: 49207.6). Total num frames: 1527087104. Throughput: 0: 12413.2. Samples: 381848064. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:24:05,767][1648981] Avg episode reward: [(0, '953.010')] [2024-06-15 20:24:05,805][1651669] Updated weights for policy 0, policy_version 745651 (0.0013) [2024-06-15 20:24:07,073][1651669] Updated weights for policy 0, policy_version 745680 (0.0020) [2024-06-15 20:24:08,373][1651669] Updated weights for policy 0, policy_version 745728 (0.0012) [2024-06-15 20:24:09,555][1651274] Signal inference workers to stop experience collection... (39100 times) [2024-06-15 20:24:09,595][1651669] InferenceWorker_p0-w0: stopping experience collection (39100 times) [2024-06-15 20:24:09,772][1651274] Signal inference workers to resume experience collection... (39100 times) [2024-06-15 20:24:09,772][1651669] InferenceWorker_p0-w0: resuming experience collection (39100 times) [2024-06-15 20:24:10,327][1651669] Updated weights for policy 0, policy_version 745787 (0.0042) [2024-06-15 20:24:10,767][1648981] Fps is (10 sec: 52474.5, 60 sec: 50790.4, 300 sec: 49318.6). Total num frames: 1527382016. Throughput: 0: 12470.0. Samples: 381881344. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:24:10,767][1648981] Avg episode reward: [(0, '938.340')] [2024-06-15 20:24:12,192][1651669] Updated weights for policy 0, policy_version 745825 (0.0010) [2024-06-15 20:24:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1527513088. Throughput: 0: 12367.6. Samples: 381955072. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:24:15,767][1648981] Avg episode reward: [(0, '940.350')] [2024-06-15 20:24:16,902][1651669] Updated weights for policy 0, policy_version 745904 (0.0012) [2024-06-15 20:24:17,442][1651669] Updated weights for policy 0, policy_version 745920 (0.0010) [2024-06-15 20:24:19,398][1651669] Updated weights for policy 0, policy_version 745977 (0.0077) [2024-06-15 20:24:20,235][1651669] Updated weights for policy 0, policy_version 746004 (0.0012) [2024-06-15 20:24:20,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 51336.6, 300 sec: 49429.7). Total num frames: 1527873536. Throughput: 0: 12345.1. Samples: 382026752. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:24:20,767][1648981] Avg episode reward: [(0, '928.840')] [2024-06-15 20:24:22,018][1651669] Updated weights for policy 0, policy_version 746065 (0.0015) [2024-06-15 20:24:23,135][1651669] Updated weights for policy 0, policy_version 746110 (0.0013) [2024-06-15 20:24:25,779][1648981] Fps is (10 sec: 52364.0, 60 sec: 48595.8, 300 sec: 49205.5). Total num frames: 1528037376. Throughput: 0: 12170.9. Samples: 382060544. Policy #0 lag: (min: 8.0, avg: 102.4, max: 264.0) [2024-06-15 20:24:25,779][1648981] Avg episode reward: [(0, '923.780')] [2024-06-15 20:24:28,540][1651669] Updated weights for policy 0, policy_version 746165 (0.0012) [2024-06-15 20:24:29,737][1651669] Updated weights for policy 0, policy_version 746208 (0.0018) [2024-06-15 20:24:30,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 49698.1, 300 sec: 49319.8). Total num frames: 1528299520. Throughput: 0: 12356.2. Samples: 382142464. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:24:30,768][1648981] Avg episode reward: [(0, '949.920')] [2024-06-15 20:24:31,565][1651669] Updated weights for policy 0, policy_version 746288 (0.0013) [2024-06-15 20:24:33,125][1651669] Updated weights for policy 0, policy_version 746325 (0.0013) [2024-06-15 20:24:35,766][1648981] Fps is (10 sec: 52493.2, 60 sec: 50244.8, 300 sec: 49318.6). Total num frames: 1528561664. Throughput: 0: 12219.8. Samples: 382206464. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:24:35,767][1648981] Avg episode reward: [(0, '965.220')] [2024-06-15 20:24:38,535][1651669] Updated weights for policy 0, policy_version 746386 (0.0012) [2024-06-15 20:24:40,437][1651669] Updated weights for policy 0, policy_version 746433 (0.0011) [2024-06-15 20:24:40,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 48059.8, 300 sec: 48985.4). Total num frames: 1528725504. Throughput: 0: 12162.9. Samples: 382245888. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:24:40,767][1648981] Avg episode reward: [(0, '912.250')] [2024-06-15 20:24:42,691][1651669] Updated weights for policy 0, policy_version 746519 (0.0105) [2024-06-15 20:24:44,398][1651669] Updated weights for policy 0, policy_version 746581 (0.0012) [2024-06-15 20:24:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.3, 300 sec: 49318.8). Total num frames: 1529085952. Throughput: 0: 11949.0. Samples: 382308864. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:24:45,767][1648981] Avg episode reward: [(0, '919.830')] [2024-06-15 20:24:49,084][1651669] Updated weights for policy 0, policy_version 746627 (0.0011) [2024-06-15 20:24:50,267][1651669] Updated weights for policy 0, policy_version 746678 (0.0013) [2024-06-15 20:24:50,790][1648981] Fps is (10 sec: 49035.0, 60 sec: 48040.9, 300 sec: 48870.4). Total num frames: 1529217024. Throughput: 0: 12008.6. Samples: 382388736. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:24:50,791][1648981] Avg episode reward: [(0, '950.520')] [2024-06-15 20:24:52,170][1651669] Updated weights for policy 0, policy_version 746720 (0.0024) [2024-06-15 20:24:52,999][1651274] Signal inference workers to stop experience collection... (39150 times) [2024-06-15 20:24:53,051][1651669] InferenceWorker_p0-w0: stopping experience collection (39150 times) [2024-06-15 20:24:53,271][1651274] Signal inference workers to resume experience collection... (39150 times) [2024-06-15 20:24:53,273][1651669] InferenceWorker_p0-w0: resuming experience collection (39150 times) [2024-06-15 20:24:53,811][1651669] Updated weights for policy 0, policy_version 746785 (0.0012) [2024-06-15 20:24:54,708][1651669] Updated weights for policy 0, policy_version 746819 (0.0012) [2024-06-15 20:24:55,767][1648981] Fps is (10 sec: 52426.9, 60 sec: 50244.0, 300 sec: 49429.6). Total num frames: 1529610240. Throughput: 0: 12083.1. Samples: 382425088. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:24:55,767][1648981] Avg episode reward: [(0, '945.160')] [2024-06-15 20:24:55,799][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000746880_1529610240.pth... [2024-06-15 20:24:55,846][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000741104_1517780992.pth [2024-06-15 20:24:59,953][1651669] Updated weights for policy 0, policy_version 746881 (0.0012) [2024-06-15 20:25:00,766][1648981] Fps is (10 sec: 42700.0, 60 sec: 46428.2, 300 sec: 48874.3). Total num frames: 1529643008. Throughput: 0: 12197.0. Samples: 382503936. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:00,767][1648981] Avg episode reward: [(0, '956.990')] [2024-06-15 20:25:01,333][1651669] Updated weights for policy 0, policy_version 746938 (0.0011) [2024-06-15 20:25:03,020][1651669] Updated weights for policy 0, policy_version 746992 (0.0013) [2024-06-15 20:25:05,741][1651669] Updated weights for policy 0, policy_version 747074 (0.0094) [2024-06-15 20:25:05,766][1648981] Fps is (10 sec: 39322.5, 60 sec: 48605.8, 300 sec: 49318.6). Total num frames: 1530003456. Throughput: 0: 12026.3. Samples: 382567936. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:05,767][1648981] Avg episode reward: [(0, '958.720')] [2024-06-15 20:25:06,744][1651669] Updated weights for policy 0, policy_version 747132 (0.0012) [2024-06-15 20:25:10,818][1648981] Fps is (10 sec: 48897.9, 60 sec: 45835.6, 300 sec: 48868.9). Total num frames: 1530134528. Throughput: 0: 12209.0. Samples: 382610432. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:10,819][1648981] Avg episode reward: [(0, '964.960')] [2024-06-15 20:25:11,866][1651669] Updated weights for policy 0, policy_version 747189 (0.0027) [2024-06-15 20:25:13,733][1651669] Updated weights for policy 0, policy_version 747248 (0.0020) [2024-06-15 20:25:15,417][1651669] Updated weights for policy 0, policy_version 747316 (0.0011) [2024-06-15 20:25:15,785][1648981] Fps is (10 sec: 52343.1, 60 sec: 50230.5, 300 sec: 49204.8). Total num frames: 1530527744. Throughput: 0: 11851.3. Samples: 382675968. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:15,790][1648981] Avg episode reward: [(0, '964.430')] [2024-06-15 20:25:17,183][1651669] Updated weights for policy 0, policy_version 747360 (0.0010) [2024-06-15 20:25:20,766][1648981] Fps is (10 sec: 52702.6, 60 sec: 46421.3, 300 sec: 48874.3). Total num frames: 1530658816. Throughput: 0: 12128.7. Samples: 382752256. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:20,767][1648981] Avg episode reward: [(0, '977.750')] [2024-06-15 20:25:22,273][1651669] Updated weights for policy 0, policy_version 747408 (0.0014) [2024-06-15 20:25:24,091][1651669] Updated weights for policy 0, policy_version 747472 (0.0012) [2024-06-15 20:25:25,514][1651669] Updated weights for policy 0, policy_version 747524 (0.0011) [2024-06-15 20:25:25,766][1648981] Fps is (10 sec: 42668.7, 60 sec: 48615.9, 300 sec: 48985.4). Total num frames: 1530953728. Throughput: 0: 11992.2. Samples: 382785536. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:25,767][1648981] Avg episode reward: [(0, '961.600')] [2024-06-15 20:25:26,296][1651669] Updated weights for policy 0, policy_version 747578 (0.0010) [2024-06-15 20:25:28,617][1651669] Updated weights for policy 0, policy_version 747638 (0.0015) [2024-06-15 20:25:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1531183104. Throughput: 0: 12174.2. Samples: 382856704. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:30,767][1648981] Avg episode reward: [(0, '937.430')] [2024-06-15 20:25:34,084][1651669] Updated weights for policy 0, policy_version 747696 (0.0012) [2024-06-15 20:25:34,294][1651274] Signal inference workers to stop experience collection... (39200 times) [2024-06-15 20:25:34,339][1651669] InferenceWorker_p0-w0: stopping experience collection (39200 times) [2024-06-15 20:25:34,611][1651274] Signal inference workers to resume experience collection... (39200 times) [2024-06-15 20:25:34,612][1651669] InferenceWorker_p0-w0: resuming experience collection (39200 times) [2024-06-15 20:25:35,774][1648981] Fps is (10 sec: 42565.2, 60 sec: 46961.4, 300 sec: 48761.9). Total num frames: 1531379712. Throughput: 0: 12064.7. Samples: 382931456. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:35,775][1648981] Avg episode reward: [(0, '944.630')] [2024-06-15 20:25:36,078][1651669] Updated weights for policy 0, policy_version 747761 (0.0011) [2024-06-15 20:25:37,448][1651669] Updated weights for policy 0, policy_version 747832 (0.0030) [2024-06-15 20:25:39,041][1651669] Updated weights for policy 0, policy_version 747874 (0.0012) [2024-06-15 20:25:40,768][1648981] Fps is (10 sec: 52426.7, 60 sec: 49697.8, 300 sec: 49096.4). Total num frames: 1531707392. Throughput: 0: 12037.7. Samples: 382966784. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:40,771][1648981] Avg episode reward: [(0, '994.480')] [2024-06-15 20:25:44,175][1651669] Updated weights for policy 0, policy_version 747905 (0.0013) [2024-06-15 20:25:45,390][1651669] Updated weights for policy 0, policy_version 747954 (0.0010) [2024-06-15 20:25:45,766][1648981] Fps is (10 sec: 45910.5, 60 sec: 45875.1, 300 sec: 48763.2). Total num frames: 1531838464. Throughput: 0: 11992.1. Samples: 383043584. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:45,767][1648981] Avg episode reward: [(0, '1013.660')] [2024-06-15 20:25:47,642][1651669] Updated weights for policy 0, policy_version 748044 (0.0014) [2024-06-15 20:25:49,880][1651669] Updated weights for policy 0, policy_version 748113 (0.0142) [2024-06-15 20:25:50,766][1648981] Fps is (10 sec: 49153.9, 60 sec: 49717.8, 300 sec: 49096.5). Total num frames: 1532198912. Throughput: 0: 11935.3. Samples: 383105024. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:50,767][1648981] Avg episode reward: [(0, '993.480')] [2024-06-15 20:25:50,775][1651669] Updated weights for policy 0, policy_version 748157 (0.0011) [2024-06-15 20:25:55,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 44237.0, 300 sec: 48541.4). Total num frames: 1532264448. Throughput: 0: 11994.6. Samples: 383149568. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:25:55,767][1648981] Avg episode reward: [(0, '977.630')] [2024-06-15 20:25:56,588][1651669] Updated weights for policy 0, policy_version 748224 (0.0036) [2024-06-15 20:25:58,138][1651669] Updated weights for policy 0, policy_version 748284 (0.0012) [2024-06-15 20:25:59,666][1651669] Updated weights for policy 0, policy_version 748344 (0.0012) [2024-06-15 20:26:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1532657664. Throughput: 0: 11860.0. Samples: 383209472. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:26:00,767][1648981] Avg episode reward: [(0, '1006.220')] [2024-06-15 20:26:01,562][1651669] Updated weights for policy 0, policy_version 748414 (0.0013) [2024-06-15 20:26:05,773][1648981] Fps is (10 sec: 49127.5, 60 sec: 45871.5, 300 sec: 48540.9). Total num frames: 1532755968. Throughput: 0: 11877.1. Samples: 383286784. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:26:05,777][1648981] Avg episode reward: [(0, '1002.060')] [2024-06-15 20:26:07,329][1651669] Updated weights for policy 0, policy_version 748474 (0.0011) [2024-06-15 20:26:08,611][1651669] Updated weights for policy 0, policy_version 748537 (0.0012) [2024-06-15 20:26:10,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49194.6, 300 sec: 48763.2). Total num frames: 1533083648. Throughput: 0: 11901.2. Samples: 383321088. Policy #0 lag: (min: 63.0, avg: 144.5, max: 319.0) [2024-06-15 20:26:10,767][1648981] Avg episode reward: [(0, '947.030')] [2024-06-15 20:26:11,276][1651669] Updated weights for policy 0, policy_version 748594 (0.0011) [2024-06-15 20:26:12,071][1651274] Signal inference workers to stop experience collection... (39250 times) [2024-06-15 20:26:12,109][1651669] InferenceWorker_p0-w0: stopping experience collection (39250 times) [2024-06-15 20:26:12,296][1651274] Signal inference workers to resume experience collection... (39250 times) [2024-06-15 20:26:12,301][1651669] InferenceWorker_p0-w0: resuming experience collection (39250 times) [2024-06-15 20:26:12,777][1651669] Updated weights for policy 0, policy_version 748661 (0.0014) [2024-06-15 20:26:15,766][1648981] Fps is (10 sec: 52455.3, 60 sec: 45887.9, 300 sec: 48876.3). Total num frames: 1533280256. Throughput: 0: 11935.3. Samples: 383393792. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:15,767][1648981] Avg episode reward: [(0, '935.420')] [2024-06-15 20:26:17,088][1651669] Updated weights for policy 0, policy_version 748704 (0.0016) [2024-06-15 20:26:19,246][1651669] Updated weights for policy 0, policy_version 748772 (0.0012) [2024-06-15 20:26:20,309][1651669] Updated weights for policy 0, policy_version 748802 (0.0013) [2024-06-15 20:26:20,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48605.9, 300 sec: 48985.4). Total num frames: 1533575168. Throughput: 0: 11982.9. Samples: 383470592. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:20,767][1648981] Avg episode reward: [(0, '908.790')] [2024-06-15 20:26:22,192][1651669] Updated weights for policy 0, policy_version 748880 (0.0013) [2024-06-15 20:26:23,115][1651669] Updated weights for policy 0, policy_version 748928 (0.0013) [2024-06-15 20:26:25,766][1648981] Fps is (10 sec: 52428.0, 60 sec: 47513.5, 300 sec: 48874.3). Total num frames: 1533804544. Throughput: 0: 11787.5. Samples: 383497216. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:25,767][1648981] Avg episode reward: [(0, '937.910')] [2024-06-15 20:26:28,968][1651669] Updated weights for policy 0, policy_version 748982 (0.0012) [2024-06-15 20:26:30,408][1651669] Updated weights for policy 0, policy_version 749044 (0.0012) [2024-06-15 20:26:30,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48059.5, 300 sec: 48874.3). Total num frames: 1534066688. Throughput: 0: 11889.8. Samples: 383578624. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:30,768][1648981] Avg episode reward: [(0, '959.780')] [2024-06-15 20:26:31,853][1651669] Updated weights for policy 0, policy_version 749105 (0.0139) [2024-06-15 20:26:33,475][1651669] Updated weights for policy 0, policy_version 749152 (0.0011) [2024-06-15 20:26:35,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 49158.4, 300 sec: 48874.3). Total num frames: 1534328832. Throughput: 0: 12037.7. Samples: 383646720. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:35,767][1648981] Avg episode reward: [(0, '934.290')] [2024-06-15 20:26:38,514][1651669] Updated weights for policy 0, policy_version 749200 (0.0012) [2024-06-15 20:26:39,638][1651669] Updated weights for policy 0, policy_version 749247 (0.0021) [2024-06-15 20:26:40,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46421.6, 300 sec: 48541.1). Total num frames: 1534492672. Throughput: 0: 11969.4. Samples: 383688192. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:40,767][1648981] Avg episode reward: [(0, '931.600')] [2024-06-15 20:26:41,495][1651669] Updated weights for policy 0, policy_version 749308 (0.0012) [2024-06-15 20:26:43,180][1651669] Updated weights for policy 0, policy_version 749368 (0.0015) [2024-06-15 20:26:45,130][1651669] Updated weights for policy 0, policy_version 749429 (0.0159) [2024-06-15 20:26:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.4, 300 sec: 48875.1). Total num frames: 1534853120. Throughput: 0: 12026.3. Samples: 383750656. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:45,767][1648981] Avg episode reward: [(0, '923.120')] [2024-06-15 20:26:49,757][1651669] Updated weights for policy 0, policy_version 749475 (0.0011) [2024-06-15 20:26:50,772][1648981] Fps is (10 sec: 49127.1, 60 sec: 46417.3, 300 sec: 48873.5). Total num frames: 1534984192. Throughput: 0: 11946.6. Samples: 383824384. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:50,772][1648981] Avg episode reward: [(0, '889.350')] [2024-06-15 20:26:52,200][1651669] Updated weights for policy 0, policy_version 749536 (0.0012) [2024-06-15 20:26:53,325][1651669] Updated weights for policy 0, policy_version 749584 (0.0021) [2024-06-15 20:26:54,178][1651669] Updated weights for policy 0, policy_version 749632 (0.0011) [2024-06-15 20:26:55,538][1651274] Signal inference workers to stop experience collection... (39300 times) [2024-06-15 20:26:55,566][1651669] InferenceWorker_p0-w0: stopping experience collection (39300 times) [2024-06-15 20:26:55,760][1651274] Signal inference workers to resume experience collection... (39300 times) [2024-06-15 20:26:55,761][1651669] InferenceWorker_p0-w0: resuming experience collection (39300 times) [2024-06-15 20:26:55,767][1648981] Fps is (10 sec: 45873.4, 60 sec: 50790.1, 300 sec: 48764.9). Total num frames: 1535311872. Throughput: 0: 11969.3. Samples: 383859712. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:26:55,767][1648981] Avg episode reward: [(0, '899.230')] [2024-06-15 20:26:56,150][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000749680_1535344640.pth... [2024-06-15 20:26:56,190][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000743984_1523679232.pth [2024-06-15 20:26:56,480][1651669] Updated weights for policy 0, policy_version 749690 (0.0010) [2024-06-15 20:27:00,671][1651669] Updated weights for policy 0, policy_version 749744 (0.0099) [2024-06-15 20:27:00,766][1648981] Fps is (10 sec: 49177.4, 60 sec: 46967.5, 300 sec: 48763.2). Total num frames: 1535475712. Throughput: 0: 12174.2. Samples: 383941632. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:00,767][1648981] Avg episode reward: [(0, '911.690')] [2024-06-15 20:27:02,661][1651669] Updated weights for policy 0, policy_version 749766 (0.0011) [2024-06-15 20:27:03,652][1651669] Updated weights for policy 0, policy_version 749811 (0.0016) [2024-06-15 20:27:05,116][1651669] Updated weights for policy 0, policy_version 749884 (0.0028) [2024-06-15 20:27:05,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 50248.4, 300 sec: 48433.3). Total num frames: 1535770624. Throughput: 0: 11935.3. Samples: 384007680. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:05,767][1648981] Avg episode reward: [(0, '942.510')] [2024-06-15 20:27:06,807][1651669] Updated weights for policy 0, policy_version 749942 (0.0082) [2024-06-15 20:27:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.5, 300 sec: 48652.2). Total num frames: 1535934464. Throughput: 0: 12265.3. Samples: 384049152. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:10,767][1648981] Avg episode reward: [(0, '879.680')] [2024-06-15 20:27:11,182][1651669] Updated weights for policy 0, policy_version 749987 (0.0143) [2024-06-15 20:27:14,221][1651669] Updated weights for policy 0, policy_version 750051 (0.0012) [2024-06-15 20:27:15,648][1651669] Updated weights for policy 0, policy_version 750114 (0.0015) [2024-06-15 20:27:15,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 49151.8, 300 sec: 48318.9). Total num frames: 1536229376. Throughput: 0: 12037.7. Samples: 384120320. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:15,767][1648981] Avg episode reward: [(0, '883.170')] [2024-06-15 20:27:17,571][1651669] Updated weights for policy 0, policy_version 750176 (0.0011) [2024-06-15 20:27:20,767][1648981] Fps is (10 sec: 49150.6, 60 sec: 47513.4, 300 sec: 48652.1). Total num frames: 1536425984. Throughput: 0: 12162.8. Samples: 384194048. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:20,767][1648981] Avg episode reward: [(0, '903.470')] [2024-06-15 20:27:21,241][1651669] Updated weights for policy 0, policy_version 750224 (0.0020) [2024-06-15 20:27:25,156][1651669] Updated weights for policy 0, policy_version 750306 (0.0012) [2024-06-15 20:27:25,787][1648981] Fps is (10 sec: 45779.4, 60 sec: 48042.9, 300 sec: 48315.5). Total num frames: 1536688128. Throughput: 0: 12020.7. Samples: 384229376. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:25,788][1648981] Avg episode reward: [(0, '881.140')] [2024-06-15 20:27:26,499][1651669] Updated weights for policy 0, policy_version 750371 (0.0020) [2024-06-15 20:27:28,261][1651669] Updated weights for policy 0, policy_version 750434 (0.0012) [2024-06-15 20:27:30,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 48059.9, 300 sec: 48654.1). Total num frames: 1536950272. Throughput: 0: 12026.3. Samples: 384291840. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:30,767][1648981] Avg episode reward: [(0, '870.980')] [2024-06-15 20:27:32,771][1651669] Updated weights for policy 0, policy_version 750482 (0.0014) [2024-06-15 20:27:33,796][1651669] Updated weights for policy 0, policy_version 750522 (0.0009) [2024-06-15 20:27:35,768][1648981] Fps is (10 sec: 45971.8, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 1537146880. Throughput: 0: 12232.5. Samples: 384374784. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:35,769][1648981] Avg episode reward: [(0, '882.590')] [2024-06-15 20:27:36,750][1651669] Updated weights for policy 0, policy_version 750594 (0.0012) [2024-06-15 20:27:37,054][1651274] Signal inference workers to stop experience collection... (39350 times) [2024-06-15 20:27:37,100][1651669] InferenceWorker_p0-w0: stopping experience collection (39350 times) [2024-06-15 20:27:37,236][1651274] Signal inference workers to resume experience collection... (39350 times) [2024-06-15 20:27:37,238][1651669] InferenceWorker_p0-w0: resuming experience collection (39350 times) [2024-06-15 20:27:38,942][1651669] Updated weights for policy 0, policy_version 750672 (0.0013) [2024-06-15 20:27:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.2, 300 sec: 48652.2). Total num frames: 1537474560. Throughput: 0: 12060.5. Samples: 384402432. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:40,767][1648981] Avg episode reward: [(0, '914.220')] [2024-06-15 20:27:43,731][1651669] Updated weights for policy 0, policy_version 750723 (0.0039) [2024-06-15 20:27:45,081][1651669] Updated weights for policy 0, policy_version 750781 (0.0114) [2024-06-15 20:27:45,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 45875.1, 300 sec: 48211.1). Total num frames: 1537605632. Throughput: 0: 11787.4. Samples: 384472064. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:45,767][1648981] Avg episode reward: [(0, '909.650')] [2024-06-15 20:27:48,011][1651669] Updated weights for policy 0, policy_version 750834 (0.0011) [2024-06-15 20:27:49,492][1651669] Updated weights for policy 0, policy_version 750912 (0.0011) [2024-06-15 20:27:50,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48610.0, 300 sec: 48431.0). Total num frames: 1537900544. Throughput: 0: 11798.8. Samples: 384538624. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:50,767][1648981] Avg episode reward: [(0, '929.950')] [2024-06-15 20:27:55,222][1651669] Updated weights for policy 0, policy_version 750983 (0.0039) [2024-06-15 20:27:55,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 45329.3, 300 sec: 47985.7). Total num frames: 1538031616. Throughput: 0: 11741.9. Samples: 384577536. Policy #0 lag: (min: 85.0, avg: 187.2, max: 277.0) [2024-06-15 20:27:55,767][1648981] Avg episode reward: [(0, '961.360')] [2024-06-15 20:27:56,726][1651669] Updated weights for policy 0, policy_version 751040 (0.0017) [2024-06-15 20:28:00,165][1651669] Updated weights for policy 0, policy_version 751120 (0.0013) [2024-06-15 20:28:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47513.5, 300 sec: 48207.8). Total num frames: 1538326528. Throughput: 0: 11741.9. Samples: 384648704. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:00,767][1648981] Avg episode reward: [(0, '968.690')] [2024-06-15 20:28:01,761][1651669] Updated weights for policy 0, policy_version 751184 (0.0088) [2024-06-15 20:28:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 48096.8). Total num frames: 1538523136. Throughput: 0: 11605.4. Samples: 384716288. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:05,767][1648981] Avg episode reward: [(0, '975.570')] [2024-06-15 20:28:06,038][1651669] Updated weights for policy 0, policy_version 751249 (0.0010) [2024-06-15 20:28:06,932][1651669] Updated weights for policy 0, policy_version 751292 (0.0030) [2024-06-15 20:28:10,767][1648981] Fps is (10 sec: 39320.3, 60 sec: 46421.0, 300 sec: 47763.5). Total num frames: 1538719744. Throughput: 0: 11656.2. Samples: 384753664. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:10,767][1648981] Avg episode reward: [(0, '1010.110')] [2024-06-15 20:28:11,064][1651669] Updated weights for policy 0, policy_version 751348 (0.0131) [2024-06-15 20:28:12,935][1651669] Updated weights for policy 0, policy_version 751425 (0.0094) [2024-06-15 20:28:13,947][1651669] Updated weights for policy 0, policy_version 751481 (0.0012) [2024-06-15 20:28:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.6, 300 sec: 48318.9). Total num frames: 1539047424. Throughput: 0: 11764.6. Samples: 384821248. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:15,767][1648981] Avg episode reward: [(0, '987.390')] [2024-06-15 20:28:17,106][1651669] Updated weights for policy 0, policy_version 751524 (0.0018) [2024-06-15 20:28:20,787][1648981] Fps is (10 sec: 45785.0, 60 sec: 45860.1, 300 sec: 47649.2). Total num frames: 1539178496. Throughput: 0: 11827.6. Samples: 384907264. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:20,787][1648981] Avg episode reward: [(0, '978.870')] [2024-06-15 20:28:20,955][1651274] Signal inference workers to stop experience collection... (39400 times) [2024-06-15 20:28:21,009][1651669] Updated weights for policy 0, policy_version 751569 (0.0080) [2024-06-15 20:28:21,041][1651669] InferenceWorker_p0-w0: stopping experience collection (39400 times) [2024-06-15 20:28:21,301][1651274] Signal inference workers to resume experience collection... (39400 times) [2024-06-15 20:28:21,301][1651669] InferenceWorker_p0-w0: resuming experience collection (39400 times) [2024-06-15 20:28:22,852][1651669] Updated weights for policy 0, policy_version 751648 (0.0012) [2024-06-15 20:28:24,173][1651669] Updated weights for policy 0, policy_version 751712 (0.0090) [2024-06-15 20:28:25,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48076.7, 300 sec: 48318.9). Total num frames: 1539571712. Throughput: 0: 11810.2. Samples: 384933888. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:25,767][1648981] Avg episode reward: [(0, '934.370')] [2024-06-15 20:28:28,332][1651669] Updated weights for policy 0, policy_version 751792 (0.0013) [2024-06-15 20:28:30,798][1648981] Fps is (10 sec: 52367.4, 60 sec: 45850.9, 300 sec: 47980.6). Total num frames: 1539702784. Throughput: 0: 11824.5. Samples: 385004544. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:30,799][1648981] Avg episode reward: [(0, '946.320')] [2024-06-15 20:28:31,926][1651669] Updated weights for policy 0, policy_version 751840 (0.0013) [2024-06-15 20:28:34,152][1651669] Updated weights for policy 0, policy_version 751920 (0.0012) [2024-06-15 20:28:35,452][1651669] Updated weights for policy 0, policy_version 751989 (0.0013) [2024-06-15 20:28:35,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.1, 300 sec: 48318.9). Total num frames: 1540096000. Throughput: 0: 11867.0. Samples: 385072640. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:35,767][1648981] Avg episode reward: [(0, '925.830')] [2024-06-15 20:28:37,462][1651669] Updated weights for policy 0, policy_version 752020 (0.0010) [2024-06-15 20:28:40,767][1648981] Fps is (10 sec: 52592.8, 60 sec: 45874.7, 300 sec: 47985.6). Total num frames: 1540227072. Throughput: 0: 11992.0. Samples: 385117184. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:40,768][1648981] Avg episode reward: [(0, '947.230')] [2024-06-15 20:28:41,402][1651669] Updated weights for policy 0, policy_version 752080 (0.0014) [2024-06-15 20:28:42,487][1651669] Updated weights for policy 0, policy_version 752129 (0.0011) [2024-06-15 20:28:44,220][1651669] Updated weights for policy 0, policy_version 752193 (0.0018) [2024-06-15 20:28:45,557][1651669] Updated weights for policy 0, policy_version 752252 (0.0035) [2024-06-15 20:28:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1540620288. Throughput: 0: 12128.7. Samples: 385194496. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:45,767][1648981] Avg episode reward: [(0, '939.710')] [2024-06-15 20:28:49,059][1651669] Updated weights for policy 0, policy_version 752311 (0.0012) [2024-06-15 20:28:50,766][1648981] Fps is (10 sec: 52432.0, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1540751360. Throughput: 0: 12322.1. Samples: 385270784. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:50,767][1648981] Avg episode reward: [(0, '949.100')] [2024-06-15 20:28:51,920][1651669] Updated weights for policy 0, policy_version 752337 (0.0010) [2024-06-15 20:28:52,643][1651669] Updated weights for policy 0, policy_version 752377 (0.0012) [2024-06-15 20:28:54,607][1651669] Updated weights for policy 0, policy_version 752438 (0.0013) [2024-06-15 20:28:55,802][1648981] Fps is (10 sec: 45711.1, 60 sec: 50760.1, 300 sec: 48203.4). Total num frames: 1541079040. Throughput: 0: 12392.0. Samples: 385311744. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:28:55,803][1648981] Avg episode reward: [(0, '939.480')] [2024-06-15 20:28:56,121][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000752496_1541111808.pth... [2024-06-15 20:28:56,193][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000746880_1529610240.pth [2024-06-15 20:28:56,388][1651669] Updated weights for policy 0, policy_version 752506 (0.0045) [2024-06-15 20:28:58,023][1651274] Signal inference workers to stop experience collection... (39450 times) [2024-06-15 20:28:58,079][1651669] InferenceWorker_p0-w0: stopping experience collection (39450 times) [2024-06-15 20:28:58,373][1651274] Signal inference workers to resume experience collection... (39450 times) [2024-06-15 20:28:58,374][1651669] InferenceWorker_p0-w0: resuming experience collection (39450 times) [2024-06-15 20:28:59,126][1651669] Updated weights for policy 0, policy_version 752568 (0.0014) [2024-06-15 20:29:00,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 48096.7). Total num frames: 1541275648. Throughput: 0: 12379.0. Samples: 385378304. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:00,767][1648981] Avg episode reward: [(0, '942.230')] [2024-06-15 20:29:02,846][1651669] Updated weights for policy 0, policy_version 752628 (0.0012) [2024-06-15 20:29:04,654][1651669] Updated weights for policy 0, policy_version 752672 (0.0012) [2024-06-15 20:29:05,766][1648981] Fps is (10 sec: 46040.3, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1541537792. Throughput: 0: 12350.4. Samples: 385462784. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:05,767][1648981] Avg episode reward: [(0, '963.650')] [2024-06-15 20:29:06,332][1651669] Updated weights for policy 0, policy_version 752736 (0.0012) [2024-06-15 20:29:09,264][1651669] Updated weights for policy 0, policy_version 752816 (0.0015) [2024-06-15 20:29:10,767][1648981] Fps is (10 sec: 52426.9, 60 sec: 51336.5, 300 sec: 48429.9). Total num frames: 1541799936. Throughput: 0: 12492.6. Samples: 385496064. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:10,768][1648981] Avg episode reward: [(0, '963.650')] [2024-06-15 20:29:13,228][1651669] Updated weights for policy 0, policy_version 752890 (0.0014) [2024-06-15 20:29:15,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 49148.9, 300 sec: 47874.0). Total num frames: 1541996544. Throughput: 0: 12523.3. Samples: 385567744. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:15,771][1648981] Avg episode reward: [(0, '930.360')] [2024-06-15 20:29:15,861][1651669] Updated weights for policy 0, policy_version 752931 (0.0016) [2024-06-15 20:29:17,512][1651669] Updated weights for policy 0, policy_version 752976 (0.0011) [2024-06-15 20:29:19,177][1651669] Updated weights for policy 0, policy_version 753041 (0.0020) [2024-06-15 20:29:20,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 52446.3, 300 sec: 48432.0). Total num frames: 1542324224. Throughput: 0: 12435.9. Samples: 385632256. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:20,767][1648981] Avg episode reward: [(0, '936.980')] [2024-06-15 20:29:22,991][1651669] Updated weights for policy 0, policy_version 753089 (0.0011) [2024-06-15 20:29:24,488][1651669] Updated weights for policy 0, policy_version 753151 (0.0013) [2024-06-15 20:29:25,766][1648981] Fps is (10 sec: 45892.8, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1542455296. Throughput: 0: 12424.7. Samples: 385676288. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:25,767][1648981] Avg episode reward: [(0, '865.290')] [2024-06-15 20:29:27,659][1651669] Updated weights for policy 0, policy_version 753216 (0.0027) [2024-06-15 20:29:28,959][1651669] Updated weights for policy 0, policy_version 753264 (0.0013) [2024-06-15 20:29:30,261][1651669] Updated weights for policy 0, policy_version 753312 (0.0048) [2024-06-15 20:29:30,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 51910.2, 300 sec: 48318.9). Total num frames: 1542815744. Throughput: 0: 12253.9. Samples: 385745920. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:30,767][1648981] Avg episode reward: [(0, '846.860')] [2024-06-15 20:29:35,277][1651669] Updated weights for policy 0, policy_version 753392 (0.0033) [2024-06-15 20:29:35,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48059.6, 300 sec: 48318.9). Total num frames: 1542979584. Throughput: 0: 12105.9. Samples: 385815552. Policy #0 lag: (min: 36.0, avg: 123.4, max: 292.0) [2024-06-15 20:29:35,767][1648981] Avg episode reward: [(0, '822.800')] [2024-06-15 20:29:38,206][1651669] Updated weights for policy 0, policy_version 753442 (0.0012) [2024-06-15 20:29:39,904][1651669] Updated weights for policy 0, policy_version 753506 (0.0036) [2024-06-15 20:29:40,211][1651274] Signal inference workers to stop experience collection... (39500 times) [2024-06-15 20:29:40,264][1651669] InferenceWorker_p0-w0: stopping experience collection (39500 times) [2024-06-15 20:29:40,522][1651274] Signal inference workers to resume experience collection... (39500 times) [2024-06-15 20:29:40,523][1651669] InferenceWorker_p0-w0: resuming experience collection (39500 times) [2024-06-15 20:29:40,794][1648981] Fps is (10 sec: 42480.1, 60 sec: 50221.5, 300 sec: 47981.2). Total num frames: 1543241728. Throughput: 0: 12153.6. Samples: 385858560. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:29:40,795][1648981] Avg episode reward: [(0, '845.810')] [2024-06-15 20:29:41,460][1651669] Updated weights for policy 0, policy_version 753571 (0.0012) [2024-06-15 20:29:44,831][1651669] Updated weights for policy 0, policy_version 753603 (0.0012) [2024-06-15 20:29:45,774][1648981] Fps is (10 sec: 49113.2, 60 sec: 47507.2, 300 sec: 48321.5). Total num frames: 1543471104. Throughput: 0: 12228.9. Samples: 385928704. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:29:45,775][1648981] Avg episode reward: [(0, '868.710')] [2024-06-15 20:29:46,047][1651669] Updated weights for policy 0, policy_version 753664 (0.0040) [2024-06-15 20:29:49,854][1651669] Updated weights for policy 0, policy_version 753744 (0.0015) [2024-06-15 20:29:50,766][1648981] Fps is (10 sec: 49289.2, 60 sec: 49698.2, 300 sec: 47874.7). Total num frames: 1543733248. Throughput: 0: 11969.4. Samples: 386001408. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:29:50,767][1648981] Avg episode reward: [(0, '835.060')] [2024-06-15 20:29:51,281][1651669] Updated weights for policy 0, policy_version 753795 (0.0011) [2024-06-15 20:29:52,527][1651669] Updated weights for policy 0, policy_version 753852 (0.0013) [2024-06-15 20:29:55,766][1648981] Fps is (10 sec: 45912.3, 60 sec: 47542.0, 300 sec: 48430.0). Total num frames: 1543929856. Throughput: 0: 12015.0. Samples: 386036736. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:29:55,767][1648981] Avg episode reward: [(0, '813.800')] [2024-06-15 20:29:56,417][1651669] Updated weights for policy 0, policy_version 753914 (0.0014) [2024-06-15 20:29:59,900][1651669] Updated weights for policy 0, policy_version 753968 (0.0013) [2024-06-15 20:30:00,670][1651669] Updated weights for policy 0, policy_version 754000 (0.0048) [2024-06-15 20:30:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1544192000. Throughput: 0: 12175.3. Samples: 386115584. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:00,767][1648981] Avg episode reward: [(0, '826.750')] [2024-06-15 20:30:02,622][1651669] Updated weights for policy 0, policy_version 754067 (0.0014) [2024-06-15 20:30:03,519][1651669] Updated weights for policy 0, policy_version 754112 (0.0012) [2024-06-15 20:30:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.8, 300 sec: 48438.5). Total num frames: 1544421376. Throughput: 0: 12208.4. Samples: 386181632. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:05,767][1648981] Avg episode reward: [(0, '821.310')] [2024-06-15 20:30:07,373][1651669] Updated weights for policy 0, policy_version 754175 (0.0025) [2024-06-15 20:30:10,770][1648981] Fps is (10 sec: 42582.2, 60 sec: 46964.8, 300 sec: 47765.6). Total num frames: 1544617984. Throughput: 0: 12002.5. Samples: 386216448. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:10,771][1648981] Avg episode reward: [(0, '824.630')] [2024-06-15 20:30:11,116][1651669] Updated weights for policy 0, policy_version 754230 (0.0019) [2024-06-15 20:30:12,495][1651669] Updated weights for policy 0, policy_version 754288 (0.0151) [2024-06-15 20:30:13,762][1651669] Updated weights for policy 0, policy_version 754352 (0.0012) [2024-06-15 20:30:15,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49155.0, 300 sec: 48430.0). Total num frames: 1544945664. Throughput: 0: 12014.9. Samples: 386286592. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:15,767][1648981] Avg episode reward: [(0, '851.060')] [2024-06-15 20:30:17,327][1651669] Updated weights for policy 0, policy_version 754400 (0.0013) [2024-06-15 20:30:20,622][1651669] Updated weights for policy 0, policy_version 754436 (0.0010) [2024-06-15 20:30:20,766][1648981] Fps is (10 sec: 45892.7, 60 sec: 45875.2, 300 sec: 47874.6). Total num frames: 1545076736. Throughput: 0: 12390.4. Samples: 386373120. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:20,767][1648981] Avg episode reward: [(0, '850.810')] [2024-06-15 20:30:21,228][1651274] Signal inference workers to stop experience collection... (39550 times) [2024-06-15 20:30:21,278][1651669] InferenceWorker_p0-w0: stopping experience collection (39550 times) [2024-06-15 20:30:21,422][1651274] Signal inference workers to resume experience collection... (39550 times) [2024-06-15 20:30:21,423][1651669] InferenceWorker_p0-w0: resuming experience collection (39550 times) [2024-06-15 20:30:21,674][1651669] Updated weights for policy 0, policy_version 754489 (0.0011) [2024-06-15 20:30:23,376][1651669] Updated weights for policy 0, policy_version 754550 (0.0225) [2024-06-15 20:30:24,607][1651669] Updated weights for policy 0, policy_version 754612 (0.0012) [2024-06-15 20:30:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 1545469952. Throughput: 0: 12079.3. Samples: 386401792. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:25,767][1648981] Avg episode reward: [(0, '843.100')] [2024-06-15 20:30:28,204][1651669] Updated weights for policy 0, policy_version 754685 (0.0012) [2024-06-15 20:30:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 48209.1). Total num frames: 1545601024. Throughput: 0: 12153.7. Samples: 386475520. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:30,767][1648981] Avg episode reward: [(0, '844.850')] [2024-06-15 20:30:32,828][1651669] Updated weights for policy 0, policy_version 754760 (0.0094) [2024-06-15 20:30:34,347][1651669] Updated weights for policy 0, policy_version 754832 (0.0012) [2024-06-15 20:30:35,790][1648981] Fps is (10 sec: 52304.8, 60 sec: 50224.5, 300 sec: 48426.1). Total num frames: 1545994240. Throughput: 0: 12042.7. Samples: 386543616. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:35,791][1648981] Avg episode reward: [(0, '834.900')] [2024-06-15 20:30:37,265][1651669] Updated weights for policy 0, policy_version 754881 (0.0013) [2024-06-15 20:30:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48082.0, 300 sec: 48430.0). Total num frames: 1546125312. Throughput: 0: 12060.5. Samples: 386579456. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:40,767][1648981] Avg episode reward: [(0, '829.160')] [2024-06-15 20:30:42,727][1651669] Updated weights for policy 0, policy_version 754962 (0.0015) [2024-06-15 20:30:44,004][1651669] Updated weights for policy 0, policy_version 755024 (0.0011) [2024-06-15 20:30:45,767][1648981] Fps is (10 sec: 42699.5, 60 sec: 49158.5, 300 sec: 48207.8). Total num frames: 1546420224. Throughput: 0: 12117.3. Samples: 386660864. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:45,767][1648981] Avg episode reward: [(0, '823.500')] [2024-06-15 20:30:45,809][1651669] Updated weights for policy 0, policy_version 755089 (0.0013) [2024-06-15 20:30:46,806][1651669] Updated weights for policy 0, policy_version 755136 (0.0013) [2024-06-15 20:30:49,597][1651669] Updated weights for policy 0, policy_version 755193 (0.0013) [2024-06-15 20:30:50,783][1648981] Fps is (10 sec: 52339.0, 60 sec: 48592.0, 300 sec: 48760.4). Total num frames: 1546649600. Throughput: 0: 12146.8. Samples: 386728448. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:50,784][1648981] Avg episode reward: [(0, '840.800')] [2024-06-15 20:30:54,257][1651669] Updated weights for policy 0, policy_version 755236 (0.0012) [2024-06-15 20:30:55,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1546878976. Throughput: 0: 12300.4. Samples: 386769920. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:30:55,767][1648981] Avg episode reward: [(0, '871.930')] [2024-06-15 20:30:55,899][1651669] Updated weights for policy 0, policy_version 755319 (0.0012) [2024-06-15 20:30:56,066][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000755328_1546911744.pth... [2024-06-15 20:30:56,146][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000749680_1535344640.pth [2024-06-15 20:30:56,151][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000755328_1546911744.pth [2024-06-15 20:30:57,260][1651669] Updated weights for policy 0, policy_version 755363 (0.0010) [2024-06-15 20:30:58,879][1651274] Signal inference workers to stop experience collection... (39600 times) [2024-06-15 20:30:58,929][1651669] InferenceWorker_p0-w0: stopping experience collection (39600 times) [2024-06-15 20:30:59,147][1651274] Signal inference workers to resume experience collection... (39600 times) [2024-06-15 20:30:59,149][1651669] InferenceWorker_p0-w0: resuming experience collection (39600 times) [2024-06-15 20:30:59,151][1651669] Updated weights for policy 0, policy_version 755408 (0.0013) [2024-06-15 20:31:00,226][1651669] Updated weights for policy 0, policy_version 755451 (0.0011) [2024-06-15 20:31:00,766][1648981] Fps is (10 sec: 52519.0, 60 sec: 49698.2, 300 sec: 48875.1). Total num frames: 1547173888. Throughput: 0: 12253.9. Samples: 386838016. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:31:00,767][1648981] Avg episode reward: [(0, '907.030')] [2024-06-15 20:31:05,619][1651669] Updated weights for policy 0, policy_version 755521 (0.0011) [2024-06-15 20:31:05,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1547304960. Throughput: 0: 12083.2. Samples: 386916864. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:31:05,767][1648981] Avg episode reward: [(0, '896.300')] [2024-06-15 20:31:06,631][1651669] Updated weights for policy 0, policy_version 755575 (0.0011) [2024-06-15 20:31:08,168][1651669] Updated weights for policy 0, policy_version 755632 (0.0032) [2024-06-15 20:31:09,535][1651669] Updated weights for policy 0, policy_version 755665 (0.0011) [2024-06-15 20:31:10,686][1651669] Updated weights for policy 0, policy_version 755711 (0.0020) [2024-06-15 20:31:10,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 51339.6, 300 sec: 48874.2). Total num frames: 1547698176. Throughput: 0: 12242.4. Samples: 386952704. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:31:10,767][1648981] Avg episode reward: [(0, '895.640')] [2024-06-15 20:31:15,599][1651669] Updated weights for policy 0, policy_version 755772 (0.0013) [2024-06-15 20:31:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1547829248. Throughput: 0: 12356.3. Samples: 387031552. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:31:15,767][1648981] Avg episode reward: [(0, '846.470')] [2024-06-15 20:31:16,684][1651669] Updated weights for policy 0, policy_version 755824 (0.0091) [2024-06-15 20:31:17,886][1651669] Updated weights for policy 0, policy_version 755856 (0.0022) [2024-06-15 20:31:18,937][1651669] Updated weights for policy 0, policy_version 755899 (0.0010) [2024-06-15 20:31:20,766][1648981] Fps is (10 sec: 45876.4, 60 sec: 51336.5, 300 sec: 48652.2). Total num frames: 1548156928. Throughput: 0: 12351.4. Samples: 387099136. Policy #0 lag: (min: 77.0, avg: 167.7, max: 303.0) [2024-06-15 20:31:20,767][1648981] Avg episode reward: [(0, '859.090')] [2024-06-15 20:31:20,927][1651669] Updated weights for policy 0, policy_version 755939 (0.0011) [2024-06-15 20:31:25,786][1648981] Fps is (10 sec: 42514.0, 60 sec: 46406.0, 300 sec: 48093.6). Total num frames: 1548255232. Throughput: 0: 12362.2. Samples: 387136000. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:25,787][1648981] Avg episode reward: [(0, '833.420')] [2024-06-15 20:31:25,907][1651669] Updated weights for policy 0, policy_version 755986 (0.0013) [2024-06-15 20:31:27,188][1651669] Updated weights for policy 0, policy_version 756051 (0.0012) [2024-06-15 20:31:28,055][1651669] Updated weights for policy 0, policy_version 756093 (0.0015) [2024-06-15 20:31:29,549][1651669] Updated weights for policy 0, policy_version 756147 (0.0011) [2024-06-15 20:31:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1548615680. Throughput: 0: 12049.1. Samples: 387203072. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:30,767][1648981] Avg episode reward: [(0, '909.900')] [2024-06-15 20:31:31,226][1651669] Updated weights for policy 0, policy_version 756178 (0.0011) [2024-06-15 20:31:35,767][1648981] Fps is (10 sec: 49249.4, 60 sec: 45893.4, 300 sec: 48318.9). Total num frames: 1548746752. Throughput: 0: 12429.2. Samples: 387287552. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:35,767][1648981] Avg episode reward: [(0, '906.600')] [2024-06-15 20:31:36,548][1651669] Updated weights for policy 0, policy_version 756260 (0.0012) [2024-06-15 20:31:38,044][1651669] Updated weights for policy 0, policy_version 756336 (0.0111) [2024-06-15 20:31:39,474][1651274] Signal inference workers to stop experience collection... (39650 times) [2024-06-15 20:31:39,520][1651669] InferenceWorker_p0-w0: stopping experience collection (39650 times) [2024-06-15 20:31:39,793][1651274] Signal inference workers to resume experience collection... (39650 times) [2024-06-15 20:31:39,794][1651669] InferenceWorker_p0-w0: resuming experience collection (39650 times) [2024-06-15 20:31:39,797][1651669] Updated weights for policy 0, policy_version 756384 (0.0080) [2024-06-15 20:31:40,770][1648981] Fps is (10 sec: 52408.5, 60 sec: 50241.0, 300 sec: 48429.3). Total num frames: 1549139968. Throughput: 0: 12127.7. Samples: 387315712. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:40,771][1648981] Avg episode reward: [(0, '869.570')] [2024-06-15 20:31:42,910][1651669] Updated weights for policy 0, policy_version 756450 (0.0033) [2024-06-15 20:31:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 47513.7, 300 sec: 48430.8). Total num frames: 1549271040. Throughput: 0: 12242.5. Samples: 387388928. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:45,767][1648981] Avg episode reward: [(0, '856.590')] [2024-06-15 20:31:47,211][1651669] Updated weights for policy 0, policy_version 756496 (0.0032) [2024-06-15 20:31:48,565][1651669] Updated weights for policy 0, policy_version 756564 (0.0013) [2024-06-15 20:31:49,220][1651669] Updated weights for policy 0, policy_version 756606 (0.0015) [2024-06-15 20:31:50,766][1648981] Fps is (10 sec: 42614.7, 60 sec: 48619.7, 300 sec: 48319.0). Total num frames: 1549565952. Throughput: 0: 12094.6. Samples: 387461120. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:50,767][1648981] Avg episode reward: [(0, '869.600')] [2024-06-15 20:31:51,708][1651669] Updated weights for policy 0, policy_version 756671 (0.0012) [2024-06-15 20:31:54,399][1651669] Updated weights for policy 0, policy_version 756734 (0.0014) [2024-06-15 20:31:55,790][1648981] Fps is (10 sec: 52303.7, 60 sec: 48586.5, 300 sec: 48537.1). Total num frames: 1549795328. Throughput: 0: 12236.1. Samples: 387503616. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:31:55,791][1648981] Avg episode reward: [(0, '873.790')] [2024-06-15 20:31:58,208][1651669] Updated weights for policy 0, policy_version 756784 (0.0011) [2024-06-15 20:31:59,526][1651669] Updated weights for policy 0, policy_version 756834 (0.0013) [2024-06-15 20:32:00,772][1648981] Fps is (10 sec: 49124.4, 60 sec: 48055.2, 300 sec: 48429.1). Total num frames: 1550057472. Throughput: 0: 12070.3. Samples: 387574784. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:00,773][1648981] Avg episode reward: [(0, '834.640')] [2024-06-15 20:32:01,080][1651669] Updated weights for policy 0, policy_version 756883 (0.0013) [2024-06-15 20:32:02,219][1651669] Updated weights for policy 0, policy_version 756926 (0.0014) [2024-06-15 20:32:05,470][1651669] Updated weights for policy 0, policy_version 756985 (0.0013) [2024-06-15 20:32:05,767][1648981] Fps is (10 sec: 52552.2, 60 sec: 50243.9, 300 sec: 48763.2). Total num frames: 1550319616. Throughput: 0: 12151.4. Samples: 387645952. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:05,768][1648981] Avg episode reward: [(0, '798.530')] [2024-06-15 20:32:08,379][1651669] Updated weights for policy 0, policy_version 757024 (0.0014) [2024-06-15 20:32:09,745][1651669] Updated weights for policy 0, policy_version 757076 (0.0011) [2024-06-15 20:32:10,776][1648981] Fps is (10 sec: 52405.8, 60 sec: 48051.9, 300 sec: 48650.5). Total num frames: 1550581760. Throughput: 0: 12359.0. Samples: 387692032. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:10,777][1648981] Avg episode reward: [(0, '791.320')] [2024-06-15 20:32:11,258][1651669] Updated weights for policy 0, policy_version 757139 (0.0057) [2024-06-15 20:32:15,672][1651669] Updated weights for policy 0, policy_version 757200 (0.0013) [2024-06-15 20:32:15,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1550745600. Throughput: 0: 12356.3. Samples: 387759104. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:15,767][1648981] Avg episode reward: [(0, '786.690')] [2024-06-15 20:32:18,946][1651669] Updated weights for policy 0, policy_version 757249 (0.0012) [2024-06-15 20:32:20,741][1651274] Signal inference workers to stop experience collection... (39700 times) [2024-06-15 20:32:20,765][1651669] InferenceWorker_p0-w0: stopping experience collection (39700 times) [2024-06-15 20:32:20,778][1648981] Fps is (10 sec: 42591.0, 60 sec: 47504.3, 300 sec: 48542.6). Total num frames: 1551007744. Throughput: 0: 12057.3. Samples: 387830272. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:20,779][1648981] Avg episode reward: [(0, '707.130')] [2024-06-15 20:32:21,008][1651274] Signal inference workers to resume experience collection... (39700 times) [2024-06-15 20:32:21,010][1651669] InferenceWorker_p0-w0: resuming experience collection (39700 times) [2024-06-15 20:32:21,012][1651669] Updated weights for policy 0, policy_version 757344 (0.0011) [2024-06-15 20:32:22,911][1651669] Updated weights for policy 0, policy_version 757424 (0.0053) [2024-06-15 20:32:25,770][1648981] Fps is (10 sec: 49133.0, 60 sec: 49711.3, 300 sec: 48429.3). Total num frames: 1551237120. Throughput: 0: 12003.5. Samples: 387855872. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:25,771][1648981] Avg episode reward: [(0, '707.710')] [2024-06-15 20:32:26,873][1651669] Updated weights for policy 0, policy_version 757457 (0.0013) [2024-06-15 20:32:27,631][1651669] Updated weights for policy 0, policy_version 757501 (0.0013) [2024-06-15 20:32:30,766][1648981] Fps is (10 sec: 42648.6, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1551433728. Throughput: 0: 12208.3. Samples: 387938304. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:30,767][1648981] Avg episode reward: [(0, '687.400')] [2024-06-15 20:32:31,076][1651669] Updated weights for policy 0, policy_version 757554 (0.0058) [2024-06-15 20:32:32,803][1651669] Updated weights for policy 0, policy_version 757616 (0.0012) [2024-06-15 20:32:33,899][1651669] Updated weights for policy 0, policy_version 757668 (0.0011) [2024-06-15 20:32:34,455][1651669] Updated weights for policy 0, policy_version 757696 (0.0014) [2024-06-15 20:32:35,766][1648981] Fps is (10 sec: 52449.3, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1551761408. Throughput: 0: 12106.0. Samples: 388005888. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:35,767][1648981] Avg episode reward: [(0, '698.030')] [2024-06-15 20:32:38,173][1651669] Updated weights for policy 0, policy_version 757753 (0.0036) [2024-06-15 20:32:40,767][1648981] Fps is (10 sec: 45874.9, 60 sec: 45878.1, 300 sec: 48430.0). Total num frames: 1551892480. Throughput: 0: 11987.1. Samples: 388042752. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:40,767][1648981] Avg episode reward: [(0, '699.990')] [2024-06-15 20:32:42,166][1651669] Updated weights for policy 0, policy_version 757824 (0.0014) [2024-06-15 20:32:44,213][1651669] Updated weights for policy 0, policy_version 757890 (0.0011) [2024-06-15 20:32:45,378][1651669] Updated weights for policy 0, policy_version 757950 (0.0011) [2024-06-15 20:32:45,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 48763.2). Total num frames: 1552285696. Throughput: 0: 11914.1. Samples: 388110848. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:45,767][1648981] Avg episode reward: [(0, '664.380')] [2024-06-15 20:32:49,022][1651669] Updated weights for policy 0, policy_version 758013 (0.0013) [2024-06-15 20:32:50,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 47513.7, 300 sec: 48763.2). Total num frames: 1552416768. Throughput: 0: 12026.4. Samples: 388187136. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:50,767][1648981] Avg episode reward: [(0, '681.700')] [2024-06-15 20:32:52,463][1651669] Updated weights for policy 0, policy_version 758078 (0.0013) [2024-06-15 20:32:55,107][1651669] Updated weights for policy 0, policy_version 758129 (0.0013) [2024-06-15 20:32:55,767][1648981] Fps is (10 sec: 39320.7, 60 sec: 48078.7, 300 sec: 48652.1). Total num frames: 1552678912. Throughput: 0: 11824.1. Samples: 388224000. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:32:55,767][1648981] Avg episode reward: [(0, '665.360')] [2024-06-15 20:32:56,288][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000758176_1552744448.pth... [2024-06-15 20:32:56,500][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000752496_1541111808.pth [2024-06-15 20:32:57,017][1651669] Updated weights for policy 0, policy_version 758208 (0.0013) [2024-06-15 20:33:00,214][1651669] Updated weights for policy 0, policy_version 758258 (0.0010) [2024-06-15 20:33:00,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 48064.2, 300 sec: 48874.3). Total num frames: 1552941056. Throughput: 0: 11787.4. Samples: 388289536. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:33:00,767][1648981] Avg episode reward: [(0, '718.190')] [2024-06-15 20:33:02,550][1651669] Updated weights for policy 0, policy_version 758290 (0.0010) [2024-06-15 20:33:02,932][1651274] Signal inference workers to stop experience collection... (39750 times) [2024-06-15 20:33:03,021][1651669] InferenceWorker_p0-w0: stopping experience collection (39750 times) [2024-06-15 20:33:03,282][1651274] Signal inference workers to resume experience collection... (39750 times) [2024-06-15 20:33:03,283][1651669] InferenceWorker_p0-w0: resuming experience collection (39750 times) [2024-06-15 20:33:05,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 45875.5, 300 sec: 48652.2). Total num frames: 1553072128. Throughput: 0: 11847.4. Samples: 388363264. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 20:33:05,767][1648981] Avg episode reward: [(0, '677.240')] [2024-06-15 20:33:06,234][1651669] Updated weights for policy 0, policy_version 758353 (0.0133) [2024-06-15 20:33:07,964][1651669] Updated weights for policy 0, policy_version 758417 (0.0118) [2024-06-15 20:33:10,557][1651669] Updated weights for policy 0, policy_version 758480 (0.0012) [2024-06-15 20:33:10,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46429.2, 300 sec: 48541.1). Total num frames: 1553367040. Throughput: 0: 11788.4. Samples: 388386304. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:10,767][1648981] Avg episode reward: [(0, '634.970')] [2024-06-15 20:33:11,677][1651669] Updated weights for policy 0, policy_version 758525 (0.0011) [2024-06-15 20:33:14,532][1651669] Updated weights for policy 0, policy_version 758560 (0.0056) [2024-06-15 20:33:15,770][1648981] Fps is (10 sec: 52409.0, 60 sec: 47510.7, 300 sec: 48877.0). Total num frames: 1553596416. Throughput: 0: 11809.2. Samples: 388469760. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:15,771][1648981] Avg episode reward: [(0, '625.920')] [2024-06-15 20:33:16,950][1651669] Updated weights for policy 0, policy_version 758594 (0.0011) [2024-06-15 20:33:18,615][1651669] Updated weights for policy 0, policy_version 758660 (0.0013) [2024-06-15 20:33:19,865][1651669] Updated weights for policy 0, policy_version 758716 (0.0012) [2024-06-15 20:33:20,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 48066.2, 300 sec: 48540.4). Total num frames: 1553891328. Throughput: 0: 11718.1. Samples: 388533248. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:20,771][1648981] Avg episode reward: [(0, '620.300')] [2024-06-15 20:33:21,317][1651669] Updated weights for policy 0, policy_version 758772 (0.0011) [2024-06-15 20:33:25,605][1651669] Updated weights for policy 0, policy_version 758818 (0.0011) [2024-06-15 20:33:25,767][1648981] Fps is (10 sec: 45891.8, 60 sec: 46970.4, 300 sec: 48657.4). Total num frames: 1554055168. Throughput: 0: 11685.0. Samples: 388568576. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:25,767][1648981] Avg episode reward: [(0, '618.070')] [2024-06-15 20:33:27,805][1651669] Updated weights for policy 0, policy_version 758865 (0.0013) [2024-06-15 20:33:30,125][1651669] Updated weights for policy 0, policy_version 758929 (0.0015) [2024-06-15 20:33:30,766][1648981] Fps is (10 sec: 45892.6, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1554350080. Throughput: 0: 11958.0. Samples: 388648960. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:30,767][1648981] Avg episode reward: [(0, '600.000')] [2024-06-15 20:33:31,488][1651669] Updated weights for policy 0, policy_version 758993 (0.0015) [2024-06-15 20:33:35,656][1651669] Updated weights for policy 0, policy_version 759056 (0.0012) [2024-06-15 20:33:35,774][1648981] Fps is (10 sec: 49114.2, 60 sec: 46415.3, 300 sec: 48539.9). Total num frames: 1554546688. Throughput: 0: 11842.2. Samples: 388720128. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:35,775][1648981] Avg episode reward: [(0, '630.540')] [2024-06-15 20:33:36,649][1651669] Updated weights for policy 0, policy_version 759099 (0.0015) [2024-06-15 20:33:38,842][1651669] Updated weights for policy 0, policy_version 759152 (0.0012) [2024-06-15 20:33:40,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48605.9, 300 sec: 48096.7). Total num frames: 1554808832. Throughput: 0: 11923.9. Samples: 388760576. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:40,767][1648981] Avg episode reward: [(0, '643.670')] [2024-06-15 20:33:41,111][1651669] Updated weights for policy 0, policy_version 759201 (0.0012) [2024-06-15 20:33:42,365][1651669] Updated weights for policy 0, policy_version 759254 (0.0011) [2024-06-15 20:33:43,129][1651669] Updated weights for policy 0, policy_version 759290 (0.0107) [2024-06-15 20:33:45,775][1651274] Signal inference workers to stop experience collection... (39800 times) [2024-06-15 20:33:45,782][1648981] Fps is (10 sec: 49112.6, 60 sec: 45863.0, 300 sec: 48427.4). Total num frames: 1555038208. Throughput: 0: 12044.9. Samples: 388831744. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:45,783][1648981] Avg episode reward: [(0, '638.530')] [2024-06-15 20:33:45,807][1651669] InferenceWorker_p0-w0: stopping experience collection (39800 times) [2024-06-15 20:33:46,000][1651274] Signal inference workers to resume experience collection... (39800 times) [2024-06-15 20:33:46,001][1651669] InferenceWorker_p0-w0: resuming experience collection (39800 times) [2024-06-15 20:33:46,793][1651669] Updated weights for policy 0, policy_version 759344 (0.0170) [2024-06-15 20:33:49,963][1651669] Updated weights for policy 0, policy_version 759417 (0.0011) [2024-06-15 20:33:50,776][1648981] Fps is (10 sec: 49105.9, 60 sec: 48052.1, 300 sec: 48212.2). Total num frames: 1555300352. Throughput: 0: 12001.0. Samples: 388903424. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:50,777][1648981] Avg episode reward: [(0, '652.600')] [2024-06-15 20:33:51,988][1651669] Updated weights for policy 0, policy_version 759456 (0.0012) [2024-06-15 20:33:53,791][1651669] Updated weights for policy 0, policy_version 759536 (0.0104) [2024-06-15 20:33:55,766][1648981] Fps is (10 sec: 52512.0, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 1555562496. Throughput: 0: 12208.3. Samples: 388935680. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:33:55,767][1648981] Avg episode reward: [(0, '654.980')] [2024-06-15 20:33:57,052][1651669] Updated weights for policy 0, policy_version 759610 (0.0016) [2024-06-15 20:34:00,468][1651669] Updated weights for policy 0, policy_version 759664 (0.0012) [2024-06-15 20:34:00,770][1648981] Fps is (10 sec: 52458.6, 60 sec: 48056.8, 300 sec: 48429.4). Total num frames: 1555824640. Throughput: 0: 12185.6. Samples: 389018112. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:00,771][1648981] Avg episode reward: [(0, '690.580')] [2024-06-15 20:34:02,595][1651669] Updated weights for policy 0, policy_version 759712 (0.0019) [2024-06-15 20:34:04,313][1651669] Updated weights for policy 0, policy_version 759780 (0.0012) [2024-06-15 20:34:05,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 48430.1). Total num frames: 1556086784. Throughput: 0: 12300.4. Samples: 389086720. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:05,767][1648981] Avg episode reward: [(0, '717.020')] [2024-06-15 20:34:07,225][1651669] Updated weights for policy 0, policy_version 759840 (0.0012) [2024-06-15 20:34:10,767][1648981] Fps is (10 sec: 39335.9, 60 sec: 47513.4, 300 sec: 48208.4). Total num frames: 1556217856. Throughput: 0: 12367.6. Samples: 389125120. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:10,767][1648981] Avg episode reward: [(0, '741.010')] [2024-06-15 20:34:10,772][1651669] Updated weights for policy 0, policy_version 759874 (0.0012) [2024-06-15 20:34:11,858][1651669] Updated weights for policy 0, policy_version 759936 (0.0039) [2024-06-15 20:34:13,963][1651669] Updated weights for policy 0, policy_version 760016 (0.0012) [2024-06-15 20:34:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50247.4, 300 sec: 48430.0). Total num frames: 1556611072. Throughput: 0: 12276.6. Samples: 389201408. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:15,767][1648981] Avg episode reward: [(0, '744.660')] [2024-06-15 20:34:16,541][1651669] Updated weights for policy 0, policy_version 760080 (0.0015) [2024-06-15 20:34:17,442][1651669] Updated weights for policy 0, policy_version 760123 (0.0012) [2024-06-15 20:34:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 47516.6, 300 sec: 48430.0). Total num frames: 1556742144. Throughput: 0: 12608.8. Samples: 389287424. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:20,767][1648981] Avg episode reward: [(0, '764.000')] [2024-06-15 20:34:21,753][1651669] Updated weights for policy 0, policy_version 760182 (0.0012) [2024-06-15 20:34:23,559][1651669] Updated weights for policy 0, policy_version 760228 (0.0011) [2024-06-15 20:34:24,526][1651274] Signal inference workers to stop experience collection... (39850 times) [2024-06-15 20:34:24,561][1651669] InferenceWorker_p0-w0: stopping experience collection (39850 times) [2024-06-15 20:34:24,775][1651274] Signal inference workers to resume experience collection... (39850 times) [2024-06-15 20:34:24,775][1651669] InferenceWorker_p0-w0: resuming experience collection (39850 times) [2024-06-15 20:34:24,944][1651669] Updated weights for policy 0, policy_version 760290 (0.0032) [2024-06-15 20:34:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 51336.5, 300 sec: 48541.0). Total num frames: 1557135360. Throughput: 0: 12504.2. Samples: 389323264. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:25,767][1648981] Avg episode reward: [(0, '740.190')] [2024-06-15 20:34:26,077][1651669] Updated weights for policy 0, policy_version 760323 (0.0010) [2024-06-15 20:34:27,359][1651669] Updated weights for policy 0, policy_version 760378 (0.0115) [2024-06-15 20:34:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1557266432. Throughput: 0: 12508.6. Samples: 389394432. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:30,767][1648981] Avg episode reward: [(0, '730.000')] [2024-06-15 20:34:31,960][1651669] Updated weights for policy 0, policy_version 760422 (0.0011) [2024-06-15 20:34:33,529][1651669] Updated weights for policy 0, policy_version 760467 (0.0022) [2024-06-15 20:34:34,956][1651669] Updated weights for policy 0, policy_version 760534 (0.0110) [2024-06-15 20:34:35,690][1651669] Updated weights for policy 0, policy_version 760574 (0.0011) [2024-06-15 20:34:35,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 51342.9, 300 sec: 48767.8). Total num frames: 1557626880. Throughput: 0: 12620.5. Samples: 389471232. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:35,767][1648981] Avg episode reward: [(0, '742.580')] [2024-06-15 20:34:37,199][1651669] Updated weights for policy 0, policy_version 760632 (0.0012) [2024-06-15 20:34:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.2, 300 sec: 48542.4). Total num frames: 1557790720. Throughput: 0: 12709.0. Samples: 389507584. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:40,767][1648981] Avg episode reward: [(0, '742.580')] [2024-06-15 20:34:42,252][1651669] Updated weights for policy 0, policy_version 760662 (0.0094) [2024-06-15 20:34:43,124][1651669] Updated weights for policy 0, policy_version 760704 (0.0088) [2024-06-15 20:34:45,268][1651669] Updated weights for policy 0, policy_version 760768 (0.0013) [2024-06-15 20:34:45,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 50803.8, 300 sec: 48652.1). Total num frames: 1558085632. Throughput: 0: 12619.0. Samples: 389585920. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:45,767][1648981] Avg episode reward: [(0, '734.250')] [2024-06-15 20:34:46,723][1651669] Updated weights for policy 0, policy_version 760828 (0.0013) [2024-06-15 20:34:47,803][1651669] Updated weights for policy 0, policy_version 760865 (0.0016) [2024-06-15 20:34:50,767][1648981] Fps is (10 sec: 52425.6, 60 sec: 50251.6, 300 sec: 48763.1). Total num frames: 1558315008. Throughput: 0: 12822.6. Samples: 389663744. Policy #0 lag: (min: 5.0, avg: 138.1, max: 261.0) [2024-06-15 20:34:50,768][1648981] Avg episode reward: [(0, '793.730')] [2024-06-15 20:34:52,365][1651669] Updated weights for policy 0, policy_version 760912 (0.0011) [2024-06-15 20:34:54,265][1651669] Updated weights for policy 0, policy_version 760964 (0.0014) [2024-06-15 20:34:55,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 50790.1, 300 sec: 48874.2). Total num frames: 1558609920. Throughput: 0: 12777.2. Samples: 389700096. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:34:55,767][1648981] Avg episode reward: [(0, '782.310')] [2024-06-15 20:34:55,933][1651669] Updated weights for policy 0, policy_version 761042 (0.0114) [2024-06-15 20:34:56,066][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000761056_1558642688.pth... [2024-06-15 20:34:56,246][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000755328_1546911744.pth [2024-06-15 20:34:57,682][1651669] Updated weights for policy 0, policy_version 761104 (0.0014) [2024-06-15 20:35:00,767][1648981] Fps is (10 sec: 52431.3, 60 sec: 50247.3, 300 sec: 48874.3). Total num frames: 1558839296. Throughput: 0: 12435.9. Samples: 389761024. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:00,767][1648981] Avg episode reward: [(0, '780.020')] [2024-06-15 20:35:03,740][1651669] Updated weights for policy 0, policy_version 761168 (0.0013) [2024-06-15 20:35:05,294][1651669] Updated weights for policy 0, policy_version 761217 (0.0012) [2024-06-15 20:35:05,766][1648981] Fps is (10 sec: 39323.0, 60 sec: 48605.8, 300 sec: 48763.9). Total num frames: 1559003136. Throughput: 0: 12322.1. Samples: 389841920. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:05,767][1648981] Avg episode reward: [(0, '836.450')] [2024-06-15 20:35:06,051][1651274] Signal inference workers to stop experience collection... (39900 times) [2024-06-15 20:35:06,099][1651669] InferenceWorker_p0-w0: stopping experience collection (39900 times) [2024-06-15 20:35:06,248][1651274] Signal inference workers to resume experience collection... (39900 times) [2024-06-15 20:35:06,249][1651669] InferenceWorker_p0-w0: resuming experience collection (39900 times) [2024-06-15 20:35:06,840][1651669] Updated weights for policy 0, policy_version 761287 (0.0020) [2024-06-15 20:35:07,835][1651669] Updated weights for policy 0, policy_version 761339 (0.0019) [2024-06-15 20:35:09,633][1651669] Updated weights for policy 0, policy_version 761392 (0.0012) [2024-06-15 20:35:10,767][1648981] Fps is (10 sec: 52425.4, 60 sec: 52428.2, 300 sec: 48874.2). Total num frames: 1559363584. Throughput: 0: 12253.7. Samples: 389874688. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:10,768][1648981] Avg episode reward: [(0, '865.650')] [2024-06-15 20:35:14,808][1651669] Updated weights for policy 0, policy_version 761456 (0.0015) [2024-06-15 20:35:15,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1559494656. Throughput: 0: 12515.6. Samples: 389957632. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:15,767][1648981] Avg episode reward: [(0, '886.420')] [2024-06-15 20:35:16,378][1651669] Updated weights for policy 0, policy_version 761508 (0.0024) [2024-06-15 20:35:17,471][1651669] Updated weights for policy 0, policy_version 761558 (0.0012) [2024-06-15 20:35:18,888][1651669] Updated weights for policy 0, policy_version 761617 (0.0013) [2024-06-15 20:35:19,900][1651669] Updated weights for policy 0, policy_version 761664 (0.0096) [2024-06-15 20:35:20,766][1648981] Fps is (10 sec: 52433.2, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1559887872. Throughput: 0: 12356.4. Samples: 390027264. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:20,767][1648981] Avg episode reward: [(0, '885.250')] [2024-06-15 20:35:25,774][1648981] Fps is (10 sec: 49113.3, 60 sec: 47507.5, 300 sec: 48761.9). Total num frames: 1559986176. Throughput: 0: 12547.5. Samples: 390072320. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:25,775][1648981] Avg episode reward: [(0, '994.520')] [2024-06-15 20:35:25,775][1651669] Updated weights for policy 0, policy_version 761728 (0.0102) [2024-06-15 20:35:27,313][1651669] Updated weights for policy 0, policy_version 761795 (0.0013) [2024-06-15 20:35:28,544][1651669] Updated weights for policy 0, policy_version 761853 (0.0012) [2024-06-15 20:35:30,252][1651669] Updated weights for policy 0, policy_version 761904 (0.0011) [2024-06-15 20:35:30,768][1648981] Fps is (10 sec: 52420.5, 60 sec: 52427.5, 300 sec: 48878.0). Total num frames: 1560412160. Throughput: 0: 12253.4. Samples: 390137344. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:30,768][1648981] Avg episode reward: [(0, '1038.990')] [2024-06-15 20:35:35,766][1648981] Fps is (10 sec: 45911.8, 60 sec: 46967.8, 300 sec: 48541.1). Total num frames: 1560444928. Throughput: 0: 12356.5. Samples: 390219776. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:35,767][1648981] Avg episode reward: [(0, '1017.960')] [2024-06-15 20:35:35,770][1651669] Updated weights for policy 0, policy_version 761940 (0.0015) [2024-06-15 20:35:37,296][1651669] Updated weights for policy 0, policy_version 762006 (0.0014) [2024-06-15 20:35:38,381][1651669] Updated weights for policy 0, policy_version 762051 (0.0013) [2024-06-15 20:35:39,600][1651669] Updated weights for policy 0, policy_version 762110 (0.0015) [2024-06-15 20:35:40,766][1648981] Fps is (10 sec: 45882.4, 60 sec: 51336.6, 300 sec: 48985.4). Total num frames: 1560870912. Throughput: 0: 12333.6. Samples: 390255104. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:40,767][1648981] Avg episode reward: [(0, '965.780')] [2024-06-15 20:35:41,315][1651669] Updated weights for policy 0, policy_version 762164 (0.0101) [2024-06-15 20:35:45,668][1651274] Signal inference workers to stop experience collection... (39950 times) [2024-06-15 20:35:45,710][1651669] InferenceWorker_p0-w0: stopping experience collection (39950 times) [2024-06-15 20:35:45,767][1648981] Fps is (10 sec: 49147.2, 60 sec: 47512.9, 300 sec: 48432.7). Total num frames: 1560936448. Throughput: 0: 12583.6. Samples: 390327296. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:45,768][1648981] Avg episode reward: [(0, '980.650')] [2024-06-15 20:35:45,989][1651274] Signal inference workers to resume experience collection... (39950 times) [2024-06-15 20:35:45,990][1651669] InferenceWorker_p0-w0: resuming experience collection (39950 times) [2024-06-15 20:35:46,594][1651669] Updated weights for policy 0, policy_version 762209 (0.0035) [2024-06-15 20:35:47,909][1651669] Updated weights for policy 0, policy_version 762259 (0.0012) [2024-06-15 20:35:48,891][1651669] Updated weights for policy 0, policy_version 762304 (0.0013) [2024-06-15 20:35:50,419][1651669] Updated weights for policy 0, policy_version 762361 (0.0013) [2024-06-15 20:35:50,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 50244.9, 300 sec: 48985.4). Total num frames: 1561329664. Throughput: 0: 12151.5. Samples: 390388736. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:50,767][1648981] Avg episode reward: [(0, '1016.550')] [2024-06-15 20:35:52,624][1651669] Updated weights for policy 0, policy_version 762416 (0.0013) [2024-06-15 20:35:55,767][1648981] Fps is (10 sec: 52432.9, 60 sec: 47513.8, 300 sec: 48430.0). Total num frames: 1561460736. Throughput: 0: 12219.9. Samples: 390424576. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:35:55,767][1648981] Avg episode reward: [(0, '1007.200')] [2024-06-15 20:35:57,133][1651669] Updated weights for policy 0, policy_version 762480 (0.0012) [2024-06-15 20:35:58,736][1651669] Updated weights for policy 0, policy_version 762528 (0.0013) [2024-06-15 20:36:00,540][1651669] Updated weights for policy 0, policy_version 762582 (0.0014) [2024-06-15 20:36:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49152.2, 300 sec: 49096.5). Total num frames: 1561788416. Throughput: 0: 12117.3. Samples: 390502912. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:00,767][1648981] Avg episode reward: [(0, '1035.680')] [2024-06-15 20:36:02,785][1651669] Updated weights for policy 0, policy_version 762641 (0.0020) [2024-06-15 20:36:05,782][1648981] Fps is (10 sec: 52346.7, 60 sec: 49685.1, 300 sec: 48427.4). Total num frames: 1561985024. Throughput: 0: 12306.4. Samples: 390581248. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:05,783][1648981] Avg episode reward: [(0, '1030.260')] [2024-06-15 20:36:06,034][1651669] Updated weights for policy 0, policy_version 762704 (0.0011) [2024-06-15 20:36:08,918][1651669] Updated weights for policy 0, policy_version 762784 (0.0014) [2024-06-15 20:36:10,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48060.4, 300 sec: 48874.3). Total num frames: 1562247168. Throughput: 0: 12108.1. Samples: 390617088. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:10,767][1648981] Avg episode reward: [(0, '1054.080')] [2024-06-15 20:36:11,342][1651669] Updated weights for policy 0, policy_version 762832 (0.0012) [2024-06-15 20:36:13,788][1651669] Updated weights for policy 0, policy_version 762899 (0.0029) [2024-06-15 20:36:15,766][1648981] Fps is (10 sec: 52511.8, 60 sec: 50244.3, 300 sec: 48652.2). Total num frames: 1562509312. Throughput: 0: 12140.5. Samples: 390683648. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:15,767][1648981] Avg episode reward: [(0, '1018.430')] [2024-06-15 20:36:16,720][1651669] Updated weights for policy 0, policy_version 762945 (0.0022) [2024-06-15 20:36:17,770][1651669] Updated weights for policy 0, policy_version 762999 (0.0020) [2024-06-15 20:36:20,023][1651669] Updated weights for policy 0, policy_version 763040 (0.0062) [2024-06-15 20:36:20,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 47513.6, 300 sec: 49099.8). Total num frames: 1562738688. Throughput: 0: 12037.7. Samples: 390761472. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:20,767][1648981] Avg episode reward: [(0, '1017.020')] [2024-06-15 20:36:22,325][1651669] Updated weights for policy 0, policy_version 763104 (0.0013) [2024-06-15 20:36:23,310][1651669] Updated weights for policy 0, policy_version 763139 (0.0011) [2024-06-15 20:36:24,683][1651669] Updated weights for policy 0, policy_version 763193 (0.0016) [2024-06-15 20:36:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50797.1, 300 sec: 48874.3). Total num frames: 1563033600. Throughput: 0: 12151.5. Samples: 390801920. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:25,767][1648981] Avg episode reward: [(0, '1020.350')] [2024-06-15 20:36:28,188][1651274] Signal inference workers to stop experience collection... (40000 times) [2024-06-15 20:36:28,287][1651669] InferenceWorker_p0-w0: stopping experience collection (40000 times) [2024-06-15 20:36:28,382][1651274] Signal inference workers to resume experience collection... (40000 times) [2024-06-15 20:36:28,383][1651669] InferenceWorker_p0-w0: resuming experience collection (40000 times) [2024-06-15 20:36:28,586][1651669] Updated weights for policy 0, policy_version 763238 (0.0012) [2024-06-15 20:36:30,370][1651669] Updated weights for policy 0, policy_version 763283 (0.0012) [2024-06-15 20:36:30,768][1648981] Fps is (10 sec: 49142.5, 60 sec: 46967.2, 300 sec: 49096.2). Total num frames: 1563230208. Throughput: 0: 12174.0. Samples: 390875136. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:30,769][1648981] Avg episode reward: [(0, '1022.230')] [2024-06-15 20:36:31,091][1651669] Updated weights for policy 0, policy_version 763327 (0.0011) [2024-06-15 20:36:33,205][1651669] Updated weights for policy 0, policy_version 763381 (0.0132) [2024-06-15 20:36:34,602][1651669] Updated weights for policy 0, policy_version 763431 (0.0012) [2024-06-15 20:36:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 51882.6, 300 sec: 48875.0). Total num frames: 1563557888. Throughput: 0: 12356.3. Samples: 390944768. Policy #0 lag: (min: 27.0, avg: 123.0, max: 283.0) [2024-06-15 20:36:35,767][1648981] Avg episode reward: [(0, '998.630')] [2024-06-15 20:36:38,743][1651669] Updated weights for policy 0, policy_version 763461 (0.0011) [2024-06-15 20:36:39,740][1651669] Updated weights for policy 0, policy_version 763517 (0.0011) [2024-06-15 20:36:40,766][1648981] Fps is (10 sec: 45883.6, 60 sec: 46967.4, 300 sec: 48874.3). Total num frames: 1563688960. Throughput: 0: 12435.9. Samples: 390984192. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:36:40,767][1648981] Avg episode reward: [(0, '981.340')] [2024-06-15 20:36:42,036][1651669] Updated weights for policy 0, policy_version 763574 (0.0011) [2024-06-15 20:36:43,320][1651669] Updated weights for policy 0, policy_version 763600 (0.0010) [2024-06-15 20:36:44,784][1651669] Updated weights for policy 0, policy_version 763649 (0.0015) [2024-06-15 20:36:45,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 51337.3, 300 sec: 48985.4). Total num frames: 1564016640. Throughput: 0: 12276.6. Samples: 391055360. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:36:45,767][1648981] Avg episode reward: [(0, '964.860')] [2024-06-15 20:36:46,080][1651669] Updated weights for policy 0, policy_version 763702 (0.0011) [2024-06-15 20:36:50,402][1651669] Updated weights for policy 0, policy_version 763748 (0.0012) [2024-06-15 20:36:50,771][1648981] Fps is (10 sec: 49130.8, 60 sec: 47510.1, 300 sec: 48766.5). Total num frames: 1564180480. Throughput: 0: 12200.1. Samples: 391130112. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:36:50,772][1648981] Avg episode reward: [(0, '949.970')] [2024-06-15 20:36:51,956][1651669] Updated weights for policy 0, policy_version 763778 (0.0065) [2024-06-15 20:36:53,022][1651669] Updated weights for policy 0, policy_version 763834 (0.0012) [2024-06-15 20:36:54,759][1651669] Updated weights for policy 0, policy_version 763872 (0.0011) [2024-06-15 20:36:55,768][1648981] Fps is (10 sec: 45870.0, 60 sec: 50243.4, 300 sec: 48875.1). Total num frames: 1564475392. Throughput: 0: 12196.7. Samples: 391165952. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:36:55,768][1648981] Avg episode reward: [(0, '951.570')] [2024-06-15 20:36:56,068][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000763920_1564508160.pth... [2024-06-15 20:36:56,069][1651669] Updated weights for policy 0, policy_version 763920 (0.0012) [2024-06-15 20:36:56,211][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000758176_1552744448.pth [2024-06-15 20:37:00,748][1651669] Updated weights for policy 0, policy_version 763973 (0.0016) [2024-06-15 20:37:00,766][1648981] Fps is (10 sec: 42617.3, 60 sec: 46967.5, 300 sec: 48430.1). Total num frames: 1564606464. Throughput: 0: 12288.0. Samples: 391236608. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:00,767][1648981] Avg episode reward: [(0, '883.170')] [2024-06-15 20:37:02,727][1651669] Updated weights for policy 0, policy_version 764034 (0.0012) [2024-06-15 20:37:04,011][1651669] Updated weights for policy 0, policy_version 764093 (0.0012) [2024-06-15 20:37:05,635][1651669] Updated weights for policy 0, policy_version 764145 (0.0013) [2024-06-15 20:37:05,774][1648981] Fps is (10 sec: 49119.6, 60 sec: 49704.8, 300 sec: 48763.6). Total num frames: 1564966912. Throughput: 0: 12160.7. Samples: 391308800. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:05,775][1648981] Avg episode reward: [(0, '825.650')] [2024-06-15 20:37:06,715][1651274] Signal inference workers to stop experience collection... (40050 times) [2024-06-15 20:37:06,764][1651669] InferenceWorker_p0-w0: stopping experience collection (40050 times) [2024-06-15 20:37:06,999][1651274] Signal inference workers to resume experience collection... (40050 times) [2024-06-15 20:37:07,002][1651669] InferenceWorker_p0-w0: resuming experience collection (40050 times) [2024-06-15 20:37:07,005][1651669] Updated weights for policy 0, policy_version 764208 (0.0022) [2024-06-15 20:37:10,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 48059.5, 300 sec: 48763.2). Total num frames: 1565130752. Throughput: 0: 12014.8. Samples: 391342592. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:10,767][1648981] Avg episode reward: [(0, '852.060')] [2024-06-15 20:37:11,871][1651669] Updated weights for policy 0, policy_version 764245 (0.0012) [2024-06-15 20:37:14,187][1651669] Updated weights for policy 0, policy_version 764306 (0.0011) [2024-06-15 20:37:15,775][1651669] Updated weights for policy 0, policy_version 764354 (0.0014) [2024-06-15 20:37:15,783][1648981] Fps is (10 sec: 42560.5, 60 sec: 48046.4, 300 sec: 48762.4). Total num frames: 1565392896. Throughput: 0: 12067.9. Samples: 391418368. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:15,784][1648981] Avg episode reward: [(0, '864.480')] [2024-06-15 20:37:17,424][1651669] Updated weights for policy 0, policy_version 764432 (0.0012) [2024-06-15 20:37:20,766][1648981] Fps is (10 sec: 52430.4, 60 sec: 48605.8, 300 sec: 48875.0). Total num frames: 1565655040. Throughput: 0: 11889.8. Samples: 391479808. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:20,767][1648981] Avg episode reward: [(0, '867.750')] [2024-06-15 20:37:23,472][1651669] Updated weights for policy 0, policy_version 764496 (0.0011) [2024-06-15 20:37:24,394][1651669] Updated weights for policy 0, policy_version 764543 (0.0012) [2024-06-15 20:37:25,766][1648981] Fps is (10 sec: 42669.5, 60 sec: 46421.3, 300 sec: 48763.2). Total num frames: 1565818880. Throughput: 0: 11912.5. Samples: 391520256. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:25,767][1648981] Avg episode reward: [(0, '854.980')] [2024-06-15 20:37:26,306][1651669] Updated weights for policy 0, policy_version 764579 (0.0011) [2024-06-15 20:37:28,228][1651669] Updated weights for policy 0, policy_version 764656 (0.0011) [2024-06-15 20:37:29,249][1651669] Updated weights for policy 0, policy_version 764704 (0.0012) [2024-06-15 20:37:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49153.5, 300 sec: 48874.3). Total num frames: 1566179328. Throughput: 0: 11776.0. Samples: 391585280. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:30,767][1648981] Avg episode reward: [(0, '816.520')] [2024-06-15 20:37:35,480][1651669] Updated weights for policy 0, policy_version 764768 (0.0012) [2024-06-15 20:37:35,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 48652.2). Total num frames: 1566244864. Throughput: 0: 11663.3. Samples: 391654912. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:35,767][1648981] Avg episode reward: [(0, '844.200')] [2024-06-15 20:37:37,975][1651669] Updated weights for policy 0, policy_version 764833 (0.0014) [2024-06-15 20:37:40,145][1651669] Updated weights for policy 0, policy_version 764915 (0.0011) [2024-06-15 20:37:40,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1566572544. Throughput: 0: 11628.4. Samples: 391689216. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:40,767][1648981] Avg episode reward: [(0, '868.970')] [2024-06-15 20:37:40,818][1651669] Updated weights for policy 0, policy_version 764944 (0.0011) [2024-06-15 20:37:41,996][1651669] Updated weights for policy 0, policy_version 764992 (0.0014) [2024-06-15 20:37:45,770][1648981] Fps is (10 sec: 49133.8, 60 sec: 45326.3, 300 sec: 48540.5). Total num frames: 1566736384. Throughput: 0: 11615.7. Samples: 391759360. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:45,770][1648981] Avg episode reward: [(0, '837.480')] [2024-06-15 20:37:46,435][1651669] Updated weights for policy 0, policy_version 765042 (0.0013) [2024-06-15 20:37:49,798][1651669] Updated weights for policy 0, policy_version 765089 (0.0013) [2024-06-15 20:37:50,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 46424.7, 300 sec: 48430.0). Total num frames: 1566965760. Throughput: 0: 11539.0. Samples: 391827968. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:50,767][1648981] Avg episode reward: [(0, '844.240')] [2024-06-15 20:37:51,626][1651274] Signal inference workers to stop experience collection... (40100 times) [2024-06-15 20:37:51,697][1651669] InferenceWorker_p0-w0: stopping experience collection (40100 times) [2024-06-15 20:37:51,699][1651669] Updated weights for policy 0, policy_version 765155 (0.0024) [2024-06-15 20:37:51,942][1651274] Signal inference workers to resume experience collection... (40100 times) [2024-06-15 20:37:51,942][1651669] InferenceWorker_p0-w0: resuming experience collection (40100 times) [2024-06-15 20:37:53,903][1651669] Updated weights for policy 0, policy_version 765246 (0.0045) [2024-06-15 20:37:55,770][1648981] Fps is (10 sec: 49152.0, 60 sec: 45873.2, 300 sec: 48429.4). Total num frames: 1567227904. Throughput: 0: 11285.9. Samples: 391850496. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:37:55,771][1648981] Avg episode reward: [(0, '815.240')] [2024-06-15 20:37:58,002][1651669] Updated weights for policy 0, policy_version 765303 (0.0099) [2024-06-15 20:38:00,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 45875.1, 300 sec: 48430.0). Total num frames: 1567358976. Throughput: 0: 11120.2. Samples: 391918592. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:38:00,767][1648981] Avg episode reward: [(0, '768.620')] [2024-06-15 20:38:03,961][1651669] Updated weights for policy 0, policy_version 765408 (0.0015) [2024-06-15 20:38:05,766][1648981] Fps is (10 sec: 39336.4, 60 sec: 44242.5, 300 sec: 48318.9). Total num frames: 1567621120. Throughput: 0: 11047.8. Samples: 391976960. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:38:05,767][1648981] Avg episode reward: [(0, '770.370')] [2024-06-15 20:38:06,221][1651669] Updated weights for policy 0, policy_version 765472 (0.0032) [2024-06-15 20:38:06,989][1651669] Updated weights for policy 0, policy_version 765504 (0.0046) [2024-06-15 20:38:10,211][1651669] Updated weights for policy 0, policy_version 765560 (0.0053) [2024-06-15 20:38:10,780][1648981] Fps is (10 sec: 52355.7, 60 sec: 45864.8, 300 sec: 48428.3). Total num frames: 1567883264. Throughput: 0: 10873.8. Samples: 392009728. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:38:10,781][1648981] Avg episode reward: [(0, '769.140')] [2024-06-15 20:38:15,089][1651669] Updated weights for policy 0, policy_version 765616 (0.0032) [2024-06-15 20:38:15,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 43702.8, 300 sec: 47875.2). Total num frames: 1568014336. Throughput: 0: 10956.8. Samples: 392078336. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:38:15,767][1648981] Avg episode reward: [(0, '767.910')] [2024-06-15 20:38:16,609][1651669] Updated weights for policy 0, policy_version 765680 (0.0016) [2024-06-15 20:38:18,762][1651669] Updated weights for policy 0, policy_version 765717 (0.0012) [2024-06-15 20:38:20,766][1648981] Fps is (10 sec: 39376.4, 60 sec: 43690.6, 300 sec: 48207.8). Total num frames: 1568276480. Throughput: 0: 10729.2. Samples: 392137728. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:38:20,767][1648981] Avg episode reward: [(0, '722.910')] [2024-06-15 20:38:21,009][1651669] Updated weights for policy 0, policy_version 765763 (0.0017) [2024-06-15 20:38:25,774][1648981] Fps is (10 sec: 39290.7, 60 sec: 43138.9, 300 sec: 47651.2). Total num frames: 1568407552. Throughput: 0: 10556.7. Samples: 392164352. Policy #0 lag: (min: 31.0, avg: 124.3, max: 287.0) [2024-06-15 20:38:25,775][1648981] Avg episode reward: [(0, '720.820')] [2024-06-15 20:38:26,889][1651669] Updated weights for policy 0, policy_version 765825 (0.0012) [2024-06-15 20:38:28,403][1651669] Updated weights for policy 0, policy_version 765888 (0.0169) [2024-06-15 20:38:30,099][1651669] Updated weights for policy 0, policy_version 765951 (0.0015) [2024-06-15 20:38:30,766][1648981] Fps is (10 sec: 39321.8, 60 sec: 41506.2, 300 sec: 47875.9). Total num frames: 1568669696. Throughput: 0: 10297.7. Samples: 392222720. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:38:30,767][1648981] Avg episode reward: [(0, '713.160')] [2024-06-15 20:38:33,915][1651669] Updated weights for policy 0, policy_version 766009 (0.0174) [2024-06-15 20:38:35,759][1651669] Updated weights for policy 0, policy_version 766079 (0.0013) [2024-06-15 20:38:35,766][1648981] Fps is (10 sec: 52470.5, 60 sec: 44783.0, 300 sec: 47874.6). Total num frames: 1568931840. Throughput: 0: 10126.3. Samples: 392283648. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:38:35,767][1648981] Avg episode reward: [(0, '724.800')] [2024-06-15 20:38:40,047][1651274] Signal inference workers to stop experience collection... (40150 times) [2024-06-15 20:38:40,084][1651669] InferenceWorker_p0-w0: stopping experience collection (40150 times) [2024-06-15 20:38:40,401][1651274] Signal inference workers to resume experience collection... (40150 times) [2024-06-15 20:38:40,402][1651669] InferenceWorker_p0-w0: resuming experience collection (40150 times) [2024-06-15 20:38:40,769][1648981] Fps is (10 sec: 36033.6, 60 sec: 40957.9, 300 sec: 47432.3). Total num frames: 1569030144. Throughput: 0: 10558.7. Samples: 392325632. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:38:40,772][1648981] Avg episode reward: [(0, '717.800')] [2024-06-15 20:38:41,620][1651669] Updated weights for policy 0, policy_version 766160 (0.0158) [2024-06-15 20:38:45,805][1648981] Fps is (10 sec: 26112.7, 60 sec: 40936.1, 300 sec: 47092.4). Total num frames: 1569193984. Throughput: 0: 10037.9. Samples: 392370688. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:38:45,806][1648981] Avg episode reward: [(0, '728.690')] [2024-06-15 20:38:46,564][1651669] Updated weights for policy 0, policy_version 766224 (0.0019) [2024-06-15 20:38:48,685][1651669] Updated weights for policy 0, policy_version 766293 (0.0012) [2024-06-15 20:38:49,478][1651669] Updated weights for policy 0, policy_version 766336 (0.0033) [2024-06-15 20:38:50,766][1648981] Fps is (10 sec: 42611.6, 60 sec: 41506.2, 300 sec: 47097.1). Total num frames: 1569456128. Throughput: 0: 10080.7. Samples: 392430592. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:38:50,767][1648981] Avg episode reward: [(0, '731.330')] [2024-06-15 20:38:55,766][1648981] Fps is (10 sec: 42764.4, 60 sec: 39870.2, 300 sec: 46764.4). Total num frames: 1569619968. Throughput: 0: 10152.1. Samples: 392466432. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:38:55,767][1648981] Avg episode reward: [(0, '752.610')] [2024-06-15 20:38:56,341][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000766448_1569685504.pth... [2024-06-15 20:38:56,342][1651669] Updated weights for policy 0, policy_version 766448 (0.0167) [2024-06-15 20:38:56,403][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000761056_1558642688.pth [2024-06-15 20:39:00,418][1651669] Updated weights for policy 0, policy_version 766465 (0.0025) [2024-06-15 20:39:00,782][1648981] Fps is (10 sec: 29444.6, 60 sec: 39857.2, 300 sec: 46317.0). Total num frames: 1569751040. Throughput: 0: 9758.7. Samples: 392517632. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:00,783][1648981] Avg episode reward: [(0, '756.740')] [2024-06-15 20:39:02,909][1651669] Updated weights for policy 0, policy_version 766576 (0.0014) [2024-06-15 20:39:05,768][1648981] Fps is (10 sec: 36044.3, 60 sec: 39321.5, 300 sec: 46652.7). Total num frames: 1569980416. Throughput: 0: 9807.6. Samples: 392579072. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:05,770][1648981] Avg episode reward: [(0, '784.100')] [2024-06-15 20:39:07,599][1651669] Updated weights for policy 0, policy_version 766625 (0.0023) [2024-06-15 20:39:09,565][1651669] Updated weights for policy 0, policy_version 766691 (0.0090) [2024-06-15 20:39:10,767][1648981] Fps is (10 sec: 49228.9, 60 sec: 39330.6, 300 sec: 46208.4). Total num frames: 1570242560. Throughput: 0: 9843.4. Samples: 392607232. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:10,767][1648981] Avg episode reward: [(0, '785.990')] [2024-06-15 20:39:14,560][1651669] Updated weights for policy 0, policy_version 766785 (0.0085) [2024-06-15 20:39:15,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 40960.0, 300 sec: 46541.7). Total num frames: 1570471936. Throughput: 0: 10058.0. Samples: 392675328. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:15,774][1648981] Avg episode reward: [(0, '824.190')] [2024-06-15 20:39:15,960][1651669] Updated weights for policy 0, policy_version 766847 (0.0013) [2024-06-15 20:39:19,265][1651669] Updated weights for policy 0, policy_version 766900 (0.0012) [2024-06-15 20:39:20,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 40413.6, 300 sec: 45986.2). Total num frames: 1570701312. Throughput: 0: 10228.5. Samples: 392743936. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:20,767][1648981] Avg episode reward: [(0, '748.310')] [2024-06-15 20:39:21,185][1651669] Updated weights for policy 0, policy_version 766975 (0.0277) [2024-06-15 20:39:25,496][1651669] Updated weights for policy 0, policy_version 767035 (0.0014) [2024-06-15 20:39:25,721][1651274] Signal inference workers to stop experience collection... (40200 times) [2024-06-15 20:39:25,762][1651669] InferenceWorker_p0-w0: stopping experience collection (40200 times) [2024-06-15 20:39:25,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 41511.5, 300 sec: 46208.4). Total num frames: 1570897920. Throughput: 0: 10070.0. Samples: 392778752. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:25,767][1648981] Avg episode reward: [(0, '746.470')] [2024-06-15 20:39:25,910][1651274] Signal inference workers to resume experience collection... (40200 times) [2024-06-15 20:39:25,911][1651669] InferenceWorker_p0-w0: resuming experience collection (40200 times) [2024-06-15 20:39:26,999][1651669] Updated weights for policy 0, policy_version 767104 (0.0023) [2024-06-15 20:39:30,784][1648981] Fps is (10 sec: 45796.0, 60 sec: 41493.8, 300 sec: 45872.5). Total num frames: 1571160064. Throughput: 0: 10745.6. Samples: 392854016. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:30,785][1648981] Avg episode reward: [(0, '779.500')] [2024-06-15 20:39:31,645][1651669] Updated weights for policy 0, policy_version 767200 (0.0013) [2024-06-15 20:39:35,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 39321.5, 300 sec: 45764.1). Total num frames: 1571291136. Throughput: 0: 10911.3. Samples: 392921600. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:35,767][1648981] Avg episode reward: [(0, '763.450')] [2024-06-15 20:39:36,065][1651669] Updated weights for policy 0, policy_version 767251 (0.0012) [2024-06-15 20:39:37,834][1651669] Updated weights for policy 0, policy_version 767328 (0.0014) [2024-06-15 20:39:40,772][1648981] Fps is (10 sec: 39370.3, 60 sec: 42050.6, 300 sec: 45652.2). Total num frames: 1571553280. Throughput: 0: 10659.7. Samples: 392946176. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:40,772][1648981] Avg episode reward: [(0, '814.390')] [2024-06-15 20:39:40,816][1651669] Updated weights for policy 0, policy_version 767362 (0.0018) [2024-06-15 20:39:42,874][1651669] Updated weights for policy 0, policy_version 767441 (0.0130) [2024-06-15 20:39:43,755][1651669] Updated weights for policy 0, policy_version 767488 (0.0015) [2024-06-15 20:39:45,768][1648981] Fps is (10 sec: 52428.0, 60 sec: 43718.8, 300 sec: 45764.2). Total num frames: 1571815424. Throughput: 0: 11267.9. Samples: 393024512. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:45,770][1648981] Avg episode reward: [(0, '807.890')] [2024-06-15 20:39:47,900][1651669] Updated weights for policy 0, policy_version 767555 (0.0033) [2024-06-15 20:39:49,290][1651669] Updated weights for policy 0, policy_version 767614 (0.0021) [2024-06-15 20:39:50,766][1648981] Fps is (10 sec: 52457.2, 60 sec: 43690.7, 300 sec: 45653.1). Total num frames: 1572077568. Throughput: 0: 11400.6. Samples: 393092096. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:50,767][1648981] Avg episode reward: [(0, '805.870')] [2024-06-15 20:39:52,909][1651669] Updated weights for policy 0, policy_version 767669 (0.0012) [2024-06-15 20:39:54,493][1651669] Updated weights for policy 0, policy_version 767733 (0.0012) [2024-06-15 20:39:55,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 45328.7, 300 sec: 45764.1). Total num frames: 1572339712. Throughput: 0: 11650.8. Samples: 393131520. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:39:55,768][1648981] Avg episode reward: [(0, '831.960')] [2024-06-15 20:39:58,644][1651669] Updated weights for policy 0, policy_version 767792 (0.0158) [2024-06-15 20:40:00,669][1651669] Updated weights for policy 0, policy_version 767872 (0.0014) [2024-06-15 20:40:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47526.1, 300 sec: 46097.4). Total num frames: 1572601856. Throughput: 0: 11696.4. Samples: 393201664. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:40:00,767][1648981] Avg episode reward: [(0, '860.100')] [2024-06-15 20:40:04,359][1651669] Updated weights for policy 0, policy_version 767940 (0.0012) [2024-06-15 20:40:04,730][1651274] Signal inference workers to stop experience collection... (40250 times) [2024-06-15 20:40:04,794][1651669] InferenceWorker_p0-w0: stopping experience collection (40250 times) [2024-06-15 20:40:05,011][1651274] Signal inference workers to resume experience collection... (40250 times) [2024-06-15 20:40:05,012][1651669] InferenceWorker_p0-w0: resuming experience collection (40250 times) [2024-06-15 20:40:05,753][1651669] Updated weights for policy 0, policy_version 767991 (0.0010) [2024-06-15 20:40:05,766][1648981] Fps is (10 sec: 49154.7, 60 sec: 47513.7, 300 sec: 45653.2). Total num frames: 1572831232. Throughput: 0: 11594.1. Samples: 393265664. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:40:05,767][1648981] Avg episode reward: [(0, '863.960')] [2024-06-15 20:40:09,812][1651669] Updated weights for policy 0, policy_version 768034 (0.0011) [2024-06-15 20:40:10,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 45875.3, 300 sec: 45764.1). Total num frames: 1572995072. Throughput: 0: 11741.9. Samples: 393307136. Policy #0 lag: (min: 15.0, avg: 93.4, max: 271.0) [2024-06-15 20:40:10,767][1648981] Avg episode reward: [(0, '864.060')] [2024-06-15 20:40:11,023][1651669] Updated weights for policy 0, policy_version 768085 (0.0011) [2024-06-15 20:40:14,226][1651669] Updated weights for policy 0, policy_version 768144 (0.0015) [2024-06-15 20:40:15,770][1648981] Fps is (10 sec: 45857.5, 60 sec: 46964.5, 300 sec: 45430.3). Total num frames: 1573289984. Throughput: 0: 11608.9. Samples: 393376256. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:15,771][1648981] Avg episode reward: [(0, '828.820')] [2024-06-15 20:40:15,777][1651669] Updated weights for policy 0, policy_version 768209 (0.0014) [2024-06-15 20:40:16,887][1651669] Updated weights for policy 0, policy_version 768256 (0.0013) [2024-06-15 20:40:20,799][1648981] Fps is (10 sec: 45724.4, 60 sec: 45850.3, 300 sec: 45649.2). Total num frames: 1573453824. Throughput: 0: 11699.1. Samples: 393448448. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:20,800][1648981] Avg episode reward: [(0, '824.720')] [2024-06-15 20:40:21,675][1651669] Updated weights for policy 0, policy_version 768336 (0.0013) [2024-06-15 20:40:22,966][1651669] Updated weights for policy 0, policy_version 768382 (0.0013) [2024-06-15 20:40:25,766][1648981] Fps is (10 sec: 39336.4, 60 sec: 46421.3, 300 sec: 44986.8). Total num frames: 1573683200. Throughput: 0: 11857.1. Samples: 393479680. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:25,767][1648981] Avg episode reward: [(0, '790.740')] [2024-06-15 20:40:26,982][1651669] Updated weights for policy 0, policy_version 768464 (0.0013) [2024-06-15 20:40:27,959][1651669] Updated weights for policy 0, policy_version 768508 (0.0014) [2024-06-15 20:40:30,766][1648981] Fps is (10 sec: 49314.7, 60 sec: 46435.1, 300 sec: 45764.1). Total num frames: 1573945344. Throughput: 0: 11901.2. Samples: 393560064. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:30,767][1648981] Avg episode reward: [(0, '782.310')] [2024-06-15 20:40:31,384][1651669] Updated weights for policy 0, policy_version 768560 (0.0088) [2024-06-15 20:40:33,007][1651669] Updated weights for policy 0, policy_version 768629 (0.0011) [2024-06-15 20:40:35,767][1648981] Fps is (10 sec: 49150.1, 60 sec: 48059.4, 300 sec: 45097.6). Total num frames: 1574174720. Throughput: 0: 12196.9. Samples: 393640960. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:35,768][1648981] Avg episode reward: [(0, '763.730')] [2024-06-15 20:40:36,349][1651669] Updated weights for policy 0, policy_version 768672 (0.0163) [2024-06-15 20:40:37,621][1651669] Updated weights for policy 0, policy_version 768737 (0.0014) [2024-06-15 20:40:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48064.0, 300 sec: 45764.3). Total num frames: 1574436864. Throughput: 0: 11969.5. Samples: 393670144. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:40,767][1648981] Avg episode reward: [(0, '768.980')] [2024-06-15 20:40:41,518][1651669] Updated weights for policy 0, policy_version 768800 (0.0012) [2024-06-15 20:40:43,026][1651669] Updated weights for policy 0, policy_version 768864 (0.0012) [2024-06-15 20:40:43,124][1651274] Signal inference workers to stop experience collection... (40300 times) [2024-06-15 20:40:43,181][1651669] InferenceWorker_p0-w0: stopping experience collection (40300 times) [2024-06-15 20:40:43,478][1651274] Signal inference workers to resume experience collection... (40300 times) [2024-06-15 20:40:43,479][1651669] InferenceWorker_p0-w0: resuming experience collection (40300 times) [2024-06-15 20:40:45,766][1648981] Fps is (10 sec: 52431.3, 60 sec: 48059.9, 300 sec: 45319.8). Total num frames: 1574699008. Throughput: 0: 11980.8. Samples: 393740800. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:45,767][1648981] Avg episode reward: [(0, '745.640')] [2024-06-15 20:40:47,075][1651669] Updated weights for policy 0, policy_version 768913 (0.0014) [2024-06-15 20:40:48,296][1651669] Updated weights for policy 0, policy_version 768967 (0.0013) [2024-06-15 20:40:49,228][1651669] Updated weights for policy 0, policy_version 769023 (0.0152) [2024-06-15 20:40:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 45764.1). Total num frames: 1574961152. Throughput: 0: 12322.1. Samples: 393820160. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:50,767][1648981] Avg episode reward: [(0, '727.670')] [2024-06-15 20:40:52,602][1651669] Updated weights for policy 0, policy_version 769072 (0.0012) [2024-06-15 20:40:53,938][1651669] Updated weights for policy 0, policy_version 769126 (0.0013) [2024-06-15 20:40:55,771][1648981] Fps is (10 sec: 52407.1, 60 sec: 48056.8, 300 sec: 45541.3). Total num frames: 1575223296. Throughput: 0: 12184.5. Samples: 393855488. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:40:55,771][1648981] Avg episode reward: [(0, '737.830')] [2024-06-15 20:40:55,777][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000769152_1575223296.pth... [2024-06-15 20:40:55,836][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000763920_1564508160.pth [2024-06-15 20:40:58,095][1651669] Updated weights for policy 0, policy_version 769188 (0.0013) [2024-06-15 20:40:59,294][1651669] Updated weights for policy 0, policy_version 769248 (0.0015) [2024-06-15 20:41:00,000][1651669] Updated weights for policy 0, policy_version 769280 (0.0014) [2024-06-15 20:41:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 45766.6). Total num frames: 1575485440. Throughput: 0: 12277.7. Samples: 393928704. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:00,767][1648981] Avg episode reward: [(0, '735.410')] [2024-06-15 20:41:03,833][1651669] Updated weights for policy 0, policy_version 769346 (0.0013) [2024-06-15 20:41:05,774][1648981] Fps is (10 sec: 52410.6, 60 sec: 48599.7, 300 sec: 45763.0). Total num frames: 1575747584. Throughput: 0: 12181.1. Samples: 393996288. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:05,775][1648981] Avg episode reward: [(0, '709.970')] [2024-06-15 20:41:08,611][1651669] Updated weights for policy 0, policy_version 769424 (0.0012) [2024-06-15 20:41:10,203][1651669] Updated weights for policy 0, policy_version 769488 (0.0012) [2024-06-15 20:41:10,769][1648981] Fps is (10 sec: 45863.7, 60 sec: 49150.0, 300 sec: 45541.6). Total num frames: 1575944192. Throughput: 0: 12412.5. Samples: 394038272. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:10,769][1648981] Avg episode reward: [(0, '738.940')] [2024-06-15 20:41:11,100][1651669] Updated weights for policy 0, policy_version 769532 (0.0013) [2024-06-15 20:41:14,216][1651669] Updated weights for policy 0, policy_version 769575 (0.0012) [2024-06-15 20:41:15,647][1651669] Updated weights for policy 0, policy_version 769648 (0.0030) [2024-06-15 20:41:15,766][1648981] Fps is (10 sec: 49189.6, 60 sec: 49155.2, 300 sec: 45764.1). Total num frames: 1576239104. Throughput: 0: 12174.3. Samples: 394107904. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:15,767][1648981] Avg episode reward: [(0, '730.110')] [2024-06-15 20:41:18,909][1651669] Updated weights for policy 0, policy_version 769680 (0.0027) [2024-06-15 20:41:20,489][1651669] Updated weights for policy 0, policy_version 769744 (0.0030) [2024-06-15 20:41:20,766][1648981] Fps is (10 sec: 49164.7, 60 sec: 49725.6, 300 sec: 45430.9). Total num frames: 1576435712. Throughput: 0: 12003.7. Samples: 394181120. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:20,767][1648981] Avg episode reward: [(0, '734.650')] [2024-06-15 20:41:21,536][1651669] Updated weights for policy 0, policy_version 769792 (0.0021) [2024-06-15 20:41:24,330][1651274] Signal inference workers to stop experience collection... (40350 times) [2024-06-15 20:41:24,424][1651669] InferenceWorker_p0-w0: stopping experience collection (40350 times) [2024-06-15 20:41:24,688][1651274] Signal inference workers to resume experience collection... (40350 times) [2024-06-15 20:41:24,689][1651669] InferenceWorker_p0-w0: resuming experience collection (40350 times) [2024-06-15 20:41:25,694][1651669] Updated weights for policy 0, policy_version 769849 (0.0086) [2024-06-15 20:41:25,767][1648981] Fps is (10 sec: 39316.8, 60 sec: 49151.1, 300 sec: 45431.0). Total num frames: 1576632320. Throughput: 0: 12151.2. Samples: 394216960. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:25,768][1648981] Avg episode reward: [(0, '722.430')] [2024-06-15 20:41:27,267][1651669] Updated weights for policy 0, policy_version 769913 (0.0038) [2024-06-15 20:41:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.1, 300 sec: 45208.7). Total num frames: 1576894464. Throughput: 0: 12265.3. Samples: 394292736. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:30,767][1648981] Avg episode reward: [(0, '745.150')] [2024-06-15 20:41:31,221][1651669] Updated weights for policy 0, policy_version 769984 (0.0012) [2024-06-15 20:41:32,603][1651669] Updated weights for policy 0, policy_version 770048 (0.0011) [2024-06-15 20:41:35,799][1648981] Fps is (10 sec: 42463.7, 60 sec: 48033.8, 300 sec: 45314.8). Total num frames: 1577058304. Throughput: 0: 11983.4. Samples: 394359808. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:35,800][1648981] Avg episode reward: [(0, '754.130')] [2024-06-15 20:41:37,336][1651669] Updated weights for policy 0, policy_version 770106 (0.0011) [2024-06-15 20:41:38,697][1651669] Updated weights for policy 0, policy_version 770160 (0.0020) [2024-06-15 20:41:40,681][1651669] Updated weights for policy 0, policy_version 770192 (0.0017) [2024-06-15 20:41:40,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48605.9, 300 sec: 45208.7). Total num frames: 1577353216. Throughput: 0: 11822.6. Samples: 394387456. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:40,767][1648981] Avg episode reward: [(0, '745.840')] [2024-06-15 20:41:42,690][1651669] Updated weights for policy 0, policy_version 770279 (0.0101) [2024-06-15 20:41:45,775][1648981] Fps is (10 sec: 52555.2, 60 sec: 48052.7, 300 sec: 45430.2). Total num frames: 1577582592. Throughput: 0: 11898.8. Samples: 394464256. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:45,776][1648981] Avg episode reward: [(0, '737.360')] [2024-06-15 20:41:47,725][1651669] Updated weights for policy 0, policy_version 770336 (0.0013) [2024-06-15 20:41:49,497][1651669] Updated weights for policy 0, policy_version 770404 (0.0015) [2024-06-15 20:41:50,770][1648981] Fps is (10 sec: 49132.8, 60 sec: 48056.6, 300 sec: 45319.4). Total num frames: 1577844736. Throughput: 0: 11947.6. Samples: 394533888. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:50,771][1648981] Avg episode reward: [(0, '770.620')] [2024-06-15 20:41:52,449][1651669] Updated weights for policy 0, policy_version 770467 (0.0112) [2024-06-15 20:41:53,768][1651669] Updated weights for policy 0, policy_version 770538 (0.0012) [2024-06-15 20:41:54,275][1651669] Updated weights for policy 0, policy_version 770560 (0.0011) [2024-06-15 20:41:55,766][1648981] Fps is (10 sec: 52475.0, 60 sec: 48063.0, 300 sec: 45764.1). Total num frames: 1578106880. Throughput: 0: 11765.3. Samples: 394567680. Policy #0 lag: (min: 95.0, avg: 206.9, max: 381.0) [2024-06-15 20:41:55,767][1648981] Avg episode reward: [(0, '748.140')] [2024-06-15 20:41:59,449][1651669] Updated weights for policy 0, policy_version 770628 (0.0011) [2024-06-15 20:42:00,782][1648981] Fps is (10 sec: 52369.2, 60 sec: 48047.5, 300 sec: 45429.7). Total num frames: 1578369024. Throughput: 0: 12067.7. Samples: 394651136. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:00,782][1648981] Avg episode reward: [(0, '748.090')] [2024-06-15 20:42:01,671][1651669] Updated weights for policy 0, policy_version 770691 (0.0014) [2024-06-15 20:42:03,696][1651274] Signal inference workers to stop experience collection... (40400 times) [2024-06-15 20:42:03,784][1651669] InferenceWorker_p0-w0: stopping experience collection (40400 times) [2024-06-15 20:42:03,804][1651669] Updated weights for policy 0, policy_version 770759 (0.0084) [2024-06-15 20:42:03,920][1651274] Signal inference workers to resume experience collection... (40400 times) [2024-06-15 20:42:03,921][1651669] InferenceWorker_p0-w0: resuming experience collection (40400 times) [2024-06-15 20:42:04,875][1651669] Updated weights for policy 0, policy_version 770809 (0.0029) [2024-06-15 20:42:05,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 48065.6, 300 sec: 45764.1). Total num frames: 1578631168. Throughput: 0: 11889.7. Samples: 394716160. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:05,767][1648981] Avg episode reward: [(0, '830.090')] [2024-06-15 20:42:08,841][1651669] Updated weights for policy 0, policy_version 770864 (0.0012) [2024-06-15 20:42:10,766][1648981] Fps is (10 sec: 42663.4, 60 sec: 47515.5, 300 sec: 45433.4). Total num frames: 1578795008. Throughput: 0: 12072.1. Samples: 394760192. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:10,767][1648981] Avg episode reward: [(0, '843.380')] [2024-06-15 20:42:11,169][1651669] Updated weights for policy 0, policy_version 770928 (0.0011) [2024-06-15 20:42:13,394][1651669] Updated weights for policy 0, policy_version 770992 (0.0012) [2024-06-15 20:42:15,745][1651669] Updated weights for policy 0, policy_version 771068 (0.0014) [2024-06-15 20:42:15,766][1648981] Fps is (10 sec: 49154.0, 60 sec: 48059.8, 300 sec: 45653.1). Total num frames: 1579122688. Throughput: 0: 11867.0. Samples: 394826752. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:15,767][1648981] Avg episode reward: [(0, '834.010')] [2024-06-15 20:42:19,946][1651669] Updated weights for policy 0, policy_version 771124 (0.0013) [2024-06-15 20:42:20,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 47513.6, 300 sec: 45653.1). Total num frames: 1579286528. Throughput: 0: 12183.1. Samples: 394907648. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:20,767][1648981] Avg episode reward: [(0, '832.530')] [2024-06-15 20:42:21,267][1651669] Updated weights for policy 0, policy_version 771156 (0.0013) [2024-06-15 20:42:23,112][1651669] Updated weights for policy 0, policy_version 771219 (0.0150) [2024-06-15 20:42:25,766][1648981] Fps is (10 sec: 42597.7, 60 sec: 48606.8, 300 sec: 45319.8). Total num frames: 1579548672. Throughput: 0: 12151.5. Samples: 394934272. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:25,767][1648981] Avg episode reward: [(0, '821.600')] [2024-06-15 20:42:25,986][1651669] Updated weights for policy 0, policy_version 771282 (0.0013) [2024-06-15 20:42:29,295][1651669] Updated weights for policy 0, policy_version 771329 (0.0020) [2024-06-15 20:42:30,661][1651669] Updated weights for policy 0, policy_version 771392 (0.0104) [2024-06-15 20:42:30,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48605.7, 300 sec: 45986.3). Total num frames: 1579810816. Throughput: 0: 12381.4. Samples: 395021312. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:30,767][1648981] Avg episode reward: [(0, '816.610')] [2024-06-15 20:42:33,036][1651669] Updated weights for policy 0, policy_version 771443 (0.0013) [2024-06-15 20:42:34,617][1651669] Updated weights for policy 0, policy_version 771514 (0.0011) [2024-06-15 20:42:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50271.8, 300 sec: 45764.1). Total num frames: 1580072960. Throughput: 0: 12186.7. Samples: 395082240. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:35,767][1648981] Avg episode reward: [(0, '795.950')] [2024-06-15 20:42:37,478][1651669] Updated weights for policy 0, policy_version 771560 (0.0011) [2024-06-15 20:42:40,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 49152.0, 300 sec: 45986.9). Total num frames: 1580302336. Throughput: 0: 12344.9. Samples: 395123200. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:40,767][1648981] Avg episode reward: [(0, '818.090')] [2024-06-15 20:42:40,767][1651669] Updated weights for policy 0, policy_version 771632 (0.0080) [2024-06-15 20:42:42,906][1651669] Updated weights for policy 0, policy_version 771664 (0.0012) [2024-06-15 20:42:44,282][1651669] Updated weights for policy 0, policy_version 771715 (0.0012) [2024-06-15 20:42:44,991][1651274] Signal inference workers to stop experience collection... (40450 times) [2024-06-15 20:42:45,031][1651669] InferenceWorker_p0-w0: stopping experience collection (40450 times) [2024-06-15 20:42:45,279][1651274] Signal inference workers to resume experience collection... (40450 times) [2024-06-15 20:42:45,280][1651669] InferenceWorker_p0-w0: resuming experience collection (40450 times) [2024-06-15 20:42:45,514][1651669] Updated weights for policy 0, policy_version 771774 (0.0012) [2024-06-15 20:42:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50251.7, 300 sec: 46208.4). Total num frames: 1580597248. Throughput: 0: 12087.3. Samples: 395194880. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:45,767][1648981] Avg episode reward: [(0, '885.210')] [2024-06-15 20:42:49,102][1651669] Updated weights for policy 0, policy_version 771829 (0.0013) [2024-06-15 20:42:50,660][1651669] Updated weights for policy 0, policy_version 771859 (0.0027) [2024-06-15 20:42:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48609.1, 300 sec: 45875.8). Total num frames: 1580761088. Throughput: 0: 12276.7. Samples: 395268608. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:50,767][1648981] Avg episode reward: [(0, '902.380')] [2024-06-15 20:42:54,400][1651669] Updated weights for policy 0, policy_version 771936 (0.0012) [2024-06-15 20:42:55,632][1651669] Updated weights for policy 0, policy_version 772000 (0.0033) [2024-06-15 20:42:55,767][1648981] Fps is (10 sec: 45870.8, 60 sec: 49151.2, 300 sec: 46430.4). Total num frames: 1581056000. Throughput: 0: 12219.5. Samples: 395310080. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:42:55,768][1648981] Avg episode reward: [(0, '924.820')] [2024-06-15 20:42:56,267][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000772032_1581121536.pth... [2024-06-15 20:42:56,276][1651669] Updated weights for policy 0, policy_version 772032 (0.0015) [2024-06-15 20:42:56,337][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000766448_1569685504.pth [2024-06-15 20:43:00,606][1651669] Updated weights for policy 0, policy_version 772112 (0.0013) [2024-06-15 20:43:00,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48618.3, 300 sec: 46319.5). Total num frames: 1581285376. Throughput: 0: 12379.0. Samples: 395383808. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:00,767][1648981] Avg episode reward: [(0, '912.410')] [2024-06-15 20:43:04,310][1651669] Updated weights for policy 0, policy_version 772164 (0.0073) [2024-06-15 20:43:05,638][1651669] Updated weights for policy 0, policy_version 772228 (0.0013) [2024-06-15 20:43:05,769][1648981] Fps is (10 sec: 45867.6, 60 sec: 48057.9, 300 sec: 46210.2). Total num frames: 1581514752. Throughput: 0: 12241.8. Samples: 395458560. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:05,770][1648981] Avg episode reward: [(0, '952.980')] [2024-06-15 20:43:06,684][1651669] Updated weights for policy 0, policy_version 772288 (0.0022) [2024-06-15 20:43:10,031][1651669] Updated weights for policy 0, policy_version 772344 (0.0011) [2024-06-15 20:43:10,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.2, 300 sec: 46652.7). Total num frames: 1581776896. Throughput: 0: 12561.1. Samples: 395499520. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:10,767][1648981] Avg episode reward: [(0, '984.860')] [2024-06-15 20:43:11,448][1651669] Updated weights for policy 0, policy_version 772387 (0.0013) [2024-06-15 20:43:14,817][1651669] Updated weights for policy 0, policy_version 772418 (0.0014) [2024-06-15 20:43:15,766][1648981] Fps is (10 sec: 49165.0, 60 sec: 48059.7, 300 sec: 46541.7). Total num frames: 1582006272. Throughput: 0: 12276.7. Samples: 395573760. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:15,767][1648981] Avg episode reward: [(0, '980.980')] [2024-06-15 20:43:16,510][1651669] Updated weights for policy 0, policy_version 772499 (0.0012) [2024-06-15 20:43:20,274][1651669] Updated weights for policy 0, policy_version 772582 (0.0017) [2024-06-15 20:43:20,691][1651669] Updated weights for policy 0, policy_version 772608 (0.0011) [2024-06-15 20:43:20,767][1648981] Fps is (10 sec: 52427.4, 60 sec: 50244.0, 300 sec: 47098.3). Total num frames: 1582301184. Throughput: 0: 12299.3. Samples: 395635712. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:20,767][1648981] Avg episode reward: [(0, '1003.600')] [2024-06-15 20:43:22,386][1651669] Updated weights for policy 0, policy_version 772660 (0.0012) [2024-06-15 20:43:25,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 48059.7, 300 sec: 46652.7). Total num frames: 1582432256. Throughput: 0: 12322.1. Samples: 395677696. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:25,767][1648981] Avg episode reward: [(0, '1039.540')] [2024-06-15 20:43:26,175][1651669] Updated weights for policy 0, policy_version 772692 (0.0013) [2024-06-15 20:43:26,920][1651274] Signal inference workers to stop experience collection... (40500 times) [2024-06-15 20:43:26,960][1651669] InferenceWorker_p0-w0: stopping experience collection (40500 times) [2024-06-15 20:43:27,098][1651274] Signal inference workers to resume experience collection... (40500 times) [2024-06-15 20:43:27,099][1651669] InferenceWorker_p0-w0: resuming experience collection (40500 times) [2024-06-15 20:43:27,729][1651669] Updated weights for policy 0, policy_version 772768 (0.0124) [2024-06-15 20:43:29,678][1651669] Updated weights for policy 0, policy_version 772803 (0.0011) [2024-06-15 20:43:30,654][1651669] Updated weights for policy 0, policy_version 772852 (0.0011) [2024-06-15 20:43:30,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 49698.3, 300 sec: 46986.0). Total num frames: 1582792704. Throughput: 0: 12390.4. Samples: 395752448. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:30,767][1648981] Avg episode reward: [(0, '1073.890')] [2024-06-15 20:43:32,887][1651669] Updated weights for policy 0, policy_version 772912 (0.0105) [2024-06-15 20:43:35,767][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.6, 300 sec: 47208.6). Total num frames: 1582956544. Throughput: 0: 12390.4. Samples: 395826176. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:35,767][1648981] Avg episode reward: [(0, '1074.740')] [2024-06-15 20:43:36,806][1651669] Updated weights for policy 0, policy_version 772929 (0.0011) [2024-06-15 20:43:38,555][1651669] Updated weights for policy 0, policy_version 773008 (0.0104) [2024-06-15 20:43:39,504][1651669] Updated weights for policy 0, policy_version 773049 (0.0012) [2024-06-15 20:43:40,782][1648981] Fps is (10 sec: 42530.6, 60 sec: 48593.0, 300 sec: 47545.1). Total num frames: 1583218688. Throughput: 0: 12147.4. Samples: 395856896. Policy #0 lag: (min: 15.0, avg: 98.5, max: 271.0) [2024-06-15 20:43:40,783][1648981] Avg episode reward: [(0, '1060.240')] [2024-06-15 20:43:42,207][1651669] Updated weights for policy 0, policy_version 773119 (0.0012) [2024-06-15 20:43:44,665][1651669] Updated weights for policy 0, policy_version 773172 (0.0100) [2024-06-15 20:43:45,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1583480832. Throughput: 0: 12060.5. Samples: 395926528. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:43:45,767][1648981] Avg episode reward: [(0, '1059.890')] [2024-06-15 20:43:48,607][1651669] Updated weights for policy 0, policy_version 773216 (0.0011) [2024-06-15 20:43:50,767][1648981] Fps is (10 sec: 49229.1, 60 sec: 49151.8, 300 sec: 47763.5). Total num frames: 1583710208. Throughput: 0: 11901.8. Samples: 395994112. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:43:50,768][1648981] Avg episode reward: [(0, '1054.520')] [2024-06-15 20:43:50,839][1651669] Updated weights for policy 0, policy_version 773309 (0.0115) [2024-06-15 20:43:53,392][1651669] Updated weights for policy 0, policy_version 773366 (0.0020) [2024-06-15 20:43:55,772][1648981] Fps is (10 sec: 49122.8, 60 sec: 48601.8, 300 sec: 48209.5). Total num frames: 1583972352. Throughput: 0: 11740.3. Samples: 396027904. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:43:55,773][1648981] Avg episode reward: [(0, '1058.050')] [2024-06-15 20:43:55,796][1651669] Updated weights for policy 0, policy_version 773432 (0.0012) [2024-06-15 20:43:59,920][1651669] Updated weights for policy 0, policy_version 773477 (0.0030) [2024-06-15 20:44:00,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1584136192. Throughput: 0: 11958.0. Samples: 396111872. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:00,767][1648981] Avg episode reward: [(0, '1096.410')] [2024-06-15 20:44:01,328][1651669] Updated weights for policy 0, policy_version 773536 (0.0011) [2024-06-15 20:44:03,465][1651669] Updated weights for policy 0, policy_version 773584 (0.0012) [2024-06-15 20:44:04,433][1651669] Updated weights for policy 0, policy_version 773629 (0.0046) [2024-06-15 20:44:05,776][1648981] Fps is (10 sec: 42582.6, 60 sec: 48054.1, 300 sec: 47984.1). Total num frames: 1584398336. Throughput: 0: 11978.3. Samples: 396174848. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:05,777][1648981] Avg episode reward: [(0, '1154.520')] [2024-06-15 20:44:06,203][1651274] Saving new best policy, reward=1154.520! [2024-06-15 20:44:06,531][1651669] Updated weights for policy 0, policy_version 773680 (0.0011) [2024-06-15 20:44:09,790][1651274] Signal inference workers to stop experience collection... (40550 times) [2024-06-15 20:44:09,831][1651669] InferenceWorker_p0-w0: stopping experience collection (40550 times) [2024-06-15 20:44:10,074][1651274] Signal inference workers to resume experience collection... (40550 times) [2024-06-15 20:44:10,074][1651669] InferenceWorker_p0-w0: resuming experience collection (40550 times) [2024-06-15 20:44:10,076][1651669] Updated weights for policy 0, policy_version 773728 (0.0011) [2024-06-15 20:44:10,772][1648981] Fps is (10 sec: 49125.1, 60 sec: 47509.3, 300 sec: 47984.8). Total num frames: 1584627712. Throughput: 0: 12047.6. Samples: 396219904. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:10,772][1648981] Avg episode reward: [(0, '1148.830')] [2024-06-15 20:44:12,373][1651669] Updated weights for policy 0, policy_version 773814 (0.0110) [2024-06-15 20:44:15,428][1651669] Updated weights for policy 0, policy_version 773857 (0.0011) [2024-06-15 20:44:15,774][1648981] Fps is (10 sec: 49161.0, 60 sec: 48053.4, 300 sec: 48095.6). Total num frames: 1584889856. Throughput: 0: 11796.7. Samples: 396283392. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:15,775][1648981] Avg episode reward: [(0, '1137.720')] [2024-06-15 20:44:17,086][1651669] Updated weights for policy 0, policy_version 773907 (0.0020) [2024-06-15 20:44:20,766][1648981] Fps is (10 sec: 45900.3, 60 sec: 46421.5, 300 sec: 48096.8). Total num frames: 1585086464. Throughput: 0: 11867.0. Samples: 396360192. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:20,767][1648981] Avg episode reward: [(0, '1094.310')] [2024-06-15 20:44:21,012][1651669] Updated weights for policy 0, policy_version 773984 (0.0013) [2024-06-15 20:44:22,596][1651669] Updated weights for policy 0, policy_version 774037 (0.0120) [2024-06-15 20:44:25,767][1648981] Fps is (10 sec: 42631.1, 60 sec: 48059.7, 300 sec: 47988.6). Total num frames: 1585315840. Throughput: 0: 11723.2. Samples: 396384256. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:25,767][1648981] Avg episode reward: [(0, '1100.980')] [2024-06-15 20:44:26,334][1651669] Updated weights for policy 0, policy_version 774081 (0.0015) [2024-06-15 20:44:27,623][1651669] Updated weights for policy 0, policy_version 774145 (0.0011) [2024-06-15 20:44:28,518][1651669] Updated weights for policy 0, policy_version 774199 (0.0011) [2024-06-15 20:44:30,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 48430.0). Total num frames: 1585577984. Throughput: 0: 11969.4. Samples: 396465152. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:30,767][1648981] Avg episode reward: [(0, '1057.840')] [2024-06-15 20:44:32,280][1651669] Updated weights for policy 0, policy_version 774244 (0.0012) [2024-06-15 20:44:33,545][1651669] Updated weights for policy 0, policy_version 774295 (0.0012) [2024-06-15 20:44:35,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 48059.8, 300 sec: 48430.9). Total num frames: 1585840128. Throughput: 0: 12015.0. Samples: 396534784. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:35,767][1648981] Avg episode reward: [(0, '1012.400')] [2024-06-15 20:44:37,144][1651669] Updated weights for policy 0, policy_version 774352 (0.0013) [2024-06-15 20:44:38,185][1651669] Updated weights for policy 0, policy_version 774400 (0.0012) [2024-06-15 20:44:40,782][1648981] Fps is (10 sec: 52345.4, 60 sec: 48059.7, 300 sec: 48427.4). Total num frames: 1586102272. Throughput: 0: 12035.0. Samples: 396569600. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:40,783][1648981] Avg episode reward: [(0, '1009.940')] [2024-06-15 20:44:42,397][1651669] Updated weights for policy 0, policy_version 774465 (0.0012) [2024-06-15 20:44:43,794][1651669] Updated weights for policy 0, policy_version 774528 (0.0011) [2024-06-15 20:44:45,466][1651669] Updated weights for policy 0, policy_version 774589 (0.0013) [2024-06-15 20:44:45,776][1648981] Fps is (10 sec: 52375.9, 60 sec: 48051.7, 300 sec: 48428.3). Total num frames: 1586364416. Throughput: 0: 11807.5. Samples: 396643328. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:45,777][1648981] Avg episode reward: [(0, '996.920')] [2024-06-15 20:44:49,335][1651274] Signal inference workers to stop experience collection... (40600 times) [2024-06-15 20:44:49,358][1651669] Updated weights for policy 0, policy_version 774658 (0.0012) [2024-06-15 20:44:49,375][1651669] InferenceWorker_p0-w0: stopping experience collection (40600 times) [2024-06-15 20:44:49,546][1651274] Signal inference workers to resume experience collection... (40600 times) [2024-06-15 20:44:49,547][1651669] InferenceWorker_p0-w0: resuming experience collection (40600 times) [2024-06-15 20:44:50,570][1651669] Updated weights for policy 0, policy_version 774714 (0.0012) [2024-06-15 20:44:50,775][1648981] Fps is (10 sec: 52465.7, 60 sec: 48598.9, 300 sec: 48428.6). Total num frames: 1586626560. Throughput: 0: 11855.8. Samples: 396708352. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:50,776][1648981] Avg episode reward: [(0, '1034.440')] [2024-06-15 20:44:55,142][1651669] Updated weights for policy 0, policy_version 774788 (0.0011) [2024-06-15 20:44:55,767][1648981] Fps is (10 sec: 42640.1, 60 sec: 46971.9, 300 sec: 48096.7). Total num frames: 1586790400. Throughput: 0: 11959.4. Samples: 396758016. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:44:55,767][1648981] Avg episode reward: [(0, '1073.110')] [2024-06-15 20:44:56,149][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000774832_1586855936.pth... [2024-06-15 20:44:56,200][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000769152_1575223296.pth [2024-06-15 20:44:56,548][1651669] Updated weights for policy 0, policy_version 774846 (0.0023) [2024-06-15 20:45:00,348][1651669] Updated weights for policy 0, policy_version 774912 (0.0109) [2024-06-15 20:45:00,775][1648981] Fps is (10 sec: 42601.6, 60 sec: 48599.3, 300 sec: 48206.5). Total num frames: 1587052544. Throughput: 0: 12026.2. Samples: 396824576. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:45:00,776][1648981] Avg episode reward: [(0, '1052.610')] [2024-06-15 20:45:01,558][1651669] Updated weights for policy 0, policy_version 774975 (0.0014) [2024-06-15 20:45:05,778][1648981] Fps is (10 sec: 42549.3, 60 sec: 46965.8, 300 sec: 48205.9). Total num frames: 1587216384. Throughput: 0: 11966.3. Samples: 396898816. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:45:05,779][1648981] Avg episode reward: [(0, '1034.450')] [2024-06-15 20:45:05,873][1651669] Updated weights for policy 0, policy_version 775024 (0.0011) [2024-06-15 20:45:07,567][1651669] Updated weights for policy 0, policy_version 775093 (0.0012) [2024-06-15 20:45:10,766][1648981] Fps is (10 sec: 42633.2, 60 sec: 47518.0, 300 sec: 48097.4). Total num frames: 1587478528. Throughput: 0: 12094.6. Samples: 396928512. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:45:10,767][1648981] Avg episode reward: [(0, '997.000')] [2024-06-15 20:45:10,844][1651669] Updated weights for policy 0, policy_version 775152 (0.0012) [2024-06-15 20:45:12,208][1651669] Updated weights for policy 0, policy_version 775203 (0.0011) [2024-06-15 20:45:15,766][1648981] Fps is (10 sec: 45929.5, 60 sec: 46427.4, 300 sec: 48213.2). Total num frames: 1587675136. Throughput: 0: 11878.4. Samples: 396999680. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:45:15,767][1648981] Avg episode reward: [(0, '983.130')] [2024-06-15 20:45:16,116][1651669] Updated weights for policy 0, policy_version 775248 (0.0012) [2024-06-15 20:45:18,556][1651669] Updated weights for policy 0, policy_version 775344 (0.0012) [2024-06-15 20:45:20,767][1648981] Fps is (10 sec: 45873.4, 60 sec: 47513.3, 300 sec: 48318.9). Total num frames: 1587937280. Throughput: 0: 11798.7. Samples: 397065728. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:45:20,767][1648981] Avg episode reward: [(0, '1013.130')] [2024-06-15 20:45:22,348][1651669] Updated weights for policy 0, policy_version 775408 (0.0102) [2024-06-15 20:45:24,152][1651669] Updated weights for policy 0, policy_version 775482 (0.0012) [2024-06-15 20:45:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1588199424. Throughput: 0: 11677.7. Samples: 397094912. Policy #0 lag: (min: 15.0, avg: 148.1, max: 271.0) [2024-06-15 20:45:25,767][1648981] Avg episode reward: [(0, '940.110')] [2024-06-15 20:45:29,060][1651669] Updated weights for policy 0, policy_version 775541 (0.0075) [2024-06-15 20:45:29,987][1651274] Signal inference workers to stop experience collection... (40650 times) [2024-06-15 20:45:30,042][1651669] InferenceWorker_p0-w0: stopping experience collection (40650 times) [2024-06-15 20:45:30,331][1651274] Signal inference workers to resume experience collection... (40650 times) [2024-06-15 20:45:30,332][1651669] InferenceWorker_p0-w0: resuming experience collection (40650 times) [2024-06-15 20:45:30,776][1648981] Fps is (10 sec: 49104.0, 60 sec: 47505.6, 300 sec: 48317.3). Total num frames: 1588428800. Throughput: 0: 11639.5. Samples: 397167104. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:45:30,777][1648981] Avg episode reward: [(0, '950.110')] [2024-06-15 20:45:30,953][1651669] Updated weights for policy 0, policy_version 775611 (0.0012) [2024-06-15 20:45:34,717][1651669] Updated weights for policy 0, policy_version 775680 (0.0013) [2024-06-15 20:45:35,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 48207.9). Total num frames: 1588658176. Throughput: 0: 11584.9. Samples: 397229568. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:45:35,767][1648981] Avg episode reward: [(0, '957.030')] [2024-06-15 20:45:35,841][1651669] Updated weights for policy 0, policy_version 775729 (0.0011) [2024-06-15 20:45:39,273][1651669] Updated weights for policy 0, policy_version 775776 (0.0012) [2024-06-15 20:45:40,767][1648981] Fps is (10 sec: 45920.8, 60 sec: 46433.5, 300 sec: 48096.7). Total num frames: 1588887552. Throughput: 0: 11503.0. Samples: 397275648. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:45:40,767][1648981] Avg episode reward: [(0, '918.580')] [2024-06-15 20:45:40,853][1651669] Updated weights for policy 0, policy_version 775828 (0.0014) [2024-06-15 20:45:44,726][1651669] Updated weights for policy 0, policy_version 775892 (0.0015) [2024-06-15 20:45:45,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 45883.0, 300 sec: 47985.7). Total num frames: 1589116928. Throughput: 0: 11596.1. Samples: 397346304. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:45:45,767][1648981] Avg episode reward: [(0, '947.690')] [2024-06-15 20:45:46,770][1651669] Updated weights for policy 0, policy_version 775984 (0.0013) [2024-06-15 20:45:50,314][1651669] Updated weights for policy 0, policy_version 776032 (0.0012) [2024-06-15 20:45:50,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 45335.8, 300 sec: 47875.3). Total num frames: 1589346304. Throughput: 0: 11528.7. Samples: 397417472. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:45:50,767][1648981] Avg episode reward: [(0, '936.090')] [2024-06-15 20:45:52,292][1651669] Updated weights for policy 0, policy_version 776097 (0.0013) [2024-06-15 20:45:55,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 45875.4, 300 sec: 47652.4). Total num frames: 1589542912. Throughput: 0: 11468.8. Samples: 397444608. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:45:55,767][1648981] Avg episode reward: [(0, '944.830')] [2024-06-15 20:45:55,821][1651669] Updated weights for policy 0, policy_version 776149 (0.0015) [2024-06-15 20:45:57,268][1651669] Updated weights for policy 0, policy_version 776208 (0.0015) [2024-06-15 20:45:58,395][1651669] Updated weights for policy 0, policy_version 776256 (0.0023) [2024-06-15 20:46:00,771][1648981] Fps is (10 sec: 42579.4, 60 sec: 45331.8, 300 sec: 47541.9). Total num frames: 1589772288. Throughput: 0: 11501.8. Samples: 397517312. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:00,771][1648981] Avg episode reward: [(0, '975.500')] [2024-06-15 20:46:02,076][1651669] Updated weights for policy 0, policy_version 776306 (0.0012) [2024-06-15 20:46:03,171][1651669] Updated weights for policy 0, policy_version 776358 (0.0016) [2024-06-15 20:46:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46976.7, 300 sec: 47763.9). Total num frames: 1590034432. Throughput: 0: 11707.8. Samples: 397592576. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:05,767][1648981] Avg episode reward: [(0, '958.550')] [2024-06-15 20:46:06,699][1651669] Updated weights for policy 0, policy_version 776400 (0.0027) [2024-06-15 20:46:08,112][1651669] Updated weights for policy 0, policy_version 776452 (0.0043) [2024-06-15 20:46:08,777][1651274] Signal inference workers to stop experience collection... (40700 times) [2024-06-15 20:46:08,823][1651669] InferenceWorker_p0-w0: stopping experience collection (40700 times) [2024-06-15 20:46:09,047][1651274] Signal inference workers to resume experience collection... (40700 times) [2024-06-15 20:46:09,049][1651669] InferenceWorker_p0-w0: resuming experience collection (40700 times) [2024-06-15 20:46:10,766][1648981] Fps is (10 sec: 52452.0, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 1590296576. Throughput: 0: 11798.8. Samples: 397625856. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:10,767][1648981] Avg episode reward: [(0, '937.890')] [2024-06-15 20:46:11,931][1651669] Updated weights for policy 0, policy_version 776528 (0.0015) [2024-06-15 20:46:13,619][1651669] Updated weights for policy 0, policy_version 776593 (0.0012) [2024-06-15 20:46:15,768][1648981] Fps is (10 sec: 52419.1, 60 sec: 48058.2, 300 sec: 47874.3). Total num frames: 1590558720. Throughput: 0: 11687.1. Samples: 397692928. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:15,769][1648981] Avg episode reward: [(0, '935.960')] [2024-06-15 20:46:17,751][1651669] Updated weights for policy 0, policy_version 776643 (0.0012) [2024-06-15 20:46:19,505][1651669] Updated weights for policy 0, policy_version 776720 (0.0011) [2024-06-15 20:46:20,697][1651669] Updated weights for policy 0, policy_version 776767 (0.0011) [2024-06-15 20:46:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48060.0, 300 sec: 48096.9). Total num frames: 1590820864. Throughput: 0: 11787.4. Samples: 397760000. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:20,767][1648981] Avg episode reward: [(0, '907.600')] [2024-06-15 20:46:24,290][1651669] Updated weights for policy 0, policy_version 776832 (0.0012) [2024-06-15 20:46:25,443][1651669] Updated weights for policy 0, policy_version 776889 (0.0017) [2024-06-15 20:46:25,766][1648981] Fps is (10 sec: 52438.6, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 1591083008. Throughput: 0: 11753.3. Samples: 397804544. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:25,767][1648981] Avg episode reward: [(0, '898.900')] [2024-06-15 20:46:29,830][1651669] Updated weights for policy 0, policy_version 776960 (0.0123) [2024-06-15 20:46:30,768][1648981] Fps is (10 sec: 45865.7, 60 sec: 47520.0, 300 sec: 48212.9). Total num frames: 1591279616. Throughput: 0: 11752.7. Samples: 397875200. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:30,769][1648981] Avg episode reward: [(0, '876.010')] [2024-06-15 20:46:31,257][1651669] Updated weights for policy 0, policy_version 777019 (0.0012) [2024-06-15 20:46:34,957][1651669] Updated weights for policy 0, policy_version 777072 (0.0020) [2024-06-15 20:46:35,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 1591508992. Throughput: 0: 11730.5. Samples: 397945344. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:35,767][1648981] Avg episode reward: [(0, '851.760')] [2024-06-15 20:46:36,653][1651669] Updated weights for policy 0, policy_version 777152 (0.0015) [2024-06-15 20:46:40,770][1648981] Fps is (10 sec: 39314.7, 60 sec: 46418.5, 300 sec: 47764.3). Total num frames: 1591672832. Throughput: 0: 11900.2. Samples: 397980160. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:40,771][1648981] Avg episode reward: [(0, '855.980')] [2024-06-15 20:46:41,358][1651669] Updated weights for policy 0, policy_version 777206 (0.0012) [2024-06-15 20:46:42,927][1651669] Updated weights for policy 0, policy_version 777264 (0.0017) [2024-06-15 20:46:45,384][1651669] Updated weights for policy 0, policy_version 777297 (0.0011) [2024-06-15 20:46:45,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 47764.2). Total num frames: 1591934976. Throughput: 0: 11856.8. Samples: 398050816. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:45,767][1648981] Avg episode reward: [(0, '852.880')] [2024-06-15 20:46:47,083][1651669] Updated weights for policy 0, policy_version 777366 (0.0078) [2024-06-15 20:46:50,766][1648981] Fps is (10 sec: 45892.8, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1592131584. Throughput: 0: 11764.6. Samples: 398121984. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:50,767][1648981] Avg episode reward: [(0, '854.720')] [2024-06-15 20:46:51,264][1651274] Signal inference workers to stop experience collection... (40750 times) [2024-06-15 20:46:51,301][1651669] Updated weights for policy 0, policy_version 777412 (0.0012) [2024-06-15 20:46:51,331][1651669] InferenceWorker_p0-w0: stopping experience collection (40750 times) [2024-06-15 20:46:51,467][1651274] Signal inference workers to resume experience collection... (40750 times) [2024-06-15 20:46:51,468][1651669] InferenceWorker_p0-w0: resuming experience collection (40750 times) [2024-06-15 20:46:52,479][1651669] Updated weights for policy 0, policy_version 777458 (0.0025) [2024-06-15 20:46:54,219][1651669] Updated weights for policy 0, policy_version 777533 (0.0012) [2024-06-15 20:46:55,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48059.8, 300 sec: 47654.9). Total num frames: 1592426496. Throughput: 0: 11810.1. Samples: 398157312. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:46:55,767][1648981] Avg episode reward: [(0, '860.790')] [2024-06-15 20:46:56,037][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000777568_1592459264.pth... [2024-06-15 20:46:56,190][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000772032_1581121536.pth [2024-06-15 20:46:56,682][1651669] Updated weights for policy 0, policy_version 777589 (0.0011) [2024-06-15 20:46:57,486][1651669] Updated weights for policy 0, policy_version 777618 (0.0015) [2024-06-15 20:47:00,767][1648981] Fps is (10 sec: 52426.8, 60 sec: 48063.0, 300 sec: 47541.4). Total num frames: 1592655872. Throughput: 0: 11912.9. Samples: 398228992. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:47:00,767][1648981] Avg episode reward: [(0, '836.940')] [2024-06-15 20:47:01,900][1651669] Updated weights for policy 0, policy_version 777667 (0.0011) [2024-06-15 20:47:02,978][1651669] Updated weights for policy 0, policy_version 777721 (0.0011) [2024-06-15 20:47:04,691][1651669] Updated weights for policy 0, policy_version 777790 (0.0039) [2024-06-15 20:47:05,768][1648981] Fps is (10 sec: 49142.2, 60 sec: 48058.2, 300 sec: 47874.3). Total num frames: 1592918016. Throughput: 0: 12082.7. Samples: 398303744. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:47:05,769][1648981] Avg episode reward: [(0, '855.800')] [2024-06-15 20:47:07,628][1651669] Updated weights for policy 0, policy_version 777852 (0.0017) [2024-06-15 20:47:09,124][1651669] Updated weights for policy 0, policy_version 777920 (0.0019) [2024-06-15 20:47:10,766][1648981] Fps is (10 sec: 52430.5, 60 sec: 48059.7, 300 sec: 47652.4). Total num frames: 1593180160. Throughput: 0: 11776.0. Samples: 398334464. Policy #0 lag: (min: 63.0, avg: 142.0, max: 316.0) [2024-06-15 20:47:10,767][1648981] Avg episode reward: [(0, '927.250')] [2024-06-15 20:47:14,133][1651669] Updated weights for policy 0, policy_version 777981 (0.0013) [2024-06-15 20:47:15,599][1651669] Updated weights for policy 0, policy_version 778032 (0.0011) [2024-06-15 20:47:15,766][1648981] Fps is (10 sec: 49161.5, 60 sec: 47515.0, 300 sec: 47874.6). Total num frames: 1593409536. Throughput: 0: 11958.6. Samples: 398413312. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:15,767][1648981] Avg episode reward: [(0, '903.380')] [2024-06-15 20:47:18,141][1651669] Updated weights for policy 0, policy_version 778082 (0.0019) [2024-06-15 20:47:19,903][1651669] Updated weights for policy 0, policy_version 778147 (0.0012) [2024-06-15 20:47:20,795][1648981] Fps is (10 sec: 52283.2, 60 sec: 48037.4, 300 sec: 47981.2). Total num frames: 1593704448. Throughput: 0: 11871.0. Samples: 398479872. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:20,797][1648981] Avg episode reward: [(0, '918.220')] [2024-06-15 20:47:24,614][1651669] Updated weights for policy 0, policy_version 778209 (0.0011) [2024-06-15 20:47:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46421.3, 300 sec: 47652.5). Total num frames: 1593868288. Throughput: 0: 12061.5. Samples: 398522880. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:25,767][1648981] Avg episode reward: [(0, '916.360')] [2024-06-15 20:47:26,471][1651669] Updated weights for policy 0, policy_version 778288 (0.0011) [2024-06-15 20:47:28,523][1651669] Updated weights for policy 0, policy_version 778322 (0.0010) [2024-06-15 20:47:29,426][1651669] Updated weights for policy 0, policy_version 778365 (0.0012) [2024-06-15 20:47:30,074][1651274] Signal inference workers to stop experience collection... (40800 times) [2024-06-15 20:47:30,175][1651669] InferenceWorker_p0-w0: stopping experience collection (40800 times) [2024-06-15 20:47:30,382][1651274] Signal inference workers to resume experience collection... (40800 times) [2024-06-15 20:47:30,383][1651669] InferenceWorker_p0-w0: resuming experience collection (40800 times) [2024-06-15 20:47:30,766][1648981] Fps is (10 sec: 46003.5, 60 sec: 48061.4, 300 sec: 47763.5). Total num frames: 1594163200. Throughput: 0: 11992.2. Samples: 398590464. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:30,767][1648981] Avg episode reward: [(0, '886.890')] [2024-06-15 20:47:35,767][1648981] Fps is (10 sec: 36044.6, 60 sec: 45329.0, 300 sec: 47208.1). Total num frames: 1594228736. Throughput: 0: 12094.5. Samples: 398666240. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:35,767][1648981] Avg episode reward: [(0, '888.250')] [2024-06-15 20:47:35,803][1651669] Updated weights for policy 0, policy_version 778448 (0.0012) [2024-06-15 20:47:37,614][1651669] Updated weights for policy 0, policy_version 778528 (0.0011) [2024-06-15 20:47:39,652][1651669] Updated weights for policy 0, policy_version 778595 (0.0014) [2024-06-15 20:47:40,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49155.1, 300 sec: 47541.4). Total num frames: 1594621952. Throughput: 0: 11958.0. Samples: 398695424. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:40,767][1648981] Avg episode reward: [(0, '923.400')] [2024-06-15 20:47:41,362][1651669] Updated weights for policy 0, policy_version 778643 (0.0012) [2024-06-15 20:47:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1594753024. Throughput: 0: 11912.6. Samples: 398765056. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:45,767][1648981] Avg episode reward: [(0, '905.680')] [2024-06-15 20:47:46,822][1651669] Updated weights for policy 0, policy_version 778692 (0.0012) [2024-06-15 20:47:48,030][1651669] Updated weights for policy 0, policy_version 778755 (0.0012) [2024-06-15 20:47:49,312][1651669] Updated weights for policy 0, policy_version 778812 (0.0012) [2024-06-15 20:47:50,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49698.1, 300 sec: 47652.6). Total num frames: 1595113472. Throughput: 0: 11844.8. Samples: 398836736. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:50,767][1648981] Avg episode reward: [(0, '890.340')] [2024-06-15 20:47:50,919][1651669] Updated weights for policy 0, policy_version 778875 (0.0095) [2024-06-15 20:47:53,107][1651669] Updated weights for policy 0, policy_version 778944 (0.0017) [2024-06-15 20:47:55,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 47430.3). Total num frames: 1595277312. Throughput: 0: 11878.4. Samples: 398868992. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:47:55,767][1648981] Avg episode reward: [(0, '915.810')] [2024-06-15 20:47:59,321][1651669] Updated weights for policy 0, policy_version 778995 (0.0011) [2024-06-15 20:48:00,689][1651669] Updated weights for policy 0, policy_version 779065 (0.0011) [2024-06-15 20:48:00,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 47513.9, 300 sec: 47430.7). Total num frames: 1595506688. Throughput: 0: 11867.0. Samples: 398947328. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:00,767][1648981] Avg episode reward: [(0, '947.810')] [2024-06-15 20:48:02,213][1651669] Updated weights for policy 0, policy_version 779120 (0.0012) [2024-06-15 20:48:04,614][1651669] Updated weights for policy 0, policy_version 779192 (0.0011) [2024-06-15 20:48:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48061.3, 300 sec: 47541.4). Total num frames: 1595801600. Throughput: 0: 11817.5. Samples: 399011328. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:05,767][1648981] Avg episode reward: [(0, '928.670')] [2024-06-15 20:48:09,391][1651669] Updated weights for policy 0, policy_version 779226 (0.0015) [2024-06-15 20:48:10,542][1651669] Updated weights for policy 0, policy_version 779280 (0.0093) [2024-06-15 20:48:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46421.4, 300 sec: 47319.2). Total num frames: 1595965440. Throughput: 0: 11855.7. Samples: 399056384. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:10,767][1648981] Avg episode reward: [(0, '920.760')] [2024-06-15 20:48:11,456][1651274] Signal inference workers to stop experience collection... (40850 times) [2024-06-15 20:48:11,534][1651669] InferenceWorker_p0-w0: stopping experience collection (40850 times) [2024-06-15 20:48:11,536][1651669] Updated weights for policy 0, policy_version 779325 (0.0016) [2024-06-15 20:48:11,559][1651274] Signal inference workers to resume experience collection... (40850 times) [2024-06-15 20:48:11,560][1651669] InferenceWorker_p0-w0: resuming experience collection (40850 times) [2024-06-15 20:48:13,127][1651669] Updated weights for policy 0, policy_version 779387 (0.0118) [2024-06-15 20:48:15,624][1651669] Updated weights for policy 0, policy_version 779453 (0.0011) [2024-06-15 20:48:15,768][1648981] Fps is (10 sec: 52421.6, 60 sec: 48604.8, 300 sec: 47541.2). Total num frames: 1596325888. Throughput: 0: 11855.3. Samples: 399123968. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:15,770][1648981] Avg episode reward: [(0, '963.460')] [2024-06-15 20:48:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 44803.8, 300 sec: 47319.2). Total num frames: 1596391424. Throughput: 0: 11844.3. Samples: 399199232. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:20,767][1648981] Avg episode reward: [(0, '970.430')] [2024-06-15 20:48:21,429][1651669] Updated weights for policy 0, policy_version 779510 (0.0013) [2024-06-15 20:48:23,254][1651669] Updated weights for policy 0, policy_version 779600 (0.0012) [2024-06-15 20:48:24,206][1651669] Updated weights for policy 0, policy_version 779648 (0.0086) [2024-06-15 20:48:25,770][1648981] Fps is (10 sec: 39311.8, 60 sec: 47510.6, 300 sec: 47207.5). Total num frames: 1596719104. Throughput: 0: 11706.7. Samples: 399222272. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:25,771][1648981] Avg episode reward: [(0, '984.330')] [2024-06-15 20:48:26,869][1651669] Updated weights for policy 0, policy_version 779711 (0.0138) [2024-06-15 20:48:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 1596850176. Throughput: 0: 11969.4. Samples: 399303680. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:30,767][1648981] Avg episode reward: [(0, '977.610')] [2024-06-15 20:48:31,820][1651669] Updated weights for policy 0, policy_version 779762 (0.0013) [2024-06-15 20:48:34,041][1651669] Updated weights for policy 0, policy_version 779856 (0.0112) [2024-06-15 20:48:35,766][1648981] Fps is (10 sec: 52449.0, 60 sec: 50244.4, 300 sec: 47543.9). Total num frames: 1597243392. Throughput: 0: 11787.4. Samples: 399367168. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:35,767][1648981] Avg episode reward: [(0, '961.790')] [2024-06-15 20:48:36,646][1651669] Updated weights for policy 0, policy_version 779905 (0.0012) [2024-06-15 20:48:38,018][1651669] Updated weights for policy 0, policy_version 779968 (0.0011) [2024-06-15 20:48:40,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 45875.1, 300 sec: 47097.0). Total num frames: 1597374464. Throughput: 0: 11946.6. Samples: 399406592. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:40,767][1648981] Avg episode reward: [(0, '944.310')] [2024-06-15 20:48:44,066][1651669] Updated weights for policy 0, policy_version 780048 (0.0012) [2024-06-15 20:48:45,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48605.9, 300 sec: 47319.3). Total num frames: 1597669376. Throughput: 0: 11741.9. Samples: 399475712. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:45,767][1648981] Avg episode reward: [(0, '961.430')] [2024-06-15 20:48:45,794][1651669] Updated weights for policy 0, policy_version 780113 (0.0012) [2024-06-15 20:48:48,681][1651669] Updated weights for policy 0, policy_version 780192 (0.0102) [2024-06-15 20:48:50,773][1648981] Fps is (10 sec: 52396.9, 60 sec: 46416.5, 300 sec: 47208.1). Total num frames: 1597898752. Throughput: 0: 11694.7. Samples: 399537664. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:50,773][1648981] Avg episode reward: [(0, '918.470')] [2024-06-15 20:48:54,100][1651669] Updated weights for policy 0, policy_version 780240 (0.0011) [2024-06-15 20:48:54,264][1651274] Signal inference workers to stop experience collection... (40900 times) [2024-06-15 20:48:54,349][1651669] InferenceWorker_p0-w0: stopping experience collection (40900 times) [2024-06-15 20:48:54,549][1651274] Signal inference workers to resume experience collection... (40900 times) [2024-06-15 20:48:54,550][1651669] InferenceWorker_p0-w0: resuming experience collection (40900 times) [2024-06-15 20:48:55,769][1648981] Fps is (10 sec: 36033.5, 60 sec: 45872.8, 300 sec: 47096.6). Total num frames: 1598029824. Throughput: 0: 11695.5. Samples: 399582720. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:48:55,770][1648981] Avg episode reward: [(0, '904.970')] [2024-06-15 20:48:56,014][1651669] Updated weights for policy 0, policy_version 780307 (0.0013) [2024-06-15 20:48:56,275][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000780320_1598095360.pth... [2024-06-15 20:48:56,369][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000774832_1586855936.pth [2024-06-15 20:48:57,571][1651669] Updated weights for policy 0, policy_version 780368 (0.0012) [2024-06-15 20:48:58,689][1651669] Updated weights for policy 0, policy_version 780416 (0.0010) [2024-06-15 20:49:00,766][1648981] Fps is (10 sec: 49182.8, 60 sec: 48059.8, 300 sec: 47431.8). Total num frames: 1598390272. Throughput: 0: 11514.7. Samples: 399642112. Policy #0 lag: (min: 35.0, avg: 142.9, max: 292.0) [2024-06-15 20:49:00,767][1648981] Avg episode reward: [(0, '886.550')] [2024-06-15 20:49:00,824][1651669] Updated weights for policy 0, policy_version 780474 (0.0014) [2024-06-15 20:49:05,766][1648981] Fps is (10 sec: 45889.7, 60 sec: 44783.0, 300 sec: 46986.9). Total num frames: 1598488576. Throughput: 0: 11741.9. Samples: 399727616. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:05,767][1648981] Avg episode reward: [(0, '880.370')] [2024-06-15 20:49:06,607][1651669] Updated weights for policy 0, policy_version 780545 (0.0010) [2024-06-15 20:49:08,509][1651669] Updated weights for policy 0, policy_version 780612 (0.0012) [2024-06-15 20:49:10,767][1648981] Fps is (10 sec: 42597.4, 60 sec: 47513.5, 300 sec: 47209.4). Total num frames: 1598816256. Throughput: 0: 11731.4. Samples: 399750144. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:10,768][1648981] Avg episode reward: [(0, '871.440')] [2024-06-15 20:49:11,250][1651669] Updated weights for policy 0, policy_version 780678 (0.0011) [2024-06-15 20:49:12,378][1651669] Updated weights for policy 0, policy_version 780736 (0.0011) [2024-06-15 20:49:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 43691.6, 300 sec: 46986.0). Total num frames: 1598947328. Throughput: 0: 11434.7. Samples: 399818240. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:15,767][1648981] Avg episode reward: [(0, '892.590')] [2024-06-15 20:49:17,318][1651669] Updated weights for policy 0, policy_version 780797 (0.0011) [2024-06-15 20:49:19,115][1651669] Updated weights for policy 0, policy_version 780865 (0.0011) [2024-06-15 20:49:20,611][1651669] Updated weights for policy 0, policy_version 780925 (0.0119) [2024-06-15 20:49:20,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1599340544. Throughput: 0: 11548.4. Samples: 399886848. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:20,767][1648981] Avg episode reward: [(0, '897.070')] [2024-06-15 20:49:23,348][1651669] Updated weights for policy 0, policy_version 780976 (0.0012) [2024-06-15 20:49:25,782][1648981] Fps is (10 sec: 52345.9, 60 sec: 45866.0, 300 sec: 47094.5). Total num frames: 1599471616. Throughput: 0: 11544.4. Samples: 399926272. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:25,783][1648981] Avg episode reward: [(0, '926.320')] [2024-06-15 20:49:27,775][1651669] Updated weights for policy 0, policy_version 781025 (0.0011) [2024-06-15 20:49:29,060][1651669] Updated weights for policy 0, policy_version 781088 (0.0011) [2024-06-15 20:49:30,684][1651274] Signal inference workers to stop experience collection... (40950 times) [2024-06-15 20:49:30,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 48605.7, 300 sec: 47208.1). Total num frames: 1599766528. Throughput: 0: 11650.8. Samples: 400000000. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:30,768][1648981] Avg episode reward: [(0, '926.660')] [2024-06-15 20:49:30,774][1651669] InferenceWorker_p0-w0: stopping experience collection (40950 times) [2024-06-15 20:49:30,943][1651274] Signal inference workers to resume experience collection... (40950 times) [2024-06-15 20:49:30,943][1651669] InferenceWorker_p0-w0: resuming experience collection (40950 times) [2024-06-15 20:49:31,310][1651669] Updated weights for policy 0, policy_version 781168 (0.0118) [2024-06-15 20:49:34,400][1651669] Updated weights for policy 0, policy_version 781216 (0.0062) [2024-06-15 20:49:35,766][1648981] Fps is (10 sec: 52511.6, 60 sec: 45875.1, 300 sec: 47099.6). Total num frames: 1599995904. Throughput: 0: 11663.8. Samples: 400062464. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:35,767][1648981] Avg episode reward: [(0, '893.020')] [2024-06-15 20:49:39,343][1651669] Updated weights for policy 0, policy_version 781302 (0.0016) [2024-06-15 20:49:40,724][1651669] Updated weights for policy 0, policy_version 781348 (0.0016) [2024-06-15 20:49:40,788][1648981] Fps is (10 sec: 42507.3, 60 sec: 46950.6, 300 sec: 46873.0). Total num frames: 1600192512. Throughput: 0: 11623.3. Samples: 400105984. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:40,789][1648981] Avg episode reward: [(0, '898.820')] [2024-06-15 20:49:42,960][1651669] Updated weights for policy 0, policy_version 781437 (0.0012) [2024-06-15 20:49:45,779][1648981] Fps is (10 sec: 42546.6, 60 sec: 45865.8, 300 sec: 46763.3). Total num frames: 1600421888. Throughput: 0: 11670.4. Samples: 400167424. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:45,779][1648981] Avg episode reward: [(0, '933.760')] [2024-06-15 20:49:45,923][1651669] Updated weights for policy 0, policy_version 781473 (0.0014) [2024-06-15 20:49:50,386][1651669] Updated weights for policy 0, policy_version 781561 (0.0013) [2024-06-15 20:49:50,766][1648981] Fps is (10 sec: 45974.9, 60 sec: 45879.9, 300 sec: 46986.0). Total num frames: 1600651264. Throughput: 0: 11502.9. Samples: 400245248. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:50,767][1648981] Avg episode reward: [(0, '941.920')] [2024-06-15 20:49:51,939][1651669] Updated weights for policy 0, policy_version 781616 (0.0012) [2024-06-15 20:49:53,557][1651669] Updated weights for policy 0, policy_version 781680 (0.0075) [2024-06-15 20:49:55,767][1648981] Fps is (10 sec: 49211.0, 60 sec: 48062.0, 300 sec: 46987.2). Total num frames: 1600913408. Throughput: 0: 11628.1. Samples: 400273408. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:49:55,767][1648981] Avg episode reward: [(0, '897.350')] [2024-06-15 20:49:55,949][1651669] Updated weights for policy 0, policy_version 781698 (0.0025) [2024-06-15 20:49:56,822][1651669] Updated weights for policy 0, policy_version 781758 (0.0013) [2024-06-15 20:50:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45329.0, 300 sec: 47098.9). Total num frames: 1601110016. Throughput: 0: 12003.5. Samples: 400358400. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:00,767][1648981] Avg episode reward: [(0, '866.500')] [2024-06-15 20:50:01,215][1651669] Updated weights for policy 0, policy_version 781812 (0.0014) [2024-06-15 20:50:02,161][1651669] Updated weights for policy 0, policy_version 781846 (0.0012) [2024-06-15 20:50:03,996][1651669] Updated weights for policy 0, policy_version 781923 (0.0134) [2024-06-15 20:50:05,767][1648981] Fps is (10 sec: 52429.1, 60 sec: 49151.8, 300 sec: 47319.2). Total num frames: 1601437696. Throughput: 0: 12014.9. Samples: 400427520. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:05,767][1648981] Avg episode reward: [(0, '830.200')] [2024-06-15 20:50:06,315][1651669] Updated weights for policy 0, policy_version 781968 (0.0010) [2024-06-15 20:50:07,141][1651669] Updated weights for policy 0, policy_version 782006 (0.0012) [2024-06-15 20:50:10,775][1648981] Fps is (10 sec: 45837.7, 60 sec: 45869.0, 300 sec: 47095.7). Total num frames: 1601568768. Throughput: 0: 12062.5. Samples: 400468992. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:10,775][1648981] Avg episode reward: [(0, '831.820')] [2024-06-15 20:50:12,119][1651669] Updated weights for policy 0, policy_version 782064 (0.0011) [2024-06-15 20:50:13,085][1651274] Signal inference workers to stop experience collection... (41000 times) [2024-06-15 20:50:13,172][1651669] InferenceWorker_p0-w0: stopping experience collection (41000 times) [2024-06-15 20:50:13,339][1651274] Signal inference workers to resume experience collection... (41000 times) [2024-06-15 20:50:13,340][1651669] InferenceWorker_p0-w0: resuming experience collection (41000 times) [2024-06-15 20:50:13,497][1651669] Updated weights for policy 0, policy_version 782115 (0.0012) [2024-06-15 20:50:14,802][1651669] Updated weights for policy 0, policy_version 782176 (0.0010) [2024-06-15 20:50:15,770][1648981] Fps is (10 sec: 52413.4, 60 sec: 50241.6, 300 sec: 47540.9). Total num frames: 1601961984. Throughput: 0: 11877.6. Samples: 400534528. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:15,770][1648981] Avg episode reward: [(0, '831.980')] [2024-06-15 20:50:17,711][1651669] Updated weights for policy 0, policy_version 782227 (0.0012) [2024-06-15 20:50:20,778][1648981] Fps is (10 sec: 52410.2, 60 sec: 45866.2, 300 sec: 47095.2). Total num frames: 1602093056. Throughput: 0: 12125.6. Samples: 400608256. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:20,779][1648981] Avg episode reward: [(0, '851.950')] [2024-06-15 20:50:22,115][1651669] Updated weights for policy 0, policy_version 782278 (0.0013) [2024-06-15 20:50:24,238][1651669] Updated weights for policy 0, policy_version 782352 (0.0012) [2024-06-15 20:50:25,766][1648981] Fps is (10 sec: 42611.9, 60 sec: 48618.7, 300 sec: 47320.8). Total num frames: 1602387968. Throughput: 0: 11963.8. Samples: 400644096. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:25,767][1648981] Avg episode reward: [(0, '874.060')] [2024-06-15 20:50:25,943][1651669] Updated weights for policy 0, policy_version 782417 (0.0012) [2024-06-15 20:50:29,527][1651669] Updated weights for policy 0, policy_version 782497 (0.0012) [2024-06-15 20:50:30,067][1651669] Updated weights for policy 0, policy_version 782524 (0.0010) [2024-06-15 20:50:30,766][1648981] Fps is (10 sec: 52490.7, 60 sec: 47513.8, 300 sec: 47319.2). Total num frames: 1602617344. Throughput: 0: 11961.3. Samples: 400705536. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:30,767][1648981] Avg episode reward: [(0, '835.930')] [2024-06-15 20:50:34,151][1651669] Updated weights for policy 0, policy_version 782580 (0.0012) [2024-06-15 20:50:35,721][1651669] Updated weights for policy 0, policy_version 782654 (0.0223) [2024-06-15 20:50:35,767][1648981] Fps is (10 sec: 49149.2, 60 sec: 48059.4, 300 sec: 47430.2). Total num frames: 1602879488. Throughput: 0: 11878.3. Samples: 400779776. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:35,768][1648981] Avg episode reward: [(0, '849.090')] [2024-06-15 20:50:37,545][1651669] Updated weights for policy 0, policy_version 782716 (0.0115) [2024-06-15 20:50:40,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48077.2, 300 sec: 47319.2). Total num frames: 1603076096. Throughput: 0: 11924.0. Samples: 400809984. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:40,767][1648981] Avg episode reward: [(0, '830.450')] [2024-06-15 20:50:41,219][1651669] Updated weights for policy 0, policy_version 782772 (0.0015) [2024-06-15 20:50:43,666][1651669] Updated weights for policy 0, policy_version 782800 (0.0013) [2024-06-15 20:50:45,053][1651669] Updated weights for policy 0, policy_version 782852 (0.0012) [2024-06-15 20:50:45,766][1648981] Fps is (10 sec: 45877.6, 60 sec: 48615.8, 300 sec: 47430.3). Total num frames: 1603338240. Throughput: 0: 12003.6. Samples: 400898560. Policy #0 lag: (min: 55.0, avg: 212.3, max: 311.0) [2024-06-15 20:50:45,767][1648981] Avg episode reward: [(0, '837.150')] [2024-06-15 20:50:46,291][1651669] Updated weights for policy 0, policy_version 782907 (0.0016) [2024-06-15 20:50:47,957][1651669] Updated weights for policy 0, policy_version 782970 (0.0012) [2024-06-15 20:50:50,767][1648981] Fps is (10 sec: 49147.9, 60 sec: 48605.3, 300 sec: 47541.2). Total num frames: 1603567616. Throughput: 0: 11969.3. Samples: 400966144. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:50:50,768][1648981] Avg episode reward: [(0, '827.820')] [2024-06-15 20:50:51,120][1651669] Updated weights for policy 0, policy_version 783010 (0.0013) [2024-06-15 20:50:54,077][1651669] Updated weights for policy 0, policy_version 783048 (0.0013) [2024-06-15 20:50:54,315][1651274] Signal inference workers to stop experience collection... (41050 times) [2024-06-15 20:50:54,359][1651669] InferenceWorker_p0-w0: stopping experience collection (41050 times) [2024-06-15 20:50:54,533][1651274] Signal inference workers to resume experience collection... (41050 times) [2024-06-15 20:50:54,533][1651669] InferenceWorker_p0-w0: resuming experience collection (41050 times) [2024-06-15 20:50:55,484][1651669] Updated weights for policy 0, policy_version 783106 (0.0012) [2024-06-15 20:50:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.9, 300 sec: 47542.1). Total num frames: 1603796992. Throughput: 0: 12074.0. Samples: 401012224. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:50:55,767][1648981] Avg episode reward: [(0, '866.500')] [2024-06-15 20:50:56,160][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000783136_1603862528.pth... [2024-06-15 20:50:56,319][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000777568_1592459264.pth [2024-06-15 20:50:56,324][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000783136_1603862528.pth [2024-06-15 20:50:57,539][1651669] Updated weights for policy 0, policy_version 783184 (0.0019) [2024-06-15 20:50:58,695][1651669] Updated weights for policy 0, policy_version 783232 (0.0014) [2024-06-15 20:51:00,766][1648981] Fps is (10 sec: 49155.8, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1604059136. Throughput: 0: 11947.5. Samples: 401072128. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:00,767][1648981] Avg episode reward: [(0, '910.490')] [2024-06-15 20:51:05,766][1648981] Fps is (10 sec: 39322.0, 60 sec: 45875.4, 300 sec: 47097.1). Total num frames: 1604190208. Throughput: 0: 12177.4. Samples: 401156096. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:05,767][1648981] Avg episode reward: [(0, '942.550')] [2024-06-15 20:51:05,788][1651669] Updated weights for policy 0, policy_version 783312 (0.0013) [2024-06-15 20:51:07,876][1651669] Updated weights for policy 0, policy_version 783392 (0.0014) [2024-06-15 20:51:09,333][1651669] Updated weights for policy 0, policy_version 783456 (0.0012) [2024-06-15 20:51:10,770][1648981] Fps is (10 sec: 52409.1, 60 sec: 50248.0, 300 sec: 47541.1). Total num frames: 1604583424. Throughput: 0: 11911.5. Samples: 401180160. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:10,771][1648981] Avg episode reward: [(0, '940.890')] [2024-06-15 20:51:12,106][1651669] Updated weights for policy 0, policy_version 783493 (0.0011) [2024-06-15 20:51:13,177][1651669] Updated weights for policy 0, policy_version 783548 (0.0011) [2024-06-15 20:51:15,769][1648981] Fps is (10 sec: 52417.1, 60 sec: 45875.9, 300 sec: 47096.7). Total num frames: 1604714496. Throughput: 0: 12196.4. Samples: 401254400. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:15,769][1648981] Avg episode reward: [(0, '938.550')] [2024-06-15 20:51:18,551][1651669] Updated weights for policy 0, policy_version 783603 (0.0012) [2024-06-15 20:51:20,315][1651669] Updated weights for policy 0, policy_version 783668 (0.0036) [2024-06-15 20:51:20,767][1648981] Fps is (10 sec: 39335.0, 60 sec: 48068.9, 300 sec: 47097.0). Total num frames: 1604976640. Throughput: 0: 12049.1. Samples: 401321984. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:20,768][1648981] Avg episode reward: [(0, '960.360')] [2024-06-15 20:51:21,983][1651669] Updated weights for policy 0, policy_version 783739 (0.0124) [2024-06-15 20:51:24,548][1651669] Updated weights for policy 0, policy_version 783798 (0.0011) [2024-06-15 20:51:25,782][1648981] Fps is (10 sec: 52356.2, 60 sec: 47500.9, 300 sec: 47317.0). Total num frames: 1605238784. Throughput: 0: 12113.0. Samples: 401355264. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:25,783][1648981] Avg episode reward: [(0, '992.430')] [2024-06-15 20:51:30,150][1651669] Updated weights for policy 0, policy_version 783858 (0.0015) [2024-06-15 20:51:30,766][1648981] Fps is (10 sec: 42600.0, 60 sec: 46421.3, 300 sec: 47097.1). Total num frames: 1605402624. Throughput: 0: 11832.9. Samples: 401431040. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:30,767][1648981] Avg episode reward: [(0, '975.900')] [2024-06-15 20:51:32,475][1651274] Signal inference workers to stop experience collection... (41100 times) [2024-06-15 20:51:32,545][1651669] Updated weights for policy 0, policy_version 783953 (0.0012) [2024-06-15 20:51:32,562][1651669] InferenceWorker_p0-w0: stopping experience collection (41100 times) [2024-06-15 20:51:32,850][1651274] Signal inference workers to resume experience collection... (41100 times) [2024-06-15 20:51:32,851][1651669] InferenceWorker_p0-w0: resuming experience collection (41100 times) [2024-06-15 20:51:33,623][1651669] Updated weights for policy 0, policy_version 784000 (0.0011) [2024-06-15 20:51:35,766][1648981] Fps is (10 sec: 45948.8, 60 sec: 46967.9, 300 sec: 47542.0). Total num frames: 1605697536. Throughput: 0: 11764.8. Samples: 401495552. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:35,767][1648981] Avg episode reward: [(0, '979.270')] [2024-06-15 20:51:36,071][1651669] Updated weights for policy 0, policy_version 784058 (0.0011) [2024-06-15 20:51:40,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1605828608. Throughput: 0: 11616.7. Samples: 401534976. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:40,767][1648981] Avg episode reward: [(0, '936.650')] [2024-06-15 20:51:41,693][1651669] Updated weights for policy 0, policy_version 784128 (0.0015) [2024-06-15 20:51:43,100][1651669] Updated weights for policy 0, policy_version 784181 (0.0012) [2024-06-15 20:51:44,603][1651669] Updated weights for policy 0, policy_version 784245 (0.0014) [2024-06-15 20:51:45,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1606156288. Throughput: 0: 11514.3. Samples: 401590272. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:45,767][1648981] Avg episode reward: [(0, '926.900')] [2024-06-15 20:51:46,975][1651669] Updated weights for policy 0, policy_version 784290 (0.0123) [2024-06-15 20:51:50,798][1648981] Fps is (10 sec: 45729.3, 60 sec: 45305.6, 300 sec: 46980.9). Total num frames: 1606287360. Throughput: 0: 11528.9. Samples: 401675264. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:50,799][1648981] Avg episode reward: [(0, '938.820')] [2024-06-15 20:51:52,890][1651669] Updated weights for policy 0, policy_version 784384 (0.0011) [2024-06-15 20:51:54,575][1651669] Updated weights for policy 0, policy_version 784448 (0.0012) [2024-06-15 20:51:55,806][1648981] Fps is (10 sec: 48956.8, 60 sec: 47482.1, 300 sec: 47423.9). Total num frames: 1606647808. Throughput: 0: 11630.2. Samples: 401703936. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:51:55,807][1648981] Avg episode reward: [(0, '963.740')] [2024-06-15 20:51:55,983][1651669] Updated weights for policy 0, policy_version 784512 (0.0012) [2024-06-15 20:51:58,490][1651669] Updated weights for policy 0, policy_version 784570 (0.0015) [2024-06-15 20:52:00,767][1648981] Fps is (10 sec: 52592.5, 60 sec: 45874.7, 300 sec: 47097.3). Total num frames: 1606811648. Throughput: 0: 11514.7. Samples: 401772544. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:00,768][1648981] Avg episode reward: [(0, '960.060')] [2024-06-15 20:52:04,401][1651669] Updated weights for policy 0, policy_version 784627 (0.0015) [2024-06-15 20:52:05,803][1648981] Fps is (10 sec: 36056.4, 60 sec: 46938.8, 300 sec: 46869.1). Total num frames: 1607008256. Throughput: 0: 11470.9. Samples: 401838592. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:05,803][1648981] Avg episode reward: [(0, '921.650')] [2024-06-15 20:52:06,140][1651669] Updated weights for policy 0, policy_version 784688 (0.0012) [2024-06-15 20:52:07,440][1651669] Updated weights for policy 0, policy_version 784736 (0.0118) [2024-06-15 20:52:10,766][1648981] Fps is (10 sec: 52432.3, 60 sec: 45878.1, 300 sec: 47208.1). Total num frames: 1607335936. Throughput: 0: 11381.8. Samples: 401867264. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:10,767][1648981] Avg episode reward: [(0, '921.650')] [2024-06-15 20:52:14,974][1651669] Updated weights for policy 0, policy_version 784848 (0.0013) [2024-06-15 20:52:15,766][1648981] Fps is (10 sec: 39466.1, 60 sec: 44784.6, 300 sec: 46435.0). Total num frames: 1607401472. Throughput: 0: 11548.4. Samples: 401950720. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:15,767][1648981] Avg episode reward: [(0, '956.310')] [2024-06-15 20:52:16,048][1651274] Signal inference workers to stop experience collection... (41150 times) [2024-06-15 20:52:16,110][1651669] InferenceWorker_p0-w0: stopping experience collection (41150 times) [2024-06-15 20:52:16,347][1651274] Signal inference workers to resume experience collection... (41150 times) [2024-06-15 20:52:16,347][1651669] InferenceWorker_p0-w0: resuming experience collection (41150 times) [2024-06-15 20:52:17,060][1651669] Updated weights for policy 0, policy_version 784916 (0.0012) [2024-06-15 20:52:19,059][1651669] Updated weights for policy 0, policy_version 784992 (0.0010) [2024-06-15 20:52:20,398][1651669] Updated weights for policy 0, policy_version 785041 (0.0012) [2024-06-15 20:52:20,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46967.8, 300 sec: 47208.2). Total num frames: 1607794688. Throughput: 0: 11434.7. Samples: 402010112. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:20,767][1648981] Avg episode reward: [(0, '945.570')] [2024-06-15 20:52:25,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 43702.3, 300 sec: 46430.6). Total num frames: 1607860224. Throughput: 0: 11514.3. Samples: 402053120. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:25,767][1648981] Avg episode reward: [(0, '976.640')] [2024-06-15 20:52:25,772][1651669] Updated weights for policy 0, policy_version 785095 (0.0011) [2024-06-15 20:52:28,253][1651669] Updated weights for policy 0, policy_version 785186 (0.0012) [2024-06-15 20:52:29,909][1651669] Updated weights for policy 0, policy_version 785251 (0.0012) [2024-06-15 20:52:30,509][1651669] Updated weights for policy 0, policy_version 785280 (0.0011) [2024-06-15 20:52:30,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1608253440. Throughput: 0: 11502.9. Samples: 402107904. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:30,767][1648981] Avg episode reward: [(0, '1005.070')] [2024-06-15 20:52:32,474][1651669] Updated weights for policy 0, policy_version 785342 (0.0014) [2024-06-15 20:52:35,772][1648981] Fps is (10 sec: 52400.1, 60 sec: 44778.8, 300 sec: 46651.9). Total num frames: 1608384512. Throughput: 0: 11509.7. Samples: 402192896. Policy #0 lag: (min: 16.0, avg: 164.7, max: 271.0) [2024-06-15 20:52:35,772][1648981] Avg episode reward: [(0, '1024.170')] [2024-06-15 20:52:37,842][1651669] Updated weights for policy 0, policy_version 785394 (0.0013) [2024-06-15 20:52:39,208][1651669] Updated weights for policy 0, policy_version 785444 (0.0012) [2024-06-15 20:52:40,688][1651669] Updated weights for policy 0, policy_version 785504 (0.0013) [2024-06-15 20:52:40,769][1648981] Fps is (10 sec: 45861.7, 60 sec: 48057.3, 300 sec: 47318.7). Total num frames: 1608712192. Throughput: 0: 11603.5. Samples: 402225664. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:52:40,770][1648981] Avg episode reward: [(0, '1054.730')] [2024-06-15 20:52:42,917][1651669] Updated weights for policy 0, policy_version 785559 (0.0013) [2024-06-15 20:52:45,766][1648981] Fps is (10 sec: 52457.9, 60 sec: 45875.2, 300 sec: 46763.8). Total num frames: 1608908800. Throughput: 0: 11571.4. Samples: 402293248. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:52:45,767][1648981] Avg episode reward: [(0, '1072.210')] [2024-06-15 20:52:47,650][1651669] Updated weights for policy 0, policy_version 785602 (0.0011) [2024-06-15 20:52:49,175][1651669] Updated weights for policy 0, policy_version 785680 (0.0022) [2024-06-15 20:52:50,412][1651669] Updated weights for policy 0, policy_version 785731 (0.0086) [2024-06-15 20:52:50,766][1648981] Fps is (10 sec: 49166.9, 60 sec: 48631.7, 300 sec: 47208.1). Total num frames: 1609203712. Throughput: 0: 11797.0. Samples: 402369024. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:52:50,767][1648981] Avg episode reward: [(0, '1085.130')] [2024-06-15 20:52:51,676][1651669] Updated weights for policy 0, policy_version 785785 (0.0018) [2024-06-15 20:52:52,848][1651274] Signal inference workers to stop experience collection... (41200 times) [2024-06-15 20:52:52,897][1651669] InferenceWorker_p0-w0: stopping experience collection (41200 times) [2024-06-15 20:52:53,129][1651274] Signal inference workers to resume experience collection... (41200 times) [2024-06-15 20:52:53,130][1651669] InferenceWorker_p0-w0: resuming experience collection (41200 times) [2024-06-15 20:52:53,309][1651669] Updated weights for policy 0, policy_version 785811 (0.0016) [2024-06-15 20:52:55,782][1648981] Fps is (10 sec: 52349.4, 60 sec: 46440.4, 300 sec: 47205.7). Total num frames: 1609433088. Throughput: 0: 12033.6. Samples: 402408960. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:52:55,784][1648981] Avg episode reward: [(0, '1040.290')] [2024-06-15 20:52:55,790][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000785856_1609433088.pth... [2024-06-15 20:52:55,829][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000780320_1598095360.pth [2024-06-15 20:52:58,210][1651669] Updated weights for policy 0, policy_version 785888 (0.0011) [2024-06-15 20:52:59,483][1651669] Updated weights for policy 0, policy_version 785936 (0.0029) [2024-06-15 20:53:00,767][1648981] Fps is (10 sec: 49150.5, 60 sec: 48060.1, 300 sec: 47097.0). Total num frames: 1609695232. Throughput: 0: 12003.5. Samples: 402490880. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:00,767][1648981] Avg episode reward: [(0, '990.460')] [2024-06-15 20:53:00,768][1651669] Updated weights for policy 0, policy_version 785988 (0.0015) [2024-06-15 20:53:02,086][1651669] Updated weights for policy 0, policy_version 786048 (0.0015) [2024-06-15 20:53:04,108][1651669] Updated weights for policy 0, policy_version 786096 (0.0014) [2024-06-15 20:53:05,766][1648981] Fps is (10 sec: 52508.2, 60 sec: 49182.0, 300 sec: 47430.3). Total num frames: 1609957376. Throughput: 0: 12219.7. Samples: 402560000. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:05,767][1648981] Avg episode reward: [(0, '996.730')] [2024-06-15 20:53:08,332][1651669] Updated weights for policy 0, policy_version 786144 (0.0012) [2024-06-15 20:53:09,469][1651669] Updated weights for policy 0, policy_version 786193 (0.0098) [2024-06-15 20:53:10,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 48059.7, 300 sec: 47097.3). Total num frames: 1610219520. Throughput: 0: 12322.1. Samples: 402607616. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:10,767][1648981] Avg episode reward: [(0, '960.170')] [2024-06-15 20:53:11,315][1651669] Updated weights for policy 0, policy_version 786273 (0.0013) [2024-06-15 20:53:13,441][1651669] Updated weights for policy 0, policy_version 786320 (0.0012) [2024-06-15 20:53:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 51336.5, 300 sec: 47763.5). Total num frames: 1610481664. Throughput: 0: 12504.2. Samples: 402670592. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:15,767][1648981] Avg episode reward: [(0, '928.010')] [2024-06-15 20:53:18,495][1651669] Updated weights for policy 0, policy_version 786386 (0.0029) [2024-06-15 20:53:19,280][1651669] Updated weights for policy 0, policy_version 786432 (0.0011) [2024-06-15 20:53:20,798][1648981] Fps is (10 sec: 48996.5, 60 sec: 48580.1, 300 sec: 47425.8). Total num frames: 1610711040. Throughput: 0: 12462.8. Samples: 402754048. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:20,799][1648981] Avg episode reward: [(0, '905.790')] [2024-06-15 20:53:21,131][1651669] Updated weights for policy 0, policy_version 786500 (0.0025) [2024-06-15 20:53:24,307][1651669] Updated weights for policy 0, policy_version 786578 (0.0012) [2024-06-15 20:53:25,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 52429.0, 300 sec: 47985.7). Total num frames: 1611005952. Throughput: 0: 12425.4. Samples: 402784768. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:25,766][1648981] Avg episode reward: [(0, '925.840')] [2024-06-15 20:53:29,054][1651669] Updated weights for policy 0, policy_version 786640 (0.0011) [2024-06-15 20:53:30,120][1651669] Updated weights for policy 0, policy_version 786688 (0.0012) [2024-06-15 20:53:30,766][1648981] Fps is (10 sec: 42734.3, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1611137024. Throughput: 0: 12629.3. Samples: 402861568. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:30,767][1648981] Avg episode reward: [(0, '876.310')] [2024-06-15 20:53:31,829][1651274] Signal inference workers to stop experience collection... (41250 times) [2024-06-15 20:53:31,893][1651669] InferenceWorker_p0-w0: stopping experience collection (41250 times) [2024-06-15 20:53:31,994][1651274] Signal inference workers to resume experience collection... (41250 times) [2024-06-15 20:53:31,995][1651669] InferenceWorker_p0-w0: resuming experience collection (41250 times) [2024-06-15 20:53:32,145][1651669] Updated weights for policy 0, policy_version 786754 (0.0012) [2024-06-15 20:53:33,416][1651669] Updated weights for policy 0, policy_version 786816 (0.0011) [2024-06-15 20:53:35,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 51341.3, 300 sec: 47763.5). Total num frames: 1611464704. Throughput: 0: 12390.4. Samples: 402926592. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:35,767][1648981] Avg episode reward: [(0, '889.750')] [2024-06-15 20:53:36,098][1651669] Updated weights for policy 0, policy_version 786877 (0.0024) [2024-06-15 20:53:40,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48062.1, 300 sec: 47208.1). Total num frames: 1611595776. Throughput: 0: 12428.7. Samples: 402968064. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:40,767][1648981] Avg episode reward: [(0, '907.880')] [2024-06-15 20:53:40,959][1651669] Updated weights for policy 0, policy_version 786939 (0.0014) [2024-06-15 20:53:42,379][1651669] Updated weights for policy 0, policy_version 786978 (0.0012) [2024-06-15 20:53:43,844][1651669] Updated weights for policy 0, policy_version 787056 (0.0010) [2024-06-15 20:53:45,433][1651669] Updated weights for policy 0, policy_version 787090 (0.0013) [2024-06-15 20:53:45,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 51336.4, 300 sec: 47764.5). Total num frames: 1611988992. Throughput: 0: 12208.4. Samples: 403040256. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:45,767][1648981] Avg episode reward: [(0, '911.660')] [2024-06-15 20:53:50,774][1648981] Fps is (10 sec: 52387.9, 60 sec: 48599.5, 300 sec: 47762.8). Total num frames: 1612120064. Throughput: 0: 12502.0. Samples: 403122688. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:50,775][1648981] Avg episode reward: [(0, '979.390')] [2024-06-15 20:53:50,867][1651669] Updated weights for policy 0, policy_version 787184 (0.0095) [2024-06-15 20:53:53,534][1651669] Updated weights for policy 0, policy_version 787259 (0.0015) [2024-06-15 20:53:55,106][1651669] Updated weights for policy 0, policy_version 787322 (0.0079) [2024-06-15 20:53:55,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 50257.0, 300 sec: 47652.4). Total num frames: 1612447744. Throughput: 0: 12049.1. Samples: 403149824. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:53:55,767][1648981] Avg episode reward: [(0, '1006.430')] [2024-06-15 20:53:56,922][1651669] Updated weights for policy 0, policy_version 787392 (0.0111) [2024-06-15 20:54:00,766][1648981] Fps is (10 sec: 45911.2, 60 sec: 48060.0, 300 sec: 47763.5). Total num frames: 1612578816. Throughput: 0: 12322.1. Samples: 403225088. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:54:00,767][1648981] Avg episode reward: [(0, '1017.650')] [2024-06-15 20:54:02,295][1651669] Updated weights for policy 0, policy_version 787449 (0.0035) [2024-06-15 20:54:04,055][1651669] Updated weights for policy 0, policy_version 787504 (0.0013) [2024-06-15 20:54:05,673][1651669] Updated weights for policy 0, policy_version 787552 (0.0012) [2024-06-15 20:54:05,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1612906496. Throughput: 0: 12012.0. Samples: 403294208. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:54:05,767][1648981] Avg episode reward: [(0, '984.230')] [2024-06-15 20:54:07,450][1651669] Updated weights for policy 0, policy_version 787624 (0.0011) [2024-06-15 20:54:10,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1613103104. Throughput: 0: 12026.3. Samples: 403325952. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:54:10,767][1648981] Avg episode reward: [(0, '963.520')] [2024-06-15 20:54:12,072][1651669] Updated weights for policy 0, policy_version 787664 (0.0015) [2024-06-15 20:54:13,065][1651669] Updated weights for policy 0, policy_version 787707 (0.0053) [2024-06-15 20:54:14,728][1651274] Signal inference workers to stop experience collection... (41300 times) [2024-06-15 20:54:14,739][1651669] Updated weights for policy 0, policy_version 787745 (0.0011) [2024-06-15 20:54:14,801][1651669] InferenceWorker_p0-w0: stopping experience collection (41300 times) [2024-06-15 20:54:15,062][1651274] Signal inference workers to resume experience collection... (41300 times) [2024-06-15 20:54:15,062][1651669] InferenceWorker_p0-w0: resuming experience collection (41300 times) [2024-06-15 20:54:15,767][1648981] Fps is (10 sec: 45874.2, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 1613365248. Throughput: 0: 12094.5. Samples: 403405824. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:54:15,768][1648981] Avg episode reward: [(0, '950.720')] [2024-06-15 20:54:15,951][1651669] Updated weights for policy 0, policy_version 787778 (0.0013) [2024-06-15 20:54:17,377][1651669] Updated weights for policy 0, policy_version 787832 (0.0012) [2024-06-15 20:54:18,944][1651669] Updated weights for policy 0, policy_version 787896 (0.0012) [2024-06-15 20:54:20,767][1648981] Fps is (10 sec: 52425.6, 60 sec: 48631.1, 300 sec: 47988.2). Total num frames: 1613627392. Throughput: 0: 12151.3. Samples: 403473408. Policy #0 lag: (min: 0.0, avg: 61.0, max: 256.0) [2024-06-15 20:54:20,768][1648981] Avg episode reward: [(0, '923.560')] [2024-06-15 20:54:23,000][1651669] Updated weights for policy 0, policy_version 787936 (0.0201) [2024-06-15 20:54:24,582][1651669] Updated weights for policy 0, policy_version 787984 (0.0037) [2024-06-15 20:54:25,590][1651669] Updated weights for policy 0, policy_version 788025 (0.0015) [2024-06-15 20:54:25,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 47513.5, 300 sec: 47763.6). Total num frames: 1613856768. Throughput: 0: 12276.6. Samples: 403520512. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:25,767][1648981] Avg episode reward: [(0, '918.290')] [2024-06-15 20:54:27,373][1651669] Updated weights for policy 0, policy_version 788080 (0.0027) [2024-06-15 20:54:28,743][1651669] Updated weights for policy 0, policy_version 788116 (0.0097) [2024-06-15 20:54:30,766][1648981] Fps is (10 sec: 52432.4, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 1614151680. Throughput: 0: 12003.6. Samples: 403580416. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:30,767][1648981] Avg episode reward: [(0, '889.330')] [2024-06-15 20:54:33,599][1651669] Updated weights for policy 0, policy_version 788176 (0.0089) [2024-06-15 20:54:34,507][1651669] Updated weights for policy 0, policy_version 788221 (0.0019) [2024-06-15 20:54:35,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47878.1). Total num frames: 1614315520. Throughput: 0: 12051.2. Samples: 403664896. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:35,767][1648981] Avg episode reward: [(0, '933.780')] [2024-06-15 20:54:36,530][1651669] Updated weights for policy 0, policy_version 788283 (0.0012) [2024-06-15 20:54:38,392][1651669] Updated weights for policy 0, policy_version 788336 (0.0121) [2024-06-15 20:54:39,593][1651669] Updated weights for policy 0, policy_version 788370 (0.0013) [2024-06-15 20:54:40,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 50790.3, 300 sec: 48209.8). Total num frames: 1614643200. Throughput: 0: 12060.4. Samples: 403692544. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:40,767][1648981] Avg episode reward: [(0, '937.170')] [2024-06-15 20:54:40,835][1651669] Updated weights for policy 0, policy_version 788416 (0.0011) [2024-06-15 20:54:45,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 46421.3, 300 sec: 47874.6). Total num frames: 1614774272. Throughput: 0: 12094.5. Samples: 403769344. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:45,767][1648981] Avg episode reward: [(0, '926.500')] [2024-06-15 20:54:45,832][1651669] Updated weights for policy 0, policy_version 788475 (0.0012) [2024-06-15 20:54:46,768][1651669] Updated weights for policy 0, policy_version 788514 (0.0012) [2024-06-15 20:54:48,190][1651669] Updated weights for policy 0, policy_version 788560 (0.0009) [2024-06-15 20:54:49,383][1651669] Updated weights for policy 0, policy_version 788605 (0.0011) [2024-06-15 20:54:50,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 49158.4, 300 sec: 47985.7). Total num frames: 1615069184. Throughput: 0: 12219.8. Samples: 403844096. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:50,767][1648981] Avg episode reward: [(0, '935.340')] [2024-06-15 20:54:51,630][1651669] Updated weights for policy 0, policy_version 788641 (0.0012) [2024-06-15 20:54:55,469][1651669] Updated weights for policy 0, policy_version 788693 (0.0012) [2024-06-15 20:54:55,794][1648981] Fps is (10 sec: 49018.5, 60 sec: 46946.0, 300 sec: 47981.2). Total num frames: 1615265792. Throughput: 0: 12212.3. Samples: 403875840. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:54:55,794][1648981] Avg episode reward: [(0, '949.840')] [2024-06-15 20:54:56,100][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000788736_1615331328.pth... [2024-06-15 20:54:56,359][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000783136_1603862528.pth [2024-06-15 20:54:56,766][1651669] Updated weights for policy 0, policy_version 788758 (0.0013) [2024-06-15 20:54:56,998][1651274] Signal inference workers to stop experience collection... (41350 times) [2024-06-15 20:54:57,046][1651669] InferenceWorker_p0-w0: stopping experience collection (41350 times) [2024-06-15 20:54:57,178][1651274] Signal inference workers to resume experience collection... (41350 times) [2024-06-15 20:54:57,179][1651669] InferenceWorker_p0-w0: resuming experience collection (41350 times) [2024-06-15 20:54:59,338][1651669] Updated weights for policy 0, policy_version 788836 (0.0012) [2024-06-15 20:55:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1615593472. Throughput: 0: 12083.3. Samples: 403949568. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:00,767][1648981] Avg episode reward: [(0, '911.420')] [2024-06-15 20:55:01,239][1651669] Updated weights for policy 0, policy_version 788867 (0.0009) [2024-06-15 20:55:02,604][1651669] Updated weights for policy 0, policy_version 788927 (0.0016) [2024-06-15 20:55:05,767][1648981] Fps is (10 sec: 46000.9, 60 sec: 46967.4, 300 sec: 47987.0). Total num frames: 1615724544. Throughput: 0: 12345.0. Samples: 404028928. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:05,769][1648981] Avg episode reward: [(0, '897.070')] [2024-06-15 20:55:06,723][1651669] Updated weights for policy 0, policy_version 788976 (0.0011) [2024-06-15 20:55:07,788][1651669] Updated weights for policy 0, policy_version 789028 (0.0014) [2024-06-15 20:55:09,312][1651669] Updated weights for policy 0, policy_version 789075 (0.0013) [2024-06-15 20:55:10,774][1648981] Fps is (10 sec: 52390.6, 60 sec: 50238.2, 300 sec: 47985.0). Total num frames: 1616117760. Throughput: 0: 12172.3. Samples: 404068352. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:10,774][1648981] Avg episode reward: [(0, '911.670')] [2024-06-15 20:55:11,576][1651669] Updated weights for policy 0, policy_version 789125 (0.0012) [2024-06-15 20:55:12,726][1651669] Updated weights for policy 0, policy_version 789175 (0.0011) [2024-06-15 20:55:15,486][1651669] Updated weights for policy 0, policy_version 789216 (0.0014) [2024-06-15 20:55:15,766][1648981] Fps is (10 sec: 58983.0, 60 sec: 49152.2, 300 sec: 48209.8). Total num frames: 1616314368. Throughput: 0: 12743.1. Samples: 404153856. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:15,767][1648981] Avg episode reward: [(0, '908.310')] [2024-06-15 20:55:17,668][1651669] Updated weights for policy 0, policy_version 789296 (0.0012) [2024-06-15 20:55:20,420][1651669] Updated weights for policy 0, policy_version 789368 (0.0018) [2024-06-15 20:55:20,766][1648981] Fps is (10 sec: 52466.9, 60 sec: 50244.8, 300 sec: 48318.9). Total num frames: 1616642048. Throughput: 0: 12208.3. Samples: 404214272. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:20,767][1648981] Avg episode reward: [(0, '881.930')] [2024-06-15 20:55:23,281][1651669] Updated weights for policy 0, policy_version 789396 (0.0011) [2024-06-15 20:55:24,333][1651669] Updated weights for policy 0, policy_version 789439 (0.0013) [2024-06-15 20:55:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1616773120. Throughput: 0: 12674.9. Samples: 404262912. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:25,767][1648981] Avg episode reward: [(0, '893.480')] [2024-06-15 20:55:27,084][1651669] Updated weights for policy 0, policy_version 789488 (0.0011) [2024-06-15 20:55:29,038][1651669] Updated weights for policy 0, policy_version 789558 (0.0111) [2024-06-15 20:55:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 48207.9). Total num frames: 1617100800. Throughput: 0: 12299.4. Samples: 404322816. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:30,767][1648981] Avg episode reward: [(0, '872.940')] [2024-06-15 20:55:30,767][1651669] Updated weights for policy 0, policy_version 789604 (0.0013) [2024-06-15 20:55:35,135][1651669] Updated weights for policy 0, policy_version 789680 (0.0016) [2024-06-15 20:55:35,770][1648981] Fps is (10 sec: 52407.9, 60 sec: 49694.8, 300 sec: 48207.2). Total num frames: 1617297408. Throughput: 0: 12434.8. Samples: 404403712. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:35,771][1648981] Avg episode reward: [(0, '855.480')] [2024-06-15 20:55:37,175][1651669] Updated weights for policy 0, policy_version 789713 (0.0013) [2024-06-15 20:55:38,176][1651274] Signal inference workers to stop experience collection... (41400 times) [2024-06-15 20:55:38,281][1651669] InferenceWorker_p0-w0: stopping experience collection (41400 times) [2024-06-15 20:55:38,437][1651274] Signal inference workers to resume experience collection... (41400 times) [2024-06-15 20:55:38,438][1651669] InferenceWorker_p0-w0: resuming experience collection (41400 times) [2024-06-15 20:55:39,119][1651669] Updated weights for policy 0, policy_version 789798 (0.0103) [2024-06-15 20:55:40,767][1648981] Fps is (10 sec: 45873.2, 60 sec: 48605.7, 300 sec: 48207.8). Total num frames: 1617559552. Throughput: 0: 12443.4. Samples: 404435456. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:40,767][1648981] Avg episode reward: [(0, '841.860')] [2024-06-15 20:55:41,180][1651669] Updated weights for policy 0, policy_version 789840 (0.0011) [2024-06-15 20:55:42,084][1651669] Updated weights for policy 0, policy_version 789886 (0.0012) [2024-06-15 20:55:45,766][1648981] Fps is (10 sec: 42615.5, 60 sec: 49152.2, 300 sec: 47985.8). Total num frames: 1617723392. Throughput: 0: 12435.9. Samples: 404509184. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:45,767][1648981] Avg episode reward: [(0, '837.800')] [2024-06-15 20:55:46,411][1651669] Updated weights for policy 0, policy_version 789942 (0.0011) [2024-06-15 20:55:48,937][1651669] Updated weights for policy 0, policy_version 790005 (0.0013) [2024-06-15 20:55:50,663][1651669] Updated weights for policy 0, policy_version 790074 (0.0012) [2024-06-15 20:55:50,767][1648981] Fps is (10 sec: 49152.3, 60 sec: 49697.8, 300 sec: 48318.9). Total num frames: 1618051072. Throughput: 0: 12253.8. Samples: 404580352. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:50,769][1648981] Avg episode reward: [(0, '852.340')] [2024-06-15 20:55:52,114][1651669] Updated weights for policy 0, policy_version 790137 (0.0028) [2024-06-15 20:55:55,788][1648981] Fps is (10 sec: 49045.1, 60 sec: 49156.6, 300 sec: 47982.1). Total num frames: 1618214912. Throughput: 0: 12170.3. Samples: 404616192. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:55:55,789][1648981] Avg episode reward: [(0, '862.010')] [2024-06-15 20:55:56,318][1651669] Updated weights for policy 0, policy_version 790176 (0.0036) [2024-06-15 20:55:58,969][1651669] Updated weights for policy 0, policy_version 790224 (0.0013) [2024-06-15 20:56:00,388][1651669] Updated weights for policy 0, policy_version 790288 (0.0112) [2024-06-15 20:56:00,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1618509824. Throughput: 0: 12083.2. Samples: 404697600. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:56:00,767][1648981] Avg episode reward: [(0, '811.820')] [2024-06-15 20:56:01,904][1651669] Updated weights for policy 0, policy_version 790338 (0.0011) [2024-06-15 20:56:03,096][1651669] Updated weights for policy 0, policy_version 790400 (0.0014) [2024-06-15 20:56:05,766][1648981] Fps is (10 sec: 52543.8, 60 sec: 50244.4, 300 sec: 47986.3). Total num frames: 1618739200. Throughput: 0: 12333.5. Samples: 404769280. Policy #0 lag: (min: 31.0, avg: 114.7, max: 287.0) [2024-06-15 20:56:05,767][1648981] Avg episode reward: [(0, '810.230')] [2024-06-15 20:56:07,401][1651669] Updated weights for policy 0, policy_version 790459 (0.0014) [2024-06-15 20:56:10,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46973.1, 300 sec: 48208.2). Total num frames: 1618935808. Throughput: 0: 12049.1. Samples: 404805120. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:10,767][1648981] Avg episode reward: [(0, '816.670')] [2024-06-15 20:56:11,312][1651669] Updated weights for policy 0, policy_version 790529 (0.0030) [2024-06-15 20:56:12,839][1651669] Updated weights for policy 0, policy_version 790592 (0.0057) [2024-06-15 20:56:15,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 48430.1). Total num frames: 1619263488. Throughput: 0: 12105.9. Samples: 404867584. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:15,767][1648981] Avg episode reward: [(0, '851.220')] [2024-06-15 20:56:17,098][1651669] Updated weights for policy 0, policy_version 790672 (0.0097) [2024-06-15 20:56:17,674][1651274] Signal inference workers to stop experience collection... (41450 times) [2024-06-15 20:56:17,739][1651669] InferenceWorker_p0-w0: stopping experience collection (41450 times) [2024-06-15 20:56:17,960][1651274] Signal inference workers to resume experience collection... (41450 times) [2024-06-15 20:56:17,961][1651669] InferenceWorker_p0-w0: resuming experience collection (41450 times) [2024-06-15 20:56:18,303][1651669] Updated weights for policy 0, policy_version 790715 (0.0012) [2024-06-15 20:56:20,775][1648981] Fps is (10 sec: 45833.6, 60 sec: 45868.2, 300 sec: 47986.8). Total num frames: 1619394560. Throughput: 0: 12059.1. Samples: 404946432. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:20,776][1648981] Avg episode reward: [(0, '837.870')] [2024-06-15 20:56:21,642][1651669] Updated weights for policy 0, policy_version 790754 (0.0140) [2024-06-15 20:56:22,898][1651669] Updated weights for policy 0, policy_version 790804 (0.0011) [2024-06-15 20:56:24,567][1651669] Updated weights for policy 0, policy_version 790881 (0.0011) [2024-06-15 20:56:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1619787776. Throughput: 0: 12106.0. Samples: 404980224. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:25,767][1648981] Avg episode reward: [(0, '864.710')] [2024-06-15 20:56:27,601][1651669] Updated weights for policy 0, policy_version 790916 (0.0012) [2024-06-15 20:56:29,146][1651669] Updated weights for policy 0, policy_version 790969 (0.0020) [2024-06-15 20:56:30,766][1648981] Fps is (10 sec: 52476.6, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 1619918848. Throughput: 0: 12060.4. Samples: 405051904. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:30,767][1648981] Avg episode reward: [(0, '843.750')] [2024-06-15 20:56:32,102][1651669] Updated weights for policy 0, policy_version 791008 (0.0029) [2024-06-15 20:56:33,091][1651669] Updated weights for policy 0, policy_version 791043 (0.0013) [2024-06-15 20:56:34,560][1651669] Updated weights for policy 0, policy_version 791120 (0.0014) [2024-06-15 20:56:35,470][1651669] Updated weights for policy 0, policy_version 791168 (0.0012) [2024-06-15 20:56:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50247.6, 300 sec: 49096.4). Total num frames: 1620312064. Throughput: 0: 12151.6. Samples: 405127168. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:35,767][1648981] Avg episode reward: [(0, '840.230')] [2024-06-15 20:56:39,455][1651669] Updated weights for policy 0, policy_version 791226 (0.0012) [2024-06-15 20:56:40,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 48060.0, 300 sec: 48430.0). Total num frames: 1620443136. Throughput: 0: 12464.7. Samples: 405176832. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:40,767][1648981] Avg episode reward: [(0, '865.170')] [2024-06-15 20:56:42,588][1651669] Updated weights for policy 0, policy_version 791280 (0.0072) [2024-06-15 20:56:44,562][1651669] Updated weights for policy 0, policy_version 791362 (0.0013) [2024-06-15 20:56:45,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 51882.7, 300 sec: 49323.9). Total num frames: 1620836352. Throughput: 0: 12003.6. Samples: 405237760. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:45,767][1648981] Avg episode reward: [(0, '868.860')] [2024-06-15 20:56:49,520][1651669] Updated weights for policy 0, policy_version 791440 (0.0024) [2024-06-15 20:56:50,513][1651669] Updated weights for policy 0, policy_version 791488 (0.0012) [2024-06-15 20:56:50,770][1648981] Fps is (10 sec: 52409.2, 60 sec: 48603.0, 300 sec: 48547.0). Total num frames: 1620967424. Throughput: 0: 12218.7. Samples: 405319168. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:50,771][1648981] Avg episode reward: [(0, '871.810')] [2024-06-15 20:56:53,144][1651669] Updated weights for policy 0, policy_version 791540 (0.0011) [2024-06-15 20:56:54,076][1651669] Updated weights for policy 0, policy_version 791571 (0.0041) [2024-06-15 20:56:54,751][1651274] Signal inference workers to stop experience collection... (41500 times) [2024-06-15 20:56:54,813][1651669] InferenceWorker_p0-w0: stopping experience collection (41500 times) [2024-06-15 20:56:54,965][1651274] Signal inference workers to resume experience collection... (41500 times) [2024-06-15 20:56:54,966][1651669] InferenceWorker_p0-w0: resuming experience collection (41500 times) [2024-06-15 20:56:55,462][1651669] Updated weights for policy 0, policy_version 791636 (0.0013) [2024-06-15 20:56:55,767][1648981] Fps is (10 sec: 45873.3, 60 sec: 51354.9, 300 sec: 49096.5). Total num frames: 1621295104. Throughput: 0: 12299.3. Samples: 405358592. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:56:55,767][1648981] Avg episode reward: [(0, '891.960')] [2024-06-15 20:56:56,347][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000791680_1621360640.pth... [2024-06-15 20:56:56,389][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000785856_1609433088.pth [2024-06-15 20:56:59,132][1651669] Updated weights for policy 0, policy_version 791681 (0.0013) [2024-06-15 20:57:00,482][1651669] Updated weights for policy 0, policy_version 791739 (0.0012) [2024-06-15 20:57:00,766][1648981] Fps is (10 sec: 52449.2, 60 sec: 49698.2, 300 sec: 49102.6). Total num frames: 1621491712. Throughput: 0: 12686.2. Samples: 405438464. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:00,767][1648981] Avg episode reward: [(0, '912.790')] [2024-06-15 20:57:03,137][1651669] Updated weights for policy 0, policy_version 791779 (0.0011) [2024-06-15 20:57:05,262][1651669] Updated weights for policy 0, policy_version 791872 (0.0010) [2024-06-15 20:57:05,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 50790.3, 300 sec: 48985.4). Total num frames: 1621786624. Throughput: 0: 12392.9. Samples: 405504000. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:05,767][1648981] Avg episode reward: [(0, '884.810')] [2024-06-15 20:57:10,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49698.2, 300 sec: 49207.5). Total num frames: 1621917696. Throughput: 0: 12447.3. Samples: 405540352. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:10,767][1648981] Avg episode reward: [(0, '862.980')] [2024-06-15 20:57:10,932][1651669] Updated weights for policy 0, policy_version 791954 (0.0020) [2024-06-15 20:57:11,957][1651669] Updated weights for policy 0, policy_version 791996 (0.0015) [2024-06-15 20:57:14,480][1651669] Updated weights for policy 0, policy_version 792048 (0.0012) [2024-06-15 20:57:15,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1622212608. Throughput: 0: 12629.3. Samples: 405620224. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:15,767][1648981] Avg episode reward: [(0, '897.490')] [2024-06-15 20:57:16,588][1651669] Updated weights for policy 0, policy_version 792132 (0.0012) [2024-06-15 20:57:20,767][1648981] Fps is (10 sec: 49150.5, 60 sec: 50251.7, 300 sec: 49318.6). Total num frames: 1622409216. Throughput: 0: 12287.9. Samples: 405680128. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:20,767][1648981] Avg episode reward: [(0, '872.750')] [2024-06-15 20:57:22,072][1651669] Updated weights for policy 0, policy_version 792195 (0.0015) [2024-06-15 20:57:23,332][1651669] Updated weights for policy 0, policy_version 792245 (0.0012) [2024-06-15 20:57:25,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 47513.7, 300 sec: 48763.2). Total num frames: 1622638592. Throughput: 0: 12037.7. Samples: 405718528. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:25,767][1648981] Avg episode reward: [(0, '900.690')] [2024-06-15 20:57:25,836][1651669] Updated weights for policy 0, policy_version 792306 (0.0011) [2024-06-15 20:57:27,483][1651669] Updated weights for policy 0, policy_version 792385 (0.0016) [2024-06-15 20:57:28,711][1651669] Updated weights for policy 0, policy_version 792444 (0.0013) [2024-06-15 20:57:30,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 50244.2, 300 sec: 49319.5). Total num frames: 1622933504. Throughput: 0: 12151.4. Samples: 405784576. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:30,767][1648981] Avg episode reward: [(0, '887.360')] [2024-06-15 20:57:33,753][1651669] Updated weights for policy 0, policy_version 792501 (0.0012) [2024-06-15 20:57:35,624][1651274] Signal inference workers to stop experience collection... (41550 times) [2024-06-15 20:57:35,686][1651669] InferenceWorker_p0-w0: stopping experience collection (41550 times) [2024-06-15 20:57:35,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46421.3, 300 sec: 48763.7). Total num frames: 1623097344. Throughput: 0: 12175.3. Samples: 405867008. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:35,767][1648981] Avg episode reward: [(0, '864.560')] [2024-06-15 20:57:35,831][1651274] Signal inference workers to resume experience collection... (41550 times) [2024-06-15 20:57:35,833][1651669] InferenceWorker_p0-w0: resuming experience collection (41550 times) [2024-06-15 20:57:36,203][1651669] Updated weights for policy 0, policy_version 792560 (0.0011) [2024-06-15 20:57:37,934][1651669] Updated weights for policy 0, policy_version 792636 (0.0013) [2024-06-15 20:57:40,224][1651669] Updated weights for policy 0, policy_version 792698 (0.0012) [2024-06-15 20:57:40,770][1648981] Fps is (10 sec: 52408.2, 60 sec: 50241.1, 300 sec: 49318.0). Total num frames: 1623457792. Throughput: 0: 11854.7. Samples: 405892096. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:40,771][1648981] Avg episode reward: [(0, '899.310')] [2024-06-15 20:57:45,248][1651669] Updated weights for policy 0, policy_version 792757 (0.0013) [2024-06-15 20:57:45,769][1648981] Fps is (10 sec: 49139.4, 60 sec: 45873.2, 300 sec: 48762.8). Total num frames: 1623588864. Throughput: 0: 11798.1. Samples: 405969408. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:45,770][1648981] Avg episode reward: [(0, '905.970')] [2024-06-15 20:57:47,227][1651669] Updated weights for policy 0, policy_version 792800 (0.0014) [2024-06-15 20:57:48,722][1651669] Updated weights for policy 0, policy_version 792864 (0.0019) [2024-06-15 20:57:50,766][1648981] Fps is (10 sec: 42615.6, 60 sec: 48609.1, 300 sec: 48987.9). Total num frames: 1623883776. Throughput: 0: 11832.9. Samples: 406036480. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:50,767][1648981] Avg episode reward: [(0, '873.090')] [2024-06-15 20:57:50,941][1651669] Updated weights for policy 0, policy_version 792928 (0.0012) [2024-06-15 20:57:55,215][1651669] Updated weights for policy 0, policy_version 792961 (0.0018) [2024-06-15 20:57:55,794][1648981] Fps is (10 sec: 42489.9, 60 sec: 45308.1, 300 sec: 48536.5). Total num frames: 1624014848. Throughput: 0: 11791.4. Samples: 406071296. Policy #0 lag: (min: 15.0, avg: 89.1, max: 255.0) [2024-06-15 20:57:55,795][1648981] Avg episode reward: [(0, '873.970')] [2024-06-15 20:57:56,387][1651669] Updated weights for policy 0, policy_version 793024 (0.0011) [2024-06-15 20:57:58,948][1651669] Updated weights for policy 0, policy_version 793090 (0.0012) [2024-06-15 20:58:00,774][1648981] Fps is (10 sec: 49113.2, 60 sec: 48053.5, 300 sec: 48873.0). Total num frames: 1624375296. Throughput: 0: 11523.7. Samples: 406138880. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:00,775][1648981] Avg episode reward: [(0, '885.160')] [2024-06-15 20:58:01,936][1651669] Updated weights for policy 0, policy_version 793156 (0.0013) [2024-06-15 20:58:03,293][1651669] Updated weights for policy 0, policy_version 793215 (0.0013) [2024-06-15 20:58:05,770][1648981] Fps is (10 sec: 49272.3, 60 sec: 45326.3, 300 sec: 48429.4). Total num frames: 1624506368. Throughput: 0: 11923.0. Samples: 406216704. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:05,770][1648981] Avg episode reward: [(0, '879.820')] [2024-06-15 20:58:07,091][1651669] Updated weights for policy 0, policy_version 793273 (0.0014) [2024-06-15 20:58:09,046][1651669] Updated weights for policy 0, policy_version 793328 (0.0104) [2024-06-15 20:58:10,669][1651669] Updated weights for policy 0, policy_version 793362 (0.0027) [2024-06-15 20:58:10,767][1648981] Fps is (10 sec: 42630.3, 60 sec: 48059.4, 300 sec: 48541.0). Total num frames: 1624801280. Throughput: 0: 11878.3. Samples: 406253056. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:10,768][1648981] Avg episode reward: [(0, '859.530')] [2024-06-15 20:58:13,145][1651669] Updated weights for policy 0, policy_version 793427 (0.0011) [2024-06-15 20:58:15,770][1648981] Fps is (10 sec: 52428.6, 60 sec: 46964.6, 300 sec: 48545.7). Total num frames: 1625030656. Throughput: 0: 11797.8. Samples: 406315520. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:15,771][1648981] Avg episode reward: [(0, '875.970')] [2024-06-15 20:58:16,956][1651669] Updated weights for policy 0, policy_version 793477 (0.0012) [2024-06-15 20:58:18,212][1651669] Updated weights for policy 0, policy_version 793532 (0.0012) [2024-06-15 20:58:19,472][1651274] Signal inference workers to stop experience collection... (41600 times) [2024-06-15 20:58:19,520][1651669] InferenceWorker_p0-w0: stopping experience collection (41600 times) [2024-06-15 20:58:19,840][1651274] Signal inference workers to resume experience collection... (41600 times) [2024-06-15 20:58:19,841][1651669] InferenceWorker_p0-w0: resuming experience collection (41600 times) [2024-06-15 20:58:20,593][1651669] Updated weights for policy 0, policy_version 793584 (0.0013) [2024-06-15 20:58:20,768][1648981] Fps is (10 sec: 45870.1, 60 sec: 47512.6, 300 sec: 48318.7). Total num frames: 1625260032. Throughput: 0: 11707.4. Samples: 406393856. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:20,769][1648981] Avg episode reward: [(0, '946.500')] [2024-06-15 20:58:22,665][1651669] Updated weights for policy 0, policy_version 793653 (0.0016) [2024-06-15 20:58:23,760][1651669] Updated weights for policy 0, policy_version 793696 (0.0011) [2024-06-15 20:58:25,786][1648981] Fps is (10 sec: 52344.5, 60 sec: 48589.8, 300 sec: 48871.0). Total num frames: 1625554944. Throughput: 0: 11851.5. Samples: 406425600. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:25,787][1648981] Avg episode reward: [(0, '908.420')] [2024-06-15 20:58:27,454][1651669] Updated weights for policy 0, policy_version 793732 (0.0016) [2024-06-15 20:58:28,737][1651669] Updated weights for policy 0, policy_version 793787 (0.0014) [2024-06-15 20:58:30,766][1648981] Fps is (10 sec: 49158.9, 60 sec: 46967.4, 300 sec: 48430.0). Total num frames: 1625751552. Throughput: 0: 11947.3. Samples: 406507008. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:30,767][1648981] Avg episode reward: [(0, '901.850')] [2024-06-15 20:58:31,099][1651669] Updated weights for policy 0, policy_version 793845 (0.0011) [2024-06-15 20:58:33,283][1651669] Updated weights for policy 0, policy_version 793890 (0.0011) [2024-06-15 20:58:35,044][1651669] Updated weights for policy 0, policy_version 793968 (0.0079) [2024-06-15 20:58:35,766][1648981] Fps is (10 sec: 52533.1, 60 sec: 49698.2, 300 sec: 49096.5). Total num frames: 1626079232. Throughput: 0: 11844.2. Samples: 406569472. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:35,767][1648981] Avg episode reward: [(0, '904.900')] [2024-06-15 20:58:38,953][1651669] Updated weights for policy 0, policy_version 794000 (0.0013) [2024-06-15 20:58:40,144][1651669] Updated weights for policy 0, policy_version 794046 (0.0011) [2024-06-15 20:58:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 45878.2, 300 sec: 48207.8). Total num frames: 1626210304. Throughput: 0: 12022.4. Samples: 406611968. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:40,767][1648981] Avg episode reward: [(0, '891.790')] [2024-06-15 20:58:42,365][1651669] Updated weights for policy 0, policy_version 794104 (0.0014) [2024-06-15 20:58:44,691][1651669] Updated weights for policy 0, policy_version 794164 (0.0012) [2024-06-15 20:58:45,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 49154.0, 300 sec: 48875.6). Total num frames: 1626537984. Throughput: 0: 12164.9. Samples: 406686208. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:45,767][1648981] Avg episode reward: [(0, '882.430')] [2024-06-15 20:58:46,072][1651669] Updated weights for policy 0, policy_version 794227 (0.0012) [2024-06-15 20:58:50,220][1651669] Updated weights for policy 0, policy_version 794288 (0.0138) [2024-06-15 20:58:50,624][1651669] Updated weights for policy 0, policy_version 794304 (0.0010) [2024-06-15 20:58:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.5, 300 sec: 48430.0). Total num frames: 1626734592. Throughput: 0: 12027.3. Samples: 406757888. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:50,767][1648981] Avg episode reward: [(0, '878.940')] [2024-06-15 20:58:52,922][1651669] Updated weights for policy 0, policy_version 794366 (0.0089) [2024-06-15 20:58:54,612][1651669] Updated weights for policy 0, policy_version 794423 (0.0012) [2024-06-15 20:58:55,767][1648981] Fps is (10 sec: 52428.4, 60 sec: 50814.1, 300 sec: 49096.4). Total num frames: 1627062272. Throughput: 0: 12106.0. Samples: 406797824. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:58:55,767][1648981] Avg episode reward: [(0, '878.850')] [2024-06-15 20:58:56,038][1651669] Updated weights for policy 0, policy_version 794485 (0.0097) [2024-06-15 20:58:56,253][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000794496_1627127808.pth... [2024-06-15 20:58:56,301][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000788736_1615331328.pth [2024-06-15 20:59:00,290][1651669] Updated weights for policy 0, policy_version 794513 (0.0014) [2024-06-15 20:59:00,629][1651274] Signal inference workers to stop experience collection... (41650 times) [2024-06-15 20:59:00,687][1651669] InferenceWorker_p0-w0: stopping experience collection (41650 times) [2024-06-15 20:59:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46973.6, 300 sec: 48430.0). Total num frames: 1627193344. Throughput: 0: 12334.5. Samples: 406870528. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:00,767][1648981] Avg episode reward: [(0, '901.230')] [2024-06-15 20:59:00,855][1651274] Signal inference workers to resume experience collection... (41650 times) [2024-06-15 20:59:00,856][1651669] InferenceWorker_p0-w0: resuming experience collection (41650 times) [2024-06-15 20:59:02,677][1651669] Updated weights for policy 0, policy_version 794565 (0.0012) [2024-06-15 20:59:04,024][1651669] Updated weights for policy 0, policy_version 794623 (0.0011) [2024-06-15 20:59:05,360][1651669] Updated weights for policy 0, policy_version 794676 (0.0020) [2024-06-15 20:59:05,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 50247.3, 300 sec: 48874.3). Total num frames: 1627521024. Throughput: 0: 12129.1. Samples: 406939648. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:05,767][1648981] Avg episode reward: [(0, '909.200')] [2024-06-15 20:59:06,912][1651669] Updated weights for policy 0, policy_version 794720 (0.0012) [2024-06-15 20:59:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.9, 300 sec: 48430.0). Total num frames: 1627652096. Throughput: 0: 12213.7. Samples: 406974976. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:10,767][1648981] Avg episode reward: [(0, '928.380')] [2024-06-15 20:59:11,077][1651669] Updated weights for policy 0, policy_version 794768 (0.0012) [2024-06-15 20:59:13,724][1651669] Updated weights for policy 0, policy_version 794832 (0.0015) [2024-06-15 20:59:15,606][1651669] Updated weights for policy 0, policy_version 794915 (0.0012) [2024-06-15 20:59:15,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49155.1, 300 sec: 48652.3). Total num frames: 1627979776. Throughput: 0: 12060.5. Samples: 407049728. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:15,767][1648981] Avg episode reward: [(0, '929.460')] [2024-06-15 20:59:17,765][1651669] Updated weights for policy 0, policy_version 794966 (0.0012) [2024-06-15 20:59:20,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48607.0, 300 sec: 48541.1). Total num frames: 1628176384. Throughput: 0: 12185.6. Samples: 407117824. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:20,767][1648981] Avg episode reward: [(0, '906.440')] [2024-06-15 20:59:22,782][1651669] Updated weights for policy 0, policy_version 795044 (0.0142) [2024-06-15 20:59:25,570][1651669] Updated weights for policy 0, policy_version 795104 (0.0090) [2024-06-15 20:59:25,784][1648981] Fps is (10 sec: 42521.7, 60 sec: 47515.1, 300 sec: 48316.0). Total num frames: 1628405760. Throughput: 0: 11987.4. Samples: 407151616. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:25,785][1648981] Avg episode reward: [(0, '870.790')] [2024-06-15 20:59:27,029][1651669] Updated weights for policy 0, policy_version 795174 (0.0011) [2024-06-15 20:59:28,434][1651669] Updated weights for policy 0, policy_version 795217 (0.0014) [2024-06-15 20:59:29,393][1651669] Updated weights for policy 0, policy_version 795258 (0.0012) [2024-06-15 20:59:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49152.1, 300 sec: 48763.2). Total num frames: 1628700672. Throughput: 0: 12071.9. Samples: 407229440. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:30,767][1648981] Avg episode reward: [(0, '863.780')] [2024-06-15 20:59:32,980][1651669] Updated weights for policy 0, policy_version 795303 (0.0036) [2024-06-15 20:59:35,148][1651669] Updated weights for policy 0, policy_version 795344 (0.0010) [2024-06-15 20:59:35,770][1648981] Fps is (10 sec: 52505.8, 60 sec: 47510.9, 300 sec: 48429.5). Total num frames: 1628930048. Throughput: 0: 12264.3. Samples: 407309824. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:35,770][1648981] Avg episode reward: [(0, '861.000')] [2024-06-15 20:59:36,425][1651669] Updated weights for policy 0, policy_version 795414 (0.0110) [2024-06-15 20:59:38,550][1651669] Updated weights for policy 0, policy_version 795458 (0.0012) [2024-06-15 20:59:40,123][1651669] Updated weights for policy 0, policy_version 795520 (0.0013) [2024-06-15 20:59:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1629224960. Throughput: 0: 12265.3. Samples: 407349760. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:40,767][1648981] Avg episode reward: [(0, '862.900')] [2024-06-15 20:59:43,161][1651274] Signal inference workers to stop experience collection... (41700 times) [2024-06-15 20:59:43,219][1651669] InferenceWorker_p0-w0: stopping experience collection (41700 times) [2024-06-15 20:59:43,422][1651274] Signal inference workers to resume experience collection... (41700 times) [2024-06-15 20:59:43,433][1651669] InferenceWorker_p0-w0: resuming experience collection (41700 times) [2024-06-15 20:59:43,825][1651669] Updated weights for policy 0, policy_version 795568 (0.0011) [2024-06-15 20:59:45,766][1648981] Fps is (10 sec: 42612.6, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1629356032. Throughput: 0: 12174.2. Samples: 407418368. Policy #0 lag: (min: 15.0, avg: 118.4, max: 271.0) [2024-06-15 20:59:45,767][1648981] Avg episode reward: [(0, '891.440')] [2024-06-15 20:59:46,115][1651669] Updated weights for policy 0, policy_version 795600 (0.0022) [2024-06-15 20:59:47,359][1651669] Updated weights for policy 0, policy_version 795664 (0.0029) [2024-06-15 20:59:48,145][1651669] Updated weights for policy 0, policy_version 795710 (0.0011) [2024-06-15 20:59:50,127][1651669] Updated weights for policy 0, policy_version 795772 (0.0012) [2024-06-15 20:59:50,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.4, 300 sec: 49101.0). Total num frames: 1629749248. Throughput: 0: 12401.8. Samples: 407497728. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 20:59:50,767][1648981] Avg episode reward: [(0, '894.290')] [2024-06-15 20:59:53,938][1651669] Updated weights for policy 0, policy_version 795832 (0.0017) [2024-06-15 20:59:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.6, 300 sec: 48430.0). Total num frames: 1629880320. Throughput: 0: 12526.9. Samples: 407538688. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 20:59:55,767][1648981] Avg episode reward: [(0, '843.040')] [2024-06-15 20:59:56,319][1651669] Updated weights for policy 0, policy_version 795861 (0.0025) [2024-06-15 20:59:58,058][1651669] Updated weights for policy 0, policy_version 795939 (0.0015) [2024-06-15 20:59:59,967][1651669] Updated weights for policy 0, policy_version 796030 (0.0011) [2024-06-15 21:00:00,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 51336.5, 300 sec: 49318.6). Total num frames: 1630273536. Throughput: 0: 12379.0. Samples: 407606784. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:00,767][1648981] Avg episode reward: [(0, '850.670')] [2024-06-15 21:00:05,778][1648981] Fps is (10 sec: 52367.5, 60 sec: 48050.4, 300 sec: 48429.3). Total num frames: 1630404608. Throughput: 0: 12535.0. Samples: 407682048. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:05,779][1648981] Avg episode reward: [(0, '833.570')] [2024-06-15 21:00:06,594][1651669] Updated weights for policy 0, policy_version 796097 (0.0012) [2024-06-15 21:00:08,442][1651669] Updated weights for policy 0, policy_version 796180 (0.0014) [2024-06-15 21:00:10,473][1651669] Updated weights for policy 0, policy_version 796246 (0.0239) [2024-06-15 21:00:10,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 51336.4, 300 sec: 48874.3). Total num frames: 1630732288. Throughput: 0: 12520.5. Samples: 407714816. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:10,767][1648981] Avg episode reward: [(0, '759.370')] [2024-06-15 21:00:14,577][1651669] Updated weights for policy 0, policy_version 796289 (0.0025) [2024-06-15 21:00:15,733][1651669] Updated weights for policy 0, policy_version 796346 (0.0016) [2024-06-15 21:00:15,766][1648981] Fps is (10 sec: 49210.2, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1630896128. Throughput: 0: 12663.5. Samples: 407799296. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:15,767][1648981] Avg episode reward: [(0, '778.350')] [2024-06-15 21:00:17,924][1651669] Updated weights for policy 0, policy_version 796400 (0.0016) [2024-06-15 21:00:19,535][1651669] Updated weights for policy 0, policy_version 796480 (0.0012) [2024-06-15 21:00:20,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1631191040. Throughput: 0: 12277.5. Samples: 407862272. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:20,767][1648981] Avg episode reward: [(0, '803.800')] [2024-06-15 21:00:21,269][1651274] Signal inference workers to stop experience collection... (41750 times) [2024-06-15 21:00:21,296][1651669] InferenceWorker_p0-w0: stopping experience collection (41750 times) [2024-06-15 21:00:21,460][1651274] Signal inference workers to resume experience collection... (41750 times) [2024-06-15 21:00:21,461][1651669] InferenceWorker_p0-w0: resuming experience collection (41750 times) [2024-06-15 21:00:21,767][1651669] Updated weights for policy 0, policy_version 796544 (0.0109) [2024-06-15 21:00:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49166.8, 300 sec: 48318.9). Total num frames: 1631354880. Throughput: 0: 12367.7. Samples: 407906304. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:25,767][1648981] Avg episode reward: [(0, '792.670')] [2024-06-15 21:00:27,699][1651669] Updated weights for policy 0, policy_version 796611 (0.0012) [2024-06-15 21:00:29,370][1651669] Updated weights for policy 0, policy_version 796688 (0.0175) [2024-06-15 21:00:30,206][1651669] Updated weights for policy 0, policy_version 796731 (0.0012) [2024-06-15 21:00:30,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 50244.0, 300 sec: 48874.9). Total num frames: 1631715328. Throughput: 0: 12322.1. Samples: 407972864. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:30,767][1648981] Avg episode reward: [(0, '799.130')] [2024-06-15 21:00:33,124][1651669] Updated weights for policy 0, policy_version 796789 (0.0016) [2024-06-15 21:00:35,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48608.5, 300 sec: 48430.1). Total num frames: 1631846400. Throughput: 0: 12390.4. Samples: 408055296. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:35,767][1648981] Avg episode reward: [(0, '838.760')] [2024-06-15 21:00:36,671][1651669] Updated weights for policy 0, policy_version 796848 (0.0033) [2024-06-15 21:00:39,631][1651669] Updated weights for policy 0, policy_version 796912 (0.0014) [2024-06-15 21:00:40,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 1632174080. Throughput: 0: 12231.1. Samples: 408089088. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:40,767][1648981] Avg episode reward: [(0, '834.140')] [2024-06-15 21:00:40,979][1651669] Updated weights for policy 0, policy_version 796976 (0.0013) [2024-06-15 21:00:43,518][1651669] Updated weights for policy 0, policy_version 797012 (0.0013) [2024-06-15 21:00:44,517][1651669] Updated weights for policy 0, policy_version 797056 (0.0012) [2024-06-15 21:00:45,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 1632370688. Throughput: 0: 12185.6. Samples: 408155136. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:45,767][1648981] Avg episode reward: [(0, '844.330')] [2024-06-15 21:00:47,070][1651669] Updated weights for policy 0, policy_version 797104 (0.0017) [2024-06-15 21:00:50,327][1651669] Updated weights for policy 0, policy_version 797162 (0.0133) [2024-06-15 21:00:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 47513.5, 300 sec: 48766.8). Total num frames: 1632600064. Throughput: 0: 12314.0. Samples: 408236032. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:50,767][1648981] Avg episode reward: [(0, '844.330')] [2024-06-15 21:00:52,169][1651669] Updated weights for policy 0, policy_version 797244 (0.0012) [2024-06-15 21:00:54,799][1651669] Updated weights for policy 0, policy_version 797299 (0.0012) [2024-06-15 21:00:55,783][1648981] Fps is (10 sec: 52344.1, 60 sec: 50230.8, 300 sec: 48760.6). Total num frames: 1632894976. Throughput: 0: 12283.7. Samples: 408267776. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:00:55,783][1648981] Avg episode reward: [(0, '847.640')] [2024-06-15 21:00:55,790][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000797312_1632894976.pth... [2024-06-15 21:00:55,868][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000791680_1621360640.pth [2024-06-15 21:00:56,892][1651669] Updated weights for policy 0, policy_version 797328 (0.0011) [2024-06-15 21:00:59,844][1651669] Updated weights for policy 0, policy_version 797379 (0.0012) [2024-06-15 21:01:00,778][1648981] Fps is (10 sec: 49094.3, 60 sec: 46958.3, 300 sec: 48650.2). Total num frames: 1633091584. Throughput: 0: 12239.3. Samples: 408350208. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:00,779][1648981] Avg episode reward: [(0, '828.090')] [2024-06-15 21:01:01,770][1651669] Updated weights for policy 0, policy_version 797456 (0.0011) [2024-06-15 21:01:02,869][1651669] Updated weights for policy 0, policy_version 797504 (0.0013) [2024-06-15 21:01:04,080][1651274] Signal inference workers to stop experience collection... (41800 times) [2024-06-15 21:01:04,138][1651669] InferenceWorker_p0-w0: stopping experience collection (41800 times) [2024-06-15 21:01:04,441][1651274] Signal inference workers to resume experience collection... (41800 times) [2024-06-15 21:01:04,442][1651669] InferenceWorker_p0-w0: resuming experience collection (41800 times) [2024-06-15 21:01:05,442][1651669] Updated weights for policy 0, policy_version 797564 (0.0012) [2024-06-15 21:01:05,766][1648981] Fps is (10 sec: 52513.2, 60 sec: 50254.1, 300 sec: 49096.5). Total num frames: 1633419264. Throughput: 0: 12208.4. Samples: 408411648. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:05,767][1648981] Avg episode reward: [(0, '849.680')] [2024-06-15 21:01:08,216][1651669] Updated weights for policy 0, policy_version 797616 (0.0024) [2024-06-15 21:01:10,790][1648981] Fps is (10 sec: 45820.0, 60 sec: 46948.9, 300 sec: 48426.1). Total num frames: 1633550336. Throughput: 0: 12167.8. Samples: 408454144. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:10,791][1648981] Avg episode reward: [(0, '850.130')] [2024-06-15 21:01:11,334][1651669] Updated weights for policy 0, policy_version 797664 (0.0014) [2024-06-15 21:01:12,538][1651669] Updated weights for policy 0, policy_version 797714 (0.0014) [2024-06-15 21:01:13,519][1651669] Updated weights for policy 0, policy_version 797756 (0.0017) [2024-06-15 21:01:15,525][1651669] Updated weights for policy 0, policy_version 797813 (0.0015) [2024-06-15 21:01:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50790.3, 300 sec: 49320.1). Total num frames: 1633943552. Throughput: 0: 12413.2. Samples: 408531456. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:15,767][1648981] Avg episode reward: [(0, '865.810')] [2024-06-15 21:01:18,318][1651669] Updated weights for policy 0, policy_version 797872 (0.0010) [2024-06-15 21:01:20,782][1648981] Fps is (10 sec: 52470.5, 60 sec: 48047.0, 300 sec: 48427.4). Total num frames: 1634074624. Throughput: 0: 12340.5. Samples: 408610816. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:20,783][1648981] Avg episode reward: [(0, '880.090')] [2024-06-15 21:01:22,200][1651669] Updated weights for policy 0, policy_version 797952 (0.0106) [2024-06-15 21:01:25,355][1651669] Updated weights for policy 0, policy_version 798034 (0.0053) [2024-06-15 21:01:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 50790.4, 300 sec: 49096.5). Total num frames: 1634402304. Throughput: 0: 12276.6. Samples: 408641536. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:25,767][1648981] Avg episode reward: [(0, '881.450')] [2024-06-15 21:01:26,419][1651669] Updated weights for policy 0, policy_version 798080 (0.0012) [2024-06-15 21:01:30,766][1648981] Fps is (10 sec: 52512.3, 60 sec: 48060.0, 300 sec: 48430.0). Total num frames: 1634598912. Throughput: 0: 12470.0. Samples: 408716288. Policy #0 lag: (min: 2.0, avg: 93.7, max: 258.0) [2024-06-15 21:01:30,767][1648981] Avg episode reward: [(0, '863.710')] [2024-06-15 21:01:31,731][1651669] Updated weights for policy 0, policy_version 798151 (0.0011) [2024-06-15 21:01:32,744][1651669] Updated weights for policy 0, policy_version 798198 (0.0011) [2024-06-15 21:01:34,131][1651669] Updated weights for policy 0, policy_version 798272 (0.0014) [2024-06-15 21:01:35,773][1648981] Fps is (10 sec: 45844.8, 60 sec: 50238.7, 300 sec: 48873.2). Total num frames: 1634861056. Throughput: 0: 12354.4. Samples: 408792064. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:01:35,776][1648981] Avg episode reward: [(0, '846.340')] [2024-06-15 21:01:36,919][1651669] Updated weights for policy 0, policy_version 798320 (0.0012) [2024-06-15 21:01:40,360][1651669] Updated weights for policy 0, policy_version 798400 (0.0016) [2024-06-15 21:01:40,767][1648981] Fps is (10 sec: 52427.4, 60 sec: 49151.8, 300 sec: 48429.9). Total num frames: 1635123200. Throughput: 0: 12428.9. Samples: 408826880. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:01:40,767][1648981] Avg episode reward: [(0, '838.460')] [2024-06-15 21:01:43,448][1651669] Updated weights for policy 0, policy_version 798451 (0.0011) [2024-06-15 21:01:44,079][1651274] Signal inference workers to stop experience collection... (41850 times) [2024-06-15 21:01:44,146][1651669] InferenceWorker_p0-w0: stopping experience collection (41850 times) [2024-06-15 21:01:44,423][1651274] Signal inference workers to resume experience collection... (41850 times) [2024-06-15 21:01:44,423][1651669] InferenceWorker_p0-w0: resuming experience collection (41850 times) [2024-06-15 21:01:45,015][1651669] Updated weights for policy 0, policy_version 798515 (0.0022) [2024-06-15 21:01:45,766][1648981] Fps is (10 sec: 52463.4, 60 sec: 50244.1, 300 sec: 48874.9). Total num frames: 1635385344. Throughput: 0: 12143.3. Samples: 408896512. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:01:45,767][1648981] Avg episode reward: [(0, '870.770')] [2024-06-15 21:01:46,517][1651669] Updated weights for policy 0, policy_version 798544 (0.0011) [2024-06-15 21:01:50,727][1651669] Updated weights for policy 0, policy_version 798624 (0.0105) [2024-06-15 21:01:50,767][1648981] Fps is (10 sec: 45875.2, 60 sec: 49698.0, 300 sec: 48430.0). Total num frames: 1635581952. Throughput: 0: 12583.8. Samples: 408977920. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:01:50,767][1648981] Avg episode reward: [(0, '873.160')] [2024-06-15 21:01:53,010][1651669] Updated weights for policy 0, policy_version 798672 (0.0012) [2024-06-15 21:01:54,941][1651669] Updated weights for policy 0, policy_version 798737 (0.0013) [2024-06-15 21:01:55,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49711.4, 300 sec: 48763.2). Total num frames: 1635876864. Throughput: 0: 12499.4. Samples: 409016320. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:01:55,767][1648981] Avg episode reward: [(0, '891.920')] [2024-06-15 21:01:56,003][1651669] Updated weights for policy 0, policy_version 798783 (0.0011) [2024-06-15 21:01:58,242][1651669] Updated weights for policy 0, policy_version 798848 (0.0015) [2024-06-15 21:02:00,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 49161.6, 300 sec: 48318.9). Total num frames: 1636040704. Throughput: 0: 12242.5. Samples: 409082368. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:00,767][1648981] Avg episode reward: [(0, '871.200')] [2024-06-15 21:02:02,087][1651669] Updated weights for policy 0, policy_version 798905 (0.0016) [2024-06-15 21:02:05,497][1651669] Updated weights for policy 0, policy_version 798979 (0.0035) [2024-06-15 21:02:05,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1636335616. Throughput: 0: 12155.7. Samples: 409157632. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:05,767][1648981] Avg episode reward: [(0, '854.480')] [2024-06-15 21:02:06,602][1651669] Updated weights for policy 0, policy_version 799035 (0.0015) [2024-06-15 21:02:08,765][1651669] Updated weights for policy 0, policy_version 799091 (0.0012) [2024-06-15 21:02:10,769][1648981] Fps is (10 sec: 52417.8, 60 sec: 50262.4, 300 sec: 48651.8). Total num frames: 1636564992. Throughput: 0: 12219.1. Samples: 409191424. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:10,770][1648981] Avg episode reward: [(0, '874.360')] [2024-06-15 21:02:12,989][1651669] Updated weights for policy 0, policy_version 799160 (0.0012) [2024-06-15 21:02:15,715][1651669] Updated weights for policy 0, policy_version 799219 (0.0108) [2024-06-15 21:02:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 48763.3). Total num frames: 1636794368. Throughput: 0: 12367.6. Samples: 409272832. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:15,767][1648981] Avg episode reward: [(0, '901.790')] [2024-06-15 21:02:17,355][1651669] Updated weights for policy 0, policy_version 799286 (0.0011) [2024-06-15 21:02:19,721][1651669] Updated weights for policy 0, policy_version 799344 (0.0013) [2024-06-15 21:02:20,766][1648981] Fps is (10 sec: 52440.0, 60 sec: 50257.5, 300 sec: 48985.4). Total num frames: 1637089280. Throughput: 0: 11993.9. Samples: 409331712. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:20,767][1648981] Avg episode reward: [(0, '911.370')] [2024-06-15 21:02:24,083][1651669] Updated weights for policy 0, policy_version 799398 (0.0010) [2024-06-15 21:02:25,519][1651669] Updated weights for policy 0, policy_version 799442 (0.0020) [2024-06-15 21:02:25,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 48059.6, 300 sec: 48652.1). Total num frames: 1637285888. Throughput: 0: 12231.1. Samples: 409377280. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:25,767][1648981] Avg episode reward: [(0, '884.910')] [2024-06-15 21:02:26,141][1651274] Signal inference workers to stop experience collection... (41900 times) [2024-06-15 21:02:26,196][1651669] InferenceWorker_p0-w0: stopping experience collection (41900 times) [2024-06-15 21:02:26,450][1651274] Signal inference workers to resume experience collection... (41900 times) [2024-06-15 21:02:26,451][1651669] InferenceWorker_p0-w0: resuming experience collection (41900 times) [2024-06-15 21:02:27,691][1651669] Updated weights for policy 0, policy_version 799536 (0.0110) [2024-06-15 21:02:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 1637548032. Throughput: 0: 12162.9. Samples: 409443840. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:30,767][1648981] Avg episode reward: [(0, '862.280')] [2024-06-15 21:02:30,823][1651669] Updated weights for policy 0, policy_version 799600 (0.0029) [2024-06-15 21:02:35,326][1651669] Updated weights for policy 0, policy_version 799651 (0.0015) [2024-06-15 21:02:35,767][1648981] Fps is (10 sec: 42595.7, 60 sec: 47518.2, 300 sec: 48319.4). Total num frames: 1637711872. Throughput: 0: 12060.3. Samples: 409520640. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:35,768][1648981] Avg episode reward: [(0, '847.450')] [2024-06-15 21:02:36,717][1651669] Updated weights for policy 0, policy_version 799713 (0.0013) [2024-06-15 21:02:38,138][1651669] Updated weights for policy 0, policy_version 799776 (0.0013) [2024-06-15 21:02:40,224][1651669] Updated weights for policy 0, policy_version 799810 (0.0012) [2024-06-15 21:02:40,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48606.1, 300 sec: 48985.8). Total num frames: 1638039552. Throughput: 0: 11901.2. Samples: 409551872. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:40,767][1648981] Avg episode reward: [(0, '862.090')] [2024-06-15 21:02:41,542][1651669] Updated weights for policy 0, policy_version 799868 (0.0024) [2024-06-15 21:02:45,779][1648981] Fps is (10 sec: 45820.6, 60 sec: 46411.5, 300 sec: 48427.9). Total num frames: 1638170624. Throughput: 0: 12136.7. Samples: 409628672. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:45,780][1648981] Avg episode reward: [(0, '855.420')] [2024-06-15 21:02:46,537][1651669] Updated weights for policy 0, policy_version 799921 (0.0015) [2024-06-15 21:02:47,768][1651669] Updated weights for policy 0, policy_version 799984 (0.0011) [2024-06-15 21:02:49,311][1651669] Updated weights for policy 0, policy_version 800032 (0.0011) [2024-06-15 21:02:50,203][1651669] Updated weights for policy 0, policy_version 800064 (0.0019) [2024-06-15 21:02:50,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49152.2, 300 sec: 49212.2). Total num frames: 1638531072. Throughput: 0: 11923.9. Samples: 409694208. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:50,767][1648981] Avg episode reward: [(0, '872.500')] [2024-06-15 21:02:53,215][1651669] Updated weights for policy 0, policy_version 800128 (0.0011) [2024-06-15 21:02:55,767][1648981] Fps is (10 sec: 49212.2, 60 sec: 46421.0, 300 sec: 48431.2). Total num frames: 1638662144. Throughput: 0: 11935.7. Samples: 409728512. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:02:55,767][1648981] Avg episode reward: [(0, '868.560')] [2024-06-15 21:02:55,787][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000800128_1638662144.pth... [2024-06-15 21:02:56,144][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000794496_1627127808.pth [2024-06-15 21:02:57,717][1651669] Updated weights for policy 0, policy_version 800194 (0.0011) [2024-06-15 21:03:00,270][1651669] Updated weights for policy 0, policy_version 800273 (0.0011) [2024-06-15 21:03:00,767][1648981] Fps is (10 sec: 45874.8, 60 sec: 49152.0, 300 sec: 49097.1). Total num frames: 1638989824. Throughput: 0: 11867.0. Samples: 409806848. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:03:00,767][1648981] Avg episode reward: [(0, '868.560')] [2024-06-15 21:03:01,317][1651669] Updated weights for policy 0, policy_version 800319 (0.0109) [2024-06-15 21:03:04,115][1651669] Updated weights for policy 0, policy_version 800372 (0.0012) [2024-06-15 21:03:05,770][1648981] Fps is (10 sec: 52411.5, 60 sec: 47510.6, 300 sec: 48762.7). Total num frames: 1639186432. Throughput: 0: 12161.8. Samples: 409879040. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:03:05,771][1648981] Avg episode reward: [(0, '871.580')] [2024-06-15 21:03:07,518][1651669] Updated weights for policy 0, policy_version 800418 (0.0012) [2024-06-15 21:03:07,820][1651274] Signal inference workers to stop experience collection... (41950 times) [2024-06-15 21:03:07,902][1651669] InferenceWorker_p0-w0: stopping experience collection (41950 times) [2024-06-15 21:03:08,058][1651274] Signal inference workers to resume experience collection... (41950 times) [2024-06-15 21:03:08,059][1651669] InferenceWorker_p0-w0: resuming experience collection (41950 times) [2024-06-15 21:03:08,608][1651669] Updated weights for policy 0, policy_version 800465 (0.0012) [2024-06-15 21:03:10,753][1651669] Updated weights for policy 0, policy_version 800517 (0.0013) [2024-06-15 21:03:10,776][1648981] Fps is (10 sec: 45830.3, 60 sec: 48053.5, 300 sec: 48873.3). Total num frames: 1639448576. Throughput: 0: 11966.8. Samples: 409915904. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:03:10,777][1648981] Avg episode reward: [(0, '850.220')] [2024-06-15 21:03:11,874][1651669] Updated weights for policy 0, policy_version 800568 (0.0013) [2024-06-15 21:03:14,232][1651669] Updated weights for policy 0, policy_version 800617 (0.0036) [2024-06-15 21:03:15,766][1648981] Fps is (10 sec: 52448.6, 60 sec: 48605.9, 300 sec: 48985.6). Total num frames: 1639710720. Throughput: 0: 12106.0. Samples: 409988608. Policy #0 lag: (min: 63.0, avg: 151.6, max: 319.0) [2024-06-15 21:03:15,767][1648981] Avg episode reward: [(0, '868.630')] [2024-06-15 21:03:18,118][1651669] Updated weights for policy 0, policy_version 800672 (0.0168) [2024-06-15 21:03:19,174][1651669] Updated weights for policy 0, policy_version 800724 (0.0040) [2024-06-15 21:03:20,074][1651669] Updated weights for policy 0, policy_version 800767 (0.0011) [2024-06-15 21:03:20,769][1648981] Fps is (10 sec: 52465.8, 60 sec: 48057.5, 300 sec: 48877.1). Total num frames: 1639972864. Throughput: 0: 12082.7. Samples: 410064384. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:20,770][1648981] Avg episode reward: [(0, '849.300')] [2024-06-15 21:03:22,225][1651669] Updated weights for policy 0, policy_version 800827 (0.0011) [2024-06-15 21:03:25,335][1651669] Updated weights for policy 0, policy_version 800885 (0.0014) [2024-06-15 21:03:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49152.1, 300 sec: 49096.5). Total num frames: 1640235008. Throughput: 0: 12299.4. Samples: 410105344. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:25,767][1648981] Avg episode reward: [(0, '868.420')] [2024-06-15 21:03:27,608][1651669] Updated weights for policy 0, policy_version 800919 (0.0011) [2024-06-15 21:03:29,487][1651669] Updated weights for policy 0, policy_version 800997 (0.0123) [2024-06-15 21:03:30,766][1648981] Fps is (10 sec: 52443.9, 60 sec: 49152.1, 300 sec: 48874.3). Total num frames: 1640497152. Throughput: 0: 12177.7. Samples: 410176512. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:30,767][1648981] Avg episode reward: [(0, '880.570')] [2024-06-15 21:03:31,059][1651669] Updated weights for policy 0, policy_version 801025 (0.0017) [2024-06-15 21:03:32,107][1651669] Updated weights for policy 0, policy_version 801079 (0.0011) [2024-06-15 21:03:35,245][1651669] Updated weights for policy 0, policy_version 801125 (0.0014) [2024-06-15 21:03:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50791.1, 300 sec: 49318.6). Total num frames: 1640759296. Throughput: 0: 12515.6. Samples: 410257408. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:35,767][1648981] Avg episode reward: [(0, '889.820')] [2024-06-15 21:03:37,776][1651669] Updated weights for policy 0, policy_version 801156 (0.0010) [2024-06-15 21:03:39,156][1651669] Updated weights for policy 0, policy_version 801216 (0.0012) [2024-06-15 21:03:40,733][1651669] Updated weights for policy 0, policy_version 801276 (0.0013) [2024-06-15 21:03:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 1640988672. Throughput: 0: 12618.1. Samples: 410296320. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:40,767][1648981] Avg episode reward: [(0, '909.220')] [2024-06-15 21:03:43,058][1651669] Updated weights for policy 0, policy_version 801335 (0.0013) [2024-06-15 21:03:45,769][1648981] Fps is (10 sec: 42585.4, 60 sec: 50252.4, 300 sec: 48984.9). Total num frames: 1641185280. Throughput: 0: 12309.9. Samples: 410360832. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:45,770][1648981] Avg episode reward: [(0, '892.880')] [2024-06-15 21:03:46,459][1651669] Updated weights for policy 0, policy_version 801398 (0.0014) [2024-06-15 21:03:49,333][1651274] Signal inference workers to stop experience collection... (42000 times) [2024-06-15 21:03:49,400][1651669] InferenceWorker_p0-w0: stopping experience collection (42000 times) [2024-06-15 21:03:49,401][1651669] Updated weights for policy 0, policy_version 801427 (0.0012) [2024-06-15 21:03:49,674][1651274] Signal inference workers to resume experience collection... (42000 times) [2024-06-15 21:03:49,690][1651669] InferenceWorker_p0-w0: resuming experience collection (42000 times) [2024-06-15 21:03:50,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48059.8, 300 sec: 48652.2). Total num frames: 1641414656. Throughput: 0: 12368.7. Samples: 410435584. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:50,767][1648981] Avg episode reward: [(0, '922.070')] [2024-06-15 21:03:51,712][1651669] Updated weights for policy 0, policy_version 801520 (0.0012) [2024-06-15 21:03:53,510][1651669] Updated weights for policy 0, policy_version 801568 (0.0013) [2024-06-15 21:03:55,766][1648981] Fps is (10 sec: 49167.1, 60 sec: 50244.7, 300 sec: 49096.5). Total num frames: 1641676800. Throughput: 0: 12222.4. Samples: 410465792. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:03:55,767][1648981] Avg episode reward: [(0, '926.540')] [2024-06-15 21:03:58,248][1651669] Updated weights for policy 0, policy_version 801664 (0.0012) [2024-06-15 21:04:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 47513.7, 300 sec: 48541.1). Total num frames: 1641840640. Throughput: 0: 12265.2. Samples: 410540544. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:00,767][1648981] Avg episode reward: [(0, '921.820')] [2024-06-15 21:04:02,679][1651669] Updated weights for policy 0, policy_version 801762 (0.0012) [2024-06-15 21:04:04,726][1651669] Updated weights for policy 0, policy_version 801808 (0.0017) [2024-06-15 21:04:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50247.5, 300 sec: 49318.6). Total num frames: 1642201088. Throughput: 0: 11913.3. Samples: 410600448. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:05,767][1648981] Avg episode reward: [(0, '899.070')] [2024-06-15 21:04:08,776][1651669] Updated weights for policy 0, policy_version 801872 (0.0012) [2024-06-15 21:04:10,771][1648981] Fps is (10 sec: 49132.4, 60 sec: 48064.5, 300 sec: 48651.5). Total num frames: 1642332160. Throughput: 0: 11968.4. Samples: 410643968. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:10,772][1648981] Avg episode reward: [(0, '918.900')] [2024-06-15 21:04:11,952][1651669] Updated weights for policy 0, policy_version 801926 (0.0012) [2024-06-15 21:04:13,825][1651669] Updated weights for policy 0, policy_version 802003 (0.0129) [2024-06-15 21:04:15,592][1651669] Updated weights for policy 0, policy_version 802050 (0.0012) [2024-06-15 21:04:15,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1642594304. Throughput: 0: 11798.7. Samples: 410707456. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:15,767][1648981] Avg episode reward: [(0, '932.310')] [2024-06-15 21:04:16,876][1651669] Updated weights for policy 0, policy_version 802102 (0.0021) [2024-06-15 21:04:20,309][1651669] Updated weights for policy 0, policy_version 802147 (0.0011) [2024-06-15 21:04:20,770][1648981] Fps is (10 sec: 52429.9, 60 sec: 48059.0, 300 sec: 48987.7). Total num frames: 1642856448. Throughput: 0: 11809.1. Samples: 410788864. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:20,771][1648981] Avg episode reward: [(0, '911.770')] [2024-06-15 21:04:22,555][1651669] Updated weights for policy 0, policy_version 802195 (0.0022) [2024-06-15 21:04:24,141][1651669] Updated weights for policy 0, policy_version 802260 (0.0099) [2024-06-15 21:04:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1643118592. Throughput: 0: 11685.0. Samples: 410822144. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:25,767][1648981] Avg episode reward: [(0, '902.820')] [2024-06-15 21:04:25,972][1651669] Updated weights for policy 0, policy_version 802305 (0.0012) [2024-06-15 21:04:27,225][1651669] Updated weights for policy 0, policy_version 802363 (0.0117) [2024-06-15 21:04:30,766][1648981] Fps is (10 sec: 42614.3, 60 sec: 46421.3, 300 sec: 48652.7). Total num frames: 1643282432. Throughput: 0: 11936.1. Samples: 410897920. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:30,767][1648981] Avg episode reward: [(0, '890.800')] [2024-06-15 21:04:31,302][1651274] Signal inference workers to stop experience collection... (42050 times) [2024-06-15 21:04:31,373][1651669] InferenceWorker_p0-w0: stopping experience collection (42050 times) [2024-06-15 21:04:31,545][1651274] Signal inference workers to resume experience collection... (42050 times) [2024-06-15 21:04:31,547][1651669] InferenceWorker_p0-w0: resuming experience collection (42050 times) [2024-06-15 21:04:31,772][1651669] Updated weights for policy 0, policy_version 802428 (0.0012) [2024-06-15 21:04:34,793][1651669] Updated weights for policy 0, policy_version 802496 (0.0012) [2024-06-15 21:04:35,767][1648981] Fps is (10 sec: 45874.2, 60 sec: 46967.3, 300 sec: 48652.1). Total num frames: 1643577344. Throughput: 0: 11707.7. Samples: 410962432. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:35,767][1648981] Avg episode reward: [(0, '849.760')] [2024-06-15 21:04:36,122][1651669] Updated weights for policy 0, policy_version 802557 (0.0014) [2024-06-15 21:04:38,331][1651669] Updated weights for policy 0, policy_version 802624 (0.0011) [2024-06-15 21:04:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46421.3, 300 sec: 48874.3). Total num frames: 1643773952. Throughput: 0: 11810.1. Samples: 410997248. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:40,767][1648981] Avg episode reward: [(0, '867.440')] [2024-06-15 21:04:42,834][1651669] Updated weights for policy 0, policy_version 802679 (0.0020) [2024-06-15 21:04:44,307][1651669] Updated weights for policy 0, policy_version 802704 (0.0011) [2024-06-15 21:04:45,615][1651669] Updated weights for policy 0, policy_version 802755 (0.0020) [2024-06-15 21:04:45,777][1648981] Fps is (10 sec: 45826.9, 60 sec: 47507.5, 300 sec: 48428.2). Total num frames: 1644036096. Throughput: 0: 11943.8. Samples: 411078144. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:45,778][1648981] Avg episode reward: [(0, '890.740')] [2024-06-15 21:04:46,657][1651669] Updated weights for policy 0, policy_version 802807 (0.0012) [2024-06-15 21:04:48,265][1651669] Updated weights for policy 0, policy_version 802852 (0.0099) [2024-06-15 21:04:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1644298240. Throughput: 0: 12424.5. Samples: 411159552. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:50,767][1648981] Avg episode reward: [(0, '920.680')] [2024-06-15 21:04:51,974][1651669] Updated weights for policy 0, policy_version 802897 (0.0012) [2024-06-15 21:04:52,782][1651669] Updated weights for policy 0, policy_version 802941 (0.0016) [2024-06-15 21:04:55,771][1648981] Fps is (10 sec: 52461.8, 60 sec: 48056.1, 300 sec: 48429.3). Total num frames: 1644560384. Throughput: 0: 12333.4. Samples: 411198976. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:04:55,771][1648981] Avg episode reward: [(0, '860.270')] [2024-06-15 21:04:55,919][1651669] Updated weights for policy 0, policy_version 803024 (0.0029) [2024-06-15 21:04:56,247][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000803040_1644625920.pth... [2024-06-15 21:04:56,422][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000797312_1632894976.pth [2024-06-15 21:04:57,051][1651669] Updated weights for policy 0, policy_version 803072 (0.0015) [2024-06-15 21:04:59,084][1651669] Updated weights for policy 0, policy_version 803135 (0.0012) [2024-06-15 21:05:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 48876.3). Total num frames: 1644822528. Throughput: 0: 12413.2. Samples: 411266048. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:05:00,767][1648981] Avg episode reward: [(0, '864.330')] [2024-06-15 21:05:03,503][1651669] Updated weights for policy 0, policy_version 803200 (0.0020) [2024-06-15 21:05:05,587][1651669] Updated weights for policy 0, policy_version 803264 (0.0011) [2024-06-15 21:05:05,767][1648981] Fps is (10 sec: 52451.0, 60 sec: 48059.5, 300 sec: 48652.1). Total num frames: 1645084672. Throughput: 0: 12345.9. Samples: 411344384. Policy #0 lag: (min: 115.0, avg: 198.7, max: 371.0) [2024-06-15 21:05:05,768][1648981] Avg episode reward: [(0, '858.480')] [2024-06-15 21:05:07,220][1651669] Updated weights for policy 0, policy_version 803327 (0.0012) [2024-06-15 21:05:09,937][1651669] Updated weights for policy 0, policy_version 803381 (0.0010) [2024-06-15 21:05:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50247.6, 300 sec: 48985.4). Total num frames: 1645346816. Throughput: 0: 12265.2. Samples: 411374080. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:10,767][1648981] Avg episode reward: [(0, '856.220')] [2024-06-15 21:05:12,961][1651274] Signal inference workers to stop experience collection... (42100 times) [2024-06-15 21:05:13,006][1651669] InferenceWorker_p0-w0: stopping experience collection (42100 times) [2024-06-15 21:05:13,234][1651274] Signal inference workers to resume experience collection... (42100 times) [2024-06-15 21:05:13,236][1651669] InferenceWorker_p0-w0: resuming experience collection (42100 times) [2024-06-15 21:05:13,713][1651669] Updated weights for policy 0, policy_version 803424 (0.0012) [2024-06-15 21:05:15,061][1651669] Updated weights for policy 0, policy_version 803458 (0.0011) [2024-06-15 21:05:15,767][1648981] Fps is (10 sec: 42594.5, 60 sec: 48604.9, 300 sec: 48540.9). Total num frames: 1645510656. Throughput: 0: 12367.3. Samples: 411454464. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:15,768][1648981] Avg episode reward: [(0, '852.540')] [2024-06-15 21:05:16,767][1651669] Updated weights for policy 0, policy_version 803536 (0.0032) [2024-06-15 21:05:19,596][1651669] Updated weights for policy 0, policy_version 803603 (0.0012) [2024-06-15 21:05:20,770][1648981] Fps is (10 sec: 52408.9, 60 sec: 50244.2, 300 sec: 49206.9). Total num frames: 1645871104. Throughput: 0: 12343.9. Samples: 411517952. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:20,771][1648981] Avg episode reward: [(0, '828.590')] [2024-06-15 21:05:24,066][1651669] Updated weights for policy 0, policy_version 803664 (0.0012) [2024-06-15 21:05:25,767][1648981] Fps is (10 sec: 52434.4, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1646034944. Throughput: 0: 12697.6. Samples: 411568640. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:25,767][1648981] Avg episode reward: [(0, '825.970')] [2024-06-15 21:05:25,804][1651669] Updated weights for policy 0, policy_version 803728 (0.0084) [2024-06-15 21:05:27,066][1651669] Updated weights for policy 0, policy_version 803778 (0.0129) [2024-06-15 21:05:28,330][1651669] Updated weights for policy 0, policy_version 803840 (0.0012) [2024-06-15 21:05:30,774][1648981] Fps is (10 sec: 45857.3, 60 sec: 50783.9, 300 sec: 49095.2). Total num frames: 1646329856. Throughput: 0: 12323.0. Samples: 411632640. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:30,775][1648981] Avg episode reward: [(0, '815.450')] [2024-06-15 21:05:31,263][1651669] Updated weights for policy 0, policy_version 803902 (0.0012) [2024-06-15 21:05:35,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 1646460928. Throughput: 0: 12265.2. Samples: 411711488. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:35,767][1648981] Avg episode reward: [(0, '820.440')] [2024-06-15 21:05:36,185][1651669] Updated weights for policy 0, policy_version 803968 (0.0011) [2024-06-15 21:05:37,476][1651669] Updated weights for policy 0, policy_version 804020 (0.0010) [2024-06-15 21:05:38,997][1651669] Updated weights for policy 0, policy_version 804088 (0.0127) [2024-06-15 21:05:40,776][1648981] Fps is (10 sec: 45864.3, 60 sec: 50235.8, 300 sec: 48872.6). Total num frames: 1646788608. Throughput: 0: 12036.2. Samples: 411740672. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:40,777][1648981] Avg episode reward: [(0, '812.520')] [2024-06-15 21:05:41,871][1651669] Updated weights for policy 0, policy_version 804130 (0.0018) [2024-06-15 21:05:45,033][1651669] Updated weights for policy 0, policy_version 804179 (0.0012) [2024-06-15 21:05:45,766][1648981] Fps is (10 sec: 55705.1, 60 sec: 49707.0, 300 sec: 48874.3). Total num frames: 1647017984. Throughput: 0: 12424.5. Samples: 411825152. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:45,767][1648981] Avg episode reward: [(0, '793.780')] [2024-06-15 21:05:45,867][1651669] Updated weights for policy 0, policy_version 804218 (0.0013) [2024-06-15 21:05:47,658][1651669] Updated weights for policy 0, policy_version 804272 (0.0144) [2024-06-15 21:05:48,845][1651274] Signal inference workers to stop experience collection... (42150 times) [2024-06-15 21:05:48,913][1651669] InferenceWorker_p0-w0: stopping experience collection (42150 times) [2024-06-15 21:05:48,915][1651669] Updated weights for policy 0, policy_version 804324 (0.0012) [2024-06-15 21:05:49,125][1651274] Signal inference workers to resume experience collection... (42150 times) [2024-06-15 21:05:49,126][1651669] InferenceWorker_p0-w0: resuming experience collection (42150 times) [2024-06-15 21:05:50,767][1648981] Fps is (10 sec: 52481.0, 60 sec: 50244.1, 300 sec: 48876.9). Total num frames: 1647312896. Throughput: 0: 12356.3. Samples: 411900416. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:50,767][1648981] Avg episode reward: [(0, '793.780')] [2024-06-15 21:05:50,882][1651669] Updated weights for policy 0, policy_version 804353 (0.0013) [2024-06-15 21:05:51,850][1651669] Updated weights for policy 0, policy_version 804414 (0.0022) [2024-06-15 21:05:55,777][1648981] Fps is (10 sec: 45824.9, 60 sec: 48600.5, 300 sec: 48763.3). Total num frames: 1647476736. Throughput: 0: 12512.5. Samples: 411937280. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:05:55,778][1648981] Avg episode reward: [(0, '829.430')] [2024-06-15 21:05:56,319][1651669] Updated weights for policy 0, policy_version 804464 (0.0011) [2024-06-15 21:05:57,484][1651669] Updated weights for policy 0, policy_version 804496 (0.0013) [2024-06-15 21:05:59,109][1651669] Updated weights for policy 0, policy_version 804576 (0.0023) [2024-06-15 21:06:00,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1647837184. Throughput: 0: 12288.3. Samples: 412007424. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:00,767][1648981] Avg episode reward: [(0, '855.150')] [2024-06-15 21:06:01,288][1651669] Updated weights for policy 0, policy_version 804627 (0.0057) [2024-06-15 21:06:05,767][1648981] Fps is (10 sec: 52486.3, 60 sec: 48606.0, 300 sec: 48989.3). Total num frames: 1648001024. Throughput: 0: 12721.4. Samples: 412090368. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:05,767][1648981] Avg episode reward: [(0, '882.470')] [2024-06-15 21:06:06,206][1651669] Updated weights for policy 0, policy_version 804707 (0.0013) [2024-06-15 21:06:08,510][1651669] Updated weights for policy 0, policy_version 804759 (0.0011) [2024-06-15 21:06:10,356][1651669] Updated weights for policy 0, policy_version 804848 (0.0012) [2024-06-15 21:06:10,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 50244.1, 300 sec: 48874.3). Total num frames: 1648361472. Throughput: 0: 12367.6. Samples: 412125184. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:10,767][1648981] Avg episode reward: [(0, '869.030')] [2024-06-15 21:06:12,143][1651669] Updated weights for policy 0, policy_version 804867 (0.0016) [2024-06-15 21:06:13,421][1651669] Updated weights for policy 0, policy_version 804926 (0.0012) [2024-06-15 21:06:15,771][1648981] Fps is (10 sec: 49132.3, 60 sec: 49695.7, 300 sec: 48876.3). Total num frames: 1648492544. Throughput: 0: 12630.4. Samples: 412200960. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:15,771][1648981] Avg episode reward: [(0, '936.660')] [2024-06-15 21:06:17,567][1651669] Updated weights for policy 0, policy_version 804987 (0.0011) [2024-06-15 21:06:18,946][1651669] Updated weights for policy 0, policy_version 805028 (0.0012) [2024-06-15 21:06:20,740][1651669] Updated weights for policy 0, policy_version 805117 (0.0011) [2024-06-15 21:06:20,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 49701.3, 300 sec: 48985.4). Total num frames: 1648852992. Throughput: 0: 12367.6. Samples: 412268032. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:20,767][1648981] Avg episode reward: [(0, '1014.790')] [2024-06-15 21:06:23,913][1651669] Updated weights for policy 0, policy_version 805180 (0.0016) [2024-06-15 21:06:25,766][1648981] Fps is (10 sec: 52450.4, 60 sec: 49698.2, 300 sec: 48874.3). Total num frames: 1649016832. Throughput: 0: 12677.7. Samples: 412311040. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:25,767][1648981] Avg episode reward: [(0, '960.610')] [2024-06-15 21:06:27,513][1651669] Updated weights for policy 0, policy_version 805220 (0.0013) [2024-06-15 21:06:29,036][1651669] Updated weights for policy 0, policy_version 805253 (0.0013) [2024-06-15 21:06:30,382][1651274] Signal inference workers to stop experience collection... (42200 times) [2024-06-15 21:06:30,475][1651669] InferenceWorker_p0-w0: stopping experience collection (42200 times) [2024-06-15 21:06:30,693][1651274] Signal inference workers to resume experience collection... (42200 times) [2024-06-15 21:06:30,695][1651669] InferenceWorker_p0-w0: resuming experience collection (42200 times) [2024-06-15 21:06:30,697][1651669] Updated weights for policy 0, policy_version 805328 (0.0012) [2024-06-15 21:06:30,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49704.5, 300 sec: 48986.5). Total num frames: 1649311744. Throughput: 0: 12492.8. Samples: 412387328. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:30,767][1648981] Avg episode reward: [(0, '981.470')] [2024-06-15 21:06:33,567][1651669] Updated weights for policy 0, policy_version 805379 (0.0013) [2024-06-15 21:06:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 51336.5, 300 sec: 48874.3). Total num frames: 1649541120. Throughput: 0: 12390.4. Samples: 412457984. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:35,767][1648981] Avg episode reward: [(0, '1002.400')] [2024-06-15 21:06:37,485][1651669] Updated weights for policy 0, policy_version 805442 (0.0020) [2024-06-15 21:06:38,617][1651669] Updated weights for policy 0, policy_version 805490 (0.0011) [2024-06-15 21:06:39,810][1651669] Updated weights for policy 0, policy_version 805520 (0.0011) [2024-06-15 21:06:40,774][1648981] Fps is (10 sec: 45839.2, 60 sec: 49700.0, 300 sec: 48761.9). Total num frames: 1649770496. Throughput: 0: 12425.4. Samples: 412496384. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:40,775][1648981] Avg episode reward: [(0, '1020.210')] [2024-06-15 21:06:41,379][1651669] Updated weights for policy 0, policy_version 805584 (0.0012) [2024-06-15 21:06:44,026][1651669] Updated weights for policy 0, policy_version 805642 (0.0012) [2024-06-15 21:06:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50790.5, 300 sec: 49096.5). Total num frames: 1650065408. Throughput: 0: 12413.2. Samples: 412566016. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:45,767][1648981] Avg episode reward: [(0, '995.850')] [2024-06-15 21:06:48,371][1651669] Updated weights for policy 0, policy_version 805699 (0.0015) [2024-06-15 21:06:50,485][1651669] Updated weights for policy 0, policy_version 805776 (0.0011) [2024-06-15 21:06:50,766][1648981] Fps is (10 sec: 45911.8, 60 sec: 48606.1, 300 sec: 48652.2). Total num frames: 1650229248. Throughput: 0: 12242.5. Samples: 412641280. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:50,767][1648981] Avg episode reward: [(0, '955.550')] [2024-06-15 21:06:52,461][1651669] Updated weights for policy 0, policy_version 805842 (0.0020) [2024-06-15 21:06:55,767][1648981] Fps is (10 sec: 39319.6, 60 sec: 49706.9, 300 sec: 48874.2). Total num frames: 1650458624. Throughput: 0: 12049.0. Samples: 412667392. Policy #0 lag: (min: 102.0, avg: 212.1, max: 374.0) [2024-06-15 21:06:55,768][1648981] Avg episode reward: [(0, '979.230')] [2024-06-15 21:06:55,844][1651669] Updated weights for policy 0, policy_version 805904 (0.0011) [2024-06-15 21:06:56,326][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000805920_1650524160.pth... [2024-06-15 21:06:56,486][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000800128_1638662144.pth [2024-06-15 21:07:00,049][1651669] Updated weights for policy 0, policy_version 805984 (0.0019) [2024-06-15 21:07:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 1650720768. Throughput: 0: 12118.4. Samples: 412746240. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:00,767][1648981] Avg episode reward: [(0, '964.550')] [2024-06-15 21:07:01,718][1651669] Updated weights for policy 0, policy_version 806032 (0.0011) [2024-06-15 21:07:02,945][1651669] Updated weights for policy 0, policy_version 806079 (0.0010) [2024-06-15 21:07:04,742][1651669] Updated weights for policy 0, policy_version 806135 (0.0113) [2024-06-15 21:07:05,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 49698.2, 300 sec: 48874.7). Total num frames: 1650982912. Throughput: 0: 12071.8. Samples: 412811264. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:05,767][1648981] Avg episode reward: [(0, '974.620')] [2024-06-15 21:07:06,550][1651669] Updated weights for policy 0, policy_version 806176 (0.0149) [2024-06-15 21:07:10,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 45875.4, 300 sec: 48541.1). Total num frames: 1651113984. Throughput: 0: 11867.0. Samples: 412845056. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:10,767][1648981] Avg episode reward: [(0, '972.760')] [2024-06-15 21:07:11,514][1651669] Updated weights for policy 0, policy_version 806231 (0.0019) [2024-06-15 21:07:13,091][1651669] Updated weights for policy 0, policy_version 806273 (0.0044) [2024-06-15 21:07:14,409][1651669] Updated weights for policy 0, policy_version 806327 (0.0011) [2024-06-15 21:07:14,605][1651274] Signal inference workers to stop experience collection... (42250 times) [2024-06-15 21:07:14,641][1651669] InferenceWorker_p0-w0: stopping experience collection (42250 times) [2024-06-15 21:07:14,651][1651274] Signal inference workers to resume experience collection... (42250 times) [2024-06-15 21:07:14,667][1651669] InferenceWorker_p0-w0: resuming experience collection (42250 times) [2024-06-15 21:07:15,700][1651669] Updated weights for policy 0, policy_version 806370 (0.0010) [2024-06-15 21:07:15,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49155.4, 300 sec: 48652.1). Total num frames: 1651441664. Throughput: 0: 11787.4. Samples: 412917760. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:15,767][1648981] Avg episode reward: [(0, '952.400')] [2024-06-15 21:07:17,461][1651669] Updated weights for policy 0, policy_version 806418 (0.0011) [2024-06-15 21:07:18,308][1651669] Updated weights for policy 0, policy_version 806457 (0.0011) [2024-06-15 21:07:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46421.4, 300 sec: 48652.2). Total num frames: 1651638272. Throughput: 0: 11787.4. Samples: 412988416. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:20,767][1648981] Avg episode reward: [(0, '921.670')] [2024-06-15 21:07:22,873][1651669] Updated weights for policy 0, policy_version 806497 (0.0037) [2024-06-15 21:07:25,034][1651669] Updated weights for policy 0, policy_version 806560 (0.0012) [2024-06-15 21:07:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 1651867648. Throughput: 0: 11743.9. Samples: 413024768. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:25,767][1648981] Avg episode reward: [(0, '921.160')] [2024-06-15 21:07:26,345][1651669] Updated weights for policy 0, policy_version 806593 (0.0010) [2024-06-15 21:07:27,844][1651669] Updated weights for policy 0, policy_version 806661 (0.0030) [2024-06-15 21:07:29,440][1651669] Updated weights for policy 0, policy_version 806720 (0.0013) [2024-06-15 21:07:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 48985.5). Total num frames: 1652162560. Throughput: 0: 11525.7. Samples: 413084672. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:30,767][1648981] Avg episode reward: [(0, '921.820')] [2024-06-15 21:07:35,354][1651669] Updated weights for policy 0, policy_version 806777 (0.0012) [2024-06-15 21:07:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 48318.9). Total num frames: 1652293632. Throughput: 0: 11525.7. Samples: 413159936. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:35,767][1648981] Avg episode reward: [(0, '932.020')] [2024-06-15 21:07:36,920][1651669] Updated weights for policy 0, policy_version 806840 (0.0013) [2024-06-15 21:07:38,431][1651669] Updated weights for policy 0, policy_version 806883 (0.0027) [2024-06-15 21:07:40,136][1651669] Updated weights for policy 0, policy_version 806960 (0.0011) [2024-06-15 21:07:40,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48612.2, 300 sec: 49209.6). Total num frames: 1652686848. Throughput: 0: 11742.0. Samples: 413195776. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:40,767][1648981] Avg episode reward: [(0, '889.920')] [2024-06-15 21:07:45,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 43690.7, 300 sec: 47985.7). Total num frames: 1652686848. Throughput: 0: 11628.1. Samples: 413269504. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:45,767][1648981] Avg episode reward: [(0, '846.300')] [2024-06-15 21:07:47,145][1651669] Updated weights for policy 0, policy_version 807040 (0.0014) [2024-06-15 21:07:48,254][1651669] Updated weights for policy 0, policy_version 807099 (0.0010) [2024-06-15 21:07:49,460][1651669] Updated weights for policy 0, policy_version 807137 (0.0009) [2024-06-15 21:07:50,743][1651669] Updated weights for policy 0, policy_version 807200 (0.0011) [2024-06-15 21:07:50,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48605.9, 300 sec: 49096.6). Total num frames: 1653145600. Throughput: 0: 11582.6. Samples: 413332480. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:50,767][1648981] Avg episode reward: [(0, '872.260')] [2024-06-15 21:07:55,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 45875.5, 300 sec: 48207.8). Total num frames: 1653211136. Throughput: 0: 11787.4. Samples: 413375488. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:07:55,767][1648981] Avg episode reward: [(0, '910.450')] [2024-06-15 21:07:56,593][1651669] Updated weights for policy 0, policy_version 807234 (0.0017) [2024-06-15 21:07:57,613][1651274] Signal inference workers to stop experience collection... (42300 times) [2024-06-15 21:07:57,665][1651669] InferenceWorker_p0-w0: stopping experience collection (42300 times) [2024-06-15 21:07:57,814][1651274] Signal inference workers to resume experience collection... (42300 times) [2024-06-15 21:07:57,815][1651669] InferenceWorker_p0-w0: resuming experience collection (42300 times) [2024-06-15 21:07:57,992][1651669] Updated weights for policy 0, policy_version 807299 (0.0040) [2024-06-15 21:07:59,231][1651669] Updated weights for policy 0, policy_version 807357 (0.0124) [2024-06-15 21:08:00,622][1651669] Updated weights for policy 0, policy_version 807410 (0.0011) [2024-06-15 21:08:00,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 48763.9). Total num frames: 1653571584. Throughput: 0: 11810.1. Samples: 413449216. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:00,767][1648981] Avg episode reward: [(0, '906.550')] [2024-06-15 21:08:02,000][1651669] Updated weights for policy 0, policy_version 807476 (0.0012) [2024-06-15 21:08:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 48431.6). Total num frames: 1653735424. Throughput: 0: 12060.4. Samples: 413531136. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:05,767][1648981] Avg episode reward: [(0, '906.950')] [2024-06-15 21:08:07,504][1651669] Updated weights for policy 0, policy_version 807520 (0.0022) [2024-06-15 21:08:09,386][1651669] Updated weights for policy 0, policy_version 807600 (0.0017) [2024-06-15 21:08:10,528][1651669] Updated weights for policy 0, policy_version 807649 (0.0012) [2024-06-15 21:08:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1654095872. Throughput: 0: 12003.6. Samples: 413564928. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:10,767][1648981] Avg episode reward: [(0, '901.840')] [2024-06-15 21:08:11,583][1651669] Updated weights for policy 0, policy_version 807712 (0.0015) [2024-06-15 21:08:12,255][1651669] Updated weights for policy 0, policy_version 807744 (0.0021) [2024-06-15 21:08:15,773][1648981] Fps is (10 sec: 52395.4, 60 sec: 46962.5, 300 sec: 48429.4). Total num frames: 1654259712. Throughput: 0: 12422.8. Samples: 413643776. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:15,773][1648981] Avg episode reward: [(0, '901.840')] [2024-06-15 21:08:17,721][1651669] Updated weights for policy 0, policy_version 807799 (0.0099) [2024-06-15 21:08:19,068][1651669] Updated weights for policy 0, policy_version 807856 (0.0011) [2024-06-15 21:08:20,294][1651669] Updated weights for policy 0, policy_version 807907 (0.0012) [2024-06-15 21:08:20,766][1648981] Fps is (10 sec: 55706.0, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1654652928. Throughput: 0: 12435.9. Samples: 413719552. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:20,767][1648981] Avg episode reward: [(0, '880.880')] [2024-06-15 21:08:21,696][1651669] Updated weights for policy 0, policy_version 807984 (0.0124) [2024-06-15 21:08:25,766][1648981] Fps is (10 sec: 52462.4, 60 sec: 48605.9, 300 sec: 48430.0). Total num frames: 1654784000. Throughput: 0: 12447.3. Samples: 413755904. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:25,767][1648981] Avg episode reward: [(0, '903.670')] [2024-06-15 21:08:28,603][1651669] Updated weights for policy 0, policy_version 808048 (0.0141) [2024-06-15 21:08:30,286][1651669] Updated weights for policy 0, policy_version 808114 (0.0012) [2024-06-15 21:08:30,775][1648981] Fps is (10 sec: 42560.7, 60 sec: 48598.7, 300 sec: 48539.6). Total num frames: 1655078912. Throughput: 0: 12638.2. Samples: 413838336. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:30,776][1648981] Avg episode reward: [(0, '895.820')] [2024-06-15 21:08:30,827][1651274] Signal inference workers to stop experience collection... (42350 times) [2024-06-15 21:08:30,871][1651669] InferenceWorker_p0-w0: stopping experience collection (42350 times) [2024-06-15 21:08:31,077][1651274] Signal inference workers to resume experience collection... (42350 times) [2024-06-15 21:08:31,082][1651669] InferenceWorker_p0-w0: resuming experience collection (42350 times) [2024-06-15 21:08:31,843][1651669] Updated weights for policy 0, policy_version 808192 (0.0011) [2024-06-15 21:08:32,957][1651669] Updated weights for policy 0, policy_version 808253 (0.0013) [2024-06-15 21:08:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 1655308288. Throughput: 0: 12788.6. Samples: 413907968. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:35,767][1648981] Avg episode reward: [(0, '910.610')] [2024-06-15 21:08:39,716][1651669] Updated weights for policy 0, policy_version 808308 (0.0019) [2024-06-15 21:08:40,766][1648981] Fps is (10 sec: 42635.8, 60 sec: 46967.5, 300 sec: 48541.6). Total num frames: 1655504896. Throughput: 0: 12947.9. Samples: 413958144. Policy #0 lag: (min: 15.0, avg: 143.5, max: 271.0) [2024-06-15 21:08:40,767][1648981] Avg episode reward: [(0, '902.870')] [2024-06-15 21:08:41,937][1651669] Updated weights for policy 0, policy_version 808400 (0.0036) [2024-06-15 21:08:43,512][1651669] Updated weights for policy 0, policy_version 808465 (0.0218) [2024-06-15 21:08:44,146][1651669] Updated weights for policy 0, policy_version 808507 (0.0012) [2024-06-15 21:08:45,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 52428.7, 300 sec: 48874.3). Total num frames: 1655832576. Throughput: 0: 12390.4. Samples: 414006784. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:08:45,769][1648981] Avg episode reward: [(0, '881.250')] [2024-06-15 21:08:50,574][1651669] Updated weights for policy 0, policy_version 808562 (0.0013) [2024-06-15 21:08:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.3, 300 sec: 48318.9). Total num frames: 1655930880. Throughput: 0: 12356.3. Samples: 414087168. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:08:50,767][1648981] Avg episode reward: [(0, '891.510')] [2024-06-15 21:08:52,495][1651669] Updated weights for policy 0, policy_version 808628 (0.0011) [2024-06-15 21:08:54,017][1651669] Updated weights for policy 0, policy_version 808704 (0.0013) [2024-06-15 21:08:55,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 52428.6, 300 sec: 49207.5). Total num frames: 1656356864. Throughput: 0: 12162.8. Samples: 414112256. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:08:55,767][1648981] Avg episode reward: [(0, '912.070')] [2024-06-15 21:08:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000808768_1656356864.pth... [2024-06-15 21:08:55,887][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000803040_1644625920.pth [2024-06-15 21:09:00,662][1651669] Updated weights for policy 0, policy_version 808784 (0.0013) [2024-06-15 21:09:00,769][1648981] Fps is (10 sec: 45862.8, 60 sec: 46965.3, 300 sec: 48096.3). Total num frames: 1656389632. Throughput: 0: 12175.2. Samples: 414191616. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:00,770][1648981] Avg episode reward: [(0, '975.540')] [2024-06-15 21:09:02,691][1651669] Updated weights for policy 0, policy_version 808849 (0.0010) [2024-06-15 21:09:04,840][1651669] Updated weights for policy 0, policy_version 808928 (0.0129) [2024-06-15 21:09:05,766][1648981] Fps is (10 sec: 39322.7, 60 sec: 50244.3, 300 sec: 48875.0). Total num frames: 1656750080. Throughput: 0: 11719.1. Samples: 414246912. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:05,767][1648981] Avg episode reward: [(0, '998.600')] [2024-06-15 21:09:05,806][1651669] Updated weights for policy 0, policy_version 808964 (0.0012) [2024-06-15 21:09:06,986][1651669] Updated weights for policy 0, policy_version 809023 (0.0012) [2024-06-15 21:09:10,766][1648981] Fps is (10 sec: 49165.3, 60 sec: 46421.3, 300 sec: 48430.0). Total num frames: 1656881152. Throughput: 0: 11719.1. Samples: 414283264. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:10,767][1648981] Avg episode reward: [(0, '1015.650')] [2024-06-15 21:09:11,890][1651274] Signal inference workers to stop experience collection... (42400 times) [2024-06-15 21:09:11,975][1651669] InferenceWorker_p0-w0: stopping experience collection (42400 times) [2024-06-15 21:09:12,177][1651274] Signal inference workers to resume experience collection... (42400 times) [2024-06-15 21:09:12,179][1651669] InferenceWorker_p0-w0: resuming experience collection (42400 times) [2024-06-15 21:09:13,284][1651669] Updated weights for policy 0, policy_version 809073 (0.0012) [2024-06-15 21:09:15,368][1651669] Updated weights for policy 0, policy_version 809152 (0.0010) [2024-06-15 21:09:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48611.1, 300 sec: 48541.7). Total num frames: 1657176064. Throughput: 0: 11482.4. Samples: 414354944. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:15,767][1648981] Avg episode reward: [(0, '993.480')] [2024-06-15 21:09:16,554][1651669] Updated weights for policy 0, policy_version 809205 (0.0012) [2024-06-15 21:09:17,690][1651669] Updated weights for policy 0, policy_version 809232 (0.0011) [2024-06-15 21:09:18,626][1651669] Updated weights for policy 0, policy_version 809280 (0.0017) [2024-06-15 21:09:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 48430.0). Total num frames: 1657405440. Throughput: 0: 11525.7. Samples: 414426624. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:20,767][1648981] Avg episode reward: [(0, '976.650')] [2024-06-15 21:09:24,817][1651669] Updated weights for policy 0, policy_version 809345 (0.0012) [2024-06-15 21:09:25,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 46967.4, 300 sec: 48541.1). Total num frames: 1657602048. Throughput: 0: 11434.6. Samples: 414472704. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:25,769][1648981] Avg episode reward: [(0, '972.620')] [2024-06-15 21:09:26,220][1651669] Updated weights for policy 0, policy_version 809408 (0.0012) [2024-06-15 21:09:27,387][1651669] Updated weights for policy 0, policy_version 809466 (0.0011) [2024-06-15 21:09:29,000][1651669] Updated weights for policy 0, policy_version 809523 (0.0103) [2024-06-15 21:09:30,770][1648981] Fps is (10 sec: 52408.9, 60 sec: 47517.5, 300 sec: 48651.6). Total num frames: 1657929728. Throughput: 0: 11786.4. Samples: 414537216. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:30,771][1648981] Avg episode reward: [(0, '906.710')] [2024-06-15 21:09:33,980][1651669] Updated weights for policy 0, policy_version 809568 (0.0012) [2024-06-15 21:09:35,257][1651669] Updated weights for policy 0, policy_version 809618 (0.0016) [2024-06-15 21:09:35,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 46967.4, 300 sec: 48652.2). Total num frames: 1658126336. Throughput: 0: 11832.9. Samples: 414619648. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:35,767][1648981] Avg episode reward: [(0, '866.810')] [2024-06-15 21:09:37,321][1651669] Updated weights for policy 0, policy_version 809701 (0.0013) [2024-06-15 21:09:39,139][1651669] Updated weights for policy 0, policy_version 809760 (0.0010) [2024-06-15 21:09:40,799][1648981] Fps is (10 sec: 52279.0, 60 sec: 49125.4, 300 sec: 48870.7). Total num frames: 1658454016. Throughput: 0: 11983.6. Samples: 414651904. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:40,799][1648981] Avg episode reward: [(0, '880.540')] [2024-06-15 21:09:43,795][1651669] Updated weights for policy 0, policy_version 809808 (0.0010) [2024-06-15 21:09:45,305][1651669] Updated weights for policy 0, policy_version 809872 (0.0010) [2024-06-15 21:09:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46967.6, 300 sec: 48652.2). Total num frames: 1658650624. Throughput: 0: 12061.2. Samples: 414734336. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:45,767][1648981] Avg episode reward: [(0, '906.290')] [2024-06-15 21:09:46,247][1651669] Updated weights for policy 0, policy_version 809920 (0.0012) [2024-06-15 21:09:46,404][1651274] Signal inference workers to stop experience collection... (42450 times) [2024-06-15 21:09:46,468][1651669] InferenceWorker_p0-w0: stopping experience collection (42450 times) [2024-06-15 21:09:46,608][1651274] Signal inference workers to resume experience collection... (42450 times) [2024-06-15 21:09:46,609][1651669] InferenceWorker_p0-w0: resuming experience collection (42450 times) [2024-06-15 21:09:47,620][1651669] Updated weights for policy 0, policy_version 809973 (0.0091) [2024-06-15 21:09:49,751][1651669] Updated weights for policy 0, policy_version 810020 (0.0010) [2024-06-15 21:09:50,766][1648981] Fps is (10 sec: 52599.7, 60 sec: 50790.4, 300 sec: 48875.0). Total num frames: 1658978304. Throughput: 0: 12333.5. Samples: 414801920. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:50,767][1648981] Avg episode reward: [(0, '883.320')] [2024-06-15 21:09:54,722][1651669] Updated weights for policy 0, policy_version 810080 (0.0014) [2024-06-15 21:09:55,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 45875.4, 300 sec: 48430.0). Total num frames: 1659109376. Throughput: 0: 12481.4. Samples: 414844928. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:09:55,767][1648981] Avg episode reward: [(0, '886.340')] [2024-06-15 21:09:56,177][1651669] Updated weights for policy 0, policy_version 810138 (0.0014) [2024-06-15 21:09:57,295][1651669] Updated weights for policy 0, policy_version 810176 (0.0013) [2024-06-15 21:09:58,455][1651669] Updated weights for policy 0, policy_version 810233 (0.0012) [2024-06-15 21:10:00,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 50792.8, 300 sec: 48652.2). Total num frames: 1659437056. Throughput: 0: 12310.8. Samples: 414908928. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:00,767][1648981] Avg episode reward: [(0, '889.950')] [2024-06-15 21:10:00,987][1651669] Updated weights for policy 0, policy_version 810288 (0.0014) [2024-06-15 21:10:05,024][1651669] Updated weights for policy 0, policy_version 810325 (0.0013) [2024-06-15 21:10:05,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1659600896. Throughput: 0: 12561.1. Samples: 414991872. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:05,767][1648981] Avg episode reward: [(0, '899.500')] [2024-06-15 21:10:06,967][1651669] Updated weights for policy 0, policy_version 810400 (0.0094) [2024-06-15 21:10:08,650][1651669] Updated weights for policy 0, policy_version 810464 (0.0026) [2024-06-15 21:10:10,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 50790.3, 300 sec: 48874.5). Total num frames: 1659928576. Throughput: 0: 12231.1. Samples: 415023104. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:10,767][1648981] Avg episode reward: [(0, '900.380')] [2024-06-15 21:10:11,080][1651669] Updated weights for policy 0, policy_version 810531 (0.0013) [2024-06-15 21:10:15,166][1651669] Updated weights for policy 0, policy_version 810562 (0.0010) [2024-06-15 21:10:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.7, 300 sec: 48097.4). Total num frames: 1660059648. Throughput: 0: 12584.9. Samples: 415103488. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:15,767][1648981] Avg episode reward: [(0, '866.640')] [2024-06-15 21:10:16,931][1651669] Updated weights for policy 0, policy_version 810625 (0.0012) [2024-06-15 21:10:18,225][1651669] Updated weights for policy 0, policy_version 810680 (0.0012) [2024-06-15 21:10:19,175][1651669] Updated weights for policy 0, policy_version 810705 (0.0016) [2024-06-15 21:10:20,772][1648981] Fps is (10 sec: 49125.9, 60 sec: 50239.7, 300 sec: 48762.3). Total num frames: 1660420096. Throughput: 0: 12275.1. Samples: 415172096. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:20,772][1648981] Avg episode reward: [(0, '858.610')] [2024-06-15 21:10:20,989][1651669] Updated weights for policy 0, policy_version 810754 (0.0013) [2024-06-15 21:10:25,781][1648981] Fps is (10 sec: 49082.0, 60 sec: 49140.4, 300 sec: 48206.8). Total num frames: 1660551168. Throughput: 0: 12361.3. Samples: 415207936. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:25,781][1648981] Avg episode reward: [(0, '856.990')] [2024-06-15 21:10:26,377][1651669] Updated weights for policy 0, policy_version 810834 (0.0149) [2024-06-15 21:10:27,634][1651274] Signal inference workers to stop experience collection... (42500 times) [2024-06-15 21:10:27,675][1651669] InferenceWorker_p0-w0: stopping experience collection (42500 times) [2024-06-15 21:10:27,866][1651274] Signal inference workers to resume experience collection... (42500 times) [2024-06-15 21:10:27,867][1651669] InferenceWorker_p0-w0: resuming experience collection (42500 times) [2024-06-15 21:10:28,258][1651669] Updated weights for policy 0, policy_version 810912 (0.0010) [2024-06-15 21:10:30,312][1651669] Updated weights for policy 0, policy_version 810961 (0.0012) [2024-06-15 21:10:30,770][1648981] Fps is (10 sec: 45881.8, 60 sec: 49151.8, 300 sec: 48873.6). Total num frames: 1660878848. Throughput: 0: 12116.2. Samples: 415279616. Policy #0 lag: (min: 197.0, avg: 252.3, max: 453.0) [2024-06-15 21:10:30,771][1648981] Avg episode reward: [(0, '904.510')] [2024-06-15 21:10:31,088][1651669] Updated weights for policy 0, policy_version 811004 (0.0012) [2024-06-15 21:10:33,337][1651669] Updated weights for policy 0, policy_version 811063 (0.0015) [2024-06-15 21:10:35,767][1648981] Fps is (10 sec: 52503.1, 60 sec: 49151.8, 300 sec: 48431.6). Total num frames: 1661075456. Throughput: 0: 12310.7. Samples: 415355904. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:10:35,767][1648981] Avg episode reward: [(0, '921.170')] [2024-06-15 21:10:36,951][1651669] Updated weights for policy 0, policy_version 811091 (0.0011) [2024-06-15 21:10:38,208][1651669] Updated weights for policy 0, policy_version 811141 (0.0012) [2024-06-15 21:10:40,767][1648981] Fps is (10 sec: 45892.7, 60 sec: 48085.6, 300 sec: 48541.0). Total num frames: 1661337600. Throughput: 0: 12083.1. Samples: 415388672. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:10:40,768][1648981] Avg episode reward: [(0, '922.250')] [2024-06-15 21:10:40,885][1651669] Updated weights for policy 0, policy_version 811209 (0.0015) [2024-06-15 21:10:41,961][1651669] Updated weights for policy 0, policy_version 811260 (0.0012) [2024-06-15 21:10:43,457][1651669] Updated weights for policy 0, policy_version 811318 (0.0034) [2024-06-15 21:10:45,770][1648981] Fps is (10 sec: 52409.8, 60 sec: 49148.9, 300 sec: 48429.4). Total num frames: 1661599744. Throughput: 0: 12355.2. Samples: 415464960. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:10:45,771][1648981] Avg episode reward: [(0, '887.160')] [2024-06-15 21:10:48,238][1651669] Updated weights for policy 0, policy_version 811376 (0.0021) [2024-06-15 21:10:49,757][1651669] Updated weights for policy 0, policy_version 811427 (0.0012) [2024-06-15 21:10:50,774][1648981] Fps is (10 sec: 52391.7, 60 sec: 48053.8, 300 sec: 48763.8). Total num frames: 1661861888. Throughput: 0: 12081.2. Samples: 415535616. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:10:50,774][1648981] Avg episode reward: [(0, '870.580')] [2024-06-15 21:10:51,457][1651669] Updated weights for policy 0, policy_version 811460 (0.0012) [2024-06-15 21:10:52,736][1651669] Updated weights for policy 0, policy_version 811520 (0.0012) [2024-06-15 21:10:54,532][1651669] Updated weights for policy 0, policy_version 811573 (0.0012) [2024-06-15 21:10:55,767][1648981] Fps is (10 sec: 52447.3, 60 sec: 50244.1, 300 sec: 48429.9). Total num frames: 1662124032. Throughput: 0: 12344.9. Samples: 415578624. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:10:55,768][1648981] Avg episode reward: [(0, '871.410')] [2024-06-15 21:10:55,788][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000811584_1662124032.pth... [2024-06-15 21:10:55,827][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000805920_1650524160.pth [2024-06-15 21:10:55,831][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000811584_1662124032.pth [2024-06-15 21:10:58,297][1651669] Updated weights for policy 0, policy_version 811632 (0.0030) [2024-06-15 21:10:59,564][1651669] Updated weights for policy 0, policy_version 811670 (0.0011) [2024-06-15 21:11:00,795][1648981] Fps is (10 sec: 52319.9, 60 sec: 49128.9, 300 sec: 48758.6). Total num frames: 1662386176. Throughput: 0: 12132.5. Samples: 415649792. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:00,795][1648981] Avg episode reward: [(0, '877.170')] [2024-06-15 21:11:03,143][1651669] Updated weights for policy 0, policy_version 811744 (0.0012) [2024-06-15 21:11:04,916][1651669] Updated weights for policy 0, policy_version 811824 (0.0019) [2024-06-15 21:11:05,770][1648981] Fps is (10 sec: 52411.7, 60 sec: 50787.4, 300 sec: 48429.5). Total num frames: 1662648320. Throughput: 0: 12208.9. Samples: 415721472. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:05,770][1648981] Avg episode reward: [(0, '866.230')] [2024-06-15 21:11:08,959][1651274] Signal inference workers to stop experience collection... (42550 times) [2024-06-15 21:11:08,980][1651669] Updated weights for policy 0, policy_version 811873 (0.0017) [2024-06-15 21:11:08,993][1651669] InferenceWorker_p0-w0: stopping experience collection (42550 times) [2024-06-15 21:11:09,270][1651274] Signal inference workers to resume experience collection... (42550 times) [2024-06-15 21:11:09,271][1651669] InferenceWorker_p0-w0: resuming experience collection (42550 times) [2024-06-15 21:11:10,338][1651669] Updated weights for policy 0, policy_version 811921 (0.0014) [2024-06-15 21:11:10,766][1648981] Fps is (10 sec: 46004.5, 60 sec: 48606.0, 300 sec: 48652.8). Total num frames: 1662844928. Throughput: 0: 12371.6. Samples: 415764480. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:10,767][1648981] Avg episode reward: [(0, '876.070')] [2024-06-15 21:11:13,388][1651669] Updated weights for policy 0, policy_version 811984 (0.0023) [2024-06-15 21:11:14,962][1651669] Updated weights for policy 0, policy_version 812049 (0.0088) [2024-06-15 21:11:15,754][1651669] Updated weights for policy 0, policy_version 812094 (0.0010) [2024-06-15 21:11:15,767][1648981] Fps is (10 sec: 52446.2, 60 sec: 51882.5, 300 sec: 48541.0). Total num frames: 1663172608. Throughput: 0: 12243.5. Samples: 415830528. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:15,767][1648981] Avg episode reward: [(0, '894.450')] [2024-06-15 21:11:20,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48064.1, 300 sec: 48430.0). Total num frames: 1663303680. Throughput: 0: 12208.4. Samples: 415905280. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:20,767][1648981] Avg episode reward: [(0, '896.330')] [2024-06-15 21:11:21,080][1651669] Updated weights for policy 0, policy_version 812176 (0.0119) [2024-06-15 21:11:24,149][1651669] Updated weights for policy 0, policy_version 812240 (0.0122) [2024-06-15 21:11:25,766][1648981] Fps is (10 sec: 39322.4, 60 sec: 50256.3, 300 sec: 48318.9). Total num frames: 1663565824. Throughput: 0: 12197.1. Samples: 415937536. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:25,767][1648981] Avg episode reward: [(0, '905.140')] [2024-06-15 21:11:26,071][1651669] Updated weights for policy 0, policy_version 812309 (0.0021) [2024-06-15 21:11:26,984][1651669] Updated weights for policy 0, policy_version 812352 (0.0012) [2024-06-15 21:11:30,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 48062.9, 300 sec: 48207.8). Total num frames: 1663762432. Throughput: 0: 12436.9. Samples: 416024576. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:30,767][1648981] Avg episode reward: [(0, '929.800')] [2024-06-15 21:11:31,539][1651669] Updated weights for policy 0, policy_version 812432 (0.0012) [2024-06-15 21:11:32,573][1651669] Updated weights for policy 0, policy_version 812480 (0.0026) [2024-06-15 21:11:35,028][1651669] Updated weights for policy 0, policy_version 812535 (0.0108) [2024-06-15 21:11:35,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.4, 300 sec: 48542.4). Total num frames: 1664090112. Throughput: 0: 12369.7. Samples: 416092160. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:35,767][1648981] Avg episode reward: [(0, '930.620')] [2024-06-15 21:11:37,122][1651669] Updated weights for policy 0, policy_version 812592 (0.0012) [2024-06-15 21:11:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48606.0, 300 sec: 48096.7). Total num frames: 1664253952. Throughput: 0: 12333.6. Samples: 416133632. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:40,767][1648981] Avg episode reward: [(0, '931.420')] [2024-06-15 21:11:41,222][1651669] Updated weights for policy 0, policy_version 812656 (0.0107) [2024-06-15 21:11:42,413][1651669] Updated weights for policy 0, policy_version 812704 (0.0011) [2024-06-15 21:11:43,244][1651669] Updated weights for policy 0, policy_version 812735 (0.0015) [2024-06-15 21:11:45,207][1651669] Updated weights for policy 0, policy_version 812784 (0.0013) [2024-06-15 21:11:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50247.4, 300 sec: 48763.2). Total num frames: 1664614400. Throughput: 0: 12318.4. Samples: 416203776. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:45,767][1648981] Avg episode reward: [(0, '931.320')] [2024-06-15 21:11:47,862][1651669] Updated weights for policy 0, policy_version 812819 (0.0011) [2024-06-15 21:11:48,151][1651274] Signal inference workers to stop experience collection... (42600 times) [2024-06-15 21:11:48,255][1651669] InferenceWorker_p0-w0: stopping experience collection (42600 times) [2024-06-15 21:11:48,478][1651274] Signal inference workers to resume experience collection... (42600 times) [2024-06-15 21:11:48,478][1651669] InferenceWorker_p0-w0: resuming experience collection (42600 times) [2024-06-15 21:11:48,792][1651669] Updated weights for policy 0, policy_version 812864 (0.0013) [2024-06-15 21:11:50,767][1648981] Fps is (10 sec: 49151.8, 60 sec: 48065.5, 300 sec: 48430.1). Total num frames: 1664745472. Throughput: 0: 12539.3. Samples: 416285696. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:50,767][1648981] Avg episode reward: [(0, '866.780')] [2024-06-15 21:11:51,984][1651669] Updated weights for policy 0, policy_version 812916 (0.0014) [2024-06-15 21:11:53,899][1651669] Updated weights for policy 0, policy_version 812991 (0.0096) [2024-06-15 21:11:55,732][1651669] Updated weights for policy 0, policy_version 813040 (0.0012) [2024-06-15 21:11:55,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.4, 300 sec: 48763.2). Total num frames: 1665105920. Throughput: 0: 12162.9. Samples: 416311808. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:11:55,767][1648981] Avg episode reward: [(0, '862.510')] [2024-06-15 21:11:59,153][1651669] Updated weights for policy 0, policy_version 813088 (0.0011) [2024-06-15 21:12:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48082.3, 300 sec: 48430.0). Total num frames: 1665269760. Throughput: 0: 12276.7. Samples: 416382976. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:12:00,767][1648981] Avg episode reward: [(0, '885.020')] [2024-06-15 21:12:02,564][1651669] Updated weights for policy 0, policy_version 813152 (0.0058) [2024-06-15 21:12:04,679][1651669] Updated weights for policy 0, policy_version 813217 (0.0012) [2024-06-15 21:12:05,778][1648981] Fps is (10 sec: 42548.2, 60 sec: 48053.1, 300 sec: 48872.4). Total num frames: 1665531904. Throughput: 0: 12182.4. Samples: 416453632. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:12:05,779][1648981] Avg episode reward: [(0, '883.400')] [2024-06-15 21:12:07,046][1651669] Updated weights for policy 0, policy_version 813296 (0.0012) [2024-06-15 21:12:10,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1665728512. Throughput: 0: 12276.6. Samples: 416489984. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:12:10,767][1648981] Avg episode reward: [(0, '872.800')] [2024-06-15 21:12:11,247][1651669] Updated weights for policy 0, policy_version 813371 (0.0029) [2024-06-15 21:12:13,907][1651669] Updated weights for policy 0, policy_version 813424 (0.0135) [2024-06-15 21:12:15,766][1648981] Fps is (10 sec: 49209.9, 60 sec: 47513.8, 300 sec: 48763.2). Total num frames: 1666023424. Throughput: 0: 11889.8. Samples: 416559616. Policy #0 lag: (min: 63.0, avg: 190.0, max: 319.0) [2024-06-15 21:12:15,798][1648981] Avg episode reward: [(0, '909.270')] [2024-06-15 21:12:15,860][1651669] Updated weights for policy 0, policy_version 813494 (0.0142) [2024-06-15 21:12:18,461][1651669] Updated weights for policy 0, policy_version 813536 (0.0013) [2024-06-15 21:12:20,781][1648981] Fps is (10 sec: 45809.4, 60 sec: 48048.3, 300 sec: 48538.7). Total num frames: 1666187264. Throughput: 0: 11942.9. Samples: 416629760. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:20,781][1648981] Avg episode reward: [(0, '928.900')] [2024-06-15 21:12:21,935][1651669] Updated weights for policy 0, policy_version 813625 (0.0011) [2024-06-15 21:12:25,097][1651669] Updated weights for policy 0, policy_version 813666 (0.0011) [2024-06-15 21:12:25,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 48059.6, 300 sec: 48430.0). Total num frames: 1666449408. Throughput: 0: 11832.9. Samples: 416666112. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:25,767][1648981] Avg episode reward: [(0, '953.540')] [2024-06-15 21:12:26,945][1651669] Updated weights for policy 0, policy_version 813751 (0.0012) [2024-06-15 21:12:29,596][1651669] Updated weights for policy 0, policy_version 813792 (0.0011) [2024-06-15 21:12:30,766][1648981] Fps is (10 sec: 52504.4, 60 sec: 49152.1, 300 sec: 48874.3). Total num frames: 1666711552. Throughput: 0: 11810.2. Samples: 416735232. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:30,767][1648981] Avg episode reward: [(0, '928.160')] [2024-06-15 21:12:31,369][1651274] Signal inference workers to stop experience collection... (42650 times) [2024-06-15 21:12:31,406][1651669] InferenceWorker_p0-w0: stopping experience collection (42650 times) [2024-06-15 21:12:31,590][1651274] Signal inference workers to resume experience collection... (42650 times) [2024-06-15 21:12:31,590][1651669] InferenceWorker_p0-w0: resuming experience collection (42650 times) [2024-06-15 21:12:31,592][1651669] Updated weights for policy 0, policy_version 813840 (0.0012) [2024-06-15 21:12:35,774][1648981] Fps is (10 sec: 42568.3, 60 sec: 46415.8, 300 sec: 48095.6). Total num frames: 1666875392. Throughput: 0: 11717.3. Samples: 416813056. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:35,774][1648981] Avg episode reward: [(0, '927.220')] [2024-06-15 21:12:35,878][1651669] Updated weights for policy 0, policy_version 813905 (0.0013) [2024-06-15 21:12:37,067][1651669] Updated weights for policy 0, policy_version 813970 (0.0013) [2024-06-15 21:12:39,848][1651669] Updated weights for policy 0, policy_version 814032 (0.0015) [2024-06-15 21:12:40,768][1648981] Fps is (10 sec: 52421.5, 60 sec: 49697.1, 300 sec: 49318.4). Total num frames: 1667235840. Throughput: 0: 11934.9. Samples: 416848896. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:40,773][1648981] Avg episode reward: [(0, '930.360')] [2024-06-15 21:12:41,826][1651669] Updated weights for policy 0, policy_version 814081 (0.0015) [2024-06-15 21:12:43,043][1651669] Updated weights for policy 0, policy_version 814137 (0.0024) [2024-06-15 21:12:45,733][1651669] Updated weights for policy 0, policy_version 814180 (0.0135) [2024-06-15 21:12:45,766][1648981] Fps is (10 sec: 55745.8, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1667432448. Throughput: 0: 12219.7. Samples: 416932864. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:45,767][1648981] Avg episode reward: [(0, '957.820')] [2024-06-15 21:12:47,383][1651669] Updated weights for policy 0, policy_version 814256 (0.0014) [2024-06-15 21:12:50,774][1648981] Fps is (10 sec: 42571.2, 60 sec: 48599.7, 300 sec: 48984.1). Total num frames: 1667661824. Throughput: 0: 12255.0. Samples: 417005056. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:50,775][1648981] Avg episode reward: [(0, '957.750')] [2024-06-15 21:12:51,263][1651669] Updated weights for policy 0, policy_version 814320 (0.0015) [2024-06-15 21:12:53,330][1651669] Updated weights for policy 0, policy_version 814368 (0.0012) [2024-06-15 21:12:55,330][1651669] Updated weights for policy 0, policy_version 814416 (0.0015) [2024-06-15 21:12:55,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 46967.4, 300 sec: 48652.1). Total num frames: 1667923968. Throughput: 0: 12162.8. Samples: 417037312. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:12:55,767][1648981] Avg episode reward: [(0, '1008.490')] [2024-06-15 21:12:56,219][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000814448_1667989504.pth... [2024-06-15 21:12:56,294][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000808768_1656356864.pth [2024-06-15 21:12:57,162][1651669] Updated weights for policy 0, policy_version 814480 (0.0014) [2024-06-15 21:13:00,766][1648981] Fps is (10 sec: 49190.1, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1668153344. Throughput: 0: 12140.1. Samples: 417105920. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:00,767][1648981] Avg episode reward: [(0, '1011.560')] [2024-06-15 21:13:01,949][1651669] Updated weights for policy 0, policy_version 814531 (0.0011) [2024-06-15 21:13:03,936][1651669] Updated weights for policy 0, policy_version 814609 (0.0012) [2024-06-15 21:13:04,815][1651669] Updated weights for policy 0, policy_version 814648 (0.0011) [2024-06-15 21:13:05,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 48068.9, 300 sec: 48541.0). Total num frames: 1668415488. Throughput: 0: 12143.9. Samples: 417176064. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:05,767][1648981] Avg episode reward: [(0, '967.790')] [2024-06-15 21:13:08,030][1651669] Updated weights for policy 0, policy_version 814712 (0.0010) [2024-06-15 21:13:09,282][1651669] Updated weights for policy 0, policy_version 814768 (0.0012) [2024-06-15 21:13:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49152.0, 300 sec: 48875.4). Total num frames: 1668677632. Throughput: 0: 12174.3. Samples: 417213952. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:10,767][1648981] Avg episode reward: [(0, '948.150')] [2024-06-15 21:13:12,949][1651274] Signal inference workers to stop experience collection... (42700 times) [2024-06-15 21:13:12,998][1651669] InferenceWorker_p0-w0: stopping experience collection (42700 times) [2024-06-15 21:13:13,000][1651669] Updated weights for policy 0, policy_version 814786 (0.0012) [2024-06-15 21:13:13,206][1651274] Signal inference workers to resume experience collection... (42700 times) [2024-06-15 21:13:13,207][1651669] InferenceWorker_p0-w0: resuming experience collection (42700 times) [2024-06-15 21:13:14,815][1651669] Updated weights for policy 0, policy_version 814864 (0.0012) [2024-06-15 21:13:15,766][1648981] Fps is (10 sec: 49153.5, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 1668907008. Throughput: 0: 12208.4. Samples: 417284608. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:15,767][1648981] Avg episode reward: [(0, '935.080')] [2024-06-15 21:13:15,954][1651669] Updated weights for policy 0, policy_version 814905 (0.0149) [2024-06-15 21:13:18,468][1651669] Updated weights for policy 0, policy_version 814945 (0.0010) [2024-06-15 21:13:19,884][1651669] Updated weights for policy 0, policy_version 815012 (0.0009) [2024-06-15 21:13:20,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50256.2, 300 sec: 48874.3). Total num frames: 1669201920. Throughput: 0: 12153.4. Samples: 417359872. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:20,767][1648981] Avg episode reward: [(0, '938.930')] [2024-06-15 21:13:24,248][1651669] Updated weights for policy 0, policy_version 815059 (0.0012) [2024-06-15 21:13:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48605.9, 300 sec: 48431.4). Total num frames: 1669365760. Throughput: 0: 12254.2. Samples: 417400320. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:25,767][1648981] Avg episode reward: [(0, '981.870')] [2024-06-15 21:13:26,106][1651669] Updated weights for policy 0, policy_version 815136 (0.0013) [2024-06-15 21:13:27,840][1651669] Updated weights for policy 0, policy_version 815170 (0.0012) [2024-06-15 21:13:29,143][1651669] Updated weights for policy 0, policy_version 815223 (0.0032) [2024-06-15 21:13:30,453][1651669] Updated weights for policy 0, policy_version 815290 (0.0011) [2024-06-15 21:13:30,771][1648981] Fps is (10 sec: 52406.5, 60 sec: 50240.6, 300 sec: 48873.6). Total num frames: 1669726208. Throughput: 0: 11865.9. Samples: 417466880. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:30,773][1648981] Avg episode reward: [(0, '984.040')] [2024-06-15 21:13:35,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 48611.5, 300 sec: 48430.0). Total num frames: 1669791744. Throughput: 0: 12164.9. Samples: 417552384. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:35,768][1648981] Avg episode reward: [(0, '974.220')] [2024-06-15 21:13:35,806][1651669] Updated weights for policy 0, policy_version 815331 (0.0031) [2024-06-15 21:13:37,523][1651669] Updated weights for policy 0, policy_version 815408 (0.0011) [2024-06-15 21:13:39,675][1651669] Updated weights for policy 0, policy_version 815480 (0.0011) [2024-06-15 21:13:40,766][1648981] Fps is (10 sec: 45895.2, 60 sec: 49153.1, 300 sec: 48652.2). Total num frames: 1670184960. Throughput: 0: 11946.7. Samples: 417574912. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:40,767][1648981] Avg episode reward: [(0, '973.880')] [2024-06-15 21:13:41,011][1651669] Updated weights for policy 0, policy_version 815544 (0.0009) [2024-06-15 21:13:45,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 46967.4, 300 sec: 48541.1). Total num frames: 1670250496. Throughput: 0: 12265.2. Samples: 417657856. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:45,767][1648981] Avg episode reward: [(0, '1016.060')] [2024-06-15 21:13:47,291][1651669] Updated weights for policy 0, policy_version 815610 (0.0120) [2024-06-15 21:13:48,211][1651669] Updated weights for policy 0, policy_version 815654 (0.0011) [2024-06-15 21:13:48,538][1651274] Signal inference workers to stop experience collection... (42750 times) [2024-06-15 21:13:48,606][1651669] InferenceWorker_p0-w0: stopping experience collection (42750 times) [2024-06-15 21:13:48,882][1651274] Signal inference workers to resume experience collection... (42750 times) [2024-06-15 21:13:48,883][1651669] InferenceWorker_p0-w0: resuming experience collection (42750 times) [2024-06-15 21:13:50,231][1651669] Updated weights for policy 0, policy_version 815736 (0.0036) [2024-06-15 21:13:50,767][1648981] Fps is (10 sec: 45874.2, 60 sec: 49704.4, 300 sec: 48430.0). Total num frames: 1670643712. Throughput: 0: 12026.3. Samples: 417717248. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:50,767][1648981] Avg episode reward: [(0, '981.740')] [2024-06-15 21:13:51,586][1651669] Updated weights for policy 0, policy_version 815779 (0.0021) [2024-06-15 21:13:55,778][1648981] Fps is (10 sec: 52369.4, 60 sec: 47504.7, 300 sec: 48761.8). Total num frames: 1670774784. Throughput: 0: 12102.9. Samples: 417758720. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:13:55,778][1648981] Avg episode reward: [(0, '976.880')] [2024-06-15 21:13:57,208][1651669] Updated weights for policy 0, policy_version 815824 (0.0011) [2024-06-15 21:13:58,391][1651669] Updated weights for policy 0, policy_version 815878 (0.0019) [2024-06-15 21:13:59,898][1651669] Updated weights for policy 0, policy_version 815952 (0.0014) [2024-06-15 21:14:00,785][1648981] Fps is (10 sec: 49062.8, 60 sec: 49682.9, 300 sec: 48760.2). Total num frames: 1671135232. Throughput: 0: 12271.6. Samples: 417837056. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:14:00,785][1648981] Avg episode reward: [(0, '956.540')] [2024-06-15 21:14:01,479][1651669] Updated weights for policy 0, policy_version 816003 (0.0012) [2024-06-15 21:14:02,655][1651669] Updated weights for policy 0, policy_version 816055 (0.0012) [2024-06-15 21:14:05,766][1648981] Fps is (10 sec: 52488.5, 60 sec: 48060.0, 300 sec: 48874.3). Total num frames: 1671299072. Throughput: 0: 12401.8. Samples: 417917952. Policy #0 lag: (min: 63.0, avg: 194.5, max: 319.0) [2024-06-15 21:14:05,767][1648981] Avg episode reward: [(0, '978.000')] [2024-06-15 21:14:07,853][1651669] Updated weights for policy 0, policy_version 816096 (0.0014) [2024-06-15 21:14:09,063][1651669] Updated weights for policy 0, policy_version 816148 (0.0015) [2024-06-15 21:14:10,579][1651669] Updated weights for policy 0, policy_version 816224 (0.0016) [2024-06-15 21:14:10,768][1648981] Fps is (10 sec: 49236.2, 60 sec: 49150.9, 300 sec: 48985.2). Total num frames: 1671626752. Throughput: 0: 12299.0. Samples: 417953792. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:10,769][1648981] Avg episode reward: [(0, '992.080')] [2024-06-15 21:14:13,111][1651669] Updated weights for policy 0, policy_version 816288 (0.0011) [2024-06-15 21:14:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1671823360. Throughput: 0: 12243.7. Samples: 418017792. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:15,767][1648981] Avg episode reward: [(0, '967.010')] [2024-06-15 21:14:17,289][1651669] Updated weights for policy 0, policy_version 816322 (0.0047) [2024-06-15 21:14:19,471][1651669] Updated weights for policy 0, policy_version 816404 (0.0112) [2024-06-15 21:14:20,628][1651669] Updated weights for policy 0, policy_version 816464 (0.0060) [2024-06-15 21:14:20,766][1648981] Fps is (10 sec: 49158.5, 60 sec: 48606.0, 300 sec: 49207.6). Total num frames: 1672118272. Throughput: 0: 11992.2. Samples: 418092032. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:20,767][1648981] Avg episode reward: [(0, '983.930')] [2024-06-15 21:14:21,608][1651669] Updated weights for policy 0, policy_version 816511 (0.0012) [2024-06-15 21:14:24,545][1651669] Updated weights for policy 0, policy_version 816565 (0.0019) [2024-06-15 21:14:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.1, 300 sec: 48874.9). Total num frames: 1672347648. Throughput: 0: 12390.4. Samples: 418132480. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:25,767][1648981] Avg episode reward: [(0, '960.820')] [2024-06-15 21:14:28,479][1651274] Signal inference workers to stop experience collection... (42800 times) [2024-06-15 21:14:28,514][1651669] InferenceWorker_p0-w0: stopping experience collection (42800 times) [2024-06-15 21:14:28,765][1651274] Signal inference workers to resume experience collection... (42800 times) [2024-06-15 21:14:28,810][1651669] InferenceWorker_p0-w0: resuming experience collection (42800 times) [2024-06-15 21:14:29,161][1651669] Updated weights for policy 0, policy_version 816624 (0.0012) [2024-06-15 21:14:30,281][1651669] Updated weights for policy 0, policy_version 816672 (0.0013) [2024-06-15 21:14:30,786][1648981] Fps is (10 sec: 45784.3, 60 sec: 47501.3, 300 sec: 48982.1). Total num frames: 1672577024. Throughput: 0: 12180.2. Samples: 418206208. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:30,787][1648981] Avg episode reward: [(0, '902.980')] [2024-06-15 21:14:32,280][1651669] Updated weights for policy 0, policy_version 816752 (0.0120) [2024-06-15 21:14:35,406][1651669] Updated weights for policy 0, policy_version 816816 (0.0037) [2024-06-15 21:14:35,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 51336.6, 300 sec: 48879.7). Total num frames: 1672871936. Throughput: 0: 12276.6. Samples: 418269696. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:35,767][1648981] Avg episode reward: [(0, '854.830')] [2024-06-15 21:14:40,137][1651669] Updated weights for policy 0, policy_version 816865 (0.0171) [2024-06-15 21:14:40,766][1648981] Fps is (10 sec: 42682.7, 60 sec: 46967.4, 300 sec: 48652.1). Total num frames: 1673003008. Throughput: 0: 12450.4. Samples: 418318848. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:40,767][1648981] Avg episode reward: [(0, '814.250')] [2024-06-15 21:14:41,967][1651669] Updated weights for policy 0, policy_version 816949 (0.0014) [2024-06-15 21:14:43,403][1651669] Updated weights for policy 0, policy_version 816995 (0.0012) [2024-06-15 21:14:43,871][1651669] Updated weights for policy 0, policy_version 817021 (0.0010) [2024-06-15 21:14:45,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 51336.6, 300 sec: 48652.2). Total num frames: 1673330688. Throughput: 0: 12008.5. Samples: 418377216. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:45,767][1648981] Avg episode reward: [(0, '763.770')] [2024-06-15 21:14:46,086][1651669] Updated weights for policy 0, policy_version 817077 (0.0012) [2024-06-15 21:14:50,499][1651669] Updated weights for policy 0, policy_version 817106 (0.0011) [2024-06-15 21:14:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46967.6, 300 sec: 48652.2). Total num frames: 1673461760. Throughput: 0: 12083.2. Samples: 418461696. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:50,767][1648981] Avg episode reward: [(0, '799.040')] [2024-06-15 21:14:52,599][1651669] Updated weights for policy 0, policy_version 817215 (0.0012) [2024-06-15 21:14:54,166][1651669] Updated weights for policy 0, policy_version 817252 (0.0013) [2024-06-15 21:14:54,799][1651669] Updated weights for policy 0, policy_version 817280 (0.0012) [2024-06-15 21:14:55,767][1648981] Fps is (10 sec: 45873.6, 60 sec: 50253.6, 300 sec: 48652.1). Total num frames: 1673789440. Throughput: 0: 11924.2. Samples: 418490368. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:14:55,767][1648981] Avg episode reward: [(0, '825.930')] [2024-06-15 21:14:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000817280_1673789440.pth... [2024-06-15 21:14:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000811584_1662124032.pth [2024-06-15 21:14:57,474][1651669] Updated weights for policy 0, policy_version 817333 (0.0045) [2024-06-15 21:15:00,783][1648981] Fps is (10 sec: 49070.0, 60 sec: 46968.7, 300 sec: 48649.4). Total num frames: 1673953280. Throughput: 0: 12294.8. Samples: 418571264. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:00,784][1648981] Avg episode reward: [(0, '819.310')] [2024-06-15 21:15:01,109][1651669] Updated weights for policy 0, policy_version 817378 (0.0011) [2024-06-15 21:15:02,941][1651669] Updated weights for policy 0, policy_version 817456 (0.0012) [2024-06-15 21:15:04,675][1651669] Updated weights for policy 0, policy_version 817493 (0.0012) [2024-06-15 21:15:05,767][1648981] Fps is (10 sec: 52429.7, 60 sec: 50244.2, 300 sec: 48763.2). Total num frames: 1674313728. Throughput: 0: 12162.8. Samples: 418639360. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:05,767][1648981] Avg episode reward: [(0, '814.500')] [2024-06-15 21:15:05,988][1651669] Updated weights for policy 0, policy_version 817537 (0.0011) [2024-06-15 21:15:06,408][1651274] Signal inference workers to stop experience collection... (42850 times) [2024-06-15 21:15:06,459][1651669] InferenceWorker_p0-w0: stopping experience collection (42850 times) [2024-06-15 21:15:06,756][1651274] Signal inference workers to resume experience collection... (42850 times) [2024-06-15 21:15:06,757][1651669] InferenceWorker_p0-w0: resuming experience collection (42850 times) [2024-06-15 21:15:07,360][1651669] Updated weights for policy 0, policy_version 817600 (0.0097) [2024-06-15 21:15:10,766][1648981] Fps is (10 sec: 49234.5, 60 sec: 46968.5, 300 sec: 48763.2). Total num frames: 1674444800. Throughput: 0: 12094.6. Samples: 418676736. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:10,767][1648981] Avg episode reward: [(0, '840.380')] [2024-06-15 21:15:12,535][1651669] Updated weights for policy 0, policy_version 817663 (0.0017) [2024-06-15 21:15:14,402][1651669] Updated weights for policy 0, policy_version 817714 (0.0012) [2024-06-15 21:15:15,646][1651669] Updated weights for policy 0, policy_version 817771 (0.0013) [2024-06-15 21:15:15,773][1648981] Fps is (10 sec: 49120.7, 60 sec: 49692.8, 300 sec: 48763.1). Total num frames: 1674805248. Throughput: 0: 12109.6. Samples: 418750976. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:15,774][1648981] Avg episode reward: [(0, '804.350')] [2024-06-15 21:15:16,379][1651669] Updated weights for policy 0, policy_version 817808 (0.0011) [2024-06-15 21:15:17,419][1651669] Updated weights for policy 0, policy_version 817853 (0.0012) [2024-06-15 21:15:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.6, 300 sec: 48876.7). Total num frames: 1674969088. Throughput: 0: 12492.9. Samples: 418831872. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:20,767][1648981] Avg episode reward: [(0, '812.260')] [2024-06-15 21:15:22,452][1651669] Updated weights for policy 0, policy_version 817891 (0.0011) [2024-06-15 21:15:23,245][1651669] Updated weights for policy 0, policy_version 817924 (0.0094) [2024-06-15 21:15:24,940][1651669] Updated weights for policy 0, policy_version 818000 (0.0016) [2024-06-15 21:15:25,767][1648981] Fps is (10 sec: 52461.6, 60 sec: 49698.0, 300 sec: 48986.0). Total num frames: 1675329536. Throughput: 0: 12322.1. Samples: 418873344. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:25,767][1648981] Avg episode reward: [(0, '811.440')] [2024-06-15 21:15:26,500][1651669] Updated weights for policy 0, policy_version 818064 (0.0011) [2024-06-15 21:15:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48621.9, 300 sec: 48874.3). Total num frames: 1675493376. Throughput: 0: 12629.3. Samples: 418945536. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:30,767][1648981] Avg episode reward: [(0, '787.480')] [2024-06-15 21:15:32,703][1651669] Updated weights for policy 0, policy_version 818168 (0.0026) [2024-06-15 21:15:34,810][1651669] Updated weights for policy 0, policy_version 818240 (0.0011) [2024-06-15 21:15:35,787][1648981] Fps is (10 sec: 45783.9, 60 sec: 48589.7, 300 sec: 48982.1). Total num frames: 1675788288. Throughput: 0: 12396.2. Samples: 419019776. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:35,787][1648981] Avg episode reward: [(0, '782.140')] [2024-06-15 21:15:36,316][1651669] Updated weights for policy 0, policy_version 818291 (0.0017) [2024-06-15 21:15:37,971][1651669] Updated weights for policy 0, policy_version 818340 (0.0013) [2024-06-15 21:15:40,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.2, 300 sec: 48874.9). Total num frames: 1676017664. Throughput: 0: 12299.4. Samples: 419043840. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:40,767][1648981] Avg episode reward: [(0, '775.060')] [2024-06-15 21:15:43,630][1651669] Updated weights for policy 0, policy_version 818405 (0.0012) [2024-06-15 21:15:44,928][1651274] Signal inference workers to stop experience collection... (42900 times) [2024-06-15 21:15:44,982][1651669] InferenceWorker_p0-w0: stopping experience collection (42900 times) [2024-06-15 21:15:45,235][1651274] Signal inference workers to resume experience collection... (42900 times) [2024-06-15 21:15:45,235][1651669] InferenceWorker_p0-w0: resuming experience collection (42900 times) [2024-06-15 21:15:45,392][1651669] Updated weights for policy 0, policy_version 818485 (0.0011) [2024-06-15 21:15:45,766][1648981] Fps is (10 sec: 49251.5, 60 sec: 49152.0, 300 sec: 48875.5). Total num frames: 1676279808. Throughput: 0: 12451.9. Samples: 419131392. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:45,767][1648981] Avg episode reward: [(0, '819.100')] [2024-06-15 21:15:46,303][1651669] Updated weights for policy 0, policy_version 818513 (0.0032) [2024-06-15 21:15:48,330][1651669] Updated weights for policy 0, policy_version 818595 (0.0187) [2024-06-15 21:15:50,767][1648981] Fps is (10 sec: 52428.6, 60 sec: 51336.4, 300 sec: 48874.3). Total num frames: 1676541952. Throughput: 0: 12276.6. Samples: 419191808. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:50,767][1648981] Avg episode reward: [(0, '826.900')] [2024-06-15 21:15:53,985][1651669] Updated weights for policy 0, policy_version 818643 (0.0011) [2024-06-15 21:15:54,808][1651669] Updated weights for policy 0, policy_version 818688 (0.0044) [2024-06-15 21:15:55,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 49152.2, 300 sec: 48656.8). Total num frames: 1676738560. Throughput: 0: 12595.2. Samples: 419243520. Policy #0 lag: (min: 15.0, avg: 77.0, max: 271.0) [2024-06-15 21:15:55,767][1648981] Avg episode reward: [(0, '831.030')] [2024-06-15 21:15:56,582][1651669] Updated weights for policy 0, policy_version 818754 (0.0016) [2024-06-15 21:15:57,839][1651669] Updated weights for policy 0, policy_version 818802 (0.0011) [2024-06-15 21:15:59,623][1651669] Updated weights for policy 0, policy_version 818875 (0.0093) [2024-06-15 21:16:00,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 51897.2, 300 sec: 48874.9). Total num frames: 1677066240. Throughput: 0: 12289.8. Samples: 419303936. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:00,767][1648981] Avg episode reward: [(0, '820.330')] [2024-06-15 21:16:05,766][1648981] Fps is (10 sec: 39322.1, 60 sec: 46967.6, 300 sec: 48430.0). Total num frames: 1677131776. Throughput: 0: 12413.2. Samples: 419390464. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:05,767][1648981] Avg episode reward: [(0, '858.700')] [2024-06-15 21:16:06,677][1651669] Updated weights for policy 0, policy_version 818963 (0.0079) [2024-06-15 21:16:08,439][1651669] Updated weights for policy 0, policy_version 819040 (0.0084) [2024-06-15 21:16:10,288][1651669] Updated weights for policy 0, policy_version 819110 (0.0014) [2024-06-15 21:16:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1677590528. Throughput: 0: 12083.2. Samples: 419417088. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:10,767][1648981] Avg episode reward: [(0, '897.270')] [2024-06-15 21:16:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46972.6, 300 sec: 48541.1). Total num frames: 1677623296. Throughput: 0: 12265.3. Samples: 419497472. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:15,767][1648981] Avg episode reward: [(0, '927.380')] [2024-06-15 21:16:15,852][1651669] Updated weights for policy 0, policy_version 819168 (0.0076) [2024-06-15 21:16:18,748][1651669] Updated weights for policy 0, policy_version 819249 (0.0022) [2024-06-15 21:16:20,423][1651669] Updated weights for policy 0, policy_version 819328 (0.0037) [2024-06-15 21:16:20,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 50790.4, 300 sec: 48985.4). Total num frames: 1678016512. Throughput: 0: 11940.7. Samples: 419556864. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:20,767][1648981] Avg episode reward: [(0, '951.830')] [2024-06-15 21:16:21,291][1651274] Signal inference workers to stop experience collection... (42950 times) [2024-06-15 21:16:21,387][1651669] InferenceWorker_p0-w0: stopping experience collection (42950 times) [2024-06-15 21:16:21,639][1651274] Signal inference workers to resume experience collection... (42950 times) [2024-06-15 21:16:21,640][1651669] InferenceWorker_p0-w0: resuming experience collection (42950 times) [2024-06-15 21:16:21,985][1651669] Updated weights for policy 0, policy_version 819392 (0.0015) [2024-06-15 21:16:25,767][1648981] Fps is (10 sec: 49151.3, 60 sec: 46421.4, 300 sec: 48652.1). Total num frames: 1678114816. Throughput: 0: 12140.1. Samples: 419590144. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:25,767][1648981] Avg episode reward: [(0, '952.530')] [2024-06-15 21:16:27,081][1651669] Updated weights for policy 0, policy_version 819440 (0.0014) [2024-06-15 21:16:29,848][1651669] Updated weights for policy 0, policy_version 819504 (0.0110) [2024-06-15 21:16:30,766][1648981] Fps is (10 sec: 39321.3, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1678409728. Throughput: 0: 11980.8. Samples: 419670528. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:30,767][1648981] Avg episode reward: [(0, '969.140')] [2024-06-15 21:16:31,568][1651669] Updated weights for policy 0, policy_version 819584 (0.0012) [2024-06-15 21:16:32,934][1651669] Updated weights for policy 0, policy_version 819642 (0.0012) [2024-06-15 21:16:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47529.5, 300 sec: 48763.2). Total num frames: 1678639104. Throughput: 0: 12219.8. Samples: 419741696. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:35,767][1648981] Avg episode reward: [(0, '1008.870')] [2024-06-15 21:16:37,503][1651669] Updated weights for policy 0, policy_version 819696 (0.0061) [2024-06-15 21:16:39,677][1651669] Updated weights for policy 0, policy_version 819744 (0.0011) [2024-06-15 21:16:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1678901248. Throughput: 0: 11969.4. Samples: 419782144. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:40,767][1648981] Avg episode reward: [(0, '1070.870')] [2024-06-15 21:16:41,450][1651669] Updated weights for policy 0, policy_version 819808 (0.0012) [2024-06-15 21:16:42,847][1651669] Updated weights for policy 0, policy_version 819872 (0.0091) [2024-06-15 21:16:45,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48059.5, 300 sec: 48874.3). Total num frames: 1679163392. Throughput: 0: 11992.1. Samples: 419843584. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:45,767][1648981] Avg episode reward: [(0, '1114.590')] [2024-06-15 21:16:47,937][1651669] Updated weights for policy 0, policy_version 819923 (0.0015) [2024-06-15 21:16:48,720][1651669] Updated weights for policy 0, policy_version 819968 (0.0011) [2024-06-15 21:16:50,709][1651669] Updated weights for policy 0, policy_version 820025 (0.0012) [2024-06-15 21:16:50,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 47513.7, 300 sec: 48430.0). Total num frames: 1679392768. Throughput: 0: 11878.4. Samples: 419924992. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:50,767][1648981] Avg episode reward: [(0, '1132.990')] [2024-06-15 21:16:53,895][1651669] Updated weights for policy 0, policy_version 820128 (0.0180) [2024-06-15 21:16:54,571][1651669] Updated weights for policy 0, policy_version 820159 (0.0012) [2024-06-15 21:16:55,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1679687680. Throughput: 0: 11980.8. Samples: 419956224. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:16:55,767][1648981] Avg episode reward: [(0, '1162.200')] [2024-06-15 21:16:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000820160_1679687680.pth... [2024-06-15 21:16:55,881][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000814448_1667989504.pth [2024-06-15 21:16:55,886][1651274] Saving new best policy, reward=1162.200! [2024-06-15 21:16:59,724][1651669] Updated weights for policy 0, policy_version 820220 (0.0011) [2024-06-15 21:17:00,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46421.4, 300 sec: 48543.0). Total num frames: 1679851520. Throughput: 0: 11923.9. Samples: 420034048. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:00,767][1648981] Avg episode reward: [(0, '1183.680')] [2024-06-15 21:17:01,018][1651669] Updated weights for policy 0, policy_version 820258 (0.0035) [2024-06-15 21:17:01,267][1651274] Saving new best policy, reward=1183.680! [2024-06-15 21:17:01,646][1651669] Updated weights for policy 0, policy_version 820284 (0.0011) [2024-06-15 21:17:03,544][1651274] Signal inference workers to stop experience collection... (43000 times) [2024-06-15 21:17:03,656][1651669] InferenceWorker_p0-w0: stopping experience collection (43000 times) [2024-06-15 21:17:03,801][1651274] Signal inference workers to resume experience collection... (43000 times) [2024-06-15 21:17:03,802][1651669] InferenceWorker_p0-w0: resuming experience collection (43000 times) [2024-06-15 21:17:04,621][1651669] Updated weights for policy 0, policy_version 820384 (0.0094) [2024-06-15 21:17:05,775][1648981] Fps is (10 sec: 52381.9, 60 sec: 51328.8, 300 sec: 49095.0). Total num frames: 1680211968. Throughput: 0: 12035.3. Samples: 420098560. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:05,776][1648981] Avg episode reward: [(0, '1184.020')] [2024-06-15 21:17:05,777][1651274] Saving new best policy, reward=1184.020! [2024-06-15 21:17:09,939][1651669] Updated weights for policy 0, policy_version 820433 (0.0012) [2024-06-15 21:17:10,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 44782.8, 300 sec: 48318.9). Total num frames: 1680277504. Throughput: 0: 12276.6. Samples: 420142592. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:10,768][1648981] Avg episode reward: [(0, '1202.130')] [2024-06-15 21:17:11,154][1651274] Saving new best policy, reward=1202.130! [2024-06-15 21:17:11,643][1651669] Updated weights for policy 0, policy_version 820497 (0.0013) [2024-06-15 21:17:13,908][1651669] Updated weights for policy 0, policy_version 820560 (0.0013) [2024-06-15 21:17:15,057][1651669] Updated weights for policy 0, policy_version 820608 (0.0019) [2024-06-15 21:17:15,775][1648981] Fps is (10 sec: 42599.2, 60 sec: 50236.9, 300 sec: 48986.3). Total num frames: 1680637952. Throughput: 0: 12080.8. Samples: 420214272. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:15,776][1648981] Avg episode reward: [(0, '1195.810')] [2024-06-15 21:17:20,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 45329.0, 300 sec: 48430.0). Total num frames: 1680736256. Throughput: 0: 12049.1. Samples: 420283904. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:20,767][1648981] Avg episode reward: [(0, '1255.320')] [2024-06-15 21:17:20,768][1651274] Saving new best policy, reward=1255.320! [2024-06-15 21:17:21,544][1651669] Updated weights for policy 0, policy_version 820694 (0.0015) [2024-06-15 21:17:22,968][1651669] Updated weights for policy 0, policy_version 820754 (0.0011) [2024-06-15 21:17:23,956][1651669] Updated weights for policy 0, policy_version 820796 (0.0012) [2024-06-15 21:17:25,685][1651669] Updated weights for policy 0, policy_version 820858 (0.0013) [2024-06-15 21:17:25,766][1648981] Fps is (10 sec: 45915.4, 60 sec: 49698.2, 300 sec: 48763.2). Total num frames: 1681096704. Throughput: 0: 11810.1. Samples: 420313600. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:25,767][1648981] Avg episode reward: [(0, '1239.790')] [2024-06-15 21:17:27,724][1651669] Updated weights for policy 0, policy_version 820912 (0.0010) [2024-06-15 21:17:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 48764.4). Total num frames: 1681260544. Throughput: 0: 12106.0. Samples: 420388352. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:30,767][1648981] Avg episode reward: [(0, '1213.290')] [2024-06-15 21:17:32,497][1651669] Updated weights for policy 0, policy_version 820976 (0.0016) [2024-06-15 21:17:34,077][1651669] Updated weights for policy 0, policy_version 821040 (0.0023) [2024-06-15 21:17:35,101][1651669] Updated weights for policy 0, policy_version 821057 (0.0015) [2024-06-15 21:17:35,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49152.0, 300 sec: 48652.4). Total num frames: 1681588224. Throughput: 0: 12026.3. Samples: 420466176. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:35,767][1648981] Avg episode reward: [(0, '1207.260')] [2024-06-15 21:17:36,305][1651669] Updated weights for policy 0, policy_version 821118 (0.0030) [2024-06-15 21:17:38,319][1651669] Updated weights for policy 0, policy_version 821173 (0.0028) [2024-06-15 21:17:40,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48059.6, 300 sec: 48652.1). Total num frames: 1681784832. Throughput: 0: 12105.9. Samples: 420500992. Policy #0 lag: (min: 59.0, avg: 145.8, max: 319.0) [2024-06-15 21:17:40,767][1648981] Avg episode reward: [(0, '1221.050')] [2024-06-15 21:17:43,170][1651669] Updated weights for policy 0, policy_version 821219 (0.0022) [2024-06-15 21:17:44,369][1651669] Updated weights for policy 0, policy_version 821280 (0.0033) [2024-06-15 21:17:45,132][1651669] Updated weights for policy 0, policy_version 821312 (0.0013) [2024-06-15 21:17:45,585][1651274] Signal inference workers to stop experience collection... (43050 times) [2024-06-15 21:17:45,673][1651669] InferenceWorker_p0-w0: stopping experience collection (43050 times) [2024-06-15 21:17:45,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48059.9, 300 sec: 48764.5). Total num frames: 1682046976. Throughput: 0: 12083.2. Samples: 420577792. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:17:45,767][1648981] Avg episode reward: [(0, '1246.990')] [2024-06-15 21:17:45,837][1651274] Signal inference workers to resume experience collection... (43050 times) [2024-06-15 21:17:45,838][1651669] InferenceWorker_p0-w0: resuming experience collection (43050 times) [2024-06-15 21:17:46,826][1651669] Updated weights for policy 0, policy_version 821370 (0.0013) [2024-06-15 21:17:48,414][1651669] Updated weights for policy 0, policy_version 821432 (0.0011) [2024-06-15 21:17:50,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 1682309120. Throughput: 0: 12290.4. Samples: 420651520. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:17:50,767][1648981] Avg episode reward: [(0, '1233.630')] [2024-06-15 21:17:54,573][1651669] Updated weights for policy 0, policy_version 821491 (0.0017) [2024-06-15 21:17:55,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.4, 300 sec: 48652.1). Total num frames: 1682505728. Throughput: 0: 12242.5. Samples: 420693504. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:17:55,767][1648981] Avg episode reward: [(0, '1235.210')] [2024-06-15 21:17:56,907][1651669] Updated weights for policy 0, policy_version 821571 (0.0014) [2024-06-15 21:17:58,171][1651669] Updated weights for policy 0, policy_version 821624 (0.0015) [2024-06-15 21:17:59,606][1651669] Updated weights for policy 0, policy_version 821680 (0.0010) [2024-06-15 21:18:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.1, 300 sec: 48874.4). Total num frames: 1682833408. Throughput: 0: 12028.7. Samples: 420755456. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:00,767][1648981] Avg episode reward: [(0, '1250.120')] [2024-06-15 21:18:04,428][1651669] Updated weights for policy 0, policy_version 821713 (0.0016) [2024-06-15 21:18:05,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 45882.1, 300 sec: 48430.0). Total num frames: 1682964480. Throughput: 0: 12174.2. Samples: 420831744. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:05,767][1648981] Avg episode reward: [(0, '1227.370')] [2024-06-15 21:18:06,074][1651669] Updated weights for policy 0, policy_version 821778 (0.0092) [2024-06-15 21:18:07,584][1651669] Updated weights for policy 0, policy_version 821826 (0.0012) [2024-06-15 21:18:09,435][1651669] Updated weights for policy 0, policy_version 821889 (0.0012) [2024-06-15 21:18:10,729][1651669] Updated weights for policy 0, policy_version 821941 (0.0011) [2024-06-15 21:18:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 50790.6, 300 sec: 48874.3). Total num frames: 1683324928. Throughput: 0: 12231.1. Samples: 420864000. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:10,767][1648981] Avg episode reward: [(0, '1226.700')] [2024-06-15 21:18:15,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 46428.1, 300 sec: 48207.9). Total num frames: 1683423232. Throughput: 0: 12435.9. Samples: 420947968. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:15,767][1648981] Avg episode reward: [(0, '1232.690')] [2024-06-15 21:18:15,964][1651669] Updated weights for policy 0, policy_version 822002 (0.0012) [2024-06-15 21:18:17,151][1651669] Updated weights for policy 0, policy_version 822064 (0.0011) [2024-06-15 21:18:18,919][1651669] Updated weights for policy 0, policy_version 822096 (0.0011) [2024-06-15 21:18:19,906][1651669] Updated weights for policy 0, policy_version 822143 (0.0048) [2024-06-15 21:18:20,818][1648981] Fps is (10 sec: 45638.4, 60 sec: 50746.6, 300 sec: 48865.7). Total num frames: 1683783680. Throughput: 0: 12080.7. Samples: 421010432. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:20,819][1648981] Avg episode reward: [(0, '1230.490')] [2024-06-15 21:18:21,550][1651669] Updated weights for policy 0, policy_version 822201 (0.0041) [2024-06-15 21:18:25,280][1651669] Updated weights for policy 0, policy_version 822226 (0.0010) [2024-06-15 21:18:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 47513.5, 300 sec: 48208.5). Total num frames: 1683947520. Throughput: 0: 12310.8. Samples: 421054976. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:25,767][1648981] Avg episode reward: [(0, '1228.780')] [2024-06-15 21:18:26,006][1651274] Signal inference workers to stop experience collection... (43100 times) [2024-06-15 21:18:26,072][1651669] InferenceWorker_p0-w0: stopping experience collection (43100 times) [2024-06-15 21:18:26,198][1651274] Signal inference workers to resume experience collection... (43100 times) [2024-06-15 21:18:26,199][1651669] InferenceWorker_p0-w0: resuming experience collection (43100 times) [2024-06-15 21:18:26,522][1651669] Updated weights for policy 0, policy_version 822288 (0.0014) [2024-06-15 21:18:28,816][1651669] Updated weights for policy 0, policy_version 822352 (0.0012) [2024-06-15 21:18:29,727][1651669] Updated weights for policy 0, policy_version 822392 (0.0017) [2024-06-15 21:18:30,770][1648981] Fps is (10 sec: 49389.2, 60 sec: 50241.1, 300 sec: 49095.9). Total num frames: 1684275200. Throughput: 0: 12173.2. Samples: 421125632. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:30,771][1648981] Avg episode reward: [(0, '1184.500')] [2024-06-15 21:18:31,728][1651669] Updated weights for policy 0, policy_version 822456 (0.0038) [2024-06-15 21:18:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1684471808. Throughput: 0: 12481.4. Samples: 421213184. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:35,767][1648981] Avg episode reward: [(0, '1180.790')] [2024-06-15 21:18:36,177][1651669] Updated weights for policy 0, policy_version 822528 (0.0014) [2024-06-15 21:18:37,578][1651669] Updated weights for policy 0, policy_version 822591 (0.0103) [2024-06-15 21:18:39,813][1651669] Updated weights for policy 0, policy_version 822640 (0.0011) [2024-06-15 21:18:40,766][1648981] Fps is (10 sec: 52448.6, 60 sec: 50244.4, 300 sec: 49318.6). Total num frames: 1684799488. Throughput: 0: 12242.5. Samples: 421244416. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:40,767][1648981] Avg episode reward: [(0, '1110.940')] [2024-06-15 21:18:40,957][1651669] Updated weights for policy 0, policy_version 822672 (0.0018) [2024-06-15 21:18:45,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1684930560. Throughput: 0: 12561.1. Samples: 421320704. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:45,767][1648981] Avg episode reward: [(0, '1090.930')] [2024-06-15 21:18:46,559][1651669] Updated weights for policy 0, policy_version 822755 (0.0012) [2024-06-15 21:18:48,108][1651669] Updated weights for policy 0, policy_version 822817 (0.0015) [2024-06-15 21:18:49,938][1651669] Updated weights for policy 0, policy_version 822864 (0.0012) [2024-06-15 21:18:50,774][1648981] Fps is (10 sec: 49116.1, 60 sec: 49692.1, 300 sec: 49208.2). Total num frames: 1685291008. Throughput: 0: 12377.0. Samples: 421388800. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:50,774][1648981] Avg episode reward: [(0, '1056.670')] [2024-06-15 21:18:51,155][1651669] Updated weights for policy 0, policy_version 822912 (0.0012) [2024-06-15 21:18:52,379][1651669] Updated weights for policy 0, policy_version 822970 (0.0015) [2024-06-15 21:18:55,778][1648981] Fps is (10 sec: 52365.2, 60 sec: 49142.2, 300 sec: 48542.1). Total num frames: 1685454848. Throughput: 0: 12387.1. Samples: 421421568. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:18:55,779][1648981] Avg episode reward: [(0, '1061.890')] [2024-06-15 21:18:55,785][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000822976_1685454848.pth... [2024-06-15 21:18:55,820][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000817280_1673789440.pth [2024-06-15 21:18:58,327][1651669] Updated weights for policy 0, policy_version 823026 (0.0013) [2024-06-15 21:18:59,539][1651669] Updated weights for policy 0, policy_version 823094 (0.0012) [2024-06-15 21:19:00,767][1648981] Fps is (10 sec: 45907.2, 60 sec: 48605.6, 300 sec: 48985.3). Total num frames: 1685749760. Throughput: 0: 12162.7. Samples: 421495296. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:00,767][1648981] Avg episode reward: [(0, '1065.450')] [2024-06-15 21:19:01,084][1651669] Updated weights for policy 0, policy_version 823136 (0.0030) [2024-06-15 21:19:02,550][1651669] Updated weights for policy 0, policy_version 823185 (0.0013) [2024-06-15 21:19:02,956][1651274] Signal inference workers to stop experience collection... (43150 times) [2024-06-15 21:19:03,045][1651669] InferenceWorker_p0-w0: stopping experience collection (43150 times) [2024-06-15 21:19:03,188][1651274] Signal inference workers to resume experience collection... (43150 times) [2024-06-15 21:19:03,189][1651669] InferenceWorker_p0-w0: resuming experience collection (43150 times) [2024-06-15 21:19:05,776][1648981] Fps is (10 sec: 52441.8, 60 sec: 50236.2, 300 sec: 48650.8). Total num frames: 1685979136. Throughput: 0: 12299.5. Samples: 421563392. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:05,776][1648981] Avg episode reward: [(0, '1106.680')] [2024-06-15 21:19:08,925][1651669] Updated weights for policy 0, policy_version 823251 (0.0011) [2024-06-15 21:19:10,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 47513.5, 300 sec: 48652.1). Total num frames: 1686175744. Throughput: 0: 12276.6. Samples: 421607424. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:10,767][1648981] Avg episode reward: [(0, '1094.150')] [2024-06-15 21:19:11,262][1651669] Updated weights for policy 0, policy_version 823354 (0.0230) [2024-06-15 21:19:14,137][1651669] Updated weights for policy 0, policy_version 823424 (0.0012) [2024-06-15 21:19:15,372][1651669] Updated weights for policy 0, policy_version 823480 (0.0030) [2024-06-15 21:19:15,767][1648981] Fps is (10 sec: 52478.5, 60 sec: 51336.4, 300 sec: 48763.2). Total num frames: 1686503424. Throughput: 0: 11799.7. Samples: 421656576. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:15,767][1648981] Avg episode reward: [(0, '1110.320')] [2024-06-15 21:19:20,404][1651669] Updated weights for policy 0, policy_version 823520 (0.0011) [2024-06-15 21:19:20,769][1648981] Fps is (10 sec: 42587.6, 60 sec: 47006.1, 300 sec: 48318.5). Total num frames: 1686601728. Throughput: 0: 11684.3. Samples: 421739008. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:20,770][1648981] Avg episode reward: [(0, '1068.490')] [2024-06-15 21:19:22,713][1651669] Updated weights for policy 0, policy_version 823600 (0.0011) [2024-06-15 21:19:24,507][1651669] Updated weights for policy 0, policy_version 823649 (0.0012) [2024-06-15 21:19:25,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 49698.2, 300 sec: 48655.4). Total num frames: 1686929408. Throughput: 0: 11685.0. Samples: 421770240. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:25,767][1648981] Avg episode reward: [(0, '1058.100')] [2024-06-15 21:19:26,725][1651669] Updated weights for policy 0, policy_version 823736 (0.0128) [2024-06-15 21:19:30,766][1648981] Fps is (10 sec: 42609.3, 60 sec: 45878.1, 300 sec: 47985.7). Total num frames: 1687027712. Throughput: 0: 11514.3. Samples: 421838848. Policy #0 lag: (min: 95.0, avg: 159.0, max: 319.0) [2024-06-15 21:19:30,767][1648981] Avg episode reward: [(0, '1047.520')] [2024-06-15 21:19:32,165][1651669] Updated weights for policy 0, policy_version 823797 (0.0013) [2024-06-15 21:19:33,221][1651669] Updated weights for policy 0, policy_version 823840 (0.0016) [2024-06-15 21:19:34,002][1651669] Updated weights for policy 0, policy_version 823872 (0.0012) [2024-06-15 21:19:35,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 1687322624. Throughput: 0: 11607.2. Samples: 421911040. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:19:35,767][1648981] Avg episode reward: [(0, '1007.810')] [2024-06-15 21:19:36,167][1651669] Updated weights for policy 0, policy_version 823924 (0.0013) [2024-06-15 21:19:37,254][1651669] Updated weights for policy 0, policy_version 823968 (0.0044) [2024-06-15 21:19:40,787][1648981] Fps is (10 sec: 52323.4, 60 sec: 45859.8, 300 sec: 48204.5). Total num frames: 1687552000. Throughput: 0: 11694.3. Samples: 421947904. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:19:40,787][1648981] Avg episode reward: [(0, '999.650')] [2024-06-15 21:19:41,499][1651669] Updated weights for policy 0, policy_version 824016 (0.0017) [2024-06-15 21:19:43,221][1651669] Updated weights for policy 0, policy_version 824080 (0.0011) [2024-06-15 21:19:45,695][1651669] Updated weights for policy 0, policy_version 824129 (0.0010) [2024-06-15 21:19:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48652.2). Total num frames: 1687814144. Throughput: 0: 11719.2. Samples: 422022656. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:19:45,767][1648981] Avg episode reward: [(0, '990.950')] [2024-06-15 21:19:46,759][1651274] Signal inference workers to stop experience collection... (43200 times) [2024-06-15 21:19:46,962][1651274] Signal inference workers to resume experience collection... (43200 times) [2024-06-15 21:19:46,968][1651669] InferenceWorker_p0-w0: stopping experience collection (43200 times) [2024-06-15 21:19:47,011][1651669] Updated weights for policy 0, policy_version 824192 (0.0012) [2024-06-15 21:19:47,052][1651669] InferenceWorker_p0-w0: resuming experience collection (43200 times) [2024-06-15 21:19:47,997][1651669] Updated weights for policy 0, policy_version 824248 (0.0021) [2024-06-15 21:19:50,766][1648981] Fps is (10 sec: 52534.4, 60 sec: 46427.0, 300 sec: 48430.0). Total num frames: 1688076288. Throughput: 0: 12119.9. Samples: 422108672. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:19:50,767][1648981] Avg episode reward: [(0, '1014.940')] [2024-06-15 21:19:52,377][1651669] Updated weights for policy 0, policy_version 824304 (0.0012) [2024-06-15 21:19:53,601][1651669] Updated weights for policy 0, policy_version 824353 (0.0015) [2024-06-15 21:19:55,771][1648981] Fps is (10 sec: 52402.2, 60 sec: 48065.4, 300 sec: 48765.2). Total num frames: 1688338432. Throughput: 0: 11911.2. Samples: 422143488. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:19:55,772][1648981] Avg episode reward: [(0, '1035.580')] [2024-06-15 21:19:56,473][1651669] Updated weights for policy 0, policy_version 824401 (0.0016) [2024-06-15 21:19:57,928][1651669] Updated weights for policy 0, policy_version 824464 (0.0011) [2024-06-15 21:19:58,901][1651669] Updated weights for policy 0, policy_version 824512 (0.0014) [2024-06-15 21:20:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 47513.9, 300 sec: 48430.0). Total num frames: 1688600576. Throughput: 0: 12367.7. Samples: 422213120. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:00,767][1648981] Avg episode reward: [(0, '1091.220')] [2024-06-15 21:20:03,534][1651669] Updated weights for policy 0, policy_version 824576 (0.0039) [2024-06-15 21:20:04,901][1651669] Updated weights for policy 0, policy_version 824632 (0.0012) [2024-06-15 21:20:05,766][1648981] Fps is (10 sec: 52456.1, 60 sec: 48067.6, 300 sec: 48874.3). Total num frames: 1688862720. Throughput: 0: 12186.3. Samples: 422287360. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:05,767][1648981] Avg episode reward: [(0, '1109.230')] [2024-06-15 21:20:07,497][1651669] Updated weights for policy 0, policy_version 824675 (0.0013) [2024-06-15 21:20:08,866][1651669] Updated weights for policy 0, policy_version 824726 (0.0013) [2024-06-15 21:20:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 48542.1). Total num frames: 1689124864. Throughput: 0: 12288.0. Samples: 422323200. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:10,767][1648981] Avg episode reward: [(0, '1089.340')] [2024-06-15 21:20:12,929][1651669] Updated weights for policy 0, policy_version 824770 (0.0011) [2024-06-15 21:20:14,443][1651669] Updated weights for policy 0, policy_version 824832 (0.0013) [2024-06-15 21:20:15,766][1648981] Fps is (10 sec: 49151.2, 60 sec: 47513.7, 300 sec: 48763.2). Total num frames: 1689354240. Throughput: 0: 12538.3. Samples: 422403072. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:15,767][1648981] Avg episode reward: [(0, '1073.510')] [2024-06-15 21:20:16,024][1651669] Updated weights for policy 0, policy_version 824895 (0.0016) [2024-06-15 21:20:18,965][1651669] Updated weights for policy 0, policy_version 824962 (0.0109) [2024-06-15 21:20:20,145][1651669] Updated weights for policy 0, policy_version 825021 (0.0011) [2024-06-15 21:20:20,802][1648981] Fps is (10 sec: 52242.1, 60 sec: 50762.3, 300 sec: 48535.2). Total num frames: 1689649152. Throughput: 0: 12266.9. Samples: 422463488. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:20,803][1648981] Avg episode reward: [(0, '1086.420')] [2024-06-15 21:20:24,637][1651669] Updated weights for policy 0, policy_version 825072 (0.0012) [2024-06-15 21:20:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.7, 300 sec: 48541.1). Total num frames: 1689812992. Throughput: 0: 12543.9. Samples: 422512128. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:25,767][1648981] Avg episode reward: [(0, '1064.410')] [2024-06-15 21:20:25,842][1651274] Signal inference workers to stop experience collection... (43250 times) [2024-06-15 21:20:25,904][1651669] InferenceWorker_p0-w0: stopping experience collection (43250 times) [2024-06-15 21:20:26,183][1651274] Signal inference workers to resume experience collection... (43250 times) [2024-06-15 21:20:26,183][1651669] InferenceWorker_p0-w0: resuming experience collection (43250 times) [2024-06-15 21:20:26,782][1651669] Updated weights for policy 0, policy_version 825142 (0.0025) [2024-06-15 21:20:28,822][1651669] Updated weights for policy 0, policy_version 825168 (0.0011) [2024-06-15 21:20:30,544][1651669] Updated weights for policy 0, policy_version 825249 (0.0085) [2024-06-15 21:20:30,771][1648981] Fps is (10 sec: 49306.3, 60 sec: 51878.8, 300 sec: 48654.7). Total num frames: 1690140672. Throughput: 0: 12218.5. Samples: 422572544. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:30,771][1648981] Avg episode reward: [(0, '1058.560')] [2024-06-15 21:20:35,452][1651669] Updated weights for policy 0, policy_version 825313 (0.0012) [2024-06-15 21:20:35,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 1690271744. Throughput: 0: 12197.0. Samples: 422657536. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:35,767][1648981] Avg episode reward: [(0, '1023.460')] [2024-06-15 21:20:36,540][1651669] Updated weights for policy 0, policy_version 825360 (0.0012) [2024-06-15 21:20:37,802][1651669] Updated weights for policy 0, policy_version 825407 (0.0012) [2024-06-15 21:20:40,524][1651669] Updated weights for policy 0, policy_version 825479 (0.0020) [2024-06-15 21:20:40,766][1648981] Fps is (10 sec: 45895.5, 60 sec: 50807.4, 300 sec: 48541.1). Total num frames: 1690599424. Throughput: 0: 12130.1. Samples: 422689280. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:40,767][1648981] Avg episode reward: [(0, '1041.960')] [2024-06-15 21:20:41,516][1651669] Updated weights for policy 0, policy_version 825529 (0.0113) [2024-06-15 21:20:45,778][1648981] Fps is (10 sec: 42548.0, 60 sec: 48050.2, 300 sec: 47983.8). Total num frames: 1690697728. Throughput: 0: 12216.5. Samples: 422763008. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:45,779][1648981] Avg episode reward: [(0, '1059.810')] [2024-06-15 21:20:46,560][1651669] Updated weights for policy 0, policy_version 825584 (0.0011) [2024-06-15 21:20:48,003][1651669] Updated weights for policy 0, policy_version 825632 (0.0018) [2024-06-15 21:20:50,137][1651669] Updated weights for policy 0, policy_version 825682 (0.0017) [2024-06-15 21:20:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49698.2, 300 sec: 48541.1). Total num frames: 1691058176. Throughput: 0: 12162.8. Samples: 422834688. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:50,767][1648981] Avg episode reward: [(0, '1046.860')] [2024-06-15 21:20:51,638][1651669] Updated weights for policy 0, policy_version 825760 (0.0018) [2024-06-15 21:20:55,766][1648981] Fps is (10 sec: 52490.9, 60 sec: 48063.8, 300 sec: 47985.7). Total num frames: 1691222016. Throughput: 0: 12140.1. Samples: 422869504. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:20:55,767][1648981] Avg episode reward: [(0, '1058.120')] [2024-06-15 21:20:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000825792_1691222016.pth... [2024-06-15 21:20:56,028][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000820160_1679687680.pth [2024-06-15 21:20:56,504][1651669] Updated weights for policy 0, policy_version 825809 (0.0012) [2024-06-15 21:20:57,547][1651669] Updated weights for policy 0, policy_version 825856 (0.0016) [2024-06-15 21:20:59,103][1651669] Updated weights for policy 0, policy_version 825911 (0.0012) [2024-06-15 21:21:00,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 49151.9, 300 sec: 48874.3). Total num frames: 1691549696. Throughput: 0: 12003.5. Samples: 422943232. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:21:00,767][1648981] Avg episode reward: [(0, '1024.130')] [2024-06-15 21:21:00,840][1651669] Updated weights for policy 0, policy_version 825955 (0.0012) [2024-06-15 21:21:02,712][1651669] Updated weights for policy 0, policy_version 826032 (0.0151) [2024-06-15 21:21:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.6, 300 sec: 47985.7). Total num frames: 1691746304. Throughput: 0: 12218.1. Samples: 423012864. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:21:05,767][1648981] Avg episode reward: [(0, '1043.570')] [2024-06-15 21:21:07,710][1651274] Signal inference workers to stop experience collection... (43300 times) [2024-06-15 21:21:07,762][1651669] InferenceWorker_p0-w0: stopping experience collection (43300 times) [2024-06-15 21:21:08,033][1651274] Signal inference workers to resume experience collection... (43300 times) [2024-06-15 21:21:08,034][1651669] InferenceWorker_p0-w0: resuming experience collection (43300 times) [2024-06-15 21:21:08,037][1651669] Updated weights for policy 0, policy_version 826080 (0.0012) [2024-06-15 21:21:10,668][1651669] Updated weights for policy 0, policy_version 826160 (0.0015) [2024-06-15 21:21:10,767][1648981] Fps is (10 sec: 42598.3, 60 sec: 47513.5, 300 sec: 48652.1). Total num frames: 1691975680. Throughput: 0: 11912.5. Samples: 423048192. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:21:10,767][1648981] Avg episode reward: [(0, '1046.950')] [2024-06-15 21:21:12,911][1651669] Updated weights for policy 0, policy_version 826237 (0.0011) [2024-06-15 21:21:14,696][1651669] Updated weights for policy 0, policy_version 826275 (0.0013) [2024-06-15 21:21:15,806][1648981] Fps is (10 sec: 52221.2, 60 sec: 48573.6, 300 sec: 48312.4). Total num frames: 1692270592. Throughput: 0: 11869.1. Samples: 423107072. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:21:15,807][1648981] Avg episode reward: [(0, '1037.450')] [2024-06-15 21:21:19,734][1651669] Updated weights for policy 0, policy_version 826336 (0.0012) [2024-06-15 21:21:20,532][1651669] Updated weights for policy 0, policy_version 826368 (0.0011) [2024-06-15 21:21:20,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 45902.6, 300 sec: 48430.0). Total num frames: 1692401664. Throughput: 0: 11707.7. Samples: 423184384. Policy #0 lag: (min: 47.0, avg: 142.9, max: 287.0) [2024-06-15 21:21:20,767][1648981] Avg episode reward: [(0, '1036.920')] [2024-06-15 21:21:22,548][1651669] Updated weights for policy 0, policy_version 826422 (0.0012) [2024-06-15 21:21:23,899][1651669] Updated weights for policy 0, policy_version 826484 (0.0011) [2024-06-15 21:21:25,574][1651669] Updated weights for policy 0, policy_version 826544 (0.0012) [2024-06-15 21:21:25,766][1648981] Fps is (10 sec: 49348.5, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 1692762112. Throughput: 0: 11696.4. Samples: 423215616. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:25,767][1648981] Avg episode reward: [(0, '1002.670')] [2024-06-15 21:21:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 44786.3, 300 sec: 48096.8). Total num frames: 1692827648. Throughput: 0: 11767.7. Samples: 423292416. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:30,767][1648981] Avg episode reward: [(0, '1019.830')] [2024-06-15 21:21:31,377][1651669] Updated weights for policy 0, policy_version 826608 (0.0012) [2024-06-15 21:21:31,870][1651669] Updated weights for policy 0, policy_version 826624 (0.0010) [2024-06-15 21:21:34,758][1651669] Updated weights for policy 0, policy_version 826725 (0.0111) [2024-06-15 21:21:35,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1693188096. Throughput: 0: 11525.7. Samples: 423353344. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:35,767][1648981] Avg episode reward: [(0, '1016.180')] [2024-06-15 21:21:36,255][1651669] Updated weights for policy 0, policy_version 826768 (0.0011) [2024-06-15 21:21:40,767][1648981] Fps is (10 sec: 49150.7, 60 sec: 45328.9, 300 sec: 47985.7). Total num frames: 1693319168. Throughput: 0: 11571.1. Samples: 423390208. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:40,767][1648981] Avg episode reward: [(0, '1031.500')] [2024-06-15 21:21:41,781][1651669] Updated weights for policy 0, policy_version 826834 (0.0013) [2024-06-15 21:21:42,594][1651669] Updated weights for policy 0, policy_version 826878 (0.0011) [2024-06-15 21:21:44,888][1651669] Updated weights for policy 0, policy_version 826946 (0.0014) [2024-06-15 21:21:45,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49161.7, 300 sec: 48318.9). Total num frames: 1693646848. Throughput: 0: 11730.5. Samples: 423471104. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:45,767][1648981] Avg episode reward: [(0, '1024.130')] [2024-06-15 21:21:46,159][1651669] Updated weights for policy 0, policy_version 827001 (0.0009) [2024-06-15 21:21:46,918][1651669] Updated weights for policy 0, policy_version 827026 (0.0012) [2024-06-15 21:21:47,253][1651274] Signal inference workers to stop experience collection... (43350 times) [2024-06-15 21:21:47,285][1651669] InferenceWorker_p0-w0: stopping experience collection (43350 times) [2024-06-15 21:21:47,567][1651274] Signal inference workers to resume experience collection... (43350 times) [2024-06-15 21:21:47,568][1651669] InferenceWorker_p0-w0: resuming experience collection (43350 times) [2024-06-15 21:21:50,790][1648981] Fps is (10 sec: 52305.6, 60 sec: 46403.0, 300 sec: 47981.8). Total num frames: 1693843456. Throughput: 0: 11747.0. Samples: 423541760. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:50,791][1648981] Avg episode reward: [(0, '1028.960')] [2024-06-15 21:21:52,318][1651669] Updated weights for policy 0, policy_version 827080 (0.0012) [2024-06-15 21:21:53,600][1651669] Updated weights for policy 0, policy_version 827136 (0.0014) [2024-06-15 21:21:55,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46967.5, 300 sec: 48096.7). Total num frames: 1694040064. Throughput: 0: 11719.1. Samples: 423575552. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:21:55,767][1648981] Avg episode reward: [(0, '1029.620')] [2024-06-15 21:21:56,328][1651669] Updated weights for policy 0, policy_version 827200 (0.0015) [2024-06-15 21:21:57,937][1651669] Updated weights for policy 0, policy_version 827280 (0.0102) [2024-06-15 21:22:00,766][1648981] Fps is (10 sec: 52553.8, 60 sec: 46967.6, 300 sec: 47987.1). Total num frames: 1694367744. Throughput: 0: 11854.7. Samples: 423640064. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:00,767][1648981] Avg episode reward: [(0, '1046.890')] [2024-06-15 21:22:03,317][1651669] Updated weights for policy 0, policy_version 827344 (0.0012) [2024-06-15 21:22:04,412][1651669] Updated weights for policy 0, policy_version 827392 (0.0014) [2024-06-15 21:22:05,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 45875.3, 300 sec: 48207.9). Total num frames: 1694498816. Throughput: 0: 11844.3. Samples: 423717376. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:05,767][1648981] Avg episode reward: [(0, '988.270')] [2024-06-15 21:22:07,996][1651669] Updated weights for policy 0, policy_version 827443 (0.0018) [2024-06-15 21:22:09,735][1651669] Updated weights for policy 0, policy_version 827521 (0.0011) [2024-06-15 21:22:10,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 47513.6, 300 sec: 48098.2). Total num frames: 1694826496. Throughput: 0: 11855.6. Samples: 423749120. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:10,767][1648981] Avg episode reward: [(0, '1009.460')] [2024-06-15 21:22:11,063][1651669] Updated weights for policy 0, policy_version 827577 (0.0011) [2024-06-15 21:22:15,039][1651669] Updated weights for policy 0, policy_version 827641 (0.0015) [2024-06-15 21:22:15,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 45905.6, 300 sec: 48430.0). Total num frames: 1695023104. Throughput: 0: 11855.6. Samples: 423825920. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:15,767][1648981] Avg episode reward: [(0, '1093.490')] [2024-06-15 21:22:19,416][1651669] Updated weights for policy 0, policy_version 827712 (0.0012) [2024-06-15 21:22:20,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1695252480. Throughput: 0: 11844.3. Samples: 423886336. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:20,767][1648981] Avg episode reward: [(0, '1065.230')] [2024-06-15 21:22:21,228][1651669] Updated weights for policy 0, policy_version 827792 (0.0126) [2024-06-15 21:22:22,449][1651669] Updated weights for policy 0, policy_version 827839 (0.0011) [2024-06-15 21:22:25,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44782.9, 300 sec: 48096.7). Total num frames: 1695449088. Throughput: 0: 11844.3. Samples: 423923200. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:25,767][1648981] Avg episode reward: [(0, '1057.670')] [2024-06-15 21:22:26,254][1651669] Updated weights for policy 0, policy_version 827888 (0.0018) [2024-06-15 21:22:28,434][1651669] Updated weights for policy 0, policy_version 827927 (0.0016) [2024-06-15 21:22:29,377][1651274] Signal inference workers to stop experience collection... (43400 times) [2024-06-15 21:22:29,427][1651669] InferenceWorker_p0-w0: stopping experience collection (43400 times) [2024-06-15 21:22:29,738][1651274] Signal inference workers to resume experience collection... (43400 times) [2024-06-15 21:22:29,739][1651669] InferenceWorker_p0-w0: resuming experience collection (43400 times) [2024-06-15 21:22:30,233][1651669] Updated weights for policy 0, policy_version 828000 (0.0013) [2024-06-15 21:22:30,786][1648981] Fps is (10 sec: 52325.0, 60 sec: 49135.7, 300 sec: 48093.5). Total num frames: 1695776768. Throughput: 0: 11793.5. Samples: 424002048. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:30,787][1648981] Avg episode reward: [(0, '1057.670')] [2024-06-15 21:22:32,244][1651669] Updated weights for policy 0, policy_version 828089 (0.0014) [2024-06-15 21:22:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1695940608. Throughput: 0: 11896.1. Samples: 424076800. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:35,767][1648981] Avg episode reward: [(0, '1074.810')] [2024-06-15 21:22:37,744][1651669] Updated weights for policy 0, policy_version 828144 (0.0012) [2024-06-15 21:22:39,656][1651669] Updated weights for policy 0, policy_version 828211 (0.0012) [2024-06-15 21:22:40,766][1648981] Fps is (10 sec: 45966.2, 60 sec: 48606.0, 300 sec: 48096.8). Total num frames: 1696235520. Throughput: 0: 11889.8. Samples: 424110592. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:40,767][1648981] Avg episode reward: [(0, '1065.400')] [2024-06-15 21:22:41,023][1651669] Updated weights for policy 0, policy_version 828258 (0.0024) [2024-06-15 21:22:42,900][1651669] Updated weights for policy 0, policy_version 828326 (0.0011) [2024-06-15 21:22:45,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1696464896. Throughput: 0: 11901.2. Samples: 424175616. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:45,767][1648981] Avg episode reward: [(0, '1043.540')] [2024-06-15 21:22:48,287][1651669] Updated weights for policy 0, policy_version 828384 (0.0011) [2024-06-15 21:22:49,894][1651669] Updated weights for policy 0, policy_version 828448 (0.0155) [2024-06-15 21:22:50,770][1648981] Fps is (10 sec: 49133.6, 60 sec: 48075.8, 300 sec: 48207.2). Total num frames: 1696727040. Throughput: 0: 12048.0. Samples: 424259584. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:50,771][1648981] Avg episode reward: [(0, '1128.860')] [2024-06-15 21:22:51,303][1651669] Updated weights for policy 0, policy_version 828512 (0.0059) [2024-06-15 21:22:52,839][1651669] Updated weights for policy 0, policy_version 828576 (0.0025) [2024-06-15 21:22:53,418][1651669] Updated weights for policy 0, policy_version 828606 (0.0012) [2024-06-15 21:22:55,767][1648981] Fps is (10 sec: 52427.0, 60 sec: 49151.8, 300 sec: 47985.6). Total num frames: 1696989184. Throughput: 0: 11935.2. Samples: 424286208. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:22:55,767][1648981] Avg episode reward: [(0, '1094.900')] [2024-06-15 21:22:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000828608_1696989184.pth... [2024-06-15 21:22:55,808][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000822976_1685454848.pth [2024-06-15 21:22:58,799][1651669] Updated weights for policy 0, policy_version 828657 (0.0010) [2024-06-15 21:22:59,967][1651669] Updated weights for policy 0, policy_version 828706 (0.0011) [2024-06-15 21:23:00,766][1648981] Fps is (10 sec: 52448.7, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1697251328. Throughput: 0: 12219.7. Samples: 424375808. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:23:00,767][1648981] Avg episode reward: [(0, '1126.130')] [2024-06-15 21:23:01,229][1651669] Updated weights for policy 0, policy_version 828754 (0.0012) [2024-06-15 21:23:03,139][1651669] Updated weights for policy 0, policy_version 828816 (0.0013) [2024-06-15 21:23:05,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 50244.2, 300 sec: 48096.7). Total num frames: 1697513472. Throughput: 0: 12288.0. Samples: 424439296. Policy #0 lag: (min: 19.0, avg: 117.3, max: 275.0) [2024-06-15 21:23:05,767][1648981] Avg episode reward: [(0, '1129.070')] [2024-06-15 21:23:08,623][1651669] Updated weights for policy 0, policy_version 828875 (0.0025) [2024-06-15 21:23:09,533][1651669] Updated weights for policy 0, policy_version 828928 (0.0011) [2024-06-15 21:23:09,991][1651274] Signal inference workers to stop experience collection... (43450 times) [2024-06-15 21:23:10,093][1651669] InferenceWorker_p0-w0: stopping experience collection (43450 times) [2024-06-15 21:23:10,268][1651274] Signal inference workers to resume experience collection... (43450 times) [2024-06-15 21:23:10,268][1651669] InferenceWorker_p0-w0: resuming experience collection (43450 times) [2024-06-15 21:23:10,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1697710080. Throughput: 0: 12435.9. Samples: 424482816. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:10,767][1648981] Avg episode reward: [(0, '1129.640')] [2024-06-15 21:23:11,590][1651669] Updated weights for policy 0, policy_version 828994 (0.0013) [2024-06-15 21:23:12,795][1651669] Updated weights for policy 0, policy_version 829051 (0.0016) [2024-06-15 21:23:14,840][1651669] Updated weights for policy 0, policy_version 829088 (0.0012) [2024-06-15 21:23:15,622][1651669] Updated weights for policy 0, policy_version 829120 (0.0012) [2024-06-15 21:23:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 48327.4). Total num frames: 1698037760. Throughput: 0: 12259.3. Samples: 424553472. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:15,767][1648981] Avg episode reward: [(0, '1097.750')] [2024-06-15 21:23:19,834][1651669] Updated weights for policy 0, policy_version 829182 (0.0012) [2024-06-15 21:23:20,769][1648981] Fps is (10 sec: 45863.0, 60 sec: 48603.6, 300 sec: 48207.4). Total num frames: 1698168832. Throughput: 0: 12287.2. Samples: 424629760. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:20,770][1648981] Avg episode reward: [(0, '1088.820')] [2024-06-15 21:23:21,937][1651669] Updated weights for policy 0, policy_version 829232 (0.0013) [2024-06-15 21:23:23,982][1651669] Updated weights for policy 0, policy_version 829302 (0.0015) [2024-06-15 21:23:25,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 50244.2, 300 sec: 48097.3). Total num frames: 1698463744. Throughput: 0: 12219.7. Samples: 424660480. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:25,767][1648981] Avg episode reward: [(0, '1072.640')] [2024-06-15 21:23:25,853][1651669] Updated weights for policy 0, policy_version 829330 (0.0014) [2024-06-15 21:23:29,063][1651669] Updated weights for policy 0, policy_version 829378 (0.0012) [2024-06-15 21:23:30,229][1651669] Updated weights for policy 0, policy_version 829440 (0.0025) [2024-06-15 21:23:30,766][1648981] Fps is (10 sec: 52442.9, 60 sec: 48621.9, 300 sec: 48207.8). Total num frames: 1698693120. Throughput: 0: 12617.9. Samples: 424743424. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:30,767][1648981] Avg episode reward: [(0, '1047.820')] [2024-06-15 21:23:32,933][1651669] Updated weights for policy 0, policy_version 829494 (0.0013) [2024-06-15 21:23:34,380][1651669] Updated weights for policy 0, policy_version 829562 (0.0016) [2024-06-15 21:23:35,767][1648981] Fps is (10 sec: 49151.0, 60 sec: 50244.0, 300 sec: 47985.6). Total num frames: 1698955264. Throughput: 0: 12209.3. Samples: 424808960. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:35,767][1648981] Avg episode reward: [(0, '1034.780')] [2024-06-15 21:23:37,025][1651669] Updated weights for policy 0, policy_version 829601 (0.0014) [2024-06-15 21:23:40,459][1651669] Updated weights for policy 0, policy_version 829658 (0.0116) [2024-06-15 21:23:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48605.8, 300 sec: 48207.8). Total num frames: 1699151872. Throughput: 0: 12470.1. Samples: 424847360. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:40,767][1648981] Avg episode reward: [(0, '968.460')] [2024-06-15 21:23:42,347][1651669] Updated weights for policy 0, policy_version 829720 (0.0131) [2024-06-15 21:23:43,830][1651669] Updated weights for policy 0, policy_version 829782 (0.0077) [2024-06-15 21:23:44,639][1651669] Updated weights for policy 0, policy_version 829824 (0.0018) [2024-06-15 21:23:45,766][1648981] Fps is (10 sec: 52430.8, 60 sec: 50244.3, 300 sec: 48098.0). Total num frames: 1699479552. Throughput: 0: 12003.6. Samples: 424915968. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:45,767][1648981] Avg episode reward: [(0, '967.730')] [2024-06-15 21:23:48,387][1651669] Updated weights for policy 0, policy_version 829880 (0.0011) [2024-06-15 21:23:50,767][1648981] Fps is (10 sec: 45874.5, 60 sec: 48062.6, 300 sec: 47987.6). Total num frames: 1699610624. Throughput: 0: 12424.5. Samples: 424998400. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:50,767][1648981] Avg episode reward: [(0, '971.230')] [2024-06-15 21:23:50,959][1651669] Updated weights for policy 0, policy_version 829904 (0.0010) [2024-06-15 21:23:51,093][1651274] Signal inference workers to stop experience collection... (43500 times) [2024-06-15 21:23:51,163][1651669] InferenceWorker_p0-w0: stopping experience collection (43500 times) [2024-06-15 21:23:51,292][1651274] Signal inference workers to resume experience collection... (43500 times) [2024-06-15 21:23:51,292][1651669] InferenceWorker_p0-w0: resuming experience collection (43500 times) [2024-06-15 21:23:52,601][1651669] Updated weights for policy 0, policy_version 829955 (0.0012) [2024-06-15 21:23:53,978][1651669] Updated weights for policy 0, policy_version 830018 (0.0013) [2024-06-15 21:23:55,769][1648981] Fps is (10 sec: 52413.7, 60 sec: 50242.1, 300 sec: 48318.5). Total num frames: 1700003840. Throughput: 0: 12332.8. Samples: 425037824. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:23:55,770][1648981] Avg episode reward: [(0, '930.290')] [2024-06-15 21:23:57,425][1651669] Updated weights for policy 0, policy_version 830081 (0.0020) [2024-06-15 21:23:58,949][1651669] Updated weights for policy 0, policy_version 830140 (0.0013) [2024-06-15 21:24:00,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 48059.7, 300 sec: 47987.2). Total num frames: 1700134912. Throughput: 0: 12242.5. Samples: 425104384. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:00,767][1648981] Avg episode reward: [(0, '970.560')] [2024-06-15 21:24:02,189][1651669] Updated weights for policy 0, policy_version 830208 (0.0015) [2024-06-15 21:24:04,297][1651669] Updated weights for policy 0, policy_version 830274 (0.0144) [2024-06-15 21:24:05,503][1651669] Updated weights for policy 0, policy_version 830331 (0.0013) [2024-06-15 21:24:05,766][1648981] Fps is (10 sec: 52443.8, 60 sec: 50244.3, 300 sec: 48652.2). Total num frames: 1700528128. Throughput: 0: 12197.7. Samples: 425178624. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:05,767][1648981] Avg episode reward: [(0, '983.020')] [2024-06-15 21:24:09,603][1651669] Updated weights for policy 0, policy_version 830392 (0.0028) [2024-06-15 21:24:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.1, 300 sec: 47985.7). Total num frames: 1700659200. Throughput: 0: 12447.3. Samples: 425220608. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:10,767][1648981] Avg episode reward: [(0, '949.290')] [2024-06-15 21:24:13,440][1651669] Updated weights for policy 0, policy_version 830453 (0.0014) [2024-06-15 21:24:14,573][1651669] Updated weights for policy 0, policy_version 830512 (0.0010) [2024-06-15 21:24:15,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 48763.6). Total num frames: 1700986880. Throughput: 0: 12253.9. Samples: 425294848. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:15,767][1648981] Avg episode reward: [(0, '977.660')] [2024-06-15 21:24:16,138][1651669] Updated weights for policy 0, policy_version 830579 (0.0014) [2024-06-15 21:24:19,010][1651669] Updated weights for policy 0, policy_version 830608 (0.0051) [2024-06-15 21:24:20,132][1651669] Updated weights for policy 0, policy_version 830656 (0.0012) [2024-06-15 21:24:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50246.6, 300 sec: 48318.9). Total num frames: 1701183488. Throughput: 0: 12413.2. Samples: 425367552. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:20,767][1648981] Avg episode reward: [(0, '946.240')] [2024-06-15 21:24:24,387][1651669] Updated weights for policy 0, policy_version 830723 (0.0015) [2024-06-15 21:24:25,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49698.2, 300 sec: 48874.3). Total num frames: 1701445632. Throughput: 0: 12526.9. Samples: 425411072. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:25,767][1648981] Avg episode reward: [(0, '898.660')] [2024-06-15 21:24:26,264][1651669] Updated weights for policy 0, policy_version 830802 (0.0095) [2024-06-15 21:24:29,695][1651274] Signal inference workers to stop experience collection... (43550 times) [2024-06-15 21:24:29,754][1651669] InferenceWorker_p0-w0: stopping experience collection (43550 times) [2024-06-15 21:24:30,038][1651274] Signal inference workers to resume experience collection... (43550 times) [2024-06-15 21:24:30,039][1651669] InferenceWorker_p0-w0: resuming experience collection (43550 times) [2024-06-15 21:24:30,668][1651669] Updated weights for policy 0, policy_version 830890 (0.0113) [2024-06-15 21:24:30,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.2, 300 sec: 48652.1). Total num frames: 1701675008. Throughput: 0: 12595.2. Samples: 425482752. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:30,767][1648981] Avg episode reward: [(0, '929.940')] [2024-06-15 21:24:34,247][1651669] Updated weights for policy 0, policy_version 830944 (0.0016) [2024-06-15 21:24:35,791][1648981] Fps is (10 sec: 45761.1, 60 sec: 49131.8, 300 sec: 48651.4). Total num frames: 1701904384. Throughput: 0: 12247.1. Samples: 425549824. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:35,795][1648981] Avg episode reward: [(0, '907.240')] [2024-06-15 21:24:36,189][1651669] Updated weights for policy 0, policy_version 831035 (0.0132) [2024-06-15 21:24:37,559][1651669] Updated weights for policy 0, policy_version 831088 (0.0014) [2024-06-15 21:24:40,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1702100992. Throughput: 0: 12027.1. Samples: 425579008. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:40,767][1648981] Avg episode reward: [(0, '866.100')] [2024-06-15 21:24:41,446][1651669] Updated weights for policy 0, policy_version 831124 (0.0019) [2024-06-15 21:24:45,088][1651669] Updated weights for policy 0, policy_version 831185 (0.0015) [2024-06-15 21:24:45,766][1648981] Fps is (10 sec: 42705.1, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1702330368. Throughput: 0: 12356.3. Samples: 425660416. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:45,767][1648981] Avg episode reward: [(0, '843.050')] [2024-06-15 21:24:46,193][1651669] Updated weights for policy 0, policy_version 831236 (0.0030) [2024-06-15 21:24:47,338][1651669] Updated weights for policy 0, policy_version 831296 (0.0012) [2024-06-15 21:24:48,786][1651669] Updated weights for policy 0, policy_version 831348 (0.0012) [2024-06-15 21:24:50,806][1648981] Fps is (10 sec: 52224.6, 60 sec: 50211.7, 300 sec: 48424.4). Total num frames: 1702625280. Throughput: 0: 12197.7. Samples: 425728000. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:50,806][1648981] Avg episode reward: [(0, '816.610')] [2024-06-15 21:24:52,125][1651669] Updated weights for policy 0, policy_version 831363 (0.0011) [2024-06-15 21:24:55,547][1651669] Updated weights for policy 0, policy_version 831444 (0.0014) [2024-06-15 21:24:55,782][1648981] Fps is (10 sec: 49073.2, 60 sec: 46957.1, 300 sec: 48205.2). Total num frames: 1702821888. Throughput: 0: 12067.5. Samples: 425763840. Policy #0 lag: (min: 17.0, avg: 120.0, max: 273.0) [2024-06-15 21:24:55,783][1648981] Avg episode reward: [(0, '804.370')] [2024-06-15 21:24:56,088][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000831472_1702854656.pth... [2024-06-15 21:24:56,305][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000825792_1691222016.pth [2024-06-15 21:24:57,111][1651669] Updated weights for policy 0, policy_version 831507 (0.0013) [2024-06-15 21:24:58,631][1651669] Updated weights for policy 0, policy_version 831576 (0.0109) [2024-06-15 21:24:59,559][1651669] Updated weights for policy 0, policy_version 831616 (0.0124) [2024-06-15 21:25:00,767][1648981] Fps is (10 sec: 52634.0, 60 sec: 50244.1, 300 sec: 48429.9). Total num frames: 1703149568. Throughput: 0: 11844.2. Samples: 425827840. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:00,767][1648981] Avg episode reward: [(0, '835.880')] [2024-06-15 21:25:04,483][1651669] Updated weights for policy 0, policy_version 831670 (0.0011) [2024-06-15 21:25:05,794][1648981] Fps is (10 sec: 45821.5, 60 sec: 45854.0, 300 sec: 47981.2). Total num frames: 1703280640. Throughput: 0: 12155.4. Samples: 425914880. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:05,795][1648981] Avg episode reward: [(0, '867.770')] [2024-06-15 21:25:06,253][1651669] Updated weights for policy 0, policy_version 831712 (0.0011) [2024-06-15 21:25:07,816][1651274] Signal inference workers to stop experience collection... (43600 times) [2024-06-15 21:25:07,852][1651669] InferenceWorker_p0-w0: stopping experience collection (43600 times) [2024-06-15 21:25:08,080][1651274] Signal inference workers to resume experience collection... (43600 times) [2024-06-15 21:25:08,081][1651669] InferenceWorker_p0-w0: resuming experience collection (43600 times) [2024-06-15 21:25:08,084][1651669] Updated weights for policy 0, policy_version 831792 (0.0115) [2024-06-15 21:25:10,366][1651669] Updated weights for policy 0, policy_version 831856 (0.0030) [2024-06-15 21:25:10,767][1648981] Fps is (10 sec: 52424.9, 60 sec: 50243.5, 300 sec: 48540.9). Total num frames: 1703673856. Throughput: 0: 11844.0. Samples: 425944064. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:10,768][1648981] Avg episode reward: [(0, '881.310')] [2024-06-15 21:25:15,220][1651669] Updated weights for policy 0, policy_version 831920 (0.0011) [2024-06-15 21:25:15,802][1648981] Fps is (10 sec: 52385.7, 60 sec: 46939.3, 300 sec: 47985.6). Total num frames: 1703804928. Throughput: 0: 12141.8. Samples: 426029568. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:15,803][1648981] Avg episode reward: [(0, '929.640')] [2024-06-15 21:25:16,737][1651669] Updated weights for policy 0, policy_version 831955 (0.0030) [2024-06-15 21:25:18,960][1651669] Updated weights for policy 0, policy_version 832061 (0.0011) [2024-06-15 21:25:20,767][1648981] Fps is (10 sec: 45877.6, 60 sec: 49151.7, 300 sec: 48541.0). Total num frames: 1704132608. Throughput: 0: 12010.1. Samples: 426089984. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:20,768][1648981] Avg episode reward: [(0, '943.700')] [2024-06-15 21:25:20,779][1651669] Updated weights for policy 0, policy_version 832112 (0.0026) [2024-06-15 21:25:25,766][1648981] Fps is (10 sec: 42752.1, 60 sec: 46421.3, 300 sec: 47764.2). Total num frames: 1704230912. Throughput: 0: 12390.4. Samples: 426136576. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:25,767][1648981] Avg episode reward: [(0, '952.710')] [2024-06-15 21:25:26,820][1651669] Updated weights for policy 0, policy_version 832192 (0.0154) [2024-06-15 21:25:28,512][1651669] Updated weights for policy 0, policy_version 832257 (0.0012) [2024-06-15 21:25:30,027][1651669] Updated weights for policy 0, policy_version 832320 (0.0012) [2024-06-15 21:25:30,766][1648981] Fps is (10 sec: 49153.8, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 1704624128. Throughput: 0: 12003.5. Samples: 426200576. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:30,767][1648981] Avg episode reward: [(0, '1000.030')] [2024-06-15 21:25:31,290][1651669] Updated weights for policy 0, policy_version 832382 (0.0013) [2024-06-15 21:25:35,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 46987.0, 300 sec: 47874.6). Total num frames: 1704722432. Throughput: 0: 12355.6. Samples: 426283520. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:35,767][1648981] Avg episode reward: [(0, '1064.660')] [2024-06-15 21:25:37,021][1651669] Updated weights for policy 0, policy_version 832446 (0.0087) [2024-06-15 21:25:38,805][1651669] Updated weights for policy 0, policy_version 832505 (0.0012) [2024-06-15 21:25:40,550][1651669] Updated weights for policy 0, policy_version 832560 (0.0012) [2024-06-15 21:25:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49698.2, 300 sec: 48765.2). Total num frames: 1705082880. Throughput: 0: 12235.5. Samples: 426314240. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:40,767][1648981] Avg episode reward: [(0, '1068.960')] [2024-06-15 21:25:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 1705246720. Throughput: 0: 12561.1. Samples: 426393088. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:45,767][1648981] Avg episode reward: [(0, '1070.530')] [2024-06-15 21:25:46,617][1651669] Updated weights for policy 0, policy_version 832648 (0.0012) [2024-06-15 21:25:47,669][1651669] Updated weights for policy 0, policy_version 832703 (0.0013) [2024-06-15 21:25:49,319][1651669] Updated weights for policy 0, policy_version 832762 (0.0022) [2024-06-15 21:25:50,210][1651274] Signal inference workers to stop experience collection... (43650 times) [2024-06-15 21:25:50,278][1651669] InferenceWorker_p0-w0: stopping experience collection (43650 times) [2024-06-15 21:25:50,469][1651274] Signal inference workers to resume experience collection... (43650 times) [2024-06-15 21:25:50,470][1651669] InferenceWorker_p0-w0: resuming experience collection (43650 times) [2024-06-15 21:25:50,778][1648981] Fps is (10 sec: 45820.3, 60 sec: 48627.9, 300 sec: 48539.1). Total num frames: 1705541632. Throughput: 0: 12178.5. Samples: 426462720. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:50,779][1648981] Avg episode reward: [(0, '1066.620')] [2024-06-15 21:25:52,007][1651669] Updated weights for policy 0, policy_version 832848 (0.0087) [2024-06-15 21:25:53,088][1651669] Updated weights for policy 0, policy_version 832895 (0.0010) [2024-06-15 21:25:55,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49165.1, 300 sec: 48207.9). Total num frames: 1705771008. Throughput: 0: 12140.3. Samples: 426490368. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:25:55,767][1648981] Avg episode reward: [(0, '1043.330')] [2024-06-15 21:25:57,971][1651669] Updated weights for policy 0, policy_version 832949 (0.0036) [2024-06-15 21:26:00,571][1651669] Updated weights for policy 0, policy_version 832996 (0.0012) [2024-06-15 21:26:00,766][1648981] Fps is (10 sec: 45930.1, 60 sec: 47513.7, 300 sec: 48318.9). Total num frames: 1706000384. Throughput: 0: 12070.1. Samples: 426572288. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:00,767][1648981] Avg episode reward: [(0, '1061.350')] [2024-06-15 21:26:01,630][1651669] Updated weights for policy 0, policy_version 833040 (0.0075) [2024-06-15 21:26:03,120][1651669] Updated weights for policy 0, policy_version 833092 (0.0104) [2024-06-15 21:26:04,483][1651669] Updated weights for policy 0, policy_version 833152 (0.0024) [2024-06-15 21:26:05,770][1648981] Fps is (10 sec: 52408.7, 60 sec: 50264.3, 300 sec: 48540.5). Total num frames: 1706295296. Throughput: 0: 12139.2. Samples: 426636288. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:05,771][1648981] Avg episode reward: [(0, '1091.220')] [2024-06-15 21:26:09,009][1651669] Updated weights for policy 0, policy_version 833210 (0.0018) [2024-06-15 21:26:10,767][1648981] Fps is (10 sec: 42597.3, 60 sec: 45875.7, 300 sec: 47992.1). Total num frames: 1706426368. Throughput: 0: 11969.4. Samples: 426675200. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:10,768][1648981] Avg episode reward: [(0, '1061.510')] [2024-06-15 21:26:12,429][1651669] Updated weights for policy 0, policy_version 833264 (0.0156) [2024-06-15 21:26:14,147][1651669] Updated weights for policy 0, policy_version 833329 (0.0105) [2024-06-15 21:26:15,766][1648981] Fps is (10 sec: 45892.7, 60 sec: 49181.5, 300 sec: 48652.1). Total num frames: 1706754048. Throughput: 0: 12083.2. Samples: 426744320. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:15,767][1648981] Avg episode reward: [(0, '1061.870')] [2024-06-15 21:26:15,920][1651669] Updated weights for policy 0, policy_version 833396 (0.0082) [2024-06-15 21:26:19,241][1651669] Updated weights for policy 0, policy_version 833430 (0.0022) [2024-06-15 21:26:20,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 46967.7, 300 sec: 48096.8). Total num frames: 1706950656. Throughput: 0: 11810.1. Samples: 426814976. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:20,767][1648981] Avg episode reward: [(0, '1077.400')] [2024-06-15 21:26:22,910][1651669] Updated weights for policy 0, policy_version 833488 (0.0011) [2024-06-15 21:26:24,397][1651669] Updated weights for policy 0, policy_version 833552 (0.0011) [2024-06-15 21:26:25,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 49698.2, 300 sec: 48763.2). Total num frames: 1707212800. Throughput: 0: 12071.8. Samples: 426857472. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:25,767][1648981] Avg episode reward: [(0, '1081.860')] [2024-06-15 21:26:25,778][1651669] Updated weights for policy 0, policy_version 833616 (0.0012) [2024-06-15 21:26:29,874][1651669] Updated weights for policy 0, policy_version 833667 (0.0014) [2024-06-15 21:26:30,663][1651274] Signal inference workers to stop experience collection... (43700 times) [2024-06-15 21:26:30,701][1651669] InferenceWorker_p0-w0: stopping experience collection (43700 times) [2024-06-15 21:26:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46421.3, 300 sec: 48207.8). Total num frames: 1707409408. Throughput: 0: 11764.6. Samples: 426922496. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:30,767][1648981] Avg episode reward: [(0, '1070.590')] [2024-06-15 21:26:30,845][1651274] Signal inference workers to resume experience collection... (43700 times) [2024-06-15 21:26:30,846][1651669] InferenceWorker_p0-w0: resuming experience collection (43700 times) [2024-06-15 21:26:31,090][1651669] Updated weights for policy 0, policy_version 833728 (0.0181) [2024-06-15 21:26:35,456][1651669] Updated weights for policy 0, policy_version 833792 (0.0044) [2024-06-15 21:26:35,770][1648981] Fps is (10 sec: 42581.0, 60 sec: 48602.6, 300 sec: 48540.4). Total num frames: 1707638784. Throughput: 0: 11778.1. Samples: 426992640. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:35,771][1648981] Avg episode reward: [(0, '1043.050')] [2024-06-15 21:26:37,495][1651669] Updated weights for policy 0, policy_version 833872 (0.0011) [2024-06-15 21:26:40,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46421.3, 300 sec: 48207.8). Total num frames: 1707868160. Throughput: 0: 11764.6. Samples: 427019776. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:40,767][1648981] Avg episode reward: [(0, '1020.790')] [2024-06-15 21:26:42,046][1651669] Updated weights for policy 0, policy_version 833922 (0.0016) [2024-06-15 21:26:43,191][1651669] Updated weights for policy 0, policy_version 833976 (0.0013) [2024-06-15 21:26:45,767][1648981] Fps is (10 sec: 39336.6, 60 sec: 46421.2, 300 sec: 48100.6). Total num frames: 1708032000. Throughput: 0: 11684.9. Samples: 427098112. Policy #0 lag: (min: 72.0, avg: 228.3, max: 367.0) [2024-06-15 21:26:45,767][1648981] Avg episode reward: [(0, '1017.980')] [2024-06-15 21:26:46,379][1651669] Updated weights for policy 0, policy_version 834033 (0.0011) [2024-06-15 21:26:47,985][1651669] Updated weights for policy 0, policy_version 834096 (0.0059) [2024-06-15 21:26:49,580][1651669] Updated weights for policy 0, policy_version 834175 (0.0013) [2024-06-15 21:26:50,768][1648981] Fps is (10 sec: 52418.0, 60 sec: 47521.5, 300 sec: 48651.8). Total num frames: 1708392448. Throughput: 0: 11594.4. Samples: 427158016. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:26:50,769][1648981] Avg episode reward: [(0, '1038.230')] [2024-06-15 21:26:54,416][1651669] Updated weights for policy 0, policy_version 834224 (0.0011) [2024-06-15 21:26:55,767][1648981] Fps is (10 sec: 49151.8, 60 sec: 45875.0, 300 sec: 47985.6). Total num frames: 1708523520. Throughput: 0: 11673.6. Samples: 427200512. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:26:55,767][1648981] Avg episode reward: [(0, '1013.820')] [2024-06-15 21:26:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000834240_1708523520.pth... [2024-06-15 21:26:55,828][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000828608_1696989184.pth [2024-06-15 21:26:56,849][1651669] Updated weights for policy 0, policy_version 834258 (0.0013) [2024-06-15 21:26:58,548][1651669] Updated weights for policy 0, policy_version 834336 (0.0014) [2024-06-15 21:26:59,985][1651669] Updated weights for policy 0, policy_version 834391 (0.0013) [2024-06-15 21:27:00,779][1648981] Fps is (10 sec: 49101.2, 60 sec: 48049.8, 300 sec: 48761.2). Total num frames: 1708883968. Throughput: 0: 11499.8. Samples: 427261952. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:00,779][1648981] Avg episode reward: [(0, '989.400')] [2024-06-15 21:27:00,893][1651669] Updated weights for policy 0, policy_version 834432 (0.0034) [2024-06-15 21:27:05,650][1651669] Updated weights for policy 0, policy_version 834495 (0.0013) [2024-06-15 21:27:05,770][1648981] Fps is (10 sec: 52410.3, 60 sec: 45875.2, 300 sec: 48207.2). Total num frames: 1709047808. Throughput: 0: 11718.1. Samples: 427342336. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:05,771][1648981] Avg episode reward: [(0, '984.970')] [2024-06-15 21:27:08,160][1651669] Updated weights for policy 0, policy_version 834535 (0.0013) [2024-06-15 21:27:09,552][1651669] Updated weights for policy 0, policy_version 834600 (0.0013) [2024-06-15 21:27:09,787][1651274] Signal inference workers to stop experience collection... (43750 times) [2024-06-15 21:27:09,862][1651669] InferenceWorker_p0-w0: stopping experience collection (43750 times) [2024-06-15 21:27:10,021][1651274] Signal inference workers to resume experience collection... (43750 times) [2024-06-15 21:27:10,022][1651669] InferenceWorker_p0-w0: resuming experience collection (43750 times) [2024-06-15 21:27:10,767][1648981] Fps is (10 sec: 45931.5, 60 sec: 48606.0, 300 sec: 48541.1). Total num frames: 1709342720. Throughput: 0: 11559.8. Samples: 427377664. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:10,767][1648981] Avg episode reward: [(0, '941.740')] [2024-06-15 21:27:10,923][1651669] Updated weights for policy 0, policy_version 834656 (0.0010) [2024-06-15 21:27:15,273][1651669] Updated weights for policy 0, policy_version 834690 (0.0011) [2024-06-15 21:27:15,766][1648981] Fps is (10 sec: 42614.7, 60 sec: 45329.1, 300 sec: 48207.8). Total num frames: 1709473792. Throughput: 0: 11696.4. Samples: 427448832. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:15,767][1648981] Avg episode reward: [(0, '895.120')] [2024-06-15 21:27:16,678][1651669] Updated weights for policy 0, policy_version 834747 (0.0012) [2024-06-15 21:27:19,099][1651669] Updated weights for policy 0, policy_version 834807 (0.0025) [2024-06-15 21:27:20,661][1651669] Updated weights for policy 0, policy_version 834864 (0.0011) [2024-06-15 21:27:20,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 47513.6, 300 sec: 48652.2). Total num frames: 1709801472. Throughput: 0: 11560.9. Samples: 427512832. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:20,767][1648981] Avg episode reward: [(0, '855.150')] [2024-06-15 21:27:21,553][1651669] Updated weights for policy 0, policy_version 834897 (0.0017) [2024-06-15 21:27:25,777][1648981] Fps is (10 sec: 49101.5, 60 sec: 45867.3, 300 sec: 48098.3). Total num frames: 1709965312. Throughput: 0: 11818.8. Samples: 427551744. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:25,777][1648981] Avg episode reward: [(0, '840.600')] [2024-06-15 21:27:26,335][1651669] Updated weights for policy 0, policy_version 834946 (0.0011) [2024-06-15 21:27:27,561][1651669] Updated weights for policy 0, policy_version 835002 (0.0153) [2024-06-15 21:27:30,644][1651669] Updated weights for policy 0, policy_version 835088 (0.0012) [2024-06-15 21:27:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.6, 300 sec: 48541.1). Total num frames: 1710260224. Throughput: 0: 11753.3. Samples: 427627008. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:30,767][1648981] Avg episode reward: [(0, '827.640')] [2024-06-15 21:27:31,521][1651669] Updated weights for policy 0, policy_version 835134 (0.0158) [2024-06-15 21:27:32,471][1651669] Updated weights for policy 0, policy_version 835171 (0.0013) [2024-06-15 21:27:35,778][1648981] Fps is (10 sec: 52420.9, 60 sec: 47507.5, 300 sec: 48317.0). Total num frames: 1710489600. Throughput: 0: 12137.5. Samples: 427704320. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:35,779][1648981] Avg episode reward: [(0, '831.090')] [2024-06-15 21:27:37,278][1651669] Updated weights for policy 0, policy_version 835217 (0.0021) [2024-06-15 21:27:38,927][1651669] Updated weights for policy 0, policy_version 835266 (0.0014) [2024-06-15 21:27:40,765][1651669] Updated weights for policy 0, policy_version 835331 (0.0033) [2024-06-15 21:27:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1710751744. Throughput: 0: 12140.2. Samples: 427746816. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:40,767][1648981] Avg episode reward: [(0, '865.970')] [2024-06-15 21:27:42,619][1651669] Updated weights for policy 0, policy_version 835408 (0.0012) [2024-06-15 21:27:43,639][1651669] Updated weights for policy 0, policy_version 835455 (0.0012) [2024-06-15 21:27:45,766][1648981] Fps is (10 sec: 52490.7, 60 sec: 49698.4, 300 sec: 48430.6). Total num frames: 1711013888. Throughput: 0: 12109.3. Samples: 427806720. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:45,767][1648981] Avg episode reward: [(0, '826.950')] [2024-06-15 21:27:49,344][1651669] Updated weights for policy 0, policy_version 835517 (0.0015) [2024-06-15 21:27:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46969.1, 300 sec: 48207.9). Total num frames: 1711210496. Throughput: 0: 12107.0. Samples: 427887104. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:50,767][1648981] Avg episode reward: [(0, '888.530')] [2024-06-15 21:27:51,261][1651669] Updated weights for policy 0, policy_version 835575 (0.0027) [2024-06-15 21:27:52,350][1651274] Signal inference workers to stop experience collection... (43800 times) [2024-06-15 21:27:52,456][1651669] InferenceWorker_p0-w0: stopping experience collection (43800 times) [2024-06-15 21:27:52,662][1651274] Signal inference workers to resume experience collection... (43800 times) [2024-06-15 21:27:52,662][1651669] InferenceWorker_p0-w0: resuming experience collection (43800 times) [2024-06-15 21:27:52,664][1651669] Updated weights for policy 0, policy_version 835616 (0.0010) [2024-06-15 21:27:53,637][1651669] Updated weights for policy 0, policy_version 835658 (0.0018) [2024-06-15 21:27:55,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 50244.4, 300 sec: 48430.0). Total num frames: 1711538176. Throughput: 0: 12003.6. Samples: 427917824. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:27:55,767][1648981] Avg episode reward: [(0, '890.290')] [2024-06-15 21:27:58,858][1651669] Updated weights for policy 0, policy_version 835717 (0.0012) [2024-06-15 21:28:00,619][1651669] Updated weights for policy 0, policy_version 835792 (0.0096) [2024-06-15 21:28:00,767][1648981] Fps is (10 sec: 49151.5, 60 sec: 46977.1, 300 sec: 48096.8). Total num frames: 1711702016. Throughput: 0: 12197.0. Samples: 427997696. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:00,767][1648981] Avg episode reward: [(0, '877.210')] [2024-06-15 21:28:01,886][1651669] Updated weights for policy 0, policy_version 835840 (0.0021) [2024-06-15 21:28:03,655][1651669] Updated weights for policy 0, policy_version 835890 (0.0012) [2024-06-15 21:28:05,146][1651669] Updated weights for policy 0, policy_version 835961 (0.0011) [2024-06-15 21:28:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50247.4, 300 sec: 48652.2). Total num frames: 1712062464. Throughput: 0: 12231.1. Samples: 428063232. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:05,767][1648981] Avg episode reward: [(0, '859.220')] [2024-06-15 21:28:10,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46421.4, 300 sec: 47763.5). Total num frames: 1712128000. Throughput: 0: 12290.8. Samples: 428104704. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:10,767][1648981] Avg episode reward: [(0, '869.520')] [2024-06-15 21:28:11,285][1651669] Updated weights for policy 0, policy_version 836026 (0.0013) [2024-06-15 21:28:12,857][1651669] Updated weights for policy 0, policy_version 836091 (0.0013) [2024-06-15 21:28:14,088][1651669] Updated weights for policy 0, policy_version 836130 (0.0013) [2024-06-15 21:28:15,359][1651669] Updated weights for policy 0, policy_version 836192 (0.0010) [2024-06-15 21:28:15,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 50790.4, 300 sec: 48652.6). Total num frames: 1712521216. Throughput: 0: 12162.9. Samples: 428174336. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:15,767][1648981] Avg episode reward: [(0, '917.850')] [2024-06-15 21:28:20,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 46421.3, 300 sec: 47874.6). Total num frames: 1712586752. Throughput: 0: 12291.2. Samples: 428257280. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:20,767][1648981] Avg episode reward: [(0, '887.530')] [2024-06-15 21:28:21,847][1651669] Updated weights for policy 0, policy_version 836256 (0.0014) [2024-06-15 21:28:23,880][1651669] Updated weights for policy 0, policy_version 836336 (0.0077) [2024-06-15 21:28:25,405][1651669] Updated weights for policy 0, policy_version 836401 (0.0011) [2024-06-15 21:28:25,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 50252.6, 300 sec: 48430.0). Total num frames: 1712979968. Throughput: 0: 11923.8. Samples: 428283392. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:25,767][1648981] Avg episode reward: [(0, '862.870')] [2024-06-15 21:28:26,472][1651669] Updated weights for policy 0, policy_version 836452 (0.0013) [2024-06-15 21:28:30,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1713111040. Throughput: 0: 12276.6. Samples: 428359168. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:30,767][1648981] Avg episode reward: [(0, '861.050')] [2024-06-15 21:28:33,215][1651274] Signal inference workers to stop experience collection... (43850 times) [2024-06-15 21:28:33,254][1651669] InferenceWorker_p0-w0: stopping experience collection (43850 times) [2024-06-15 21:28:33,274][1651669] Updated weights for policy 0, policy_version 836517 (0.0013) [2024-06-15 21:28:33,464][1651274] Signal inference workers to resume experience collection... (43850 times) [2024-06-15 21:28:33,466][1651669] InferenceWorker_p0-w0: resuming experience collection (43850 times) [2024-06-15 21:28:35,519][1651669] Updated weights for policy 0, policy_version 836613 (0.0111) [2024-06-15 21:28:35,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 48615.4, 300 sec: 48318.9). Total num frames: 1713405952. Throughput: 0: 11867.0. Samples: 428421120. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:35,767][1648981] Avg episode reward: [(0, '919.300')] [2024-06-15 21:28:37,023][1651669] Updated weights for policy 0, policy_version 836688 (0.0015) [2024-06-15 21:28:37,988][1651669] Updated weights for policy 0, policy_version 836736 (0.0010) [2024-06-15 21:28:40,768][1648981] Fps is (10 sec: 52418.2, 60 sec: 48058.1, 300 sec: 47985.3). Total num frames: 1713635328. Throughput: 0: 12003.0. Samples: 428457984. Policy #0 lag: (min: 125.0, avg: 226.3, max: 399.0) [2024-06-15 21:28:40,769][1648981] Avg episode reward: [(0, '939.060')] [2024-06-15 21:28:44,181][1651669] Updated weights for policy 0, policy_version 836800 (0.0011) [2024-06-15 21:28:45,814][1648981] Fps is (10 sec: 45656.6, 60 sec: 47475.7, 300 sec: 48311.1). Total num frames: 1713864704. Throughput: 0: 12115.8. Samples: 428543488. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:28:45,815][1648981] Avg episode reward: [(0, '953.080')] [2024-06-15 21:28:46,104][1651669] Updated weights for policy 0, policy_version 836864 (0.0114) [2024-06-15 21:28:47,168][1651669] Updated weights for policy 0, policy_version 836914 (0.0012) [2024-06-15 21:28:48,418][1651669] Updated weights for policy 0, policy_version 836976 (0.0012) [2024-06-15 21:28:50,769][1648981] Fps is (10 sec: 52425.2, 60 sec: 49149.7, 300 sec: 47985.7). Total num frames: 1714159616. Throughput: 0: 12241.8. Samples: 428614144. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:28:50,770][1648981] Avg episode reward: [(0, '935.790')] [2024-06-15 21:28:54,382][1651669] Updated weights for policy 0, policy_version 837044 (0.0103) [2024-06-15 21:28:55,542][1651669] Updated weights for policy 0, policy_version 837092 (0.0013) [2024-06-15 21:28:55,766][1648981] Fps is (10 sec: 52681.2, 60 sec: 47513.7, 300 sec: 48318.9). Total num frames: 1714388992. Throughput: 0: 12379.0. Samples: 428661760. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:28:55,767][1648981] Avg episode reward: [(0, '907.890')] [2024-06-15 21:28:56,085][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000837120_1714421760.pth... [2024-06-15 21:28:56,256][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000831472_1702854656.pth [2024-06-15 21:28:57,099][1651669] Updated weights for policy 0, policy_version 837158 (0.0011) [2024-06-15 21:28:58,772][1651669] Updated weights for policy 0, policy_version 837244 (0.0011) [2024-06-15 21:29:00,766][1648981] Fps is (10 sec: 52443.2, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1714683904. Throughput: 0: 12105.9. Samples: 428719104. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:00,767][1648981] Avg episode reward: [(0, '930.770')] [2024-06-15 21:29:04,978][1651669] Updated weights for policy 0, policy_version 837297 (0.0010) [2024-06-15 21:29:05,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 1714847744. Throughput: 0: 12265.3. Samples: 428809216. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:05,767][1648981] Avg episode reward: [(0, '947.570')] [2024-06-15 21:29:06,612][1651669] Updated weights for policy 0, policy_version 837371 (0.0011) [2024-06-15 21:29:07,271][1651274] Signal inference workers to stop experience collection... (43900 times) [2024-06-15 21:29:07,321][1651669] InferenceWorker_p0-w0: stopping experience collection (43900 times) [2024-06-15 21:29:07,452][1651274] Signal inference workers to resume experience collection... (43900 times) [2024-06-15 21:29:07,454][1651669] InferenceWorker_p0-w0: resuming experience collection (43900 times) [2024-06-15 21:29:08,149][1651669] Updated weights for policy 0, policy_version 837440 (0.0212) [2024-06-15 21:29:09,403][1651669] Updated weights for policy 0, policy_version 837494 (0.0106) [2024-06-15 21:29:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 51336.5, 300 sec: 48207.8). Total num frames: 1715208192. Throughput: 0: 12197.0. Samples: 428832256. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:10,767][1648981] Avg episode reward: [(0, '935.580')] [2024-06-15 21:29:15,471][1651669] Updated weights for policy 0, policy_version 837536 (0.0026) [2024-06-15 21:29:15,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 45875.1, 300 sec: 47763.5). Total num frames: 1715273728. Throughput: 0: 12276.6. Samples: 428911616. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:15,767][1648981] Avg episode reward: [(0, '938.400')] [2024-06-15 21:29:17,074][1651669] Updated weights for policy 0, policy_version 837602 (0.0011) [2024-06-15 21:29:18,090][1651669] Updated weights for policy 0, policy_version 837633 (0.0010) [2024-06-15 21:29:19,856][1651669] Updated weights for policy 0, policy_version 837712 (0.0011) [2024-06-15 21:29:20,703][1651669] Updated weights for policy 0, policy_version 837753 (0.0011) [2024-06-15 21:29:20,768][1648981] Fps is (10 sec: 49149.6, 60 sec: 51882.2, 300 sec: 48318.8). Total num frames: 1715699712. Throughput: 0: 12401.6. Samples: 428979200. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:20,771][1648981] Avg episode reward: [(0, '965.360')] [2024-06-15 21:29:25,350][1651669] Updated weights for policy 0, policy_version 837784 (0.0011) [2024-06-15 21:29:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.6, 300 sec: 47874.6). Total num frames: 1715798016. Throughput: 0: 12504.7. Samples: 429020672. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:25,767][1648981] Avg episode reward: [(0, '927.550')] [2024-06-15 21:29:27,749][1651669] Updated weights for policy 0, policy_version 837884 (0.0013) [2024-06-15 21:29:30,155][1651669] Updated weights for policy 0, policy_version 837936 (0.0010) [2024-06-15 21:29:30,766][1648981] Fps is (10 sec: 42600.8, 60 sec: 50244.3, 300 sec: 48211.9). Total num frames: 1716125696. Throughput: 0: 12175.8. Samples: 429090816. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:30,767][1648981] Avg episode reward: [(0, '914.110')] [2024-06-15 21:29:31,738][1651669] Updated weights for policy 0, policy_version 838000 (0.0146) [2024-06-15 21:29:35,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 47513.4, 300 sec: 47985.6). Total num frames: 1716256768. Throughput: 0: 12209.0. Samples: 429163520. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:35,767][1648981] Avg episode reward: [(0, '883.910')] [2024-06-15 21:29:36,814][1651669] Updated weights for policy 0, policy_version 838064 (0.0011) [2024-06-15 21:29:38,641][1651669] Updated weights for policy 0, policy_version 838136 (0.0014) [2024-06-15 21:29:40,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48607.5, 300 sec: 48207.8). Total num frames: 1716551680. Throughput: 0: 11707.7. Samples: 429188608. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:40,767][1648981] Avg episode reward: [(0, '962.550')] [2024-06-15 21:29:41,773][1651669] Updated weights for policy 0, policy_version 838208 (0.0012) [2024-06-15 21:29:43,022][1651669] Updated weights for policy 0, policy_version 838272 (0.0021) [2024-06-15 21:29:45,774][1648981] Fps is (10 sec: 52389.2, 60 sec: 48638.3, 300 sec: 47990.8). Total num frames: 1716781056. Throughput: 0: 12206.2. Samples: 429268480. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:45,775][1648981] Avg episode reward: [(0, '988.070')] [2024-06-15 21:29:47,449][1651274] Signal inference workers to stop experience collection... (43950 times) [2024-06-15 21:29:47,512][1651669] InferenceWorker_p0-w0: stopping experience collection (43950 times) [2024-06-15 21:29:47,761][1651274] Signal inference workers to resume experience collection... (43950 times) [2024-06-15 21:29:47,762][1651669] InferenceWorker_p0-w0: resuming experience collection (43950 times) [2024-06-15 21:29:48,463][1651669] Updated weights for policy 0, policy_version 838340 (0.0014) [2024-06-15 21:29:50,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48061.9, 300 sec: 48210.5). Total num frames: 1717043200. Throughput: 0: 11639.5. Samples: 429332992. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:50,767][1648981] Avg episode reward: [(0, '1025.150')] [2024-06-15 21:29:51,632][1651669] Updated weights for policy 0, policy_version 838403 (0.0190) [2024-06-15 21:29:53,043][1651669] Updated weights for policy 0, policy_version 838480 (0.0012) [2024-06-15 21:29:55,766][1648981] Fps is (10 sec: 52470.2, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1717305344. Throughput: 0: 11992.2. Samples: 429371904. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:29:55,767][1648981] Avg episode reward: [(0, '1005.670')] [2024-06-15 21:29:57,340][1651669] Updated weights for policy 0, policy_version 838533 (0.0012) [2024-06-15 21:29:58,760][1651669] Updated weights for policy 0, policy_version 838592 (0.0107) [2024-06-15 21:30:00,343][1651669] Updated weights for policy 0, policy_version 838649 (0.0013) [2024-06-15 21:30:00,774][1648981] Fps is (10 sec: 52388.1, 60 sec: 48053.5, 300 sec: 48433.3). Total num frames: 1717567488. Throughput: 0: 11921.9. Samples: 429448192. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:00,775][1648981] Avg episode reward: [(0, '1055.160')] [2024-06-15 21:30:04,083][1651669] Updated weights for policy 0, policy_version 838720 (0.0119) [2024-06-15 21:30:05,576][1651669] Updated weights for policy 0, policy_version 838777 (0.0160) [2024-06-15 21:30:05,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49698.1, 300 sec: 47985.8). Total num frames: 1717829632. Throughput: 0: 11855.8. Samples: 429512704. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:05,767][1648981] Avg episode reward: [(0, '1060.030')] [2024-06-15 21:30:09,497][1651669] Updated weights for policy 0, policy_version 838849 (0.0010) [2024-06-15 21:30:10,753][1651669] Updated weights for policy 0, policy_version 838901 (0.0011) [2024-06-15 21:30:10,779][1648981] Fps is (10 sec: 49130.1, 60 sec: 47504.0, 300 sec: 48322.8). Total num frames: 1718059008. Throughput: 0: 11875.2. Samples: 429555200. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:10,779][1648981] Avg episode reward: [(0, '1074.440')] [2024-06-15 21:30:14,162][1651669] Updated weights for policy 0, policy_version 838944 (0.0039) [2024-06-15 21:30:15,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 49698.3, 300 sec: 47874.7). Total num frames: 1718255616. Throughput: 0: 11867.0. Samples: 429624832. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:15,767][1648981] Avg episode reward: [(0, '1015.230')] [2024-06-15 21:30:15,811][1651669] Updated weights for policy 0, policy_version 838993 (0.0013) [2024-06-15 21:30:16,711][1651669] Updated weights for policy 0, policy_version 839040 (0.0017) [2024-06-15 21:30:19,630][1651669] Updated weights for policy 0, policy_version 839093 (0.0034) [2024-06-15 21:30:20,766][1648981] Fps is (10 sec: 45931.4, 60 sec: 46967.9, 300 sec: 48430.0). Total num frames: 1718517760. Throughput: 0: 11889.9. Samples: 429698560. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:20,767][1648981] Avg episode reward: [(0, '1030.130')] [2024-06-15 21:30:21,052][1651669] Updated weights for policy 0, policy_version 839140 (0.0106) [2024-06-15 21:30:25,267][1651669] Updated weights for policy 0, policy_version 839200 (0.0132) [2024-06-15 21:30:25,767][1648981] Fps is (10 sec: 45874.2, 60 sec: 48605.8, 300 sec: 47763.5). Total num frames: 1718714368. Throughput: 0: 12196.9. Samples: 429737472. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:25,768][1648981] Avg episode reward: [(0, '1010.340')] [2024-06-15 21:30:26,716][1651669] Updated weights for policy 0, policy_version 839248 (0.0013) [2024-06-15 21:30:29,638][1651274] Signal inference workers to stop experience collection... (44000 times) [2024-06-15 21:30:29,666][1651669] InferenceWorker_p0-w0: stopping experience collection (44000 times) [2024-06-15 21:30:29,878][1651274] Signal inference workers to resume experience collection... (44000 times) [2024-06-15 21:30:29,879][1651669] InferenceWorker_p0-w0: resuming experience collection (44000 times) [2024-06-15 21:30:30,285][1651669] Updated weights for policy 0, policy_version 839328 (0.0011) [2024-06-15 21:30:30,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1718976512. Throughput: 0: 11982.9. Samples: 429807616. Policy #0 lag: (min: 10.0, avg: 65.1, max: 266.0) [2024-06-15 21:30:30,767][1648981] Avg episode reward: [(0, '1006.680')] [2024-06-15 21:30:31,695][1651669] Updated weights for policy 0, policy_version 839379 (0.0012) [2024-06-15 21:30:32,517][1651669] Updated weights for policy 0, policy_version 839417 (0.0010) [2024-06-15 21:30:35,770][1648981] Fps is (10 sec: 45858.9, 60 sec: 48603.1, 300 sec: 47762.9). Total num frames: 1719173120. Throughput: 0: 12184.6. Samples: 429881344. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:30:35,771][1648981] Avg episode reward: [(0, '1007.420')] [2024-06-15 21:30:36,219][1651669] Updated weights for policy 0, policy_version 839459 (0.0015) [2024-06-15 21:30:37,086][1651669] Updated weights for policy 0, policy_version 839493 (0.0012) [2024-06-15 21:30:40,752][1651669] Updated weights for policy 0, policy_version 839554 (0.0012) [2024-06-15 21:30:40,767][1648981] Fps is (10 sec: 42597.8, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 1719402496. Throughput: 0: 11992.2. Samples: 429911552. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:30:40,769][1648981] Avg episode reward: [(0, '1058.810')] [2024-06-15 21:30:42,575][1651669] Updated weights for policy 0, policy_version 839632 (0.0119) [2024-06-15 21:30:45,766][1648981] Fps is (10 sec: 49170.8, 60 sec: 48066.1, 300 sec: 47876.6). Total num frames: 1719664640. Throughput: 0: 11869.1. Samples: 429982208. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:30:45,767][1648981] Avg episode reward: [(0, '1065.510')] [2024-06-15 21:30:47,266][1651669] Updated weights for policy 0, policy_version 839696 (0.0013) [2024-06-15 21:30:48,836][1651669] Updated weights for policy 0, policy_version 839760 (0.0124) [2024-06-15 21:30:50,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1719926784. Throughput: 0: 12049.1. Samples: 430054912. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:30:50,767][1648981] Avg episode reward: [(0, '1037.000')] [2024-06-15 21:30:52,250][1651669] Updated weights for policy 0, policy_version 839810 (0.0067) [2024-06-15 21:30:54,400][1651669] Updated weights for policy 0, policy_version 839893 (0.0108) [2024-06-15 21:30:55,767][1648981] Fps is (10 sec: 52422.5, 60 sec: 48058.8, 300 sec: 48096.6). Total num frames: 1720188928. Throughput: 0: 11756.1. Samples: 430084096. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:30:55,768][1648981] Avg episode reward: [(0, '1028.550')] [2024-06-15 21:30:55,781][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000839936_1720188928.pth... [2024-06-15 21:30:55,840][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000834240_1708523520.pth [2024-06-15 21:30:55,844][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000839936_1720188928.pth [2024-06-15 21:30:58,993][1651669] Updated weights for policy 0, policy_version 839952 (0.0012) [2024-06-15 21:31:00,767][1648981] Fps is (10 sec: 42596.3, 60 sec: 46427.0, 300 sec: 47653.0). Total num frames: 1720352768. Throughput: 0: 11821.4. Samples: 430156800. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:00,768][1648981] Avg episode reward: [(0, '1023.840')] [2024-06-15 21:31:00,905][1651669] Updated weights for policy 0, policy_version 840022 (0.0013) [2024-06-15 21:31:03,723][1651669] Updated weights for policy 0, policy_version 840066 (0.0013) [2024-06-15 21:31:05,738][1651669] Updated weights for policy 0, policy_version 840147 (0.0012) [2024-06-15 21:31:05,766][1648981] Fps is (10 sec: 42603.1, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 1720614912. Throughput: 0: 11605.3. Samples: 430220800. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:05,767][1648981] Avg episode reward: [(0, '1045.970')] [2024-06-15 21:31:10,236][1651669] Updated weights for policy 0, policy_version 840214 (0.0015) [2024-06-15 21:31:10,766][1648981] Fps is (10 sec: 42600.4, 60 sec: 45338.3, 300 sec: 47541.4). Total num frames: 1720778752. Throughput: 0: 11673.7. Samples: 430262784. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:10,767][1648981] Avg episode reward: [(0, '1063.730')] [2024-06-15 21:31:10,887][1651274] Signal inference workers to stop experience collection... (44050 times) [2024-06-15 21:31:10,962][1651669] InferenceWorker_p0-w0: stopping experience collection (44050 times) [2024-06-15 21:31:11,126][1651274] Signal inference workers to resume experience collection... (44050 times) [2024-06-15 21:31:11,138][1651669] InferenceWorker_p0-w0: resuming experience collection (44050 times) [2024-06-15 21:31:11,286][1651669] Updated weights for policy 0, policy_version 840257 (0.0012) [2024-06-15 21:31:12,709][1651669] Updated weights for policy 0, policy_version 840310 (0.0050) [2024-06-15 21:31:15,187][1651669] Updated weights for policy 0, policy_version 840355 (0.0028) [2024-06-15 21:31:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 1721073664. Throughput: 0: 11719.1. Samples: 430334976. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:15,767][1648981] Avg episode reward: [(0, '1062.050')] [2024-06-15 21:31:17,152][1651669] Updated weights for policy 0, policy_version 840432 (0.0131) [2024-06-15 21:31:20,773][1648981] Fps is (10 sec: 45845.1, 60 sec: 45324.1, 300 sec: 47540.3). Total num frames: 1721237504. Throughput: 0: 11741.1. Samples: 430409728. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:20,773][1648981] Avg episode reward: [(0, '1055.030')] [2024-06-15 21:31:22,308][1651669] Updated weights for policy 0, policy_version 840512 (0.0012) [2024-06-15 21:31:23,864][1651669] Updated weights for policy 0, policy_version 840569 (0.0010) [2024-06-15 21:31:25,770][1648981] Fps is (10 sec: 45857.7, 60 sec: 46964.6, 300 sec: 47874.0). Total num frames: 1721532416. Throughput: 0: 11706.8. Samples: 430438400. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:25,771][1648981] Avg episode reward: [(0, '1021.490')] [2024-06-15 21:31:26,140][1651669] Updated weights for policy 0, policy_version 840624 (0.0012) [2024-06-15 21:31:27,272][1651669] Updated weights for policy 0, policy_version 840659 (0.0012) [2024-06-15 21:31:30,766][1648981] Fps is (10 sec: 52463.1, 60 sec: 46421.3, 300 sec: 47875.3). Total num frames: 1721761792. Throughput: 0: 11616.7. Samples: 430504960. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:30,767][1648981] Avg episode reward: [(0, '1004.540')] [2024-06-15 21:31:32,377][1651669] Updated weights for policy 0, policy_version 840720 (0.0026) [2024-06-15 21:31:34,050][1651669] Updated weights for policy 0, policy_version 840786 (0.0012) [2024-06-15 21:31:35,203][1651669] Updated weights for policy 0, policy_version 840832 (0.0013) [2024-06-15 21:31:35,788][1648981] Fps is (10 sec: 49064.0, 60 sec: 47499.4, 300 sec: 47982.1). Total num frames: 1722023936. Throughput: 0: 11679.3. Samples: 430580736. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:35,789][1648981] Avg episode reward: [(0, '976.230')] [2024-06-15 21:31:37,285][1651669] Updated weights for policy 0, policy_version 840896 (0.0014) [2024-06-15 21:31:39,291][1651669] Updated weights for policy 0, policy_version 840960 (0.0014) [2024-06-15 21:31:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1722286080. Throughput: 0: 11730.8. Samples: 430611968. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:40,767][1648981] Avg episode reward: [(0, '976.230')] [2024-06-15 21:31:44,230][1651669] Updated weights for policy 0, policy_version 841017 (0.0013) [2024-06-15 21:31:45,231][1651669] Updated weights for policy 0, policy_version 841061 (0.0016) [2024-06-15 21:31:45,766][1648981] Fps is (10 sec: 49259.0, 60 sec: 47513.5, 300 sec: 47874.9). Total num frames: 1722515456. Throughput: 0: 11901.3. Samples: 430692352. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:45,767][1648981] Avg episode reward: [(0, '994.330')] [2024-06-15 21:31:47,621][1651669] Updated weights for policy 0, policy_version 841136 (0.0012) [2024-06-15 21:31:48,859][1651669] Updated weights for policy 0, policy_version 841168 (0.0011) [2024-06-15 21:31:50,061][1651669] Updated weights for policy 0, policy_version 841213 (0.0010) [2024-06-15 21:31:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1722810368. Throughput: 0: 11912.5. Samples: 430756864. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:50,767][1648981] Avg episode reward: [(0, '953.120')] [2024-06-15 21:31:53,935][1651274] Signal inference workers to stop experience collection... (44100 times) [2024-06-15 21:31:53,962][1651669] InferenceWorker_p0-w0: stopping experience collection (44100 times) [2024-06-15 21:31:54,184][1651274] Signal inference workers to resume experience collection... (44100 times) [2024-06-15 21:31:54,185][1651669] InferenceWorker_p0-w0: resuming experience collection (44100 times) [2024-06-15 21:31:55,591][1651669] Updated weights for policy 0, policy_version 841296 (0.0015) [2024-06-15 21:31:55,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46422.2, 300 sec: 47765.5). Total num frames: 1722974208. Throughput: 0: 11889.8. Samples: 430797824. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:31:55,767][1648981] Avg episode reward: [(0, '959.800')] [2024-06-15 21:31:57,211][1651669] Updated weights for policy 0, policy_version 841363 (0.0012) [2024-06-15 21:31:58,314][1651669] Updated weights for policy 0, policy_version 841406 (0.0013) [2024-06-15 21:32:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48606.2, 300 sec: 48208.5). Total num frames: 1723269120. Throughput: 0: 11992.2. Samples: 430874624. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:32:00,767][1648981] Avg episode reward: [(0, '953.280')] [2024-06-15 21:32:00,974][1651669] Updated weights for policy 0, policy_version 841461 (0.0011) [2024-06-15 21:32:05,505][1651669] Updated weights for policy 0, policy_version 841504 (0.0027) [2024-06-15 21:32:05,793][1648981] Fps is (10 sec: 42486.0, 60 sec: 46400.9, 300 sec: 47648.2). Total num frames: 1723400192. Throughput: 0: 11952.8. Samples: 430947840. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:32:05,793][1648981] Avg episode reward: [(0, '980.220')] [2024-06-15 21:32:06,935][1651669] Updated weights for policy 0, policy_version 841570 (0.0011) [2024-06-15 21:32:09,018][1651669] Updated weights for policy 0, policy_version 841650 (0.0011) [2024-06-15 21:32:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 1723760640. Throughput: 0: 11879.4. Samples: 430972928. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:32:10,767][1648981] Avg episode reward: [(0, '1001.690')] [2024-06-15 21:32:11,385][1651669] Updated weights for policy 0, policy_version 841697 (0.0011) [2024-06-15 21:32:15,766][1648981] Fps is (10 sec: 45996.7, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1723858944. Throughput: 0: 12208.4. Samples: 431054336. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:32:15,767][1648981] Avg episode reward: [(0, '992.870')] [2024-06-15 21:32:17,005][1651669] Updated weights for policy 0, policy_version 841744 (0.0012) [2024-06-15 21:32:18,548][1651669] Updated weights for policy 0, policy_version 841808 (0.0011) [2024-06-15 21:32:20,284][1651669] Updated weights for policy 0, policy_version 841877 (0.0012) [2024-06-15 21:32:20,770][1648981] Fps is (10 sec: 42582.5, 60 sec: 49154.3, 300 sec: 48208.9). Total num frames: 1724186624. Throughput: 0: 11985.6. Samples: 431119872. Policy #0 lag: (min: 70.0, avg: 203.8, max: 319.0) [2024-06-15 21:32:20,771][1648981] Avg episode reward: [(0, '969.370')] [2024-06-15 21:32:21,992][1651669] Updated weights for policy 0, policy_version 841938 (0.0010) [2024-06-15 21:32:23,110][1651669] Updated weights for policy 0, policy_version 841984 (0.0018) [2024-06-15 21:32:25,768][1648981] Fps is (10 sec: 52419.5, 60 sec: 47515.2, 300 sec: 47874.3). Total num frames: 1724383232. Throughput: 0: 11980.3. Samples: 431151104. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:25,769][1648981] Avg episode reward: [(0, '975.460')] [2024-06-15 21:32:29,598][1651669] Updated weights for policy 0, policy_version 842048 (0.0128) [2024-06-15 21:32:30,766][1648981] Fps is (10 sec: 42614.4, 60 sec: 47513.6, 300 sec: 47876.5). Total num frames: 1724612608. Throughput: 0: 11992.2. Samples: 431232000. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:30,767][1648981] Avg episode reward: [(0, '978.600')] [2024-06-15 21:32:31,074][1651669] Updated weights for policy 0, policy_version 842112 (0.0013) [2024-06-15 21:32:31,519][1651274] Signal inference workers to stop experience collection... (44150 times) [2024-06-15 21:32:31,562][1651669] InferenceWorker_p0-w0: stopping experience collection (44150 times) [2024-06-15 21:32:31,807][1651274] Signal inference workers to resume experience collection... (44150 times) [2024-06-15 21:32:31,808][1651669] InferenceWorker_p0-w0: resuming experience collection (44150 times) [2024-06-15 21:32:32,639][1651669] Updated weights for policy 0, policy_version 842176 (0.0011) [2024-06-15 21:32:35,766][1648981] Fps is (10 sec: 52438.0, 60 sec: 48077.1, 300 sec: 47985.7). Total num frames: 1724907520. Throughput: 0: 11889.8. Samples: 431291904. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:35,767][1648981] Avg episode reward: [(0, '910.020')] [2024-06-15 21:32:40,088][1651669] Updated weights for policy 0, policy_version 842242 (0.0012) [2024-06-15 21:32:40,767][1648981] Fps is (10 sec: 36043.8, 60 sec: 44782.8, 300 sec: 47319.2). Total num frames: 1724973056. Throughput: 0: 11878.3. Samples: 431332352. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:40,767][1648981] Avg episode reward: [(0, '919.140')] [2024-06-15 21:32:41,526][1651669] Updated weights for policy 0, policy_version 842304 (0.0095) [2024-06-15 21:32:43,331][1651669] Updated weights for policy 0, policy_version 842385 (0.0012) [2024-06-15 21:32:44,924][1651669] Updated weights for policy 0, policy_version 842464 (0.0112) [2024-06-15 21:32:45,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 48606.1, 300 sec: 48207.9). Total num frames: 1725431808. Throughput: 0: 11559.9. Samples: 431394816. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:45,766][1648981] Avg episode reward: [(0, '965.270')] [2024-06-15 21:32:45,773][1651669] Updated weights for policy 0, policy_version 842496 (0.0012) [2024-06-15 21:32:50,786][1648981] Fps is (10 sec: 45785.3, 60 sec: 43676.2, 300 sec: 47093.9). Total num frames: 1725431808. Throughput: 0: 11834.6. Samples: 431480320. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:50,787][1648981] Avg episode reward: [(0, '926.460')] [2024-06-15 21:32:52,185][1651669] Updated weights for policy 0, policy_version 842567 (0.0039) [2024-06-15 21:32:53,586][1651669] Updated weights for policy 0, policy_version 842624 (0.0049) [2024-06-15 21:32:54,731][1651669] Updated weights for policy 0, policy_version 842673 (0.0027) [2024-06-15 21:32:55,767][1648981] Fps is (10 sec: 45869.9, 60 sec: 48605.1, 300 sec: 48096.6). Total num frames: 1725890560. Throughput: 0: 11775.8. Samples: 431502848. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:32:55,768][1648981] Avg episode reward: [(0, '916.330')] [2024-06-15 21:32:56,166][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000842736_1725923328.pth... [2024-06-15 21:32:56,220][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000837120_1714421760.pth [2024-06-15 21:32:56,311][1651669] Updated weights for policy 0, policy_version 842737 (0.0012) [2024-06-15 21:33:00,752][1651669] Updated weights for policy 0, policy_version 842755 (0.0014) [2024-06-15 21:33:00,782][1648981] Fps is (10 sec: 52449.8, 60 sec: 44771.1, 300 sec: 47094.5). Total num frames: 1725956096. Throughput: 0: 12022.1. Samples: 431595520. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:00,783][1648981] Avg episode reward: [(0, '918.780')] [2024-06-15 21:33:02,626][1651669] Updated weights for policy 0, policy_version 842832 (0.0012) [2024-06-15 21:33:04,211][1651669] Updated weights for policy 0, policy_version 842900 (0.0013) [2024-06-15 21:33:05,665][1651669] Updated weights for policy 0, policy_version 842963 (0.0136) [2024-06-15 21:33:05,766][1648981] Fps is (10 sec: 49156.3, 60 sec: 49720.0, 300 sec: 48318.9). Total num frames: 1726382080. Throughput: 0: 11811.1. Samples: 431651328. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:05,767][1648981] Avg episode reward: [(0, '897.060')] [2024-06-15 21:33:05,982][1651274] Signal inference workers to stop experience collection... (44200 times) [2024-06-15 21:33:06,041][1651669] InferenceWorker_p0-w0: stopping experience collection (44200 times) [2024-06-15 21:33:06,300][1651274] Signal inference workers to resume experience collection... (44200 times) [2024-06-15 21:33:06,301][1651669] InferenceWorker_p0-w0: resuming experience collection (44200 times) [2024-06-15 21:33:10,767][1648981] Fps is (10 sec: 52511.8, 60 sec: 45329.0, 300 sec: 47319.2). Total num frames: 1726480384. Throughput: 0: 12004.0. Samples: 431691264. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:10,767][1648981] Avg episode reward: [(0, '877.390')] [2024-06-15 21:33:11,770][1651669] Updated weights for policy 0, policy_version 843024 (0.0011) [2024-06-15 21:33:13,170][1651669] Updated weights for policy 0, policy_version 843074 (0.0016) [2024-06-15 21:33:14,627][1651669] Updated weights for policy 0, policy_version 843139 (0.0011) [2024-06-15 21:33:15,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49698.2, 300 sec: 48318.9). Total num frames: 1726840832. Throughput: 0: 11935.3. Samples: 431769088. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:15,767][1648981] Avg episode reward: [(0, '904.480')] [2024-06-15 21:33:17,064][1651669] Updated weights for policy 0, policy_version 843248 (0.0015) [2024-06-15 21:33:20,770][1648981] Fps is (10 sec: 52413.0, 60 sec: 46968.0, 300 sec: 47540.9). Total num frames: 1727004672. Throughput: 0: 12139.3. Samples: 431838208. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:20,770][1648981] Avg episode reward: [(0, '866.520')] [2024-06-15 21:33:23,322][1651669] Updated weights for policy 0, policy_version 843296 (0.0011) [2024-06-15 21:33:25,380][1651669] Updated weights for policy 0, policy_version 843376 (0.0108) [2024-06-15 21:33:25,767][1648981] Fps is (10 sec: 39320.5, 60 sec: 47514.8, 300 sec: 47874.6). Total num frames: 1727234048. Throughput: 0: 12265.2. Samples: 431884288. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:25,767][1648981] Avg episode reward: [(0, '885.760')] [2024-06-15 21:33:26,653][1651669] Updated weights for policy 0, policy_version 843427 (0.0010) [2024-06-15 21:33:28,359][1651669] Updated weights for policy 0, policy_version 843515 (0.0012) [2024-06-15 21:33:30,766][1648981] Fps is (10 sec: 52444.9, 60 sec: 48605.8, 300 sec: 47874.6). Total num frames: 1727528960. Throughput: 0: 12185.5. Samples: 431943168. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:30,767][1648981] Avg episode reward: [(0, '907.980')] [2024-06-15 21:33:34,381][1651669] Updated weights for policy 0, policy_version 843553 (0.0013) [2024-06-15 21:33:35,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 46421.3, 300 sec: 47652.8). Total num frames: 1727692800. Throughput: 0: 12020.2. Samples: 432020992. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:35,767][1648981] Avg episode reward: [(0, '928.650')] [2024-06-15 21:33:36,077][1651669] Updated weights for policy 0, policy_version 843617 (0.0012) [2024-06-15 21:33:37,508][1651669] Updated weights for policy 0, policy_version 843680 (0.0011) [2024-06-15 21:33:39,263][1651669] Updated weights for policy 0, policy_version 843745 (0.0096) [2024-06-15 21:33:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 51336.7, 300 sec: 48104.6). Total num frames: 1728053248. Throughput: 0: 12185.8. Samples: 432051200. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:40,767][1648981] Avg episode reward: [(0, '889.700')] [2024-06-15 21:33:45,162][1651669] Updated weights for policy 0, policy_version 843795 (0.0012) [2024-06-15 21:33:45,767][1648981] Fps is (10 sec: 42596.4, 60 sec: 44782.4, 300 sec: 47319.6). Total num frames: 1728118784. Throughput: 0: 12007.6. Samples: 432135680. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:45,767][1648981] Avg episode reward: [(0, '875.580')] [2024-06-15 21:33:46,360][1651274] Signal inference workers to stop experience collection... (44250 times) [2024-06-15 21:33:46,442][1651669] InferenceWorker_p0-w0: stopping experience collection (44250 times) [2024-06-15 21:33:46,444][1651669] Updated weights for policy 0, policy_version 843844 (0.0014) [2024-06-15 21:33:46,666][1651274] Signal inference workers to resume experience collection... (44250 times) [2024-06-15 21:33:46,667][1651669] InferenceWorker_p0-w0: resuming experience collection (44250 times) [2024-06-15 21:33:47,809][1651669] Updated weights for policy 0, policy_version 843892 (0.0012) [2024-06-15 21:33:49,481][1651669] Updated weights for policy 0, policy_version 843968 (0.0013) [2024-06-15 21:33:50,635][1651669] Updated weights for policy 0, policy_version 844020 (0.0013) [2024-06-15 21:33:50,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 51899.9, 300 sec: 47985.7). Total num frames: 1728544768. Throughput: 0: 12083.2. Samples: 432195072. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:50,767][1648981] Avg episode reward: [(0, '851.930')] [2024-06-15 21:33:55,346][1651669] Updated weights for policy 0, policy_version 844051 (0.0011) [2024-06-15 21:33:55,767][1648981] Fps is (10 sec: 52431.0, 60 sec: 45875.8, 300 sec: 47319.2). Total num frames: 1728643072. Throughput: 0: 12253.9. Samples: 432242688. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:33:55,767][1648981] Avg episode reward: [(0, '831.110')] [2024-06-15 21:33:57,327][1651669] Updated weights for policy 0, policy_version 844128 (0.0011) [2024-06-15 21:33:58,962][1651669] Updated weights for policy 0, policy_version 844195 (0.0014) [2024-06-15 21:34:00,754][1651669] Updated weights for policy 0, policy_version 844281 (0.0016) [2024-06-15 21:34:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 51896.5, 300 sec: 48207.8). Total num frames: 1729069056. Throughput: 0: 11923.9. Samples: 432305664. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:34:00,767][1648981] Avg episode reward: [(0, '860.710')] [2024-06-15 21:34:05,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 45875.3, 300 sec: 47208.2). Total num frames: 1729134592. Throughput: 0: 12368.5. Samples: 432394752. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:34:05,767][1648981] Avg episode reward: [(0, '901.310')] [2024-06-15 21:34:06,284][1651669] Updated weights for policy 0, policy_version 844336 (0.0012) [2024-06-15 21:34:08,488][1651669] Updated weights for policy 0, policy_version 844400 (0.0012) [2024-06-15 21:34:10,291][1651669] Updated weights for policy 0, policy_version 844464 (0.0015) [2024-06-15 21:34:10,767][1648981] Fps is (10 sec: 39320.9, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 1729462272. Throughput: 0: 11912.6. Samples: 432420352. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:34:10,767][1648981] Avg episode reward: [(0, '940.280')] [2024-06-15 21:34:11,883][1651669] Updated weights for policy 0, policy_version 844515 (0.0014) [2024-06-15 21:34:15,814][1648981] Fps is (10 sec: 48918.2, 60 sec: 46384.4, 300 sec: 47200.6). Total num frames: 1729626112. Throughput: 0: 12093.1. Samples: 432487936. Policy #0 lag: (min: 79.0, avg: 222.0, max: 303.0) [2024-06-15 21:34:15,815][1648981] Avg episode reward: [(0, '857.760')] [2024-06-15 21:34:16,482][1651669] Updated weights for policy 0, policy_version 844561 (0.0016) [2024-06-15 21:34:17,519][1651669] Updated weights for policy 0, policy_version 844606 (0.0018) [2024-06-15 21:34:19,408][1651669] Updated weights for policy 0, policy_version 844660 (0.0013) [2024-06-15 21:34:20,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 49700.8, 300 sec: 48096.8). Total num frames: 1729986560. Throughput: 0: 11958.1. Samples: 432559104. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:20,767][1648981] Avg episode reward: [(0, '836.150')] [2024-06-15 21:34:20,841][1651274] Signal inference workers to stop experience collection... (44300 times) [2024-06-15 21:34:20,922][1651669] InferenceWorker_p0-w0: stopping experience collection (44300 times) [2024-06-15 21:34:21,110][1651274] Signal inference workers to resume experience collection... (44300 times) [2024-06-15 21:34:21,111][1651669] InferenceWorker_p0-w0: resuming experience collection (44300 times) [2024-06-15 21:34:21,310][1651669] Updated weights for policy 0, policy_version 844737 (0.0022) [2024-06-15 21:34:22,801][1651669] Updated weights for policy 0, policy_version 844789 (0.0025) [2024-06-15 21:34:25,773][1648981] Fps is (10 sec: 52647.7, 60 sec: 48601.1, 300 sec: 47540.4). Total num frames: 1730150400. Throughput: 0: 11979.2. Samples: 432590336. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:25,773][1648981] Avg episode reward: [(0, '844.020')] [2024-06-15 21:34:27,586][1651669] Updated weights for policy 0, policy_version 844833 (0.0010) [2024-06-15 21:34:28,259][1651669] Updated weights for policy 0, policy_version 844864 (0.0011) [2024-06-15 21:34:30,080][1651669] Updated weights for policy 0, policy_version 844928 (0.0039) [2024-06-15 21:34:30,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 1730445312. Throughput: 0: 12219.9. Samples: 432685568. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:30,767][1648981] Avg episode reward: [(0, '822.460')] [2024-06-15 21:34:32,341][1651669] Updated weights for policy 0, policy_version 845008 (0.0011) [2024-06-15 21:34:33,569][1651669] Updated weights for policy 0, policy_version 845056 (0.0013) [2024-06-15 21:34:35,767][1648981] Fps is (10 sec: 52460.4, 60 sec: 49698.1, 300 sec: 47874.6). Total num frames: 1730674688. Throughput: 0: 12299.3. Samples: 432748544. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:35,767][1648981] Avg episode reward: [(0, '866.220')] [2024-06-15 21:34:38,918][1651669] Updated weights for policy 0, policy_version 845120 (0.0011) [2024-06-15 21:34:40,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 47987.0). Total num frames: 1730936832. Throughput: 0: 12310.8. Samples: 432796672. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:40,767][1648981] Avg episode reward: [(0, '883.110')] [2024-06-15 21:34:40,968][1651669] Updated weights for policy 0, policy_version 845200 (0.0010) [2024-06-15 21:34:43,124][1651669] Updated weights for policy 0, policy_version 845280 (0.0011) [2024-06-15 21:34:45,786][1648981] Fps is (10 sec: 52325.4, 60 sec: 51320.0, 300 sec: 47982.4). Total num frames: 1731198976. Throughput: 0: 12180.2. Samples: 432854016. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:45,787][1648981] Avg episode reward: [(0, '879.790')] [2024-06-15 21:34:49,078][1651669] Updated weights for policy 0, policy_version 845334 (0.0012) [2024-06-15 21:34:49,800][1651669] Updated weights for policy 0, policy_version 845372 (0.0010) [2024-06-15 21:34:50,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 46967.3, 300 sec: 47652.4). Total num frames: 1731362816. Throughput: 0: 12049.0. Samples: 432936960. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:50,767][1648981] Avg episode reward: [(0, '926.370')] [2024-06-15 21:34:51,552][1651669] Updated weights for policy 0, policy_version 845424 (0.0010) [2024-06-15 21:34:53,056][1651669] Updated weights for policy 0, policy_version 845473 (0.0009) [2024-06-15 21:34:54,872][1651669] Updated weights for policy 0, policy_version 845557 (0.0011) [2024-06-15 21:34:55,767][1648981] Fps is (10 sec: 52531.8, 60 sec: 51336.4, 300 sec: 47986.9). Total num frames: 1731723264. Throughput: 0: 11992.1. Samples: 432960000. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:34:55,768][1648981] Avg episode reward: [(0, '934.540')] [2024-06-15 21:34:55,792][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000845568_1731723264.pth... [2024-06-15 21:34:55,834][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000839936_1720188928.pth [2024-06-15 21:34:59,276][1651669] Updated weights for policy 0, policy_version 845584 (0.0030) [2024-06-15 21:34:59,726][1651274] Signal inference workers to stop experience collection... (44350 times) [2024-06-15 21:34:59,757][1651669] InferenceWorker_p0-w0: stopping experience collection (44350 times) [2024-06-15 21:34:59,934][1651274] Signal inference workers to resume experience collection... (44350 times) [2024-06-15 21:34:59,935][1651669] InferenceWorker_p0-w0: resuming experience collection (44350 times) [2024-06-15 21:35:00,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1731854336. Throughput: 0: 12221.3. Samples: 433037312. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:00,767][1648981] Avg episode reward: [(0, '954.830')] [2024-06-15 21:35:01,931][1651669] Updated weights for policy 0, policy_version 845648 (0.0011) [2024-06-15 21:35:03,618][1651669] Updated weights for policy 0, policy_version 845715 (0.0011) [2024-06-15 21:35:05,099][1651669] Updated weights for policy 0, policy_version 845776 (0.0019) [2024-06-15 21:35:05,766][1648981] Fps is (10 sec: 45877.0, 60 sec: 50790.4, 300 sec: 47876.6). Total num frames: 1732182016. Throughput: 0: 12037.7. Samples: 433100800. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:05,767][1648981] Avg episode reward: [(0, '948.030')] [2024-06-15 21:35:10,697][1651669] Updated weights for policy 0, policy_version 845840 (0.0014) [2024-06-15 21:35:10,779][1648981] Fps is (10 sec: 42543.8, 60 sec: 46957.5, 300 sec: 47539.3). Total num frames: 1732280320. Throughput: 0: 12104.1. Samples: 433135104. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:10,780][1648981] Avg episode reward: [(0, '967.190')] [2024-06-15 21:35:11,675][1651669] Updated weights for policy 0, policy_version 845888 (0.0012) [2024-06-15 21:35:14,875][1651669] Updated weights for policy 0, policy_version 845953 (0.0014) [2024-06-15 21:35:15,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 49191.2, 300 sec: 47652.4). Total num frames: 1732575232. Throughput: 0: 11707.8. Samples: 433212416. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:15,767][1648981] Avg episode reward: [(0, '970.980')] [2024-06-15 21:35:17,190][1651669] Updated weights for policy 0, policy_version 846033 (0.0011) [2024-06-15 21:35:20,766][1648981] Fps is (10 sec: 49215.5, 60 sec: 46421.3, 300 sec: 47652.5). Total num frames: 1732771840. Throughput: 0: 11719.2. Samples: 433275904. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:20,767][1648981] Avg episode reward: [(0, '1027.300')] [2024-06-15 21:35:22,794][1651669] Updated weights for policy 0, policy_version 846098 (0.0024) [2024-06-15 21:35:25,786][1648981] Fps is (10 sec: 35975.7, 60 sec: 46411.3, 300 sec: 47316.1). Total num frames: 1732935680. Throughput: 0: 11361.5. Samples: 433308160. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:25,789][1648981] Avg episode reward: [(0, '1045.670')] [2024-06-15 21:35:25,964][1651669] Updated weights for policy 0, policy_version 846176 (0.0117) [2024-06-15 21:35:27,676][1651669] Updated weights for policy 0, policy_version 846240 (0.0012) [2024-06-15 21:35:29,390][1651669] Updated weights for policy 0, policy_version 846304 (0.0024) [2024-06-15 21:35:30,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 47513.6, 300 sec: 47875.2). Total num frames: 1733296128. Throughput: 0: 11405.5. Samples: 433367040. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:30,767][1648981] Avg episode reward: [(0, '1073.150')] [2024-06-15 21:35:34,329][1651669] Updated weights for policy 0, policy_version 846338 (0.0011) [2024-06-15 21:35:35,627][1651669] Updated weights for policy 0, policy_version 846395 (0.0013) [2024-06-15 21:35:35,766][1648981] Fps is (10 sec: 45963.0, 60 sec: 45329.1, 300 sec: 47430.3). Total num frames: 1733394432. Throughput: 0: 11195.7. Samples: 433440768. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:35,767][1648981] Avg episode reward: [(0, '1036.710')] [2024-06-15 21:35:38,135][1651669] Updated weights for policy 0, policy_version 846448 (0.0012) [2024-06-15 21:35:40,025][1651274] Signal inference workers to stop experience collection... (44400 times) [2024-06-15 21:35:40,072][1651669] InferenceWorker_p0-w0: stopping experience collection (44400 times) [2024-06-15 21:35:40,092][1651669] Updated weights for policy 0, policy_version 846515 (0.0012) [2024-06-15 21:35:40,344][1651274] Signal inference workers to resume experience collection... (44400 times) [2024-06-15 21:35:40,354][1651669] InferenceWorker_p0-w0: resuming experience collection (44400 times) [2024-06-15 21:35:40,766][1648981] Fps is (10 sec: 42599.0, 60 sec: 46421.3, 300 sec: 47652.4). Total num frames: 1733722112. Throughput: 0: 11457.5. Samples: 433475584. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:40,767][1648981] Avg episode reward: [(0, '1074.580')] [2024-06-15 21:35:41,922][1651669] Updated weights for policy 0, policy_version 846586 (0.0030) [2024-06-15 21:35:45,767][1648981] Fps is (10 sec: 42597.0, 60 sec: 43704.9, 300 sec: 47097.0). Total num frames: 1733820416. Throughput: 0: 11150.1. Samples: 433539072. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:45,767][1648981] Avg episode reward: [(0, '1054.550')] [2024-06-15 21:35:46,852][1651669] Updated weights for policy 0, policy_version 846645 (0.0012) [2024-06-15 21:35:50,188][1651669] Updated weights for policy 0, policy_version 846720 (0.0011) [2024-06-15 21:35:50,767][1648981] Fps is (10 sec: 39320.9, 60 sec: 45875.2, 300 sec: 47208.3). Total num frames: 1734115328. Throughput: 0: 11207.1. Samples: 433605120. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:50,767][1648981] Avg episode reward: [(0, '1106.230')] [2024-06-15 21:35:51,442][1651669] Updated weights for policy 0, policy_version 846768 (0.0014) [2024-06-15 21:35:53,153][1651669] Updated weights for policy 0, policy_version 846838 (0.0013) [2024-06-15 21:35:55,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 43690.9, 300 sec: 47430.4). Total num frames: 1734344704. Throughput: 0: 11051.0. Samples: 433632256. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:35:55,767][1648981] Avg episode reward: [(0, '1116.530')] [2024-06-15 21:35:58,316][1651669] Updated weights for policy 0, policy_version 846896 (0.0155) [2024-06-15 21:36:00,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 44236.8, 300 sec: 47097.1). Total num frames: 1734508544. Throughput: 0: 11070.6. Samples: 433710592. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:36:00,767][1648981] Avg episode reward: [(0, '1113.240')] [2024-06-15 21:36:01,207][1651669] Updated weights for policy 0, policy_version 846948 (0.0012) [2024-06-15 21:36:02,596][1651669] Updated weights for policy 0, policy_version 847024 (0.0062) [2024-06-15 21:36:03,952][1651669] Updated weights for policy 0, policy_version 847091 (0.0012) [2024-06-15 21:36:05,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 44782.9, 300 sec: 47763.5). Total num frames: 1734868992. Throughput: 0: 11309.5. Samples: 433784832. Policy #0 lag: (min: 3.0, avg: 74.1, max: 259.0) [2024-06-15 21:36:05,767][1648981] Avg episode reward: [(0, '1106.970')] [2024-06-15 21:36:08,197][1651669] Updated weights for policy 0, policy_version 847163 (0.0023) [2024-06-15 21:36:10,767][1648981] Fps is (10 sec: 52427.9, 60 sec: 45884.9, 300 sec: 47319.2). Total num frames: 1735032832. Throughput: 0: 11485.0. Samples: 433824768. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:10,767][1648981] Avg episode reward: [(0, '1159.900')] [2024-06-15 21:36:11,835][1651669] Updated weights for policy 0, policy_version 847231 (0.0012) [2024-06-15 21:36:13,422][1651669] Updated weights for policy 0, policy_version 847280 (0.0159) [2024-06-15 21:36:14,873][1651669] Updated weights for policy 0, policy_version 847355 (0.0155) [2024-06-15 21:36:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46967.5, 300 sec: 47986.7). Total num frames: 1735393280. Throughput: 0: 11616.7. Samples: 433889792. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:15,767][1648981] Avg episode reward: [(0, '1161.850')] [2024-06-15 21:36:18,184][1651669] Updated weights for policy 0, policy_version 847394 (0.0012) [2024-06-15 21:36:20,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 45875.2, 300 sec: 47430.9). Total num frames: 1735524352. Throughput: 0: 11992.2. Samples: 433980416. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:20,767][1648981] Avg episode reward: [(0, '1113.990')] [2024-06-15 21:36:20,907][1651669] Updated weights for policy 0, policy_version 847440 (0.0048) [2024-06-15 21:36:21,487][1651274] Signal inference workers to stop experience collection... (44450 times) [2024-06-15 21:36:21,560][1651669] InferenceWorker_p0-w0: stopping experience collection (44450 times) [2024-06-15 21:36:21,791][1651274] Signal inference workers to resume experience collection... (44450 times) [2024-06-15 21:36:21,804][1651669] InferenceWorker_p0-w0: resuming experience collection (44450 times) [2024-06-15 21:36:21,947][1651669] Updated weights for policy 0, policy_version 847479 (0.0011) [2024-06-15 21:36:22,745][1651669] Updated weights for policy 0, policy_version 847506 (0.0041) [2024-06-15 21:36:24,411][1651669] Updated weights for policy 0, policy_version 847595 (0.0136) [2024-06-15 21:36:25,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49714.1, 300 sec: 47985.7). Total num frames: 1735917568. Throughput: 0: 11946.7. Samples: 434013184. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:25,767][1648981] Avg episode reward: [(0, '1072.620')] [2024-06-15 21:36:27,974][1651669] Updated weights for policy 0, policy_version 847632 (0.0012) [2024-06-15 21:36:30,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 45875.2, 300 sec: 47544.9). Total num frames: 1736048640. Throughput: 0: 12151.6. Samples: 434085888. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:30,767][1648981] Avg episode reward: [(0, '1059.380')] [2024-06-15 21:36:31,276][1651669] Updated weights for policy 0, policy_version 847682 (0.0015) [2024-06-15 21:36:32,660][1651669] Updated weights for policy 0, policy_version 847741 (0.0011) [2024-06-15 21:36:34,242][1651669] Updated weights for policy 0, policy_version 847794 (0.0115) [2024-06-15 21:36:35,767][1648981] Fps is (10 sec: 52423.2, 60 sec: 50789.6, 300 sec: 47985.5). Total num frames: 1736441856. Throughput: 0: 12230.9. Samples: 434155520. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:35,768][1648981] Avg episode reward: [(0, '1130.490')] [2024-06-15 21:36:38,088][1651669] Updated weights for policy 0, policy_version 847873 (0.0012) [2024-06-15 21:36:39,403][1651669] Updated weights for policy 0, policy_version 847936 (0.0010) [2024-06-15 21:36:40,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 1736572928. Throughput: 0: 12526.9. Samples: 434195968. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:40,767][1648981] Avg episode reward: [(0, '1151.670')] [2024-06-15 21:36:42,848][1651669] Updated weights for policy 0, policy_version 847995 (0.0013) [2024-06-15 21:36:44,931][1651669] Updated weights for policy 0, policy_version 848035 (0.0011) [2024-06-15 21:36:45,766][1648981] Fps is (10 sec: 39325.7, 60 sec: 50244.6, 300 sec: 47541.4). Total num frames: 1736835072. Throughput: 0: 12504.2. Samples: 434273280. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:45,767][1648981] Avg episode reward: [(0, '1146.160')] [2024-06-15 21:36:46,571][1651669] Updated weights for policy 0, policy_version 848098 (0.0011) [2024-06-15 21:36:48,359][1651669] Updated weights for policy 0, policy_version 848129 (0.0012) [2024-06-15 21:36:49,594][1651669] Updated weights for policy 0, policy_version 848189 (0.0018) [2024-06-15 21:36:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49698.3, 300 sec: 47874.6). Total num frames: 1737097216. Throughput: 0: 12379.0. Samples: 434341888. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:50,767][1648981] Avg episode reward: [(0, '1125.020')] [2024-06-15 21:36:53,318][1651669] Updated weights for policy 0, policy_version 848249 (0.0084) [2024-06-15 21:36:55,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 47430.3). Total num frames: 1737261056. Throughput: 0: 12288.0. Samples: 434377728. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:36:55,767][1648981] Avg episode reward: [(0, '1064.990')] [2024-06-15 21:36:56,129][1651669] Updated weights for policy 0, policy_version 848292 (0.0012) [2024-06-15 21:36:56,341][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000848304_1737326592.pth... [2024-06-15 21:36:56,468][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000842736_1725923328.pth [2024-06-15 21:36:58,153][1651669] Updated weights for policy 0, policy_version 848372 (0.0012) [2024-06-15 21:37:00,556][1651274] Signal inference workers to stop experience collection... (44500 times) [2024-06-15 21:37:00,588][1651669] InferenceWorker_p0-w0: stopping experience collection (44500 times) [2024-06-15 21:37:00,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 50790.2, 300 sec: 47989.9). Total num frames: 1737555968. Throughput: 0: 12435.8. Samples: 434449408. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:00,767][1648981] Avg episode reward: [(0, '1079.870')] [2024-06-15 21:37:00,816][1651274] Signal inference workers to resume experience collection... (44500 times) [2024-06-15 21:37:00,817][1651669] InferenceWorker_p0-w0: resuming experience collection (44500 times) [2024-06-15 21:37:00,945][1651669] Updated weights for policy 0, policy_version 848439 (0.0039) [2024-06-15 21:37:03,933][1651669] Updated weights for policy 0, policy_version 848496 (0.0056) [2024-06-15 21:37:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.7, 300 sec: 47430.3). Total num frames: 1737752576. Throughput: 0: 12083.2. Samples: 434524160. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:05,767][1648981] Avg episode reward: [(0, '1078.440')] [2024-06-15 21:37:07,136][1651669] Updated weights for policy 0, policy_version 848544 (0.0014) [2024-06-15 21:37:08,579][1651669] Updated weights for policy 0, policy_version 848609 (0.0078) [2024-06-15 21:37:09,938][1651669] Updated weights for policy 0, policy_version 848641 (0.0013) [2024-06-15 21:37:10,766][1648981] Fps is (10 sec: 49153.7, 60 sec: 50244.5, 300 sec: 48096.8). Total num frames: 1738047488. Throughput: 0: 12140.1. Samples: 434559488. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:10,767][1648981] Avg episode reward: [(0, '1086.150')] [2024-06-15 21:37:11,468][1651669] Updated weights for policy 0, policy_version 848701 (0.0012) [2024-06-15 21:37:14,356][1651669] Updated weights for policy 0, policy_version 848765 (0.0013) [2024-06-15 21:37:15,770][1648981] Fps is (10 sec: 52409.3, 60 sec: 48056.7, 300 sec: 47763.5). Total num frames: 1738276864. Throughput: 0: 12070.8. Samples: 434629120. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:15,771][1648981] Avg episode reward: [(0, '1069.910')] [2024-06-15 21:37:18,331][1651669] Updated weights for policy 0, policy_version 848832 (0.0118) [2024-06-15 21:37:20,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 50244.2, 300 sec: 47986.0). Total num frames: 1738539008. Throughput: 0: 12220.0. Samples: 434705408. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:20,767][1648981] Avg episode reward: [(0, '1118.530')] [2024-06-15 21:37:21,164][1651669] Updated weights for policy 0, policy_version 848898 (0.0016) [2024-06-15 21:37:22,660][1651669] Updated weights for policy 0, policy_version 848955 (0.0012) [2024-06-15 21:37:24,428][1651669] Updated weights for policy 0, policy_version 849000 (0.0011) [2024-06-15 21:37:25,773][1648981] Fps is (10 sec: 52416.0, 60 sec: 48054.7, 300 sec: 48095.7). Total num frames: 1738801152. Throughput: 0: 12172.6. Samples: 434743808. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:25,773][1648981] Avg episode reward: [(0, '1096.820')] [2024-06-15 21:37:28,187][1651669] Updated weights for policy 0, policy_version 849061 (0.0012) [2024-06-15 21:37:29,103][1651669] Updated weights for policy 0, policy_version 849108 (0.0011) [2024-06-15 21:37:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.4, 300 sec: 47985.7). Total num frames: 1739063296. Throughput: 0: 12208.4. Samples: 434822656. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:30,767][1648981] Avg episode reward: [(0, '1066.740')] [2024-06-15 21:37:31,429][1651669] Updated weights for policy 0, policy_version 849168 (0.0012) [2024-06-15 21:37:32,584][1651669] Updated weights for policy 0, policy_version 849215 (0.0036) [2024-06-15 21:37:35,770][1648981] Fps is (10 sec: 52441.0, 60 sec: 48057.5, 300 sec: 48651.6). Total num frames: 1739325440. Throughput: 0: 12252.8. Samples: 434893312. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:35,771][1648981] Avg episode reward: [(0, '1088.450')] [2024-06-15 21:37:38,183][1651669] Updated weights for policy 0, policy_version 849283 (0.0012) [2024-06-15 21:37:39,598][1651669] Updated weights for policy 0, policy_version 849344 (0.0012) [2024-06-15 21:37:40,786][1648981] Fps is (10 sec: 45785.6, 60 sec: 49136.0, 300 sec: 47760.3). Total num frames: 1739522048. Throughput: 0: 12282.7. Samples: 434930688. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:40,786][1648981] Avg episode reward: [(0, '1130.370')] [2024-06-15 21:37:42,477][1651669] Updated weights for policy 0, policy_version 849412 (0.0023) [2024-06-15 21:37:43,604][1651669] Updated weights for policy 0, policy_version 849466 (0.0011) [2024-06-15 21:37:44,666][1651274] Signal inference workers to stop experience collection... (44550 times) [2024-06-15 21:37:44,718][1651669] InferenceWorker_p0-w0: stopping experience collection (44550 times) [2024-06-15 21:37:44,955][1651274] Signal inference workers to resume experience collection... (44550 times) [2024-06-15 21:37:44,956][1651669] InferenceWorker_p0-w0: resuming experience collection (44550 times) [2024-06-15 21:37:45,734][1651669] Updated weights for policy 0, policy_version 849525 (0.0083) [2024-06-15 21:37:45,767][1648981] Fps is (10 sec: 49169.5, 60 sec: 49697.9, 300 sec: 48766.5). Total num frames: 1739816960. Throughput: 0: 12310.8. Samples: 435003392. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:45,767][1648981] Avg episode reward: [(0, '1117.090')] [2024-06-15 21:37:49,882][1651669] Updated weights for policy 0, policy_version 849568 (0.0012) [2024-06-15 21:37:50,766][1648981] Fps is (10 sec: 45965.3, 60 sec: 48059.8, 300 sec: 47763.7). Total num frames: 1739980800. Throughput: 0: 12322.2. Samples: 435078656. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:50,767][1648981] Avg episode reward: [(0, '1113.670')] [2024-06-15 21:37:51,096][1651669] Updated weights for policy 0, policy_version 849616 (0.0084) [2024-06-15 21:37:52,068][1651669] Updated weights for policy 0, policy_version 849658 (0.0013) [2024-06-15 21:37:54,188][1651669] Updated weights for policy 0, policy_version 849697 (0.0021) [2024-06-15 21:37:55,189][1651669] Updated weights for policy 0, policy_version 849749 (0.0011) [2024-06-15 21:37:55,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 51336.5, 300 sec: 48765.8). Total num frames: 1740341248. Throughput: 0: 12299.3. Samples: 435112960. Policy #0 lag: (min: 8.0, avg: 95.5, max: 264.0) [2024-06-15 21:37:55,767][1648981] Avg episode reward: [(0, '1141.830')] [2024-06-15 21:37:59,861][1651669] Updated weights for policy 0, policy_version 849801 (0.0023) [2024-06-15 21:38:00,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48606.1, 300 sec: 47763.5). Total num frames: 1740472320. Throughput: 0: 12550.7. Samples: 435193856. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:00,767][1648981] Avg episode reward: [(0, '1111.990')] [2024-06-15 21:38:01,072][1651669] Updated weights for policy 0, policy_version 849856 (0.0013) [2024-06-15 21:38:02,536][1651669] Updated weights for policy 0, policy_version 849910 (0.0014) [2024-06-15 21:38:04,354][1651669] Updated weights for policy 0, policy_version 849952 (0.0012) [2024-06-15 21:38:05,736][1651669] Updated weights for policy 0, policy_version 850001 (0.0012) [2024-06-15 21:38:05,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 50790.5, 300 sec: 48541.1). Total num frames: 1740800000. Throughput: 0: 12322.2. Samples: 435259904. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:05,767][1648981] Avg episode reward: [(0, '1108.740')] [2024-06-15 21:38:10,684][1651669] Updated weights for policy 0, policy_version 850066 (0.0012) [2024-06-15 21:38:10,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48059.7, 300 sec: 47763.5). Total num frames: 1740931072. Throughput: 0: 12528.7. Samples: 435307520. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:10,767][1648981] Avg episode reward: [(0, '1111.240')] [2024-06-15 21:38:12,193][1651669] Updated weights for policy 0, policy_version 850131 (0.0124) [2024-06-15 21:38:13,701][1651669] Updated weights for policy 0, policy_version 850180 (0.0012) [2024-06-15 21:38:15,563][1651669] Updated weights for policy 0, policy_version 850256 (0.0116) [2024-06-15 21:38:15,794][1648981] Fps is (10 sec: 52282.5, 60 sec: 50769.9, 300 sec: 48537.0). Total num frames: 1741324288. Throughput: 0: 12246.3. Samples: 435374080. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:15,795][1648981] Avg episode reward: [(0, '1082.950')] [2024-06-15 21:38:16,500][1651669] Updated weights for policy 0, policy_version 850304 (0.0012) [2024-06-15 21:38:20,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 1741422592. Throughput: 0: 12539.4. Samples: 435457536. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:20,767][1648981] Avg episode reward: [(0, '1050.790')] [2024-06-15 21:38:22,576][1651669] Updated weights for policy 0, policy_version 850352 (0.0011) [2024-06-15 21:38:24,262][1651274] Signal inference workers to stop experience collection... (44600 times) [2024-06-15 21:38:24,282][1651669] Updated weights for policy 0, policy_version 850417 (0.0014) [2024-06-15 21:38:24,319][1651669] InferenceWorker_p0-w0: stopping experience collection (44600 times) [2024-06-15 21:38:24,449][1651274] Signal inference workers to resume experience collection... (44600 times) [2024-06-15 21:38:24,450][1651669] InferenceWorker_p0-w0: resuming experience collection (44600 times) [2024-06-15 21:38:25,456][1651669] Updated weights for policy 0, policy_version 850480 (0.0012) [2024-06-15 21:38:25,770][1648981] Fps is (10 sec: 49270.8, 60 sec: 50246.2, 300 sec: 48429.4). Total num frames: 1741815808. Throughput: 0: 12349.2. Samples: 435486208. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:25,771][1648981] Avg episode reward: [(0, '1005.080')] [2024-06-15 21:38:27,501][1651669] Updated weights for policy 0, policy_version 850545 (0.0014) [2024-06-15 21:38:30,770][1648981] Fps is (10 sec: 52410.9, 60 sec: 48056.9, 300 sec: 48318.4). Total num frames: 1741946880. Throughput: 0: 12298.5. Samples: 435556864. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:30,770][1648981] Avg episode reward: [(0, '1004.580')] [2024-06-15 21:38:32,144][1651669] Updated weights for policy 0, policy_version 850563 (0.0012) [2024-06-15 21:38:33,551][1651669] Updated weights for policy 0, policy_version 850618 (0.0010) [2024-06-15 21:38:35,136][1651669] Updated weights for policy 0, policy_version 850688 (0.0011) [2024-06-15 21:38:35,773][1648981] Fps is (10 sec: 42588.3, 60 sec: 48604.0, 300 sec: 48095.8). Total num frames: 1742241792. Throughput: 0: 12320.4. Samples: 435633152. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:35,773][1648981] Avg episode reward: [(0, '1038.310')] [2024-06-15 21:38:36,409][1651669] Updated weights for policy 0, policy_version 850752 (0.0017) [2024-06-15 21:38:38,045][1651669] Updated weights for policy 0, policy_version 850808 (0.0012) [2024-06-15 21:38:40,766][1648981] Fps is (10 sec: 52446.8, 60 sec: 49168.0, 300 sec: 48652.2). Total num frames: 1742471168. Throughput: 0: 12265.3. Samples: 435664896. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:40,767][1648981] Avg episode reward: [(0, '1002.980')] [2024-06-15 21:38:44,537][1651669] Updated weights for policy 0, policy_version 850868 (0.0016) [2024-06-15 21:38:45,766][1648981] Fps is (10 sec: 42624.8, 60 sec: 47513.8, 300 sec: 47874.6). Total num frames: 1742667776. Throughput: 0: 12231.1. Samples: 435744256. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:45,767][1648981] Avg episode reward: [(0, '1000.770')] [2024-06-15 21:38:45,897][1651669] Updated weights for policy 0, policy_version 850913 (0.0013) [2024-06-15 21:38:47,216][1651669] Updated weights for policy 0, policy_version 850981 (0.0018) [2024-06-15 21:38:49,652][1651669] Updated weights for policy 0, policy_version 851060 (0.0012) [2024-06-15 21:38:50,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 48652.2). Total num frames: 1742995456. Throughput: 0: 12174.2. Samples: 435807744. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:50,767][1648981] Avg episode reward: [(0, '990.260')] [2024-06-15 21:38:55,212][1651669] Updated weights for policy 0, policy_version 851091 (0.0027) [2024-06-15 21:38:55,794][1648981] Fps is (10 sec: 39212.3, 60 sec: 45308.1, 300 sec: 47425.8). Total num frames: 1743060992. Throughput: 0: 12087.1. Samples: 435851776. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:38:55,795][1648981] Avg episode reward: [(0, '901.110')] [2024-06-15 21:38:56,387][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000851136_1743126528.pth... [2024-06-15 21:38:56,489][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000845568_1731723264.pth [2024-06-15 21:38:57,275][1651669] Updated weights for policy 0, policy_version 851168 (0.0113) [2024-06-15 21:38:58,469][1651669] Updated weights for policy 0, policy_version 851220 (0.0010) [2024-06-15 21:38:59,641][1651669] Updated weights for policy 0, policy_version 851265 (0.0060) [2024-06-15 21:39:00,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.4, 300 sec: 48652.2). Total num frames: 1743486976. Throughput: 0: 12022.4. Samples: 435914752. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:00,767][1648981] Avg episode reward: [(0, '910.970')] [2024-06-15 21:39:00,900][1651669] Updated weights for policy 0, policy_version 851318 (0.0016) [2024-06-15 21:39:05,767][1648981] Fps is (10 sec: 46002.8, 60 sec: 45328.9, 300 sec: 47652.4). Total num frames: 1743519744. Throughput: 0: 11912.5. Samples: 435993600. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:05,767][1648981] Avg episode reward: [(0, '905.920')] [2024-06-15 21:39:06,158][1651274] Signal inference workers to stop experience collection... (44650 times) [2024-06-15 21:39:06,222][1651669] InferenceWorker_p0-w0: stopping experience collection (44650 times) [2024-06-15 21:39:06,458][1651274] Signal inference workers to resume experience collection... (44650 times) [2024-06-15 21:39:06,460][1651669] InferenceWorker_p0-w0: resuming experience collection (44650 times) [2024-06-15 21:39:06,700][1651669] Updated weights for policy 0, policy_version 851365 (0.0013) [2024-06-15 21:39:08,313][1651669] Updated weights for policy 0, policy_version 851440 (0.0015) [2024-06-15 21:39:09,688][1651669] Updated weights for policy 0, policy_version 851496 (0.0024) [2024-06-15 21:39:10,774][1648981] Fps is (10 sec: 45839.0, 60 sec: 50237.7, 300 sec: 48547.6). Total num frames: 1743945728. Throughput: 0: 11888.7. Samples: 436021248. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:10,775][1648981] Avg episode reward: [(0, '932.500')] [2024-06-15 21:39:11,000][1651669] Updated weights for policy 0, policy_version 851552 (0.0011) [2024-06-15 21:39:15,768][1648981] Fps is (10 sec: 52419.9, 60 sec: 45348.8, 300 sec: 47652.1). Total num frames: 1744044032. Throughput: 0: 11935.7. Samples: 436093952. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:15,769][1648981] Avg episode reward: [(0, '899.700')] [2024-06-15 21:39:17,731][1651669] Updated weights for policy 0, policy_version 851619 (0.0012) [2024-06-15 21:39:19,720][1651669] Updated weights for policy 0, policy_version 851697 (0.0011) [2024-06-15 21:39:20,766][1648981] Fps is (10 sec: 39352.6, 60 sec: 48606.0, 300 sec: 48097.8). Total num frames: 1744338944. Throughput: 0: 11629.7. Samples: 436156416. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:20,767][1648981] Avg episode reward: [(0, '871.050')] [2024-06-15 21:39:21,449][1651669] Updated weights for policy 0, policy_version 851776 (0.0104) [2024-06-15 21:39:23,518][1651669] Updated weights for policy 0, policy_version 851838 (0.0019) [2024-06-15 21:39:25,766][1648981] Fps is (10 sec: 52438.4, 60 sec: 45878.1, 300 sec: 47874.6). Total num frames: 1744568320. Throughput: 0: 11639.5. Samples: 436188672. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:25,767][1648981] Avg episode reward: [(0, '862.980')] [2024-06-15 21:39:29,712][1651669] Updated weights for policy 0, policy_version 851873 (0.0011) [2024-06-15 21:39:30,782][1648981] Fps is (10 sec: 39259.1, 60 sec: 46411.8, 300 sec: 47649.9). Total num frames: 1744732160. Throughput: 0: 11760.5. Samples: 436273664. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:30,783][1648981] Avg episode reward: [(0, '856.420')] [2024-06-15 21:39:31,543][1651669] Updated weights for policy 0, policy_version 851968 (0.0033) [2024-06-15 21:39:33,416][1651669] Updated weights for policy 0, policy_version 852034 (0.0113) [2024-06-15 21:39:34,539][1651669] Updated weights for policy 0, policy_version 852093 (0.0011) [2024-06-15 21:39:35,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 47518.6, 300 sec: 47985.7). Total num frames: 1745092608. Throughput: 0: 11810.1. Samples: 436339200. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:35,767][1648981] Avg episode reward: [(0, '881.700')] [2024-06-15 21:39:40,766][1648981] Fps is (10 sec: 49230.3, 60 sec: 45875.3, 300 sec: 47544.6). Total num frames: 1745223680. Throughput: 0: 11942.7. Samples: 436388864. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:40,767][1648981] Avg episode reward: [(0, '982.390')] [2024-06-15 21:39:40,774][1651669] Updated weights for policy 0, policy_version 852176 (0.0153) [2024-06-15 21:39:41,844][1651669] Updated weights for policy 0, policy_version 852224 (0.0013) [2024-06-15 21:39:42,004][1651274] Signal inference workers to stop experience collection... (44700 times) [2024-06-15 21:39:42,080][1651669] InferenceWorker_p0-w0: stopping experience collection (44700 times) [2024-06-15 21:39:42,279][1651274] Signal inference workers to resume experience collection... (44700 times) [2024-06-15 21:39:42,280][1651669] InferenceWorker_p0-w0: resuming experience collection (44700 times) [2024-06-15 21:39:44,172][1651669] Updated weights for policy 0, policy_version 852304 (0.0011) [2024-06-15 21:39:45,308][1651669] Updated weights for policy 0, policy_version 852347 (0.0023) [2024-06-15 21:39:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.1, 300 sec: 48319.0). Total num frames: 1745616896. Throughput: 0: 11810.1. Samples: 436446208. Policy #0 lag: (min: 15.0, avg: 92.7, max: 271.0) [2024-06-15 21:39:45,767][1648981] Avg episode reward: [(0, '985.070')] [2024-06-15 21:39:50,506][1651669] Updated weights for policy 0, policy_version 852402 (0.0014) [2024-06-15 21:39:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1745747968. Throughput: 0: 11980.9. Samples: 436532736. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:39:50,767][1648981] Avg episode reward: [(0, '1078.210')] [2024-06-15 21:39:51,509][1651669] Updated weights for policy 0, policy_version 852449 (0.0012) [2024-06-15 21:39:53,151][1651669] Updated weights for policy 0, policy_version 852515 (0.0012) [2024-06-15 21:39:54,937][1651669] Updated weights for policy 0, policy_version 852592 (0.0011) [2024-06-15 21:39:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 51360.5, 300 sec: 48430.0). Total num frames: 1746141184. Throughput: 0: 11994.3. Samples: 436560896. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:39:55,767][1648981] Avg episode reward: [(0, '1061.410')] [2024-06-15 21:39:59,573][1651669] Updated weights for policy 0, policy_version 852628 (0.0010) [2024-06-15 21:40:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46421.3, 300 sec: 47763.5). Total num frames: 1746272256. Throughput: 0: 12334.0. Samples: 436648960. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:00,767][1648981] Avg episode reward: [(0, '1031.540')] [2024-06-15 21:40:01,521][1651669] Updated weights for policy 0, policy_version 852688 (0.0015) [2024-06-15 21:40:02,842][1651669] Updated weights for policy 0, policy_version 852752 (0.0013) [2024-06-15 21:40:04,107][1651669] Updated weights for policy 0, policy_version 852816 (0.0012) [2024-06-15 21:40:05,240][1651669] Updated weights for policy 0, policy_version 852861 (0.0014) [2024-06-15 21:40:05,790][1648981] Fps is (10 sec: 52303.2, 60 sec: 52408.1, 300 sec: 48761.4). Total num frames: 1746665472. Throughput: 0: 12270.1. Samples: 436708864. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:05,791][1648981] Avg episode reward: [(0, '1026.580')] [2024-06-15 21:40:10,724][1651669] Updated weights for policy 0, policy_version 852927 (0.0012) [2024-06-15 21:40:10,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 47519.9, 300 sec: 48207.9). Total num frames: 1746796544. Throughput: 0: 12583.9. Samples: 436754944. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:10,766][1648981] Avg episode reward: [(0, '1081.580')] [2024-06-15 21:40:12,567][1651669] Updated weights for policy 0, policy_version 852981 (0.0011) [2024-06-15 21:40:14,109][1651669] Updated weights for policy 0, policy_version 853056 (0.0014) [2024-06-15 21:40:15,446][1651669] Updated weights for policy 0, policy_version 853115 (0.0011) [2024-06-15 21:40:15,767][1648981] Fps is (10 sec: 52553.3, 60 sec: 52430.2, 300 sec: 48874.3). Total num frames: 1747189760. Throughput: 0: 12360.6. Samples: 436829696. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:15,768][1648981] Avg episode reward: [(0, '1060.730')] [2024-06-15 21:40:20,109][1651274] Signal inference workers to stop experience collection... (44750 times) [2024-06-15 21:40:20,146][1651669] InferenceWorker_p0-w0: stopping experience collection (44750 times) [2024-06-15 21:40:20,363][1651274] Signal inference workers to resume experience collection... (44750 times) [2024-06-15 21:40:20,364][1651669] InferenceWorker_p0-w0: resuming experience collection (44750 times) [2024-06-15 21:40:20,766][1648981] Fps is (10 sec: 42597.7, 60 sec: 48059.7, 300 sec: 48433.2). Total num frames: 1747222528. Throughput: 0: 12720.3. Samples: 436911616. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:20,767][1648981] Avg episode reward: [(0, '1068.620')] [2024-06-15 21:40:21,267][1651669] Updated weights for policy 0, policy_version 853172 (0.0097) [2024-06-15 21:40:22,833][1651669] Updated weights for policy 0, policy_version 853232 (0.0014) [2024-06-15 21:40:24,371][1651669] Updated weights for policy 0, policy_version 853296 (0.0156) [2024-06-15 21:40:25,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 51336.7, 300 sec: 48652.2). Total num frames: 1747648512. Throughput: 0: 12367.7. Samples: 436945408. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:25,767][1648981] Avg episode reward: [(0, '1060.640')] [2024-06-15 21:40:25,785][1651669] Updated weights for policy 0, policy_version 853360 (0.0012) [2024-06-15 21:40:30,769][1648981] Fps is (10 sec: 49136.8, 60 sec: 49708.7, 300 sec: 48540.6). Total num frames: 1747714048. Throughput: 0: 12685.3. Samples: 437017088. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:30,770][1648981] Avg episode reward: [(0, '1076.940')] [2024-06-15 21:40:31,893][1651669] Updated weights for policy 0, policy_version 853392 (0.0011) [2024-06-15 21:40:33,175][1651669] Updated weights for policy 0, policy_version 853456 (0.0025) [2024-06-15 21:40:34,475][1651669] Updated weights for policy 0, policy_version 853520 (0.0011) [2024-06-15 21:40:35,763][1651669] Updated weights for policy 0, policy_version 853569 (0.0012) [2024-06-15 21:40:35,770][1648981] Fps is (10 sec: 45856.8, 60 sec: 50241.0, 300 sec: 48762.6). Total num frames: 1748107264. Throughput: 0: 12377.9. Samples: 437089792. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:35,771][1648981] Avg episode reward: [(0, '1111.080')] [2024-06-15 21:40:40,766][1648981] Fps is (10 sec: 52445.2, 60 sec: 50244.3, 300 sec: 48874.4). Total num frames: 1748238336. Throughput: 0: 12583.8. Samples: 437127168. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:40,767][1648981] Avg episode reward: [(0, '1068.010')] [2024-06-15 21:40:42,080][1651669] Updated weights for policy 0, policy_version 853635 (0.0015) [2024-06-15 21:40:43,718][1651669] Updated weights for policy 0, policy_version 853700 (0.0011) [2024-06-15 21:40:45,276][1651669] Updated weights for policy 0, policy_version 853776 (0.0012) [2024-06-15 21:40:45,766][1648981] Fps is (10 sec: 45893.6, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 1748566016. Throughput: 0: 12231.1. Samples: 437199360. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:45,767][1648981] Avg episode reward: [(0, '1041.540')] [2024-06-15 21:40:46,666][1651669] Updated weights for policy 0, policy_version 853840 (0.0147) [2024-06-15 21:40:47,647][1651669] Updated weights for policy 0, policy_version 853885 (0.0034) [2024-06-15 21:40:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1748762624. Throughput: 0: 12636.1. Samples: 437277184. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:50,767][1648981] Avg episode reward: [(0, '1050.980')] [2024-06-15 21:40:53,092][1651669] Updated weights for policy 0, policy_version 853936 (0.0012) [2024-06-15 21:40:55,118][1651274] Signal inference workers to stop experience collection... (44800 times) [2024-06-15 21:40:55,172][1651669] InferenceWorker_p0-w0: stopping experience collection (44800 times) [2024-06-15 21:40:55,292][1651274] Signal inference workers to resume experience collection... (44800 times) [2024-06-15 21:40:55,293][1651669] InferenceWorker_p0-w0: resuming experience collection (44800 times) [2024-06-15 21:40:55,295][1651669] Updated weights for policy 0, policy_version 854016 (0.0012) [2024-06-15 21:40:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48605.9, 300 sec: 49318.6). Total num frames: 1749057536. Throughput: 0: 12572.4. Samples: 437320704. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:40:55,767][1648981] Avg episode reward: [(0, '1034.850')] [2024-06-15 21:40:56,007][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000854048_1749090304.pth... [2024-06-15 21:40:56,129][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000848304_1737326592.pth [2024-06-15 21:40:57,089][1651669] Updated weights for policy 0, policy_version 854096 (0.0091) [2024-06-15 21:40:57,939][1651669] Updated weights for policy 0, policy_version 854143 (0.0040) [2024-06-15 21:41:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1749286912. Throughput: 0: 12299.5. Samples: 437383168. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:00,767][1648981] Avg episode reward: [(0, '1040.470')] [2024-06-15 21:41:04,590][1651669] Updated weights for policy 0, policy_version 854208 (0.0024) [2024-06-15 21:41:05,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46986.3, 300 sec: 48985.4). Total num frames: 1749483520. Throughput: 0: 12310.8. Samples: 437465600. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:05,767][1648981] Avg episode reward: [(0, '978.490')] [2024-06-15 21:41:06,598][1651669] Updated weights for policy 0, policy_version 854281 (0.0011) [2024-06-15 21:41:07,989][1651669] Updated weights for policy 0, policy_version 854339 (0.0012) [2024-06-15 21:41:09,198][1651669] Updated weights for policy 0, policy_version 854397 (0.0052) [2024-06-15 21:41:10,774][1648981] Fps is (10 sec: 52386.7, 60 sec: 50237.4, 300 sec: 48873.0). Total num frames: 1749811200. Throughput: 0: 12217.5. Samples: 437495296. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:10,775][1648981] Avg episode reward: [(0, '945.930')] [2024-06-15 21:41:15,360][1651669] Updated weights for policy 0, policy_version 854455 (0.0011) [2024-06-15 21:41:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45875.4, 300 sec: 48874.3). Total num frames: 1749942272. Throughput: 0: 12402.6. Samples: 437575168. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:15,767][1648981] Avg episode reward: [(0, '918.380')] [2024-06-15 21:41:16,418][1651669] Updated weights for policy 0, policy_version 854496 (0.0011) [2024-06-15 21:41:17,885][1651669] Updated weights for policy 0, policy_version 854549 (0.0011) [2024-06-15 21:41:19,171][1651669] Updated weights for policy 0, policy_version 854624 (0.0014) [2024-06-15 21:41:20,767][1648981] Fps is (10 sec: 52466.0, 60 sec: 51881.9, 300 sec: 48874.2). Total num frames: 1750335488. Throughput: 0: 12152.3. Samples: 437636608. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:20,768][1648981] Avg episode reward: [(0, '922.830')] [2024-06-15 21:41:24,598][1651669] Updated weights for policy 0, policy_version 854657 (0.0011) [2024-06-15 21:41:25,611][1651669] Updated weights for policy 0, policy_version 854712 (0.0012) [2024-06-15 21:41:25,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46967.4, 300 sec: 48874.3). Total num frames: 1750466560. Throughput: 0: 12288.0. Samples: 437680128. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:25,767][1648981] Avg episode reward: [(0, '907.500')] [2024-06-15 21:41:27,434][1651669] Updated weights for policy 0, policy_version 854752 (0.0094) [2024-06-15 21:41:29,407][1651669] Updated weights for policy 0, policy_version 854832 (0.0010) [2024-06-15 21:41:30,700][1651274] Signal inference workers to stop experience collection... (44850 times) [2024-06-15 21:41:30,736][1651669] InferenceWorker_p0-w0: stopping experience collection (44850 times) [2024-06-15 21:41:30,766][1648981] Fps is (10 sec: 45879.9, 60 sec: 51339.3, 300 sec: 48652.3). Total num frames: 1750794240. Throughput: 0: 12094.6. Samples: 437743616. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:30,767][1648981] Avg episode reward: [(0, '842.010')] [2024-06-15 21:41:30,869][1651274] Signal inference workers to resume experience collection... (44850 times) [2024-06-15 21:41:30,869][1651669] InferenceWorker_p0-w0: resuming experience collection (44850 times) [2024-06-15 21:41:30,955][1651669] Updated weights for policy 0, policy_version 854899 (0.0109) [2024-06-15 21:41:35,782][1648981] Fps is (10 sec: 39260.2, 60 sec: 45866.3, 300 sec: 48427.4). Total num frames: 1750859776. Throughput: 0: 12022.1. Samples: 437818368. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:35,783][1648981] Avg episode reward: [(0, '856.400')] [2024-06-15 21:41:37,524][1651669] Updated weights for policy 0, policy_version 854963 (0.0027) [2024-06-15 21:41:38,584][1651669] Updated weights for policy 0, policy_version 854978 (0.0027) [2024-06-15 21:41:40,160][1651669] Updated weights for policy 0, policy_version 855043 (0.0011) [2024-06-15 21:41:40,773][1648981] Fps is (10 sec: 39295.8, 60 sec: 49146.7, 300 sec: 48651.1). Total num frames: 1751187456. Throughput: 0: 11819.8. Samples: 437852672. Policy #0 lag: (min: 47.0, avg: 106.7, max: 303.0) [2024-06-15 21:41:40,773][1648981] Avg episode reward: [(0, '868.530')] [2024-06-15 21:41:41,492][1651669] Updated weights for policy 0, policy_version 855106 (0.0011) [2024-06-15 21:41:42,664][1651669] Updated weights for policy 0, policy_version 855168 (0.0012) [2024-06-15 21:41:45,766][1648981] Fps is (10 sec: 52510.1, 60 sec: 46967.3, 300 sec: 48430.0). Total num frames: 1751384064. Throughput: 0: 11935.2. Samples: 437920256. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:41:45,767][1648981] Avg episode reward: [(0, '869.290')] [2024-06-15 21:41:49,047][1651669] Updated weights for policy 0, policy_version 855227 (0.0145) [2024-06-15 21:41:50,289][1651669] Updated weights for policy 0, policy_version 855257 (0.0044) [2024-06-15 21:41:50,766][1648981] Fps is (10 sec: 42626.2, 60 sec: 47513.6, 300 sec: 48652.2). Total num frames: 1751613440. Throughput: 0: 11639.5. Samples: 437989376. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:41:50,767][1648981] Avg episode reward: [(0, '919.590')] [2024-06-15 21:41:52,114][1651669] Updated weights for policy 0, policy_version 855331 (0.0011) [2024-06-15 21:41:53,526][1651669] Updated weights for policy 0, policy_version 855394 (0.0125) [2024-06-15 21:41:55,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 47513.5, 300 sec: 48652.2). Total num frames: 1751908352. Throughput: 0: 11607.4. Samples: 438017536. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:41:55,767][1648981] Avg episode reward: [(0, '913.540')] [2024-06-15 21:41:59,082][1651669] Updated weights for policy 0, policy_version 855447 (0.0011) [2024-06-15 21:42:00,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1752039424. Throughput: 0: 11628.1. Samples: 438098432. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:00,767][1648981] Avg episode reward: [(0, '887.890')] [2024-06-15 21:42:01,002][1651669] Updated weights for policy 0, policy_version 855505 (0.0013) [2024-06-15 21:42:02,432][1651669] Updated weights for policy 0, policy_version 855568 (0.0115) [2024-06-15 21:42:04,018][1651669] Updated weights for policy 0, policy_version 855635 (0.0012) [2024-06-15 21:42:04,834][1651669] Updated weights for policy 0, policy_version 855673 (0.0024) [2024-06-15 21:42:05,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49151.9, 300 sec: 48763.2). Total num frames: 1752432640. Throughput: 0: 11616.9. Samples: 438159360. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:05,767][1648981] Avg episode reward: [(0, '869.390')] [2024-06-15 21:42:10,520][1651669] Updated weights for policy 0, policy_version 855714 (0.0014) [2024-06-15 21:42:10,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 44788.9, 300 sec: 48208.5). Total num frames: 1752498176. Throughput: 0: 11650.8. Samples: 438204416. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:10,767][1648981] Avg episode reward: [(0, '880.960')] [2024-06-15 21:42:11,865][1651669] Updated weights for policy 0, policy_version 855760 (0.0013) [2024-06-15 21:42:12,928][1651274] Signal inference workers to stop experience collection... (44900 times) [2024-06-15 21:42:13,023][1651669] InferenceWorker_p0-w0: stopping experience collection (44900 times) [2024-06-15 21:42:13,174][1651274] Signal inference workers to resume experience collection... (44900 times) [2024-06-15 21:42:13,175][1651669] InferenceWorker_p0-w0: resuming experience collection (44900 times) [2024-06-15 21:42:13,374][1651669] Updated weights for policy 0, policy_version 855825 (0.0014) [2024-06-15 21:42:15,142][1651669] Updated weights for policy 0, policy_version 855904 (0.0012) [2024-06-15 21:42:15,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1752924160. Throughput: 0: 11685.0. Samples: 438269440. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:15,767][1648981] Avg episode reward: [(0, '885.200')] [2024-06-15 21:42:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 43691.3, 300 sec: 47986.7). Total num frames: 1752956928. Throughput: 0: 11871.1. Samples: 438352384. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:20,767][1648981] Avg episode reward: [(0, '965.970')] [2024-06-15 21:42:21,215][1651669] Updated weights for policy 0, policy_version 855968 (0.0011) [2024-06-15 21:42:22,351][1651669] Updated weights for policy 0, policy_version 856004 (0.0013) [2024-06-15 21:42:23,458][1651669] Updated weights for policy 0, policy_version 856057 (0.0010) [2024-06-15 21:42:24,750][1651669] Updated weights for policy 0, policy_version 856116 (0.0014) [2024-06-15 21:42:25,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1753415680. Throughput: 0: 11800.5. Samples: 438383616. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:25,767][1648981] Avg episode reward: [(0, '951.910')] [2024-06-15 21:42:26,372][1651669] Updated weights for policy 0, policy_version 856189 (0.0107) [2024-06-15 21:42:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 44782.9, 300 sec: 47986.3). Total num frames: 1753481216. Throughput: 0: 12049.1. Samples: 438462464. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:30,767][1648981] Avg episode reward: [(0, '982.690')] [2024-06-15 21:42:32,657][1651669] Updated weights for policy 0, policy_version 856253 (0.0011) [2024-06-15 21:42:34,029][1651669] Updated weights for policy 0, policy_version 856314 (0.0012) [2024-06-15 21:42:34,775][1651669] Updated weights for policy 0, policy_version 856341 (0.0011) [2024-06-15 21:42:35,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 50257.3, 300 sec: 48655.4). Total num frames: 1753874432. Throughput: 0: 11969.4. Samples: 438528000. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:35,767][1648981] Avg episode reward: [(0, '1010.340')] [2024-06-15 21:42:36,279][1651669] Updated weights for policy 0, policy_version 856416 (0.0011) [2024-06-15 21:42:36,971][1651669] Updated weights for policy 0, policy_version 856447 (0.0011) [2024-06-15 21:42:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46972.6, 300 sec: 48096.8). Total num frames: 1754005504. Throughput: 0: 12276.6. Samples: 438569984. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:40,767][1648981] Avg episode reward: [(0, '973.780')] [2024-06-15 21:42:41,926][1651669] Updated weights for policy 0, policy_version 856496 (0.0012) [2024-06-15 21:42:44,426][1651669] Updated weights for policy 0, policy_version 856582 (0.0012) [2024-06-15 21:42:45,555][1651669] Updated weights for policy 0, policy_version 856631 (0.0031) [2024-06-15 21:42:45,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1754398720. Throughput: 0: 12288.0. Samples: 438651392. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:45,767][1648981] Avg episode reward: [(0, '1000.120')] [2024-06-15 21:42:47,056][1651669] Updated weights for policy 0, policy_version 856701 (0.0012) [2024-06-15 21:42:50,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 48096.8). Total num frames: 1754529792. Throughput: 0: 12538.3. Samples: 438723584. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:50,767][1648981] Avg episode reward: [(0, '997.260')] [2024-06-15 21:42:51,538][1651274] Signal inference workers to stop experience collection... (44950 times) [2024-06-15 21:42:51,574][1651669] InferenceWorker_p0-w0: stopping experience collection (44950 times) [2024-06-15 21:42:51,801][1651274] Signal inference workers to resume experience collection... (44950 times) [2024-06-15 21:42:51,802][1651669] InferenceWorker_p0-w0: resuming experience collection (44950 times) [2024-06-15 21:42:53,036][1651669] Updated weights for policy 0, policy_version 856756 (0.0131) [2024-06-15 21:42:54,386][1651669] Updated weights for policy 0, policy_version 856820 (0.0120) [2024-06-15 21:42:55,776][1648981] Fps is (10 sec: 39283.8, 60 sec: 48052.0, 300 sec: 48539.5). Total num frames: 1754791936. Throughput: 0: 12262.6. Samples: 438756352. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:42:55,777][1648981] Avg episode reward: [(0, '983.580')] [2024-06-15 21:42:56,335][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000856864_1754857472.pth... [2024-06-15 21:42:56,493][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000851136_1743126528.pth [2024-06-15 21:42:57,000][1651669] Updated weights for policy 0, policy_version 856890 (0.0054) [2024-06-15 21:42:58,530][1651669] Updated weights for policy 0, policy_version 856951 (0.0013) [2024-06-15 21:43:00,793][1648981] Fps is (10 sec: 52288.9, 60 sec: 50221.9, 300 sec: 48314.5). Total num frames: 1755054080. Throughput: 0: 12269.3. Samples: 438821888. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:00,793][1648981] Avg episode reward: [(0, '979.610')] [2024-06-15 21:43:03,120][1651669] Updated weights for policy 0, policy_version 856992 (0.0013) [2024-06-15 21:43:03,981][1651669] Updated weights for policy 0, policy_version 857021 (0.0011) [2024-06-15 21:43:05,487][1651669] Updated weights for policy 0, policy_version 857087 (0.0014) [2024-06-15 21:43:05,766][1648981] Fps is (10 sec: 52479.7, 60 sec: 48059.8, 300 sec: 48763.2). Total num frames: 1755316224. Throughput: 0: 12162.8. Samples: 438899712. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:05,767][1648981] Avg episode reward: [(0, '933.300')] [2024-06-15 21:43:07,342][1651669] Updated weights for policy 0, policy_version 857139 (0.0012) [2024-06-15 21:43:08,866][1651669] Updated weights for policy 0, policy_version 857212 (0.0022) [2024-06-15 21:43:10,766][1648981] Fps is (10 sec: 52568.5, 60 sec: 51336.5, 300 sec: 48323.5). Total num frames: 1755578368. Throughput: 0: 12162.8. Samples: 438930944. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:10,767][1648981] Avg episode reward: [(0, '945.870')] [2024-06-15 21:43:14,703][1651669] Updated weights for policy 0, policy_version 857274 (0.0107) [2024-06-15 21:43:15,774][1648981] Fps is (10 sec: 42565.1, 60 sec: 46961.3, 300 sec: 48539.8). Total num frames: 1755742208. Throughput: 0: 12115.2. Samples: 439007744. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:15,775][1648981] Avg episode reward: [(0, '941.560')] [2024-06-15 21:43:16,456][1651669] Updated weights for policy 0, policy_version 857314 (0.0036) [2024-06-15 21:43:17,697][1651669] Updated weights for policy 0, policy_version 857360 (0.0013) [2024-06-15 21:43:19,275][1651669] Updated weights for policy 0, policy_version 857424 (0.0019) [2024-06-15 21:43:20,486][1651669] Updated weights for policy 0, policy_version 857472 (0.0013) [2024-06-15 21:43:20,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 52428.8, 300 sec: 48430.6). Total num frames: 1756102656. Throughput: 0: 12094.6. Samples: 439072256. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:20,767][1648981] Avg episode reward: [(0, '995.310')] [2024-06-15 21:43:25,715][1651669] Updated weights for policy 0, policy_version 857523 (0.0014) [2024-06-15 21:43:25,767][1648981] Fps is (10 sec: 45910.2, 60 sec: 46421.1, 300 sec: 48319.5). Total num frames: 1756200960. Throughput: 0: 12196.9. Samples: 439118848. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:25,767][1648981] Avg episode reward: [(0, '996.690')] [2024-06-15 21:43:27,360][1651669] Updated weights for policy 0, policy_version 857593 (0.0012) [2024-06-15 21:43:29,203][1651669] Updated weights for policy 0, policy_version 857632 (0.0013) [2024-06-15 21:43:29,329][1651274] Signal inference workers to stop experience collection... (45000 times) [2024-06-15 21:43:29,367][1651669] InferenceWorker_p0-w0: stopping experience collection (45000 times) [2024-06-15 21:43:29,561][1651274] Signal inference workers to resume experience collection... (45000 times) [2024-06-15 21:43:29,562][1651669] InferenceWorker_p0-w0: resuming experience collection (45000 times) [2024-06-15 21:43:30,767][1648981] Fps is (10 sec: 42595.4, 60 sec: 50789.8, 300 sec: 48430.9). Total num frames: 1756528640. Throughput: 0: 11832.7. Samples: 439183872. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:30,768][1648981] Avg episode reward: [(0, '1027.340')] [2024-06-15 21:43:31,021][1651669] Updated weights for policy 0, policy_version 857699 (0.0013) [2024-06-15 21:43:35,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1756626944. Throughput: 0: 12026.3. Samples: 439264768. Policy #0 lag: (min: 111.0, avg: 196.5, max: 367.0) [2024-06-15 21:43:35,767][1648981] Avg episode reward: [(0, '1044.640')] [2024-06-15 21:43:36,704][1651669] Updated weights for policy 0, policy_version 857760 (0.0013) [2024-06-15 21:43:38,296][1651669] Updated weights for policy 0, policy_version 857824 (0.0011) [2024-06-15 21:43:40,180][1651669] Updated weights for policy 0, policy_version 857875 (0.0036) [2024-06-15 21:43:40,766][1648981] Fps is (10 sec: 42601.3, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1756954624. Throughput: 0: 11892.4. Samples: 439291392. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:43:40,767][1648981] Avg episode reward: [(0, '1065.370')] [2024-06-15 21:43:41,924][1651669] Updated weights for policy 0, policy_version 857952 (0.0016) [2024-06-15 21:43:45,767][1648981] Fps is (10 sec: 52427.4, 60 sec: 45875.1, 300 sec: 47985.6). Total num frames: 1757151232. Throughput: 0: 12056.2. Samples: 439364096. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:43:45,767][1648981] Avg episode reward: [(0, '1057.940')] [2024-06-15 21:43:46,835][1651669] Updated weights for policy 0, policy_version 858000 (0.0039) [2024-06-15 21:43:49,336][1651669] Updated weights for policy 0, policy_version 858083 (0.0012) [2024-06-15 21:43:50,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.7, 300 sec: 48656.8). Total num frames: 1757413376. Throughput: 0: 11798.7. Samples: 439430656. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:43:50,767][1648981] Avg episode reward: [(0, '1050.470')] [2024-06-15 21:43:51,283][1651669] Updated weights for policy 0, policy_version 858144 (0.0048) [2024-06-15 21:43:53,248][1651669] Updated weights for policy 0, policy_version 858208 (0.0013) [2024-06-15 21:43:55,766][1648981] Fps is (10 sec: 52429.9, 60 sec: 48067.5, 300 sec: 48096.7). Total num frames: 1757675520. Throughput: 0: 11810.2. Samples: 439462400. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:43:55,767][1648981] Avg episode reward: [(0, '1019.010')] [2024-06-15 21:43:57,648][1651669] Updated weights for policy 0, policy_version 858260 (0.0012) [2024-06-15 21:43:58,720][1651669] Updated weights for policy 0, policy_version 858306 (0.0011) [2024-06-15 21:44:00,079][1651669] Updated weights for policy 0, policy_version 858368 (0.0012) [2024-06-15 21:44:00,798][1648981] Fps is (10 sec: 52261.9, 60 sec: 48055.5, 300 sec: 48869.0). Total num frames: 1757937664. Throughput: 0: 11906.1. Samples: 439543808. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:00,799][1648981] Avg episode reward: [(0, '1034.510')] [2024-06-15 21:44:02,385][1651669] Updated weights for policy 0, policy_version 858430 (0.0011) [2024-06-15 21:44:04,891][1651669] Updated weights for policy 0, policy_version 858492 (0.0011) [2024-06-15 21:44:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48059.7, 300 sec: 48320.2). Total num frames: 1758199808. Throughput: 0: 11992.2. Samples: 439611904. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:05,767][1648981] Avg episode reward: [(0, '1055.770')] [2024-06-15 21:44:09,329][1651669] Updated weights for policy 0, policy_version 858562 (0.0013) [2024-06-15 21:44:09,757][1651274] Signal inference workers to stop experience collection... (45050 times) [2024-06-15 21:44:09,798][1651669] InferenceWorker_p0-w0: stopping experience collection (45050 times) [2024-06-15 21:44:10,042][1651274] Signal inference workers to resume experience collection... (45050 times) [2024-06-15 21:44:10,043][1651669] InferenceWorker_p0-w0: resuming experience collection (45050 times) [2024-06-15 21:44:10,686][1651669] Updated weights for policy 0, policy_version 858617 (0.0016) [2024-06-15 21:44:10,766][1648981] Fps is (10 sec: 49309.6, 60 sec: 47513.7, 300 sec: 48763.5). Total num frames: 1758429184. Throughput: 0: 12003.6. Samples: 439659008. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:10,767][1648981] Avg episode reward: [(0, '1078.050')] [2024-06-15 21:44:12,818][1651669] Updated weights for policy 0, policy_version 858658 (0.0013) [2024-06-15 21:44:14,774][1651669] Updated weights for policy 0, policy_version 858705 (0.0014) [2024-06-15 21:44:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49704.6, 300 sec: 48763.2). Total num frames: 1758724096. Throughput: 0: 12128.9. Samples: 439729664. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:15,767][1648981] Avg episode reward: [(0, '1109.620')] [2024-06-15 21:44:19,385][1651669] Updated weights for policy 0, policy_version 858784 (0.0072) [2024-06-15 21:44:20,701][1651669] Updated weights for policy 0, policy_version 858848 (0.0012) [2024-06-15 21:44:20,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 46967.5, 300 sec: 48652.2). Total num frames: 1758920704. Throughput: 0: 11889.8. Samples: 439799808. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:20,767][1648981] Avg episode reward: [(0, '1120.830')] [2024-06-15 21:44:21,457][1651669] Updated weights for policy 0, policy_version 858880 (0.0014) [2024-06-15 21:44:23,738][1651669] Updated weights for policy 0, policy_version 858928 (0.0014) [2024-06-15 21:44:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 49152.2, 300 sec: 48876.9). Total num frames: 1759150080. Throughput: 0: 12197.0. Samples: 439840256. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:25,767][1648981] Avg episode reward: [(0, '1129.400')] [2024-06-15 21:44:25,898][1651669] Updated weights for policy 0, policy_version 858976 (0.0016) [2024-06-15 21:44:26,750][1651669] Updated weights for policy 0, policy_version 859008 (0.0011) [2024-06-15 21:44:30,208][1651669] Updated weights for policy 0, policy_version 859061 (0.0012) [2024-06-15 21:44:30,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 47514.1, 300 sec: 48430.0). Total num frames: 1759379456. Throughput: 0: 12242.5. Samples: 439915008. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:30,767][1648981] Avg episode reward: [(0, '1148.680')] [2024-06-15 21:44:31,961][1651669] Updated weights for policy 0, policy_version 859131 (0.0012) [2024-06-15 21:44:34,376][1651669] Updated weights for policy 0, policy_version 859184 (0.0027) [2024-06-15 21:44:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1759641600. Throughput: 0: 12310.8. Samples: 439984640. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:35,767][1648981] Avg episode reward: [(0, '1166.650')] [2024-06-15 21:44:36,872][1651669] Updated weights for policy 0, policy_version 859216 (0.0012) [2024-06-15 21:44:38,171][1651669] Updated weights for policy 0, policy_version 859261 (0.0014) [2024-06-15 21:44:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1759838208. Throughput: 0: 12310.8. Samples: 440016384. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:40,767][1648981] Avg episode reward: [(0, '1182.420')] [2024-06-15 21:44:41,249][1651669] Updated weights for policy 0, policy_version 859320 (0.0128) [2024-06-15 21:44:42,733][1651669] Updated weights for policy 0, policy_version 859376 (0.0021) [2024-06-15 21:44:44,779][1651669] Updated weights for policy 0, policy_version 859412 (0.0012) [2024-06-15 21:44:45,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49698.4, 300 sec: 48763.2). Total num frames: 1760133120. Throughput: 0: 12126.0. Samples: 440089088. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:45,767][1648981] Avg episode reward: [(0, '1100.050')] [2024-06-15 21:44:47,654][1651669] Updated weights for policy 0, policy_version 859458 (0.0013) [2024-06-15 21:44:50,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1760296960. Throughput: 0: 12288.0. Samples: 440164864. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:50,767][1648981] Avg episode reward: [(0, '1061.760')] [2024-06-15 21:44:50,932][1651669] Updated weights for policy 0, policy_version 859522 (0.0013) [2024-06-15 21:44:52,111][1651274] Signal inference workers to stop experience collection... (45100 times) [2024-06-15 21:44:52,165][1651669] InferenceWorker_p0-w0: stopping experience collection (45100 times) [2024-06-15 21:44:52,175][1651669] Updated weights for policy 0, policy_version 859583 (0.0016) [2024-06-15 21:44:52,185][1651274] Signal inference workers to resume experience collection... (45100 times) [2024-06-15 21:44:52,188][1651669] InferenceWorker_p0-w0: resuming experience collection (45100 times) [2024-06-15 21:44:53,595][1651669] Updated weights for policy 0, policy_version 859639 (0.0023) [2024-06-15 21:44:55,759][1651669] Updated weights for policy 0, policy_version 859682 (0.0012) [2024-06-15 21:44:55,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 1760624640. Throughput: 0: 11878.4. Samples: 440193536. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:44:55,767][1648981] Avg episode reward: [(0, '1090.650')] [2024-06-15 21:44:56,277][1651669] Updated weights for policy 0, policy_version 859708 (0.0014) [2024-06-15 21:44:56,347][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000859712_1760690176.pth... [2024-06-15 21:44:56,395][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000854048_1749090304.pth [2024-06-15 21:44:59,550][1651669] Updated weights for policy 0, policy_version 859747 (0.0010) [2024-06-15 21:45:00,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 48085.2, 300 sec: 47989.5). Total num frames: 1760821248. Throughput: 0: 12117.3. Samples: 440274944. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:45:00,767][1648981] Avg episode reward: [(0, '1030.280')] [2024-06-15 21:45:01,772][1651669] Updated weights for policy 0, policy_version 859808 (0.0011) [2024-06-15 21:45:02,974][1651669] Updated weights for policy 0, policy_version 859856 (0.0108) [2024-06-15 21:45:05,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1761083392. Throughput: 0: 12094.6. Samples: 440344064. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:45:05,767][1648981] Avg episode reward: [(0, '1044.150')] [2024-06-15 21:45:06,178][1651669] Updated weights for policy 0, policy_version 859920 (0.0014) [2024-06-15 21:45:07,307][1651669] Updated weights for policy 0, policy_version 859968 (0.0015) [2024-06-15 21:45:10,741][1651669] Updated weights for policy 0, policy_version 860022 (0.0013) [2024-06-15 21:45:10,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 48059.7, 300 sec: 47874.7). Total num frames: 1761312768. Throughput: 0: 12037.7. Samples: 440381952. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:45:10,767][1648981] Avg episode reward: [(0, '997.180')] [2024-06-15 21:45:13,173][1651669] Updated weights for policy 0, policy_version 860080 (0.0149) [2024-06-15 21:45:14,885][1651669] Updated weights for policy 0, policy_version 860160 (0.0011) [2024-06-15 21:45:15,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 48059.6, 300 sec: 48763.2). Total num frames: 1761607680. Throughput: 0: 11912.5. Samples: 440451072. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:45:15,767][1648981] Avg episode reward: [(0, '966.010')] [2024-06-15 21:45:17,840][1651669] Updated weights for policy 0, policy_version 860224 (0.0025) [2024-06-15 21:45:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1761738752. Throughput: 0: 12003.6. Samples: 440524800. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:45:20,767][1648981] Avg episode reward: [(0, '990.940')] [2024-06-15 21:45:22,382][1651669] Updated weights for policy 0, policy_version 860286 (0.0016) [2024-06-15 21:45:24,348][1651669] Updated weights for policy 0, policy_version 860352 (0.0012) [2024-06-15 21:45:25,527][1651669] Updated weights for policy 0, policy_version 860413 (0.0012) [2024-06-15 21:45:25,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 49697.9, 300 sec: 48874.8). Total num frames: 1762131968. Throughput: 0: 12162.8. Samples: 440563712. Policy #0 lag: (min: 3.0, avg: 77.8, max: 259.0) [2024-06-15 21:45:25,767][1648981] Avg episode reward: [(0, '989.530')] [2024-06-15 21:45:28,806][1651669] Updated weights for policy 0, policy_version 860448 (0.0043) [2024-06-15 21:45:30,770][1648981] Fps is (10 sec: 52408.8, 60 sec: 48056.7, 300 sec: 47985.7). Total num frames: 1762263040. Throughput: 0: 12082.2. Samples: 440632832. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:45:30,771][1648981] Avg episode reward: [(0, '991.820')] [2024-06-15 21:45:31,581][1651669] Updated weights for policy 0, policy_version 860485 (0.0012) [2024-06-15 21:45:32,955][1651669] Updated weights for policy 0, policy_version 860538 (0.0011) [2024-06-15 21:45:34,168][1651274] Signal inference workers to stop experience collection... (45150 times) [2024-06-15 21:45:34,209][1651669] InferenceWorker_p0-w0: stopping experience collection (45150 times) [2024-06-15 21:45:34,435][1651274] Signal inference workers to resume experience collection... (45150 times) [2024-06-15 21:45:34,436][1651669] InferenceWorker_p0-w0: resuming experience collection (45150 times) [2024-06-15 21:45:35,033][1651669] Updated weights for policy 0, policy_version 860596 (0.0022) [2024-06-15 21:45:35,766][1648981] Fps is (10 sec: 42600.0, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1762557952. Throughput: 0: 12026.3. Samples: 440706048. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:45:35,767][1648981] Avg episode reward: [(0, '991.930')] [2024-06-15 21:45:36,551][1651669] Updated weights for policy 0, policy_version 860667 (0.0224) [2024-06-15 21:45:40,353][1651669] Updated weights for policy 0, policy_version 860736 (0.0012) [2024-06-15 21:45:40,766][1648981] Fps is (10 sec: 52448.9, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1762787328. Throughput: 0: 12208.4. Samples: 440742912. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:45:40,767][1648981] Avg episode reward: [(0, '946.880')] [2024-06-15 21:45:43,144][1651669] Updated weights for policy 0, policy_version 860794 (0.0015) [2024-06-15 21:45:45,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1762951168. Throughput: 0: 11992.2. Samples: 440814592. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:45:45,767][1648981] Avg episode reward: [(0, '942.530')] [2024-06-15 21:45:46,577][1651669] Updated weights for policy 0, policy_version 860849 (0.0013) [2024-06-15 21:45:48,098][1651669] Updated weights for policy 0, policy_version 860922 (0.0016) [2024-06-15 21:45:50,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1763213312. Throughput: 0: 11992.2. Samples: 440883712. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:45:50,767][1648981] Avg episode reward: [(0, '925.020')] [2024-06-15 21:45:51,257][1651669] Updated weights for policy 0, policy_version 860960 (0.0068) [2024-06-15 21:45:52,034][1651669] Updated weights for policy 0, policy_version 860992 (0.0011) [2024-06-15 21:45:54,582][1651669] Updated weights for policy 0, policy_version 861052 (0.0012) [2024-06-15 21:45:55,782][1648981] Fps is (10 sec: 49074.8, 60 sec: 46955.2, 300 sec: 47983.1). Total num frames: 1763442688. Throughput: 0: 11885.6. Samples: 440916992. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:45:55,783][1648981] Avg episode reward: [(0, '941.730')] [2024-06-15 21:45:57,494][1651669] Updated weights for policy 0, policy_version 861110 (0.0013) [2024-06-15 21:45:59,215][1651669] Updated weights for policy 0, policy_version 861168 (0.0014) [2024-06-15 21:46:00,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 1763704832. Throughput: 0: 11878.4. Samples: 440985600. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:00,767][1648981] Avg episode reward: [(0, '932.890')] [2024-06-15 21:46:02,353][1651669] Updated weights for policy 0, policy_version 861216 (0.0011) [2024-06-15 21:46:03,143][1651669] Updated weights for policy 0, policy_version 861248 (0.0011) [2024-06-15 21:46:05,051][1651669] Updated weights for policy 0, policy_version 861306 (0.0012) [2024-06-15 21:46:05,766][1648981] Fps is (10 sec: 52510.4, 60 sec: 48059.6, 300 sec: 47987.0). Total num frames: 1763966976. Throughput: 0: 11821.5. Samples: 441056768. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:05,767][1648981] Avg episode reward: [(0, '897.020')] [2024-06-15 21:46:08,383][1651669] Updated weights for policy 0, policy_version 861347 (0.0106) [2024-06-15 21:46:09,813][1651669] Updated weights for policy 0, policy_version 861395 (0.0016) [2024-06-15 21:46:10,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.7, 300 sec: 48318.9). Total num frames: 1764196352. Throughput: 0: 11798.8. Samples: 441094656. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:10,767][1648981] Avg episode reward: [(0, '886.920')] [2024-06-15 21:46:13,224][1651669] Updated weights for policy 0, policy_version 861443 (0.0013) [2024-06-15 21:46:14,648][1651669] Updated weights for policy 0, policy_version 861494 (0.0011) [2024-06-15 21:46:15,766][1648981] Fps is (10 sec: 42599.2, 60 sec: 46421.5, 300 sec: 47652.6). Total num frames: 1764392960. Throughput: 0: 11754.3. Samples: 441161728. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:15,767][1648981] Avg episode reward: [(0, '891.930')] [2024-06-15 21:46:16,212][1651669] Updated weights for policy 0, policy_version 861538 (0.0014) [2024-06-15 21:46:18,778][1651274] Signal inference workers to stop experience collection... (45200 times) [2024-06-15 21:46:18,835][1651669] InferenceWorker_p0-w0: stopping experience collection (45200 times) [2024-06-15 21:46:19,050][1651274] Signal inference workers to resume experience collection... (45200 times) [2024-06-15 21:46:19,051][1651669] InferenceWorker_p0-w0: resuming experience collection (45200 times) [2024-06-15 21:46:19,170][1651669] Updated weights for policy 0, policy_version 861586 (0.0011) [2024-06-15 21:46:20,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48605.8, 300 sec: 48096.7). Total num frames: 1764655104. Throughput: 0: 11696.3. Samples: 441232384. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:20,767][1648981] Avg episode reward: [(0, '882.150')] [2024-06-15 21:46:21,533][1651669] Updated weights for policy 0, policy_version 861680 (0.0124) [2024-06-15 21:46:25,169][1651669] Updated weights for policy 0, policy_version 861733 (0.0140) [2024-06-15 21:46:25,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 45329.3, 300 sec: 47652.4). Total num frames: 1764851712. Throughput: 0: 11719.1. Samples: 441270272. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:25,767][1648981] Avg episode reward: [(0, '890.350')] [2024-06-15 21:46:25,868][1651669] Updated weights for policy 0, policy_version 861760 (0.0029) [2024-06-15 21:46:27,475][1651669] Updated weights for policy 0, policy_version 861818 (0.0105) [2024-06-15 21:46:30,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 46970.5, 300 sec: 48210.4). Total num frames: 1765081088. Throughput: 0: 11776.0. Samples: 441344512. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:30,767][1648981] Avg episode reward: [(0, '873.680')] [2024-06-15 21:46:30,919][1651669] Updated weights for policy 0, policy_version 861872 (0.0024) [2024-06-15 21:46:32,422][1651669] Updated weights for policy 0, policy_version 861942 (0.0012) [2024-06-15 21:46:35,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 45875.1, 300 sec: 47875.6). Total num frames: 1765310464. Throughput: 0: 11844.2. Samples: 441416704. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:35,767][1648981] Avg episode reward: [(0, '857.370')] [2024-06-15 21:46:35,879][1651669] Updated weights for policy 0, policy_version 861970 (0.0037) [2024-06-15 21:46:36,916][1651669] Updated weights for policy 0, policy_version 862015 (0.0012) [2024-06-15 21:46:38,161][1651669] Updated weights for policy 0, policy_version 862070 (0.0018) [2024-06-15 21:46:40,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 46421.3, 300 sec: 48096.8). Total num frames: 1765572608. Throughput: 0: 11859.8. Samples: 441450496. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:40,767][1648981] Avg episode reward: [(0, '882.440')] [2024-06-15 21:46:40,845][1651669] Updated weights for policy 0, policy_version 862112 (0.0012) [2024-06-15 21:46:41,843][1651669] Updated weights for policy 0, policy_version 862162 (0.0169) [2024-06-15 21:46:45,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 1765801984. Throughput: 0: 11980.8. Samples: 441524736. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:45,767][1648981] Avg episode reward: [(0, '884.750')] [2024-06-15 21:46:46,977][1651669] Updated weights for policy 0, policy_version 862224 (0.0012) [2024-06-15 21:46:48,265][1651669] Updated weights for policy 0, policy_version 862268 (0.0027) [2024-06-15 21:46:49,975][1651669] Updated weights for policy 0, policy_version 862320 (0.0015) [2024-06-15 21:46:50,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 1766064128. Throughput: 0: 11901.2. Samples: 441592320. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:50,767][1648981] Avg episode reward: [(0, '847.120')] [2024-06-15 21:46:51,036][1651669] Updated weights for policy 0, policy_version 862352 (0.0012) [2024-06-15 21:46:53,012][1651669] Updated weights for policy 0, policy_version 862407 (0.0012) [2024-06-15 21:46:53,834][1651669] Updated weights for policy 0, policy_version 862453 (0.0010) [2024-06-15 21:46:55,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 48072.2, 300 sec: 48430.0). Total num frames: 1766326272. Throughput: 0: 11867.0. Samples: 441628672. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:46:55,767][1648981] Avg episode reward: [(0, '846.380')] [2024-06-15 21:46:55,774][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000862464_1766326272.pth... [2024-06-15 21:46:55,815][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000856864_1754857472.pth [2024-06-15 21:46:58,600][1651669] Updated weights for policy 0, policy_version 862517 (0.0012) [2024-06-15 21:46:59,686][1651274] Signal inference workers to stop experience collection... (45250 times) [2024-06-15 21:46:59,734][1651669] InferenceWorker_p0-w0: stopping experience collection (45250 times) [2024-06-15 21:46:59,936][1651274] Signal inference workers to resume experience collection... (45250 times) [2024-06-15 21:46:59,937][1651669] InferenceWorker_p0-w0: resuming experience collection (45250 times) [2024-06-15 21:47:00,125][1651669] Updated weights for policy 0, policy_version 862564 (0.0013) [2024-06-15 21:47:00,685][1651669] Updated weights for policy 0, policy_version 862592 (0.0012) [2024-06-15 21:47:00,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1766588416. Throughput: 0: 12094.6. Samples: 441705984. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:47:00,767][1648981] Avg episode reward: [(0, '871.220')] [2024-06-15 21:47:02,360][1651669] Updated weights for policy 0, policy_version 862651 (0.0010) [2024-06-15 21:47:05,006][1651669] Updated weights for policy 0, policy_version 862708 (0.0011) [2024-06-15 21:47:05,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.9, 300 sec: 48652.2). Total num frames: 1766850560. Throughput: 0: 12083.2. Samples: 441776128. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:47:05,767][1648981] Avg episode reward: [(0, '863.930')] [2024-06-15 21:47:08,223][1651669] Updated weights for policy 0, policy_version 862736 (0.0015) [2024-06-15 21:47:09,948][1651669] Updated weights for policy 0, policy_version 862785 (0.0012) [2024-06-15 21:47:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1767047168. Throughput: 0: 12242.5. Samples: 441821184. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:47:10,767][1648981] Avg episode reward: [(0, '849.600')] [2024-06-15 21:47:11,588][1651669] Updated weights for policy 0, policy_version 862849 (0.0012) [2024-06-15 21:47:12,624][1651669] Updated weights for policy 0, policy_version 862898 (0.0010) [2024-06-15 21:47:14,413][1651669] Updated weights for policy 0, policy_version 862932 (0.0014) [2024-06-15 21:47:15,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1767374848. Throughput: 0: 12106.0. Samples: 441889280. Policy #0 lag: (min: 15.0, avg: 135.2, max: 271.0) [2024-06-15 21:47:15,767][1648981] Avg episode reward: [(0, '870.650')] [2024-06-15 21:47:17,738][1651669] Updated weights for policy 0, policy_version 862992 (0.0012) [2024-06-15 21:47:20,155][1651669] Updated weights for policy 0, policy_version 863056 (0.0028) [2024-06-15 21:47:20,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1767571456. Throughput: 0: 12435.9. Samples: 441976320. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:20,767][1648981] Avg episode reward: [(0, '860.690')] [2024-06-15 21:47:21,301][1651669] Updated weights for policy 0, policy_version 863100 (0.0012) [2024-06-15 21:47:22,879][1651669] Updated weights for policy 0, policy_version 863162 (0.0021) [2024-06-15 21:47:24,917][1651669] Updated weights for policy 0, policy_version 863216 (0.0011) [2024-06-15 21:47:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50790.4, 300 sec: 48874.3). Total num frames: 1767899136. Throughput: 0: 12447.3. Samples: 442010624. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:25,767][1648981] Avg episode reward: [(0, '841.620')] [2024-06-15 21:47:28,473][1651669] Updated weights for policy 0, policy_version 863264 (0.0011) [2024-06-15 21:47:30,320][1651669] Updated weights for policy 0, policy_version 863300 (0.0024) [2024-06-15 21:47:30,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 1768062976. Throughput: 0: 12538.3. Samples: 442088960. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:30,767][1648981] Avg episode reward: [(0, '854.130')] [2024-06-15 21:47:31,420][1651669] Updated weights for policy 0, policy_version 863358 (0.0016) [2024-06-15 21:47:33,894][1651669] Updated weights for policy 0, policy_version 863440 (0.0012) [2024-06-15 21:47:34,918][1651669] Updated weights for policy 0, policy_version 863479 (0.0012) [2024-06-15 21:47:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 51882.7, 300 sec: 48874.3). Total num frames: 1768423424. Throughput: 0: 12640.7. Samples: 442161152. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:35,767][1648981] Avg episode reward: [(0, '852.630')] [2024-06-15 21:47:38,629][1651669] Updated weights for policy 0, policy_version 863505 (0.0036) [2024-06-15 21:47:40,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1768554496. Throughput: 0: 12800.0. Samples: 442204672. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:40,767][1648981] Avg episode reward: [(0, '844.150')] [2024-06-15 21:47:41,147][1651669] Updated weights for policy 0, policy_version 863554 (0.0015) [2024-06-15 21:47:41,460][1651274] Signal inference workers to stop experience collection... (45300 times) [2024-06-15 21:47:41,507][1651669] InferenceWorker_p0-w0: stopping experience collection (45300 times) [2024-06-15 21:47:41,769][1651274] Signal inference workers to resume experience collection... (45300 times) [2024-06-15 21:47:41,770][1651669] InferenceWorker_p0-w0: resuming experience collection (45300 times) [2024-06-15 21:47:42,226][1651669] Updated weights for policy 0, policy_version 863608 (0.0012) [2024-06-15 21:47:43,690][1651669] Updated weights for policy 0, policy_version 863677 (0.0014) [2024-06-15 21:47:45,168][1651669] Updated weights for policy 0, policy_version 863739 (0.0161) [2024-06-15 21:47:45,770][1648981] Fps is (10 sec: 52407.8, 60 sec: 52425.3, 300 sec: 48873.6). Total num frames: 1768947712. Throughput: 0: 12559.9. Samples: 442271232. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:45,771][1648981] Avg episode reward: [(0, '846.280')] [2024-06-15 21:47:50,085][1651669] Updated weights for policy 0, policy_version 863779 (0.0013) [2024-06-15 21:47:50,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 50244.0, 300 sec: 48431.6). Total num frames: 1769078784. Throughput: 0: 12652.0. Samples: 442345472. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:50,767][1648981] Avg episode reward: [(0, '867.600')] [2024-06-15 21:47:51,962][1651669] Updated weights for policy 0, policy_version 863811 (0.0011) [2024-06-15 21:47:53,113][1651669] Updated weights for policy 0, policy_version 863872 (0.0143) [2024-06-15 21:47:55,505][1651669] Updated weights for policy 0, policy_version 863959 (0.0012) [2024-06-15 21:47:55,766][1648981] Fps is (10 sec: 45893.6, 60 sec: 51336.6, 300 sec: 48656.6). Total num frames: 1769406464. Throughput: 0: 12583.8. Samples: 442387456. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:47:55,767][1648981] Avg episode reward: [(0, '852.030')] [2024-06-15 21:48:00,070][1651669] Updated weights for policy 0, policy_version 864001 (0.0012) [2024-06-15 21:48:00,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1769537536. Throughput: 0: 12674.8. Samples: 442459648. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:00,767][1648981] Avg episode reward: [(0, '845.290')] [2024-06-15 21:48:01,317][1651669] Updated weights for policy 0, policy_version 864055 (0.0047) [2024-06-15 21:48:02,876][1651669] Updated weights for policy 0, policy_version 864097 (0.0010) [2024-06-15 21:48:04,812][1651669] Updated weights for policy 0, policy_version 864160 (0.0011) [2024-06-15 21:48:05,770][1648981] Fps is (10 sec: 49133.1, 60 sec: 50787.1, 300 sec: 48540.5). Total num frames: 1769897984. Throughput: 0: 12309.7. Samples: 442530304. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:05,771][1648981] Avg episode reward: [(0, '832.670')] [2024-06-15 21:48:06,108][1651669] Updated weights for policy 0, policy_version 864224 (0.0010) [2024-06-15 21:48:10,356][1651669] Updated weights for policy 0, policy_version 864260 (0.0036) [2024-06-15 21:48:10,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 49698.0, 300 sec: 48431.3). Total num frames: 1770029056. Throughput: 0: 12413.1. Samples: 442569216. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:10,767][1648981] Avg episode reward: [(0, '872.510')] [2024-06-15 21:48:11,655][1651669] Updated weights for policy 0, policy_version 864318 (0.0017) [2024-06-15 21:48:13,455][1651669] Updated weights for policy 0, policy_version 864376 (0.0013) [2024-06-15 21:48:15,097][1651669] Updated weights for policy 0, policy_version 864418 (0.0095) [2024-06-15 21:48:15,766][1648981] Fps is (10 sec: 45893.0, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1770356736. Throughput: 0: 12561.1. Samples: 442654208. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:15,767][1648981] Avg episode reward: [(0, '892.210')] [2024-06-15 21:48:16,652][1651669] Updated weights for policy 0, policy_version 864482 (0.0012) [2024-06-15 21:48:20,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 49152.1, 300 sec: 48541.1). Total num frames: 1770520576. Throughput: 0: 12561.1. Samples: 442726400. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:20,767][1648981] Avg episode reward: [(0, '841.880')] [2024-06-15 21:48:21,761][1651669] Updated weights for policy 0, policy_version 864544 (0.0012) [2024-06-15 21:48:21,859][1651274] Signal inference workers to stop experience collection... (45350 times) [2024-06-15 21:48:21,895][1651669] InferenceWorker_p0-w0: stopping experience collection (45350 times) [2024-06-15 21:48:22,125][1651274] Signal inference workers to resume experience collection... (45350 times) [2024-06-15 21:48:22,126][1651669] InferenceWorker_p0-w0: resuming experience collection (45350 times) [2024-06-15 21:48:24,065][1651669] Updated weights for policy 0, policy_version 864630 (0.0011) [2024-06-15 21:48:25,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 48059.6, 300 sec: 48319.0). Total num frames: 1770782720. Throughput: 0: 12162.8. Samples: 442752000. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:25,767][1648981] Avg episode reward: [(0, '850.780')] [2024-06-15 21:48:26,069][1651669] Updated weights for policy 0, policy_version 864659 (0.0013) [2024-06-15 21:48:27,454][1651669] Updated weights for policy 0, policy_version 864720 (0.0015) [2024-06-15 21:48:28,563][1651669] Updated weights for policy 0, policy_version 864766 (0.0012) [2024-06-15 21:48:30,767][1648981] Fps is (10 sec: 52426.4, 60 sec: 49697.7, 300 sec: 48874.2). Total num frames: 1771044864. Throughput: 0: 12311.7. Samples: 442825216. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:30,768][1648981] Avg episode reward: [(0, '873.640')] [2024-06-15 21:48:33,772][1651669] Updated weights for policy 0, policy_version 864818 (0.0036) [2024-06-15 21:48:35,357][1651669] Updated weights for policy 0, policy_version 864887 (0.0010) [2024-06-15 21:48:35,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 48059.8, 300 sec: 48652.2). Total num frames: 1771307008. Throughput: 0: 12174.3. Samples: 442893312. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:35,767][1648981] Avg episode reward: [(0, '874.810')] [2024-06-15 21:48:37,257][1651669] Updated weights for policy 0, policy_version 864928 (0.0012) [2024-06-15 21:48:39,069][1651669] Updated weights for policy 0, policy_version 865010 (0.0102) [2024-06-15 21:48:40,766][1648981] Fps is (10 sec: 52431.0, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1771569152. Throughput: 0: 11980.8. Samples: 442926592. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:40,767][1648981] Avg episode reward: [(0, '868.850')] [2024-06-15 21:48:44,898][1651669] Updated weights for policy 0, policy_version 865045 (0.0025) [2024-06-15 21:48:45,766][1648981] Fps is (10 sec: 36044.6, 60 sec: 45332.1, 300 sec: 48318.9). Total num frames: 1771667456. Throughput: 0: 12105.9. Samples: 443004416. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:45,767][1648981] Avg episode reward: [(0, '893.530')] [2024-06-15 21:48:46,379][1651669] Updated weights for policy 0, policy_version 865109 (0.0011) [2024-06-15 21:48:48,651][1651669] Updated weights for policy 0, policy_version 865191 (0.0031) [2024-06-15 21:48:50,523][1651669] Updated weights for policy 0, policy_version 865270 (0.0014) [2024-06-15 21:48:50,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.5, 300 sec: 48874.3). Total num frames: 1772093440. Throughput: 0: 11856.7. Samples: 443063808. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:50,767][1648981] Avg episode reward: [(0, '871.300')] [2024-06-15 21:48:55,767][1648981] Fps is (10 sec: 42597.2, 60 sec: 44782.7, 300 sec: 47990.8). Total num frames: 1772093440. Throughput: 0: 11946.6. Samples: 443106816. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:48:55,768][1648981] Avg episode reward: [(0, '897.750')] [2024-06-15 21:48:55,792][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000865280_1772093440.pth... [2024-06-15 21:48:56,049][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000859712_1760690176.pth [2024-06-15 21:48:57,089][1651669] Updated weights for policy 0, policy_version 865328 (0.0011) [2024-06-15 21:48:59,137][1651669] Updated weights for policy 0, policy_version 865424 (0.0016) [2024-06-15 21:49:00,144][1651274] Signal inference workers to stop experience collection... (45400 times) [2024-06-15 21:49:00,256][1651669] InferenceWorker_p0-w0: stopping experience collection (45400 times) [2024-06-15 21:49:00,395][1651274] Signal inference workers to resume experience collection... (45400 times) [2024-06-15 21:49:00,395][1651669] InferenceWorker_p0-w0: resuming experience collection (45400 times) [2024-06-15 21:49:00,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1772486656. Throughput: 0: 11514.3. Samples: 443172352. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:49:00,767][1648981] Avg episode reward: [(0, '904.910')] [2024-06-15 21:49:00,953][1651669] Updated weights for policy 0, policy_version 865489 (0.0013) [2024-06-15 21:49:01,907][1651669] Updated weights for policy 0, policy_version 865533 (0.0023) [2024-06-15 21:49:05,766][1648981] Fps is (10 sec: 52430.3, 60 sec: 45332.0, 300 sec: 48096.8). Total num frames: 1772617728. Throughput: 0: 11502.9. Samples: 443244032. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:49:05,767][1648981] Avg episode reward: [(0, '957.430')] [2024-06-15 21:49:08,709][1651669] Updated weights for policy 0, policy_version 865594 (0.0122) [2024-06-15 21:49:09,834][1651669] Updated weights for policy 0, policy_version 865639 (0.0011) [2024-06-15 21:49:10,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 1772912640. Throughput: 0: 11776.0. Samples: 443281920. Policy #0 lag: (min: 15.0, avg: 114.5, max: 271.0) [2024-06-15 21:49:10,767][1648981] Avg episode reward: [(0, '895.830')] [2024-06-15 21:49:11,133][1651669] Updated weights for policy 0, policy_version 865697 (0.0092) [2024-06-15 21:49:12,842][1651669] Updated weights for policy 0, policy_version 865776 (0.0013) [2024-06-15 21:49:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 48207.8). Total num frames: 1773142016. Throughput: 0: 11616.8. Samples: 443347968. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:15,767][1648981] Avg episode reward: [(0, '873.640')] [2024-06-15 21:49:19,695][1651669] Updated weights for policy 0, policy_version 865856 (0.0013) [2024-06-15 21:49:20,738][1651669] Updated weights for policy 0, policy_version 865905 (0.0012) [2024-06-15 21:49:20,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 1773371392. Throughput: 0: 11650.8. Samples: 443417600. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:20,767][1648981] Avg episode reward: [(0, '904.800')] [2024-06-15 21:49:22,004][1651669] Updated weights for policy 0, policy_version 865957 (0.0019) [2024-06-15 21:49:24,004][1651669] Updated weights for policy 0, policy_version 866038 (0.0013) [2024-06-15 21:49:25,771][1648981] Fps is (10 sec: 52402.4, 60 sec: 48055.8, 300 sec: 48429.2). Total num frames: 1773666304. Throughput: 0: 11513.0. Samples: 443444736. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:25,772][1648981] Avg episode reward: [(0, '898.100')] [2024-06-15 21:49:29,380][1651669] Updated weights for policy 0, policy_version 866068 (0.0012) [2024-06-15 21:49:30,111][1651669] Updated weights for policy 0, policy_version 866109 (0.0015) [2024-06-15 21:49:30,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46421.7, 300 sec: 48096.8). Total num frames: 1773830144. Throughput: 0: 11832.9. Samples: 443536896. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:30,767][1648981] Avg episode reward: [(0, '833.630')] [2024-06-15 21:49:31,749][1651669] Updated weights for policy 0, policy_version 866178 (0.0013) [2024-06-15 21:49:32,881][1651669] Updated weights for policy 0, policy_version 866240 (0.0090) [2024-06-15 21:49:34,082][1651669] Updated weights for policy 0, policy_version 866295 (0.0016) [2024-06-15 21:49:35,766][1648981] Fps is (10 sec: 52455.3, 60 sec: 48059.7, 300 sec: 48652.2). Total num frames: 1774190592. Throughput: 0: 12162.8. Samples: 443611136. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:35,767][1648981] Avg episode reward: [(0, '855.970')] [2024-06-15 21:49:39,063][1651669] Updated weights for policy 0, policy_version 866336 (0.0015) [2024-06-15 21:49:39,719][1651274] Signal inference workers to stop experience collection... (45450 times) [2024-06-15 21:49:39,822][1651669] InferenceWorker_p0-w0: stopping experience collection (45450 times) [2024-06-15 21:49:40,048][1651274] Signal inference workers to resume experience collection... (45450 times) [2024-06-15 21:49:40,049][1651669] InferenceWorker_p0-w0: resuming experience collection (45450 times) [2024-06-15 21:49:40,754][1651669] Updated weights for policy 0, policy_version 866386 (0.0011) [2024-06-15 21:49:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46421.4, 300 sec: 48207.8). Total num frames: 1774354432. Throughput: 0: 12322.2. Samples: 443661312. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:40,767][1648981] Avg episode reward: [(0, '888.220')] [2024-06-15 21:49:42,195][1651669] Updated weights for policy 0, policy_version 866438 (0.0011) [2024-06-15 21:49:43,565][1651669] Updated weights for policy 0, policy_version 866512 (0.0016) [2024-06-15 21:49:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50790.4, 300 sec: 48874.3). Total num frames: 1774714880. Throughput: 0: 12197.0. Samples: 443721216. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:45,767][1648981] Avg episode reward: [(0, '879.480')] [2024-06-15 21:49:48,863][1651669] Updated weights for policy 0, policy_version 866576 (0.0012) [2024-06-15 21:49:50,081][1651669] Updated weights for policy 0, policy_version 866622 (0.0013) [2024-06-15 21:49:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 46421.3, 300 sec: 48318.9). Total num frames: 1774878720. Throughput: 0: 12549.7. Samples: 443808768. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:50,767][1648981] Avg episode reward: [(0, '921.660')] [2024-06-15 21:49:51,641][1651669] Updated weights for policy 0, policy_version 866672 (0.0014) [2024-06-15 21:49:52,955][1651669] Updated weights for policy 0, policy_version 866720 (0.0013) [2024-06-15 21:49:54,253][1651669] Updated weights for policy 0, policy_version 866784 (0.0013) [2024-06-15 21:49:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 52429.1, 300 sec: 48874.3). Total num frames: 1775239168. Throughput: 0: 12265.3. Samples: 443833856. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:49:55,767][1648981] Avg episode reward: [(0, '924.860')] [2024-06-15 21:49:59,792][1651669] Updated weights for policy 0, policy_version 866817 (0.0013) [2024-06-15 21:50:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 47513.5, 300 sec: 48318.9). Total num frames: 1775337472. Throughput: 0: 12697.6. Samples: 443919360. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:00,767][1648981] Avg episode reward: [(0, '937.520')] [2024-06-15 21:50:00,876][1651669] Updated weights for policy 0, policy_version 866866 (0.0011) [2024-06-15 21:50:02,764][1651669] Updated weights for policy 0, policy_version 866944 (0.0012) [2024-06-15 21:50:05,138][1651669] Updated weights for policy 0, policy_version 867040 (0.0012) [2024-06-15 21:50:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 52428.9, 300 sec: 48985.4). Total num frames: 1775763456. Throughput: 0: 12333.5. Samples: 443972608. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:05,767][1648981] Avg episode reward: [(0, '937.220')] [2024-06-15 21:50:10,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1775763456. Throughput: 0: 12653.5. Samples: 444014080. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:10,767][1648981] Avg episode reward: [(0, '956.960')] [2024-06-15 21:50:11,529][1651669] Updated weights for policy 0, policy_version 867073 (0.0061) [2024-06-15 21:50:12,838][1651669] Updated weights for policy 0, policy_version 867136 (0.0099) [2024-06-15 21:50:14,244][1651669] Updated weights for policy 0, policy_version 867200 (0.0010) [2024-06-15 21:50:15,497][1651274] Signal inference workers to stop experience collection... (45500 times) [2024-06-15 21:50:15,551][1651669] InferenceWorker_p0-w0: stopping experience collection (45500 times) [2024-06-15 21:50:15,715][1651274] Signal inference workers to resume experience collection... (45500 times) [2024-06-15 21:50:15,716][1651669] InferenceWorker_p0-w0: resuming experience collection (45500 times) [2024-06-15 21:50:15,774][1648981] Fps is (10 sec: 39290.4, 60 sec: 50237.7, 300 sec: 48873.0). Total num frames: 1776156672. Throughput: 0: 12240.4. Samples: 444087808. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:15,775][1648981] Avg episode reward: [(0, '953.670')] [2024-06-15 21:50:16,692][1651669] Updated weights for policy 0, policy_version 867312 (0.0015) [2024-06-15 21:50:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 47985.7). Total num frames: 1776287744. Throughput: 0: 12026.3. Samples: 444152320. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:20,767][1648981] Avg episode reward: [(0, '954.090')] [2024-06-15 21:50:23,731][1651669] Updated weights for policy 0, policy_version 867363 (0.0011) [2024-06-15 21:50:25,179][1651669] Updated weights for policy 0, policy_version 867440 (0.0011) [2024-06-15 21:50:25,766][1648981] Fps is (10 sec: 39352.5, 60 sec: 48063.8, 300 sec: 48430.6). Total num frames: 1776549888. Throughput: 0: 11810.1. Samples: 444192768. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:25,767][1648981] Avg episode reward: [(0, '945.470')] [2024-06-15 21:50:26,825][1651669] Updated weights for policy 0, policy_version 867511 (0.0013) [2024-06-15 21:50:28,348][1651669] Updated weights for policy 0, policy_version 867568 (0.0011) [2024-06-15 21:50:30,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1776812032. Throughput: 0: 11810.1. Samples: 444252672. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:30,767][1648981] Avg episode reward: [(0, '959.200')] [2024-06-15 21:50:35,241][1651669] Updated weights for policy 0, policy_version 867632 (0.0012) [2024-06-15 21:50:35,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 45875.1, 300 sec: 47985.7). Total num frames: 1776943104. Throughput: 0: 11685.0. Samples: 444334592. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:35,767][1648981] Avg episode reward: [(0, '972.040')] [2024-06-15 21:50:36,704][1651669] Updated weights for policy 0, policy_version 867696 (0.0013) [2024-06-15 21:50:38,582][1651669] Updated weights for policy 0, policy_version 867765 (0.0011) [2024-06-15 21:50:39,843][1651669] Updated weights for policy 0, policy_version 867812 (0.0020) [2024-06-15 21:50:40,774][1648981] Fps is (10 sec: 52391.5, 60 sec: 49692.2, 300 sec: 48762.0). Total num frames: 1777336320. Throughput: 0: 11740.0. Samples: 444362240. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:40,774][1648981] Avg episode reward: [(0, '970.360')] [2024-06-15 21:50:45,244][1651669] Updated weights for policy 0, policy_version 867861 (0.0012) [2024-06-15 21:50:45,772][1648981] Fps is (10 sec: 49122.9, 60 sec: 45324.5, 300 sec: 48206.9). Total num frames: 1777434624. Throughput: 0: 11637.9. Samples: 444443136. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:45,773][1648981] Avg episode reward: [(0, '956.180')] [2024-06-15 21:50:47,235][1651669] Updated weights for policy 0, policy_version 867959 (0.0011) [2024-06-15 21:50:48,664][1651669] Updated weights for policy 0, policy_version 868002 (0.0012) [2024-06-15 21:50:50,147][1651669] Updated weights for policy 0, policy_version 868052 (0.0052) [2024-06-15 21:50:50,766][1648981] Fps is (10 sec: 49188.1, 60 sec: 49152.1, 300 sec: 48765.8). Total num frames: 1777827840. Throughput: 0: 11696.4. Samples: 444498944. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:50,767][1648981] Avg episode reward: [(0, '931.360')] [2024-06-15 21:50:51,168][1651669] Updated weights for policy 0, policy_version 868096 (0.0011) [2024-06-15 21:50:55,767][1648981] Fps is (10 sec: 45901.6, 60 sec: 44236.6, 300 sec: 48096.7). Total num frames: 1777893376. Throughput: 0: 11923.8. Samples: 444550656. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:50:55,767][1648981] Avg episode reward: [(0, '951.490')] [2024-06-15 21:50:56,249][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000868144_1777958912.pth... [2024-06-15 21:50:56,383][1651274] Signal inference workers to stop experience collection... (45550 times) [2024-06-15 21:50:56,392][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000862464_1766326272.pth [2024-06-15 21:50:56,396][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000868144_1777958912.pth [2024-06-15 21:50:56,422][1651669] InferenceWorker_p0-w0: stopping experience collection (45550 times) [2024-06-15 21:50:56,684][1651274] Signal inference workers to resume experience collection... (45550 times) [2024-06-15 21:50:56,685][1651669] InferenceWorker_p0-w0: resuming experience collection (45550 times) [2024-06-15 21:50:57,134][1651669] Updated weights for policy 0, policy_version 868177 (0.0012) [2024-06-15 21:50:59,252][1651669] Updated weights for policy 0, policy_version 868272 (0.0021) [2024-06-15 21:51:00,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 49698.2, 300 sec: 48652.2). Total num frames: 1778319360. Throughput: 0: 11846.3. Samples: 444620800. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:51:00,767][1648981] Avg episode reward: [(0, '949.430')] [2024-06-15 21:51:01,043][1651669] Updated weights for policy 0, policy_version 868341 (0.0016) [2024-06-15 21:51:05,767][1648981] Fps is (10 sec: 49152.3, 60 sec: 43690.5, 300 sec: 48096.7). Total num frames: 1778384896. Throughput: 0: 12242.4. Samples: 444703232. Policy #0 lag: (min: 99.0, avg: 187.2, max: 323.0) [2024-06-15 21:51:05,768][1648981] Avg episode reward: [(0, '970.070')] [2024-06-15 21:51:07,119][1651669] Updated weights for policy 0, policy_version 868400 (0.0086) [2024-06-15 21:51:08,577][1651669] Updated weights for policy 0, policy_version 868465 (0.0012) [2024-06-15 21:51:10,146][1651669] Updated weights for policy 0, policy_version 868532 (0.0015) [2024-06-15 21:51:10,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50790.4, 300 sec: 48874.3). Total num frames: 1778810880. Throughput: 0: 12106.0. Samples: 444737536. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:10,767][1648981] Avg episode reward: [(0, '957.310')] [2024-06-15 21:51:11,541][1651669] Updated weights for policy 0, policy_version 868608 (0.0012) [2024-06-15 21:51:15,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 45881.2, 300 sec: 48318.9). Total num frames: 1778909184. Throughput: 0: 12367.7. Samples: 444809216. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:15,767][1648981] Avg episode reward: [(0, '926.420')] [2024-06-15 21:51:17,892][1651669] Updated weights for policy 0, policy_version 868672 (0.0010) [2024-06-15 21:51:19,270][1651669] Updated weights for policy 0, policy_version 868736 (0.0012) [2024-06-15 21:51:20,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1779302400. Throughput: 0: 12231.1. Samples: 444884992. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:20,767][1648981] Avg episode reward: [(0, '938.720')] [2024-06-15 21:51:21,584][1651669] Updated weights for policy 0, policy_version 868848 (0.0029) [2024-06-15 21:51:25,767][1648981] Fps is (10 sec: 52425.2, 60 sec: 48059.2, 300 sec: 48652.0). Total num frames: 1779433472. Throughput: 0: 12403.6. Samples: 444920320. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:25,768][1648981] Avg episode reward: [(0, '926.510')] [2024-06-15 21:51:28,466][1651669] Updated weights for policy 0, policy_version 868914 (0.0012) [2024-06-15 21:51:30,368][1651669] Updated weights for policy 0, policy_version 868997 (0.0011) [2024-06-15 21:51:30,626][1651274] Signal inference workers to stop experience collection... (45600 times) [2024-06-15 21:51:30,676][1651669] InferenceWorker_p0-w0: stopping experience collection (45600 times) [2024-06-15 21:51:30,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48606.0, 300 sec: 48874.3). Total num frames: 1779728384. Throughput: 0: 12460.4. Samples: 445003776. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:30,767][1648981] Avg episode reward: [(0, '950.790')] [2024-06-15 21:51:30,791][1651274] Signal inference workers to resume experience collection... (45600 times) [2024-06-15 21:51:30,792][1651669] InferenceWorker_p0-w0: resuming experience collection (45600 times) [2024-06-15 21:51:31,774][1651669] Updated weights for policy 0, policy_version 869076 (0.0014) [2024-06-15 21:51:35,767][1648981] Fps is (10 sec: 52431.0, 60 sec: 50244.1, 300 sec: 48763.2). Total num frames: 1779957760. Throughput: 0: 12868.2. Samples: 445078016. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:35,767][1648981] Avg episode reward: [(0, '968.650')] [2024-06-15 21:51:37,575][1651669] Updated weights for policy 0, policy_version 869123 (0.0012) [2024-06-15 21:51:39,511][1651669] Updated weights for policy 0, policy_version 869200 (0.0011) [2024-06-15 21:51:40,766][1648981] Fps is (10 sec: 49151.2, 60 sec: 48065.5, 300 sec: 48874.3). Total num frames: 1780219904. Throughput: 0: 12709.0. Samples: 445122560. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:40,767][1648981] Avg episode reward: [(0, '935.750')] [2024-06-15 21:51:41,112][1651669] Updated weights for policy 0, policy_version 869264 (0.0013) [2024-06-15 21:51:42,635][1651669] Updated weights for policy 0, policy_version 869328 (0.0011) [2024-06-15 21:51:43,545][1651669] Updated weights for policy 0, policy_version 869374 (0.0010) [2024-06-15 21:51:45,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 50795.5, 300 sec: 48874.3). Total num frames: 1780482048. Throughput: 0: 12470.0. Samples: 445181952. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:45,767][1648981] Avg episode reward: [(0, '915.470')] [2024-06-15 21:51:50,255][1651669] Updated weights for policy 0, policy_version 869440 (0.0020) [2024-06-15 21:51:50,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 46967.3, 300 sec: 48541.1). Total num frames: 1780645888. Throughput: 0: 12481.4. Samples: 445264896. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:50,767][1648981] Avg episode reward: [(0, '915.120')] [2024-06-15 21:51:51,508][1651669] Updated weights for policy 0, policy_version 869490 (0.0012) [2024-06-15 21:51:53,197][1651669] Updated weights for policy 0, policy_version 869568 (0.0165) [2024-06-15 21:51:55,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 51882.7, 300 sec: 48874.3). Total num frames: 1781006336. Throughput: 0: 12367.6. Samples: 445294080. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:51:55,767][1648981] Avg episode reward: [(0, '963.430')] [2024-06-15 21:51:59,760][1651669] Updated weights for policy 0, policy_version 869633 (0.0012) [2024-06-15 21:52:00,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 46421.3, 300 sec: 48318.9). Total num frames: 1781104640. Throughput: 0: 12538.3. Samples: 445373440. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:00,767][1648981] Avg episode reward: [(0, '950.480')] [2024-06-15 21:52:00,853][1651669] Updated weights for policy 0, policy_version 869681 (0.0011) [2024-06-15 21:52:02,512][1651669] Updated weights for policy 0, policy_version 869760 (0.0011) [2024-06-15 21:52:05,132][1651274] Signal inference workers to stop experience collection... (45650 times) [2024-06-15 21:52:05,189][1651669] InferenceWorker_p0-w0: stopping experience collection (45650 times) [2024-06-15 21:52:05,191][1651669] Updated weights for policy 0, policy_version 869858 (0.0013) [2024-06-15 21:52:05,412][1651274] Signal inference workers to resume experience collection... (45650 times) [2024-06-15 21:52:05,413][1651669] InferenceWorker_p0-w0: resuming experience collection (45650 times) [2024-06-15 21:52:05,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 51882.8, 300 sec: 48985.4). Total num frames: 1781497856. Throughput: 0: 12037.7. Samples: 445426688. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:05,767][1648981] Avg episode reward: [(0, '970.960')] [2024-06-15 21:52:10,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 45329.0, 300 sec: 47985.7). Total num frames: 1781530624. Throughput: 0: 12265.4. Samples: 445472256. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:10,767][1648981] Avg episode reward: [(0, '988.390')] [2024-06-15 21:52:11,420][1651669] Updated weights for policy 0, policy_version 869920 (0.0148) [2024-06-15 21:52:12,736][1651669] Updated weights for policy 0, policy_version 869986 (0.0140) [2024-06-15 21:52:14,245][1651669] Updated weights for policy 0, policy_version 870049 (0.0013) [2024-06-15 21:52:15,767][1648981] Fps is (10 sec: 49151.4, 60 sec: 51336.4, 300 sec: 48874.3). Total num frames: 1781989376. Throughput: 0: 11935.2. Samples: 445540864. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:15,767][1648981] Avg episode reward: [(0, '949.510')] [2024-06-15 21:52:15,876][1651669] Updated weights for policy 0, policy_version 870118 (0.0014) [2024-06-15 21:52:20,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1782054912. Throughput: 0: 12094.7. Samples: 445622272. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:20,767][1648981] Avg episode reward: [(0, '934.970')] [2024-06-15 21:52:21,907][1651669] Updated weights for policy 0, policy_version 870176 (0.0011) [2024-06-15 21:52:23,670][1651669] Updated weights for policy 0, policy_version 870258 (0.0122) [2024-06-15 21:52:25,228][1651669] Updated weights for policy 0, policy_version 870323 (0.0013) [2024-06-15 21:52:25,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 50244.8, 300 sec: 48763.2). Total num frames: 1782448128. Throughput: 0: 11844.3. Samples: 445655552. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:25,767][1648981] Avg episode reward: [(0, '919.450')] [2024-06-15 21:52:26,760][1651669] Updated weights for policy 0, policy_version 870387 (0.0012) [2024-06-15 21:52:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 1782579200. Throughput: 0: 12197.0. Samples: 445730816. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:30,767][1648981] Avg episode reward: [(0, '968.540')] [2024-06-15 21:52:31,660][1651669] Updated weights for policy 0, policy_version 870411 (0.0011) [2024-06-15 21:52:33,322][1651669] Updated weights for policy 0, policy_version 870482 (0.0111) [2024-06-15 21:52:34,585][1651669] Updated weights for policy 0, policy_version 870546 (0.0012) [2024-06-15 21:52:35,701][1651669] Updated weights for policy 0, policy_version 870608 (0.0014) [2024-06-15 21:52:35,766][1648981] Fps is (10 sec: 55705.7, 60 sec: 50790.6, 300 sec: 48985.4). Total num frames: 1783005184. Throughput: 0: 12037.7. Samples: 445806592. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:35,767][1648981] Avg episode reward: [(0, '938.120')] [2024-06-15 21:52:36,672][1651669] Updated weights for policy 0, policy_version 870652 (0.0023) [2024-06-15 21:52:40,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.8, 300 sec: 47986.3). Total num frames: 1783103488. Throughput: 0: 12288.1. Samples: 445847040. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:40,767][1648981] Avg episode reward: [(0, '945.090')] [2024-06-15 21:52:42,711][1651669] Updated weights for policy 0, policy_version 870711 (0.0102) [2024-06-15 21:52:43,007][1651274] Signal inference workers to stop experience collection... (45700 times) [2024-06-15 21:52:43,035][1651669] InferenceWorker_p0-w0: stopping experience collection (45700 times) [2024-06-15 21:52:43,204][1651274] Signal inference workers to resume experience collection... (45700 times) [2024-06-15 21:52:43,205][1651669] InferenceWorker_p0-w0: resuming experience collection (45700 times) [2024-06-15 21:52:44,281][1651669] Updated weights for policy 0, policy_version 870769 (0.0010) [2024-06-15 21:52:45,347][1651669] Updated weights for policy 0, policy_version 870832 (0.0011) [2024-06-15 21:52:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1783496704. Throughput: 0: 12071.8. Samples: 445916672. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:45,767][1648981] Avg episode reward: [(0, '941.220')] [2024-06-15 21:52:46,442][1651669] Updated weights for policy 0, policy_version 870880 (0.0085) [2024-06-15 21:52:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49698.2, 300 sec: 48207.8). Total num frames: 1783627776. Throughput: 0: 12640.7. Samples: 445995520. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:50,767][1648981] Avg episode reward: [(0, '981.670')] [2024-06-15 21:52:53,007][1651669] Updated weights for policy 0, policy_version 870944 (0.0015) [2024-06-15 21:52:54,126][1651669] Updated weights for policy 0, policy_version 870995 (0.0015) [2024-06-15 21:52:55,414][1651669] Updated weights for policy 0, policy_version 871058 (0.0012) [2024-06-15 21:52:55,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.1, 300 sec: 48874.3). Total num frames: 1783955456. Throughput: 0: 12390.4. Samples: 446029824. Policy #0 lag: (min: 1.0, avg: 58.6, max: 257.0) [2024-06-15 21:52:55,767][1648981] Avg episode reward: [(0, '936.160')] [2024-06-15 21:52:56,238][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000871104_1784020992.pth... [2024-06-15 21:52:56,287][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000865280_1772093440.pth [2024-06-15 21:52:57,213][1651669] Updated weights for policy 0, policy_version 871123 (0.0011) [2024-06-15 21:53:00,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50790.4, 300 sec: 48319.5). Total num frames: 1784152064. Throughput: 0: 12561.1. Samples: 446106112. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:00,767][1648981] Avg episode reward: [(0, '973.840')] [2024-06-15 21:53:02,785][1651669] Updated weights for policy 0, policy_version 871174 (0.0096) [2024-06-15 21:53:03,999][1651669] Updated weights for policy 0, policy_version 871228 (0.0011) [2024-06-15 21:53:05,405][1651669] Updated weights for policy 0, policy_version 871296 (0.0032) [2024-06-15 21:53:05,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1784446976. Throughput: 0: 12390.4. Samples: 446179840. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:05,767][1648981] Avg episode reward: [(0, '998.930')] [2024-06-15 21:53:07,740][1651669] Updated weights for policy 0, policy_version 871376 (0.0012) [2024-06-15 21:53:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 52428.8, 300 sec: 48541.1). Total num frames: 1784676352. Throughput: 0: 12470.0. Samples: 446216704. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:10,767][1648981] Avg episode reward: [(0, '984.700')] [2024-06-15 21:53:13,245][1651669] Updated weights for policy 0, policy_version 871456 (0.0014) [2024-06-15 21:53:14,904][1651669] Updated weights for policy 0, policy_version 871522 (0.0106) [2024-06-15 21:53:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49152.1, 300 sec: 48874.3). Total num frames: 1784938496. Throughput: 0: 12583.8. Samples: 446297088. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:15,767][1648981] Avg episode reward: [(0, '997.840')] [2024-06-15 21:53:16,235][1651669] Updated weights for policy 0, policy_version 871584 (0.0020) [2024-06-15 21:53:18,707][1651274] Signal inference workers to stop experience collection... (45750 times) [2024-06-15 21:53:18,741][1651669] InferenceWorker_p0-w0: stopping experience collection (45750 times) [2024-06-15 21:53:18,924][1651274] Signal inference workers to resume experience collection... (45750 times) [2024-06-15 21:53:18,925][1651669] InferenceWorker_p0-w0: resuming experience collection (45750 times) [2024-06-15 21:53:18,927][1651669] Updated weights for policy 0, policy_version 871648 (0.0011) [2024-06-15 21:53:19,632][1651669] Updated weights for policy 0, policy_version 871676 (0.0010) [2024-06-15 21:53:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1785200640. Throughput: 0: 12595.2. Samples: 446373376. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:20,767][1648981] Avg episode reward: [(0, '1041.480')] [2024-06-15 21:53:24,522][1651669] Updated weights for policy 0, policy_version 871731 (0.0017) [2024-06-15 21:53:25,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1785397248. Throughput: 0: 12663.5. Samples: 446416896. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:25,767][1648981] Avg episode reward: [(0, '1041.480')] [2024-06-15 21:53:26,066][1651669] Updated weights for policy 0, policy_version 871799 (0.0014) [2024-06-15 21:53:27,470][1651669] Updated weights for policy 0, policy_version 871860 (0.0013) [2024-06-15 21:53:29,839][1651669] Updated weights for policy 0, policy_version 871890 (0.0015) [2024-06-15 21:53:30,718][1651669] Updated weights for policy 0, policy_version 871936 (0.0012) [2024-06-15 21:53:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1785724928. Throughput: 0: 12470.1. Samples: 446477824. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:30,767][1648981] Avg episode reward: [(0, '1019.490')] [2024-06-15 21:53:35,719][1651669] Updated weights for policy 0, policy_version 872023 (0.0012) [2024-06-15 21:53:35,767][1648981] Fps is (10 sec: 49150.3, 60 sec: 48059.5, 300 sec: 48541.0). Total num frames: 1785888768. Throughput: 0: 12583.7. Samples: 446561792. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:35,767][1648981] Avg episode reward: [(0, '976.890')] [2024-06-15 21:53:37,259][1651669] Updated weights for policy 0, policy_version 872083 (0.0113) [2024-06-15 21:53:38,270][1651669] Updated weights for policy 0, policy_version 872124 (0.0030) [2024-06-15 21:53:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 51882.6, 300 sec: 49318.6). Total num frames: 1786216448. Throughput: 0: 12424.5. Samples: 446588928. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:40,767][1648981] Avg episode reward: [(0, '975.230')] [2024-06-15 21:53:40,890][1651669] Updated weights for policy 0, policy_version 872182 (0.0016) [2024-06-15 21:53:45,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1786347520. Throughput: 0: 12583.8. Samples: 446672384. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:45,767][1648981] Avg episode reward: [(0, '934.210')] [2024-06-15 21:53:45,836][1651669] Updated weights for policy 0, policy_version 872240 (0.0123) [2024-06-15 21:53:47,105][1651669] Updated weights for policy 0, policy_version 872304 (0.0013) [2024-06-15 21:53:48,695][1651669] Updated weights for policy 0, policy_version 872372 (0.0012) [2024-06-15 21:53:50,119][1651669] Updated weights for policy 0, policy_version 872419 (0.0013) [2024-06-15 21:53:50,766][1648981] Fps is (10 sec: 55705.3, 60 sec: 52428.7, 300 sec: 49763.0). Total num frames: 1786773504. Throughput: 0: 12219.7. Samples: 446729728. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:50,767][1648981] Avg episode reward: [(0, '928.800')] [2024-06-15 21:53:55,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 46967.3, 300 sec: 48429.9). Total num frames: 1786773504. Throughput: 0: 12333.4. Samples: 446771712. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:53:55,768][1648981] Avg episode reward: [(0, '919.920')] [2024-06-15 21:53:56,516][1651669] Updated weights for policy 0, policy_version 872480 (0.0013) [2024-06-15 21:53:57,851][1651274] Signal inference workers to stop experience collection... (45800 times) [2024-06-15 21:53:57,884][1651669] Updated weights for policy 0, policy_version 872547 (0.0012) [2024-06-15 21:53:57,898][1651669] InferenceWorker_p0-w0: stopping experience collection (45800 times) [2024-06-15 21:53:58,031][1651274] Signal inference workers to resume experience collection... (45800 times) [2024-06-15 21:53:58,032][1651669] InferenceWorker_p0-w0: resuming experience collection (45800 times) [2024-06-15 21:53:59,898][1651669] Updated weights for policy 0, policy_version 872628 (0.0012) [2024-06-15 21:54:00,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 50790.4, 300 sec: 49429.7). Total num frames: 1787199488. Throughput: 0: 12140.1. Samples: 446843392. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:00,767][1648981] Avg episode reward: [(0, '907.010')] [2024-06-15 21:54:01,877][1651669] Updated weights for policy 0, policy_version 872696 (0.0013) [2024-06-15 21:54:05,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 47513.6, 300 sec: 48763.2). Total num frames: 1787297792. Throughput: 0: 12003.6. Samples: 446913536. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:05,767][1648981] Avg episode reward: [(0, '898.910')] [2024-06-15 21:54:08,463][1651669] Updated weights for policy 0, policy_version 872755 (0.0013) [2024-06-15 21:54:10,017][1651669] Updated weights for policy 0, policy_version 872832 (0.0106) [2024-06-15 21:54:10,766][1648981] Fps is (10 sec: 39322.1, 60 sec: 48606.0, 300 sec: 48985.4). Total num frames: 1787592704. Throughput: 0: 11923.9. Samples: 446953472. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:10,767][1648981] Avg episode reward: [(0, '899.700')] [2024-06-15 21:54:12,415][1651669] Updated weights for policy 0, policy_version 872901 (0.0013) [2024-06-15 21:54:13,986][1651669] Updated weights for policy 0, policy_version 872957 (0.0012) [2024-06-15 21:54:15,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.7, 300 sec: 48985.4). Total num frames: 1787822080. Throughput: 0: 11867.0. Samples: 447011840. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:15,767][1648981] Avg episode reward: [(0, '863.570')] [2024-06-15 21:54:19,794][1651669] Updated weights for policy 0, policy_version 873024 (0.0108) [2024-06-15 21:54:20,770][1648981] Fps is (10 sec: 42582.8, 60 sec: 46964.7, 300 sec: 48652.4). Total num frames: 1788018688. Throughput: 0: 11763.8. Samples: 447091200. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:20,771][1648981] Avg episode reward: [(0, '919.010')] [2024-06-15 21:54:21,319][1651669] Updated weights for policy 0, policy_version 873088 (0.0013) [2024-06-15 21:54:22,873][1651669] Updated weights for policy 0, policy_version 873148 (0.0011) [2024-06-15 21:54:25,208][1651669] Updated weights for policy 0, policy_version 873200 (0.0013) [2024-06-15 21:54:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49151.9, 300 sec: 49207.5). Total num frames: 1788346368. Throughput: 0: 11776.0. Samples: 447118848. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:25,767][1648981] Avg episode reward: [(0, '919.010')] [2024-06-15 21:54:29,782][1651669] Updated weights for policy 0, policy_version 873221 (0.0013) [2024-06-15 21:54:30,766][1648981] Fps is (10 sec: 42613.6, 60 sec: 45329.1, 300 sec: 48318.9). Total num frames: 1788444672. Throughput: 0: 11730.5. Samples: 447200256. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:30,767][1648981] Avg episode reward: [(0, '943.580')] [2024-06-15 21:54:31,348][1651669] Updated weights for policy 0, policy_version 873296 (0.0094) [2024-06-15 21:54:32,944][1651669] Updated weights for policy 0, policy_version 873362 (0.0012) [2024-06-15 21:54:34,895][1651669] Updated weights for policy 0, policy_version 873424 (0.0011) [2024-06-15 21:54:35,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 49152.0, 300 sec: 49096.4). Total num frames: 1788837888. Throughput: 0: 11798.7. Samples: 447260672. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:35,767][1648981] Avg episode reward: [(0, '986.680')] [2024-06-15 21:54:35,959][1651669] Updated weights for policy 0, policy_version 873471 (0.0012) [2024-06-15 21:54:40,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 44236.7, 300 sec: 47985.7). Total num frames: 1788870656. Throughput: 0: 11821.6. Samples: 447303680. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:40,767][1648981] Avg episode reward: [(0, '1010.740')] [2024-06-15 21:54:40,807][1651274] Signal inference workers to stop experience collection... (45850 times) [2024-06-15 21:54:40,879][1651669] InferenceWorker_p0-w0: stopping experience collection (45850 times) [2024-06-15 21:54:41,121][1651274] Signal inference workers to resume experience collection... (45850 times) [2024-06-15 21:54:41,121][1651669] InferenceWorker_p0-w0: resuming experience collection (45850 times) [2024-06-15 21:54:42,574][1651669] Updated weights for policy 0, policy_version 873552 (0.0134) [2024-06-15 21:54:44,939][1651669] Updated weights for policy 0, policy_version 873657 (0.0134) [2024-06-15 21:54:45,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 48605.9, 300 sec: 48763.2). Total num frames: 1789263872. Throughput: 0: 11525.7. Samples: 447362048. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:45,767][1648981] Avg episode reward: [(0, '991.740')] [2024-06-15 21:54:46,932][1651669] Updated weights for policy 0, policy_version 873698 (0.0013) [2024-06-15 21:54:47,421][1651669] Updated weights for policy 0, policy_version 873728 (0.0011) [2024-06-15 21:54:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 43690.7, 300 sec: 47985.7). Total num frames: 1789394944. Throughput: 0: 11764.6. Samples: 447442944. Policy #0 lag: (min: 49.0, avg: 210.4, max: 305.0) [2024-06-15 21:54:50,767][1648981] Avg episode reward: [(0, '944.070')] [2024-06-15 21:54:53,992][1651669] Updated weights for policy 0, policy_version 873809 (0.0012) [2024-06-15 21:54:55,134][1651669] Updated weights for policy 0, policy_version 873857 (0.0012) [2024-06-15 21:54:55,767][1648981] Fps is (10 sec: 42597.4, 60 sec: 48605.9, 300 sec: 48652.1). Total num frames: 1789689856. Throughput: 0: 11730.4. Samples: 447481344. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:54:55,767][1648981] Avg episode reward: [(0, '938.550')] [2024-06-15 21:54:56,289][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000873904_1789755392.pth... [2024-06-15 21:54:56,327][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000868144_1777958912.pth [2024-06-15 21:54:56,564][1651669] Updated weights for policy 0, policy_version 873911 (0.0011) [2024-06-15 21:54:58,392][1651669] Updated weights for policy 0, policy_version 873977 (0.0011) [2024-06-15 21:55:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45329.1, 300 sec: 47985.7). Total num frames: 1789919232. Throughput: 0: 11844.3. Samples: 447544832. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:00,767][1648981] Avg episode reward: [(0, '875.800')] [2024-06-15 21:55:03,469][1651669] Updated weights for policy 0, policy_version 874033 (0.0012) [2024-06-15 21:55:04,469][1651669] Updated weights for policy 0, policy_version 874080 (0.0143) [2024-06-15 21:55:05,696][1651669] Updated weights for policy 0, policy_version 874144 (0.0012) [2024-06-15 21:55:05,770][1648981] Fps is (10 sec: 55706.6, 60 sec: 49152.0, 300 sec: 49096.5). Total num frames: 1790246912. Throughput: 0: 11811.1. Samples: 447622656. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:05,770][1648981] Avg episode reward: [(0, '878.100')] [2024-06-15 21:55:08,259][1651669] Updated weights for policy 0, policy_version 874208 (0.0013) [2024-06-15 21:55:10,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 47513.5, 300 sec: 48431.3). Total num frames: 1790443520. Throughput: 0: 11935.3. Samples: 447655936. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:10,767][1648981] Avg episode reward: [(0, '924.320')] [2024-06-15 21:55:12,407][1651669] Updated weights for policy 0, policy_version 874256 (0.0016) [2024-06-15 21:55:14,294][1651669] Updated weights for policy 0, policy_version 874336 (0.0151) [2024-06-15 21:55:15,153][1651274] Signal inference workers to stop experience collection... (45900 times) [2024-06-15 21:55:15,180][1651669] InferenceWorker_p0-w0: stopping experience collection (45900 times) [2024-06-15 21:55:15,382][1651274] Signal inference workers to resume experience collection... (45900 times) [2024-06-15 21:55:15,383][1651669] InferenceWorker_p0-w0: resuming experience collection (45900 times) [2024-06-15 21:55:15,767][1648981] Fps is (10 sec: 52427.2, 60 sec: 49151.8, 300 sec: 49096.4). Total num frames: 1790771200. Throughput: 0: 12037.6. Samples: 447741952. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:15,767][1648981] Avg episode reward: [(0, '967.770')] [2024-06-15 21:55:15,828][1651669] Updated weights for policy 0, policy_version 874402 (0.0014) [2024-06-15 21:55:18,515][1651669] Updated weights for policy 0, policy_version 874448 (0.0013) [2024-06-15 21:55:19,356][1651669] Updated weights for policy 0, policy_version 874490 (0.0012) [2024-06-15 21:55:20,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49154.9, 300 sec: 48874.3). Total num frames: 1790967808. Throughput: 0: 12242.6. Samples: 447811584. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:20,767][1648981] Avg episode reward: [(0, '949.040')] [2024-06-15 21:55:24,021][1651669] Updated weights for policy 0, policy_version 874560 (0.0129) [2024-06-15 21:55:25,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1791229952. Throughput: 0: 12367.7. Samples: 447860224. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:25,767][1648981] Avg episode reward: [(0, '965.370')] [2024-06-15 21:55:26,029][1651669] Updated weights for policy 0, policy_version 874640 (0.0013) [2024-06-15 21:55:27,144][1651669] Updated weights for policy 0, policy_version 874688 (0.0011) [2024-06-15 21:55:29,857][1651669] Updated weights for policy 0, policy_version 874743 (0.0012) [2024-06-15 21:55:30,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50790.4, 300 sec: 49318.6). Total num frames: 1791492096. Throughput: 0: 12356.3. Samples: 447918080. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:30,767][1648981] Avg episode reward: [(0, '1017.220')] [2024-06-15 21:55:35,180][1651669] Updated weights for policy 0, policy_version 874803 (0.0012) [2024-06-15 21:55:35,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.6, 300 sec: 48542.2). Total num frames: 1791655936. Throughput: 0: 12288.0. Samples: 447995904. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:35,767][1648981] Avg episode reward: [(0, '936.170')] [2024-06-15 21:55:36,913][1651669] Updated weights for policy 0, policy_version 874880 (0.0012) [2024-06-15 21:55:40,165][1651669] Updated weights for policy 0, policy_version 874963 (0.0013) [2024-06-15 21:55:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 51336.6, 300 sec: 49208.5). Total num frames: 1791950848. Throughput: 0: 11969.5. Samples: 448019968. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:40,767][1648981] Avg episode reward: [(0, '906.820')] [2024-06-15 21:55:45,766][1648981] Fps is (10 sec: 36044.9, 60 sec: 45875.2, 300 sec: 48096.7). Total num frames: 1792016384. Throughput: 0: 12401.8. Samples: 448102912. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:45,767][1648981] Avg episode reward: [(0, '817.800')] [2024-06-15 21:55:45,835][1651669] Updated weights for policy 0, policy_version 875024 (0.0011) [2024-06-15 21:55:47,629][1651669] Updated weights for policy 0, policy_version 875090 (0.0012) [2024-06-15 21:55:49,648][1651669] Updated weights for policy 0, policy_version 875184 (0.0022) [2024-06-15 21:55:50,471][1651669] Updated weights for policy 0, policy_version 875216 (0.0011) [2024-06-15 21:55:50,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 50790.5, 300 sec: 49318.7). Total num frames: 1792442368. Throughput: 0: 12015.0. Samples: 448163328. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:50,767][1648981] Avg episode reward: [(0, '858.170')] [2024-06-15 21:55:55,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.7, 300 sec: 48207.8). Total num frames: 1792540672. Throughput: 0: 12140.0. Samples: 448202240. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:55:55,767][1648981] Avg episode reward: [(0, '855.640')] [2024-06-15 21:55:56,825][1651669] Updated weights for policy 0, policy_version 875283 (0.0011) [2024-06-15 21:55:57,163][1651274] Signal inference workers to stop experience collection... (45950 times) [2024-06-15 21:55:57,206][1651669] InferenceWorker_p0-w0: stopping experience collection (45950 times) [2024-06-15 21:55:57,433][1651274] Signal inference workers to resume experience collection... (45950 times) [2024-06-15 21:55:57,433][1651669] InferenceWorker_p0-w0: resuming experience collection (45950 times) [2024-06-15 21:55:58,785][1651669] Updated weights for policy 0, policy_version 875345 (0.0014) [2024-06-15 21:56:00,661][1651669] Updated weights for policy 0, policy_version 875424 (0.0012) [2024-06-15 21:56:00,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 49151.9, 300 sec: 49096.5). Total num frames: 1792868352. Throughput: 0: 11935.3. Samples: 448279040. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:00,767][1648981] Avg episode reward: [(0, '901.170')] [2024-06-15 21:56:02,381][1651669] Updated weights for policy 0, policy_version 875504 (0.0011) [2024-06-15 21:56:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.5, 300 sec: 48318.9). Total num frames: 1793064960. Throughput: 0: 11958.0. Samples: 448349696. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:05,767][1648981] Avg episode reward: [(0, '924.050')] [2024-06-15 21:56:08,886][1651669] Updated weights for policy 0, policy_version 875552 (0.0012) [2024-06-15 21:56:10,477][1651669] Updated weights for policy 0, policy_version 875616 (0.0015) [2024-06-15 21:56:10,766][1648981] Fps is (10 sec: 39322.2, 60 sec: 46967.4, 300 sec: 48652.2). Total num frames: 1793261568. Throughput: 0: 11787.4. Samples: 448390656. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:10,767][1648981] Avg episode reward: [(0, '947.220')] [2024-06-15 21:56:12,504][1651669] Updated weights for policy 0, policy_version 875696 (0.0012) [2024-06-15 21:56:14,371][1651669] Updated weights for policy 0, policy_version 875769 (0.0014) [2024-06-15 21:56:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46967.8, 300 sec: 48430.0). Total num frames: 1793589248. Throughput: 0: 11582.6. Samples: 448439296. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:15,767][1648981] Avg episode reward: [(0, '953.500')] [2024-06-15 21:56:20,766][1648981] Fps is (10 sec: 36044.6, 60 sec: 44236.8, 300 sec: 48096.9). Total num frames: 1793622016. Throughput: 0: 11730.5. Samples: 448523776. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:20,767][1648981] Avg episode reward: [(0, '944.700')] [2024-06-15 21:56:21,213][1651669] Updated weights for policy 0, policy_version 875808 (0.0011) [2024-06-15 21:56:22,973][1651669] Updated weights for policy 0, policy_version 875876 (0.0165) [2024-06-15 21:56:24,679][1651669] Updated weights for policy 0, policy_version 875953 (0.0013) [2024-06-15 21:56:25,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 46967.4, 300 sec: 48541.0). Total num frames: 1794048000. Throughput: 0: 11696.3. Samples: 448546304. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:25,767][1648981] Avg episode reward: [(0, '901.280')] [2024-06-15 21:56:25,905][1651669] Updated weights for policy 0, policy_version 876008 (0.0100) [2024-06-15 21:56:30,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 43690.5, 300 sec: 47985.7). Total num frames: 1794113536. Throughput: 0: 11582.5. Samples: 448624128. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:30,767][1648981] Avg episode reward: [(0, '934.750')] [2024-06-15 21:56:31,637][1651669] Updated weights for policy 0, policy_version 876034 (0.0024) [2024-06-15 21:56:33,206][1651669] Updated weights for policy 0, policy_version 876096 (0.0124) [2024-06-15 21:56:35,162][1651669] Updated weights for policy 0, policy_version 876160 (0.0015) [2024-06-15 21:56:35,345][1651274] Signal inference workers to stop experience collection... (46000 times) [2024-06-15 21:56:35,380][1651669] InferenceWorker_p0-w0: stopping experience collection (46000 times) [2024-06-15 21:56:35,646][1651274] Signal inference workers to resume experience collection... (46000 times) [2024-06-15 21:56:35,647][1651669] InferenceWorker_p0-w0: resuming experience collection (46000 times) [2024-06-15 21:56:35,766][1648981] Fps is (10 sec: 36044.9, 60 sec: 45875.2, 300 sec: 48096.8). Total num frames: 1794408448. Throughput: 0: 11559.8. Samples: 448683520. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:35,767][1648981] Avg episode reward: [(0, '931.620')] [2024-06-15 21:56:36,450][1651669] Updated weights for policy 0, policy_version 876213 (0.0012) [2024-06-15 21:56:37,568][1651669] Updated weights for policy 0, policy_version 876280 (0.0077) [2024-06-15 21:56:40,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 44782.9, 300 sec: 47985.7). Total num frames: 1794637824. Throughput: 0: 11594.0. Samples: 448723968. Policy #0 lag: (min: 15.0, avg: 70.3, max: 271.0) [2024-06-15 21:56:40,767][1648981] Avg episode reward: [(0, '938.890')] [2024-06-15 21:56:43,677][1651669] Updated weights for policy 0, policy_version 876322 (0.0012) [2024-06-15 21:56:45,650][1651669] Updated weights for policy 0, policy_version 876400 (0.0013) [2024-06-15 21:56:45,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 47513.7, 300 sec: 48207.9). Total num frames: 1794867200. Throughput: 0: 11537.1. Samples: 448798208. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:56:45,767][1648981] Avg episode reward: [(0, '981.160')] [2024-06-15 21:56:47,476][1651669] Updated weights for policy 0, policy_version 876467 (0.0012) [2024-06-15 21:56:48,862][1651669] Updated weights for policy 0, policy_version 876542 (0.0012) [2024-06-15 21:56:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45329.0, 300 sec: 47985.7). Total num frames: 1795162112. Throughput: 0: 11343.7. Samples: 448860160. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:56:50,767][1648981] Avg episode reward: [(0, '985.260')] [2024-06-15 21:56:55,603][1651669] Updated weights for policy 0, policy_version 876608 (0.0014) [2024-06-15 21:56:55,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 45875.2, 300 sec: 48096.7). Total num frames: 1795293184. Throughput: 0: 11491.5. Samples: 448907776. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:56:55,767][1648981] Avg episode reward: [(0, '958.110')] [2024-06-15 21:56:56,561][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000876640_1795358720.pth... [2024-06-15 21:56:56,701][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000871104_1784020992.pth [2024-06-15 21:56:57,620][1651669] Updated weights for policy 0, policy_version 876688 (0.0012) [2024-06-15 21:56:59,238][1651669] Updated weights for policy 0, policy_version 876752 (0.0014) [2024-06-15 21:57:00,205][1651669] Updated weights for policy 0, policy_version 876798 (0.0050) [2024-06-15 21:57:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 46967.6, 300 sec: 48096.8). Total num frames: 1795686400. Throughput: 0: 11491.5. Samples: 448956416. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:00,767][1648981] Avg episode reward: [(0, '970.780')] [2024-06-15 21:57:05,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 43690.7, 300 sec: 47985.7). Total num frames: 1795686400. Throughput: 0: 11491.6. Samples: 449040896. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:05,767][1648981] Avg episode reward: [(0, '964.610')] [2024-06-15 21:57:07,304][1651669] Updated weights for policy 0, policy_version 876866 (0.0012) [2024-06-15 21:57:08,887][1651669] Updated weights for policy 0, policy_version 876932 (0.0011) [2024-06-15 21:57:10,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 46967.4, 300 sec: 47763.5). Total num frames: 1796079616. Throughput: 0: 11616.7. Samples: 449069056. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:10,767][1648981] Avg episode reward: [(0, '978.700')] [2024-06-15 21:57:10,904][1651274] Signal inference workers to stop experience collection... (46050 times) [2024-06-15 21:57:11,005][1651669] InferenceWorker_p0-w0: stopping experience collection (46050 times) [2024-06-15 21:57:11,008][1651669] Updated weights for policy 0, policy_version 877013 (0.0015) [2024-06-15 21:57:11,229][1651274] Signal inference workers to resume experience collection... (46050 times) [2024-06-15 21:57:11,232][1651669] InferenceWorker_p0-w0: resuming experience collection (46050 times) [2024-06-15 21:57:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 43690.6, 300 sec: 47985.7). Total num frames: 1796210688. Throughput: 0: 11446.1. Samples: 449139200. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:15,767][1648981] Avg episode reward: [(0, '974.820')] [2024-06-15 21:57:17,065][1651669] Updated weights for policy 0, policy_version 877060 (0.0014) [2024-06-15 21:57:18,263][1651669] Updated weights for policy 0, policy_version 877122 (0.0013) [2024-06-15 21:57:19,879][1651669] Updated weights for policy 0, policy_version 877192 (0.0024) [2024-06-15 21:57:20,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 48605.9, 300 sec: 47763.5). Total num frames: 1796538368. Throughput: 0: 11571.2. Samples: 449204224. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:20,767][1648981] Avg episode reward: [(0, '982.360')] [2024-06-15 21:57:21,537][1651669] Updated weights for policy 0, policy_version 877264 (0.0013) [2024-06-15 21:57:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 44782.9, 300 sec: 47985.7). Total num frames: 1796734976. Throughput: 0: 11434.7. Samples: 449238528. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:25,767][1648981] Avg episode reward: [(0, '968.260')] [2024-06-15 21:57:27,858][1651669] Updated weights for policy 0, policy_version 877317 (0.0010) [2024-06-15 21:57:29,117][1651669] Updated weights for policy 0, policy_version 877377 (0.0012) [2024-06-15 21:57:30,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 48059.9, 300 sec: 47430.3). Total num frames: 1796997120. Throughput: 0: 11650.8. Samples: 449322496. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:30,767][1648981] Avg episode reward: [(0, '937.900')] [2024-06-15 21:57:31,048][1651669] Updated weights for policy 0, policy_version 877459 (0.0011) [2024-06-15 21:57:32,163][1651669] Updated weights for policy 0, policy_version 877504 (0.0011) [2024-06-15 21:57:33,581][1651669] Updated weights for policy 0, policy_version 877556 (0.0025) [2024-06-15 21:57:35,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1797259264. Throughput: 0: 11650.8. Samples: 449384448. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:35,767][1648981] Avg episode reward: [(0, '850.130')] [2024-06-15 21:57:39,948][1651669] Updated weights for policy 0, policy_version 877600 (0.0090) [2024-06-15 21:57:40,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 45875.2, 300 sec: 47097.1). Total num frames: 1797390336. Throughput: 0: 11548.5. Samples: 449427456. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:40,767][1648981] Avg episode reward: [(0, '878.900')] [2024-06-15 21:57:41,696][1651669] Updated weights for policy 0, policy_version 877681 (0.0012) [2024-06-15 21:57:43,011][1651669] Updated weights for policy 0, policy_version 877744 (0.0013) [2024-06-15 21:57:44,568][1651669] Updated weights for policy 0, policy_version 877808 (0.0016) [2024-06-15 21:57:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1797783552. Throughput: 0: 11810.1. Samples: 449487872. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:45,767][1648981] Avg episode reward: [(0, '914.400')] [2024-06-15 21:57:50,735][1651669] Updated weights for policy 0, policy_version 877856 (0.0013) [2024-06-15 21:57:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 47097.1). Total num frames: 1797849088. Throughput: 0: 11810.1. Samples: 449572352. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:50,767][1648981] Avg episode reward: [(0, '923.960')] [2024-06-15 21:57:51,213][1651274] Signal inference workers to stop experience collection... (46100 times) [2024-06-15 21:57:51,254][1651669] InferenceWorker_p0-w0: stopping experience collection (46100 times) [2024-06-15 21:57:51,381][1651274] Signal inference workers to resume experience collection... (46100 times) [2024-06-15 21:57:51,382][1651669] InferenceWorker_p0-w0: resuming experience collection (46100 times) [2024-06-15 21:57:51,851][1651669] Updated weights for policy 0, policy_version 877905 (0.0013) [2024-06-15 21:57:53,311][1651669] Updated weights for policy 0, policy_version 877968 (0.0011) [2024-06-15 21:57:54,932][1651669] Updated weights for policy 0, policy_version 878032 (0.0150) [2024-06-15 21:57:55,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49698.2, 300 sec: 47874.6). Total num frames: 1798275072. Throughput: 0: 11935.3. Samples: 449606144. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:57:55,767][1648981] Avg episode reward: [(0, '947.700')] [2024-06-15 21:57:55,866][1651669] Updated weights for policy 0, policy_version 878077 (0.0012) [2024-06-15 21:58:00,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 43690.7, 300 sec: 46986.0). Total num frames: 1798307840. Throughput: 0: 12071.8. Samples: 449682432. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:00,767][1648981] Avg episode reward: [(0, '974.440')] [2024-06-15 21:58:02,228][1651669] Updated weights for policy 0, policy_version 878144 (0.0013) [2024-06-15 21:58:03,583][1651669] Updated weights for policy 0, policy_version 878208 (0.0142) [2024-06-15 21:58:04,783][1651669] Updated weights for policy 0, policy_version 878260 (0.0013) [2024-06-15 21:58:05,523][1651669] Updated weights for policy 0, policy_version 878294 (0.0013) [2024-06-15 21:58:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 51336.6, 300 sec: 47763.5). Total num frames: 1798766592. Throughput: 0: 11992.2. Samples: 449743872. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:05,767][1648981] Avg episode reward: [(0, '1002.970')] [2024-06-15 21:58:10,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 45875.2, 300 sec: 47097.0). Total num frames: 1798832128. Throughput: 0: 12151.5. Samples: 449785344. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:10,767][1648981] Avg episode reward: [(0, '1005.750')] [2024-06-15 21:58:12,339][1651669] Updated weights for policy 0, policy_version 878368 (0.0146) [2024-06-15 21:58:14,191][1651669] Updated weights for policy 0, policy_version 878434 (0.0013) [2024-06-15 21:58:15,621][1651669] Updated weights for policy 0, policy_version 878497 (0.0012) [2024-06-15 21:58:15,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 49152.1, 300 sec: 47319.2). Total num frames: 1799159808. Throughput: 0: 11685.0. Samples: 449848320. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:15,767][1648981] Avg episode reward: [(0, '1006.970')] [2024-06-15 21:58:17,421][1651669] Updated weights for policy 0, policy_version 878582 (0.0014) [2024-06-15 21:58:20,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 46967.3, 300 sec: 47319.2). Total num frames: 1799356416. Throughput: 0: 11980.7. Samples: 449923584. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:20,767][1648981] Avg episode reward: [(0, '1039.430')] [2024-06-15 21:58:23,912][1651669] Updated weights for policy 0, policy_version 878625 (0.0017) [2024-06-15 21:58:25,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 46967.5, 300 sec: 46874.9). Total num frames: 1799553024. Throughput: 0: 11923.9. Samples: 449964032. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:25,767][1648981] Avg episode reward: [(0, '1037.370')] [2024-06-15 21:58:26,217][1651669] Updated weights for policy 0, policy_version 878709 (0.0012) [2024-06-15 21:58:26,501][1651274] Signal inference workers to stop experience collection... (46150 times) [2024-06-15 21:58:26,558][1651669] InferenceWorker_p0-w0: stopping experience collection (46150 times) [2024-06-15 21:58:26,718][1651274] Signal inference workers to resume experience collection... (46150 times) [2024-06-15 21:58:26,719][1651669] InferenceWorker_p0-w0: resuming experience collection (46150 times) [2024-06-15 21:58:27,611][1651669] Updated weights for policy 0, policy_version 878769 (0.0016) [2024-06-15 21:58:29,222][1651669] Updated weights for policy 0, policy_version 878848 (0.0029) [2024-06-15 21:58:30,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.5, 300 sec: 47430.3). Total num frames: 1799880704. Throughput: 0: 11696.3. Samples: 450014208. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:30,767][1648981] Avg episode reward: [(0, '1029.340')] [2024-06-15 21:58:35,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 46652.7). Total num frames: 1799979008. Throughput: 0: 11821.5. Samples: 450104320. Policy #0 lag: (min: 79.0, avg: 137.7, max: 335.0) [2024-06-15 21:58:35,767][1648981] Avg episode reward: [(0, '1044.520')] [2024-06-15 21:58:36,372][1651669] Updated weights for policy 0, policy_version 878928 (0.0014) [2024-06-15 21:58:38,413][1651669] Updated weights for policy 0, policy_version 879008 (0.0012) [2024-06-15 21:58:39,940][1651669] Updated weights for policy 0, policy_version 879074 (0.0012) [2024-06-15 21:58:40,767][1648981] Fps is (10 sec: 52429.7, 60 sec: 50244.2, 300 sec: 47652.4). Total num frames: 1800404992. Throughput: 0: 11537.0. Samples: 450125312. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:58:40,767][1648981] Avg episode reward: [(0, '1064.500')] [2024-06-15 21:58:45,160][1651669] Updated weights for policy 0, policy_version 879120 (0.0020) [2024-06-15 21:58:45,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 44782.9, 300 sec: 46430.6). Total num frames: 1800470528. Throughput: 0: 11923.9. Samples: 450219008. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:58:45,767][1648981] Avg episode reward: [(0, '1098.580')] [2024-06-15 21:58:47,085][1651669] Updated weights for policy 0, policy_version 879185 (0.0012) [2024-06-15 21:58:49,285][1651669] Updated weights for policy 0, policy_version 879266 (0.0071) [2024-06-15 21:58:50,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 50244.3, 300 sec: 47763.6). Total num frames: 1800863744. Throughput: 0: 11719.1. Samples: 450271232. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:58:50,767][1648981] Avg episode reward: [(0, '1101.050')] [2024-06-15 21:58:50,933][1651669] Updated weights for policy 0, policy_version 879344 (0.0011) [2024-06-15 21:58:55,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 44236.8, 300 sec: 46541.7). Total num frames: 1800929280. Throughput: 0: 11798.8. Samples: 450316288. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:58:55,767][1648981] Avg episode reward: [(0, '1101.050')] [2024-06-15 21:58:56,002][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000879376_1800962048.pth... [2024-06-15 21:58:56,126][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000873904_1789755392.pth [2024-06-15 21:58:56,534][1651669] Updated weights for policy 0, policy_version 879395 (0.0013) [2024-06-15 21:58:58,217][1651669] Updated weights for policy 0, policy_version 879459 (0.0012) [2024-06-15 21:58:59,591][1651669] Updated weights for policy 0, policy_version 879520 (0.0015) [2024-06-15 21:59:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50790.4, 300 sec: 47652.5). Total num frames: 1801355264. Throughput: 0: 11935.3. Samples: 450385408. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:00,767][1648981] Avg episode reward: [(0, '1091.800')] [2024-06-15 21:59:00,782][1651274] Signal inference workers to stop experience collection... (46200 times) [2024-06-15 21:59:00,943][1651669] InferenceWorker_p0-w0: stopping experience collection (46200 times) [2024-06-15 21:59:01,019][1651274] Signal inference workers to resume experience collection... (46200 times) [2024-06-15 21:59:01,078][1651669] InferenceWorker_p0-w0: resuming experience collection (46200 times) [2024-06-15 21:59:01,110][1651669] Updated weights for policy 0, policy_version 879584 (0.0012) [2024-06-15 21:59:05,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 44782.9, 300 sec: 46986.0). Total num frames: 1801453568. Throughput: 0: 11924.0. Samples: 450460160. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:05,767][1648981] Avg episode reward: [(0, '1141.780')] [2024-06-15 21:59:08,042][1651669] Updated weights for policy 0, policy_version 879652 (0.0014) [2024-06-15 21:59:09,571][1651669] Updated weights for policy 0, policy_version 879714 (0.0014) [2024-06-15 21:59:10,766][1648981] Fps is (10 sec: 39321.2, 60 sec: 48605.9, 300 sec: 47208.1). Total num frames: 1801748480. Throughput: 0: 11764.6. Samples: 450493440. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:10,767][1648981] Avg episode reward: [(0, '1173.530')] [2024-06-15 21:59:11,017][1651669] Updated weights for policy 0, policy_version 879780 (0.0020) [2024-06-15 21:59:12,977][1651669] Updated weights for policy 0, policy_version 879864 (0.0016) [2024-06-15 21:59:15,770][1648981] Fps is (10 sec: 52409.9, 60 sec: 46964.6, 300 sec: 47319.2). Total num frames: 1801977856. Throughput: 0: 11957.2. Samples: 450552320. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:15,771][1648981] Avg episode reward: [(0, '1206.190')] [2024-06-15 21:59:19,395][1651669] Updated weights for policy 0, policy_version 879893 (0.0053) [2024-06-15 21:59:20,766][1648981] Fps is (10 sec: 36044.8, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1802108928. Throughput: 0: 11707.7. Samples: 450631168. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:20,767][1648981] Avg episode reward: [(0, '1218.540')] [2024-06-15 21:59:21,603][1651669] Updated weights for policy 0, policy_version 879984 (0.0013) [2024-06-15 21:59:23,204][1651669] Updated weights for policy 0, policy_version 880048 (0.0011) [2024-06-15 21:59:25,766][1648981] Fps is (10 sec: 52447.8, 60 sec: 49152.0, 300 sec: 47652.4). Total num frames: 1802502144. Throughput: 0: 11741.9. Samples: 450653696. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:25,767][1648981] Avg episode reward: [(0, '1213.440')] [2024-06-15 21:59:30,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 43690.9, 300 sec: 46319.6). Total num frames: 1802502144. Throughput: 0: 11343.6. Samples: 450729472. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:30,767][1648981] Avg episode reward: [(0, '1196.130')] [2024-06-15 21:59:30,970][1651669] Updated weights for policy 0, policy_version 880131 (0.0012) [2024-06-15 21:59:32,530][1651669] Updated weights for policy 0, policy_version 880193 (0.0013) [2024-06-15 21:59:34,064][1651669] Updated weights for policy 0, policy_version 880256 (0.0054) [2024-06-15 21:59:35,766][1648981] Fps is (10 sec: 39321.5, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 1802895360. Throughput: 0: 11468.8. Samples: 450787328. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:35,767][1648981] Avg episode reward: [(0, '1219.300')] [2024-06-15 21:59:35,932][1651669] Updated weights for policy 0, policy_version 880322 (0.0011) [2024-06-15 21:59:40,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 43690.8, 300 sec: 46652.7). Total num frames: 1803026432. Throughput: 0: 11241.3. Samples: 450822144. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:40,767][1648981] Avg episode reward: [(0, '1169.200')] [2024-06-15 21:59:42,618][1651669] Updated weights for policy 0, policy_version 880385 (0.0012) [2024-06-15 21:59:43,913][1651274] Signal inference workers to stop experience collection... (46250 times) [2024-06-15 21:59:43,973][1651669] InferenceWorker_p0-w0: stopping experience collection (46250 times) [2024-06-15 21:59:44,158][1651274] Signal inference workers to resume experience collection... (46250 times) [2024-06-15 21:59:44,160][1651669] InferenceWorker_p0-w0: resuming experience collection (46250 times) [2024-06-15 21:59:45,012][1651669] Updated weights for policy 0, policy_version 880480 (0.0127) [2024-06-15 21:59:45,766][1648981] Fps is (10 sec: 36044.7, 60 sec: 46421.3, 300 sec: 46986.0). Total num frames: 1803255808. Throughput: 0: 11355.0. Samples: 450896384. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:45,767][1648981] Avg episode reward: [(0, '1162.570')] [2024-06-15 21:59:47,093][1651669] Updated weights for policy 0, policy_version 880560 (0.0136) [2024-06-15 21:59:48,340][1651669] Updated weights for policy 0, policy_version 880628 (0.0012) [2024-06-15 21:59:50,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 44782.9, 300 sec: 46986.0). Total num frames: 1803550720. Throughput: 0: 11150.2. Samples: 450961920. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:50,767][1648981] Avg episode reward: [(0, '1195.530')] [2024-06-15 21:59:54,868][1651669] Updated weights for policy 0, policy_version 880677 (0.0012) [2024-06-15 21:59:55,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45875.3, 300 sec: 46652.7). Total num frames: 1803681792. Throughput: 0: 11355.0. Samples: 451004416. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 21:59:55,767][1648981] Avg episode reward: [(0, '1130.410')] [2024-06-15 21:59:56,815][1651669] Updated weights for policy 0, policy_version 880752 (0.0012) [2024-06-15 21:59:58,600][1651669] Updated weights for policy 0, policy_version 880827 (0.0126) [2024-06-15 22:00:00,014][1651669] Updated weights for policy 0, policy_version 880893 (0.0012) [2024-06-15 22:00:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45329.0, 300 sec: 46874.9). Total num frames: 1804075008. Throughput: 0: 11219.4. Samples: 451057152. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 22:00:00,767][1648981] Avg episode reward: [(0, '1143.380')] [2024-06-15 22:00:05,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 44236.7, 300 sec: 46319.5). Total num frames: 1804107776. Throughput: 0: 11355.0. Samples: 451142144. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 22:00:05,767][1648981] Avg episode reward: [(0, '1115.370')] [2024-06-15 22:00:06,612][1651669] Updated weights for policy 0, policy_version 880944 (0.0012) [2024-06-15 22:00:08,215][1651669] Updated weights for policy 0, policy_version 881024 (0.0012) [2024-06-15 22:00:09,885][1651669] Updated weights for policy 0, policy_version 881088 (0.0012) [2024-06-15 22:00:10,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 46421.2, 300 sec: 46652.8). Total num frames: 1804533760. Throughput: 0: 11502.9. Samples: 451171328. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 22:00:10,768][1648981] Avg episode reward: [(0, '1127.060')] [2024-06-15 22:00:11,141][1651669] Updated weights for policy 0, policy_version 881137 (0.0012) [2024-06-15 22:00:15,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 43693.3, 300 sec: 46208.4). Total num frames: 1804599296. Throughput: 0: 11548.4. Samples: 451249152. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 22:00:15,767][1648981] Avg episode reward: [(0, '1120.720')] [2024-06-15 22:00:17,859][1651669] Updated weights for policy 0, policy_version 881216 (0.0011) [2024-06-15 22:00:19,604][1651669] Updated weights for policy 0, policy_version 881280 (0.0012) [2024-06-15 22:00:19,749][1651274] Signal inference workers to stop experience collection... (46300 times) [2024-06-15 22:00:19,780][1651669] InferenceWorker_p0-w0: stopping experience collection (46300 times) [2024-06-15 22:00:19,944][1651274] Signal inference workers to resume experience collection... (46300 times) [2024-06-15 22:00:19,945][1651669] InferenceWorker_p0-w0: resuming experience collection (46300 times) [2024-06-15 22:00:20,766][1648981] Fps is (10 sec: 42599.5, 60 sec: 47513.7, 300 sec: 46541.7). Total num frames: 1804959744. Throughput: 0: 11514.3. Samples: 451305472. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 22:00:20,767][1648981] Avg episode reward: [(0, '1092.180')] [2024-06-15 22:00:21,347][1651669] Updated weights for policy 0, policy_version 881360 (0.0101) [2024-06-15 22:00:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 43690.6, 300 sec: 46208.4). Total num frames: 1805123584. Throughput: 0: 11434.6. Samples: 451336704. Policy #0 lag: (min: 75.0, avg: 140.1, max: 331.0) [2024-06-15 22:00:25,767][1648981] Avg episode reward: [(0, '1066.840')] [2024-06-15 22:00:28,206][1651669] Updated weights for policy 0, policy_version 881429 (0.0136) [2024-06-15 22:00:29,860][1651669] Updated weights for policy 0, policy_version 881493 (0.0016) [2024-06-15 22:00:30,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 47513.6, 300 sec: 46430.6). Total num frames: 1805352960. Throughput: 0: 11537.1. Samples: 451415552. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:00:30,767][1648981] Avg episode reward: [(0, '1070.750')] [2024-06-15 22:00:31,690][1651669] Updated weights for policy 0, policy_version 881568 (0.0083) [2024-06-15 22:00:33,226][1651669] Updated weights for policy 0, policy_version 881634 (0.0013) [2024-06-15 22:00:35,767][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.1, 300 sec: 46430.6). Total num frames: 1805647872. Throughput: 0: 11502.9. Samples: 451479552. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:00:35,767][1648981] Avg episode reward: [(0, '1064.020')] [2024-06-15 22:00:38,568][1651669] Updated weights for policy 0, policy_version 881675 (0.0037) [2024-06-15 22:00:39,953][1651669] Updated weights for policy 0, policy_version 881722 (0.0013) [2024-06-15 22:00:40,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 46421.4, 300 sec: 46763.8). Total num frames: 1805811712. Throughput: 0: 11525.7. Samples: 451523072. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:00:40,767][1648981] Avg episode reward: [(0, '1052.390')] [2024-06-15 22:00:41,494][1651669] Updated weights for policy 0, policy_version 881788 (0.0014) [2024-06-15 22:00:43,340][1651669] Updated weights for policy 0, policy_version 881848 (0.0012) [2024-06-15 22:00:44,699][1651669] Updated weights for policy 0, policy_version 881904 (0.0013) [2024-06-15 22:00:45,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48605.9, 300 sec: 46541.7). Total num frames: 1806172160. Throughput: 0: 11605.3. Samples: 451579392. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:00:45,767][1648981] Avg episode reward: [(0, '1058.010')] [2024-06-15 22:00:49,651][1651669] Updated weights for policy 0, policy_version 881926 (0.0010) [2024-06-15 22:00:50,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45875.2, 300 sec: 46652.8). Total num frames: 1806303232. Throughput: 0: 11628.1. Samples: 451665408. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:00:50,767][1648981] Avg episode reward: [(0, '1023.590')] [2024-06-15 22:00:51,471][1651669] Updated weights for policy 0, policy_version 882000 (0.0012) [2024-06-15 22:00:53,283][1651669] Updated weights for policy 0, policy_version 882065 (0.0011) [2024-06-15 22:00:55,261][1651669] Updated weights for policy 0, policy_version 882131 (0.0014) [2024-06-15 22:00:55,767][1648981] Fps is (10 sec: 45872.5, 60 sec: 49151.5, 300 sec: 46652.7). Total num frames: 1806630912. Throughput: 0: 11639.4. Samples: 451695104. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:00:55,768][1648981] Avg episode reward: [(0, '1028.150')] [2024-06-15 22:00:56,233][1651669] Updated weights for policy 0, policy_version 882171 (0.0011) [2024-06-15 22:00:56,324][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000882176_1806696448.pth... [2024-06-15 22:00:56,379][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000876640_1795358720.pth [2024-06-15 22:01:00,377][1651274] Signal inference workers to stop experience collection... (46350 times) [2024-06-15 22:01:00,428][1651669] InferenceWorker_p0-w0: stopping experience collection (46350 times) [2024-06-15 22:01:00,559][1651274] Signal inference workers to resume experience collection... (46350 times) [2024-06-15 22:01:00,560][1651669] InferenceWorker_p0-w0: resuming experience collection (46350 times) [2024-06-15 22:01:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 44236.8, 300 sec: 46319.5). Total num frames: 1806729216. Throughput: 0: 11594.0. Samples: 451770880. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:00,767][1648981] Avg episode reward: [(0, '1009.620')] [2024-06-15 22:01:01,390][1651669] Updated weights for policy 0, policy_version 882233 (0.0044) [2024-06-15 22:01:02,988][1651669] Updated weights for policy 0, policy_version 882288 (0.0013) [2024-06-15 22:01:04,282][1651669] Updated weights for policy 0, policy_version 882337 (0.0013) [2024-06-15 22:01:05,345][1651669] Updated weights for policy 0, policy_version 882384 (0.0009) [2024-06-15 22:01:05,766][1648981] Fps is (10 sec: 49155.1, 60 sec: 50244.4, 300 sec: 46986.0). Total num frames: 1807122432. Throughput: 0: 11844.3. Samples: 451838464. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:05,767][1648981] Avg episode reward: [(0, '973.110')] [2024-06-15 22:01:06,532][1651669] Updated weights for policy 0, policy_version 882429 (0.0012) [2024-06-15 22:01:10,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 44783.1, 300 sec: 46208.4). Total num frames: 1807220736. Throughput: 0: 11980.8. Samples: 451875840. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:10,767][1648981] Avg episode reward: [(0, '987.860')] [2024-06-15 22:01:11,683][1651669] Updated weights for policy 0, policy_version 882480 (0.0105) [2024-06-15 22:01:13,362][1651669] Updated weights for policy 0, policy_version 882553 (0.0043) [2024-06-15 22:01:15,339][1651669] Updated weights for policy 0, policy_version 882612 (0.0011) [2024-06-15 22:01:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 50244.3, 300 sec: 47430.3). Total num frames: 1807613952. Throughput: 0: 11969.4. Samples: 451954176. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:15,767][1648981] Avg episode reward: [(0, '996.510')] [2024-06-15 22:01:16,507][1651669] Updated weights for policy 0, policy_version 882642 (0.0012) [2024-06-15 22:01:17,618][1651669] Updated weights for policy 0, policy_version 882686 (0.0011) [2024-06-15 22:01:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46421.3, 300 sec: 46430.6). Total num frames: 1807745024. Throughput: 0: 12253.9. Samples: 452030976. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:20,767][1648981] Avg episode reward: [(0, '960.010')] [2024-06-15 22:01:22,117][1651669] Updated weights for policy 0, policy_version 882746 (0.0012) [2024-06-15 22:01:23,200][1651669] Updated weights for policy 0, policy_version 882788 (0.0010) [2024-06-15 22:01:24,748][1651669] Updated weights for policy 0, policy_version 882836 (0.0047) [2024-06-15 22:01:25,552][1651669] Updated weights for policy 0, policy_version 882880 (0.0013) [2024-06-15 22:01:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1808138240. Throughput: 0: 12071.8. Samples: 452066304. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:25,767][1648981] Avg episode reward: [(0, '952.610')] [2024-06-15 22:01:27,375][1651669] Updated weights for policy 0, policy_version 882936 (0.0012) [2024-06-15 22:01:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48605.8, 300 sec: 46986.0). Total num frames: 1808269312. Throughput: 0: 12674.8. Samples: 452149760. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:30,767][1648981] Avg episode reward: [(0, '926.540')] [2024-06-15 22:01:32,317][1651669] Updated weights for policy 0, policy_version 882996 (0.0018) [2024-06-15 22:01:33,221][1651669] Updated weights for policy 0, policy_version 883030 (0.0013) [2024-06-15 22:01:33,988][1651669] Updated weights for policy 0, policy_version 883071 (0.0011) [2024-06-15 22:01:35,672][1651669] Updated weights for policy 0, policy_version 883134 (0.0012) [2024-06-15 22:01:35,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1808662528. Throughput: 0: 12231.1. Samples: 452215808. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:35,767][1648981] Avg episode reward: [(0, '929.880')] [2024-06-15 22:01:37,668][1651274] Signal inference workers to stop experience collection... (46400 times) [2024-06-15 22:01:37,744][1651669] InferenceWorker_p0-w0: stopping experience collection (46400 times) [2024-06-15 22:01:37,888][1651274] Signal inference workers to resume experience collection... (46400 times) [2024-06-15 22:01:37,888][1651669] InferenceWorker_p0-w0: resuming experience collection (46400 times) [2024-06-15 22:01:38,169][1651669] Updated weights for policy 0, policy_version 883200 (0.0013) [2024-06-15 22:01:40,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 49698.2, 300 sec: 47208.1). Total num frames: 1808793600. Throughput: 0: 12379.2. Samples: 452252160. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:40,767][1648981] Avg episode reward: [(0, '955.160')] [2024-06-15 22:01:44,702][1651669] Updated weights for policy 0, policy_version 883284 (0.0015) [2024-06-15 22:01:45,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 48059.8, 300 sec: 47097.1). Total num frames: 1809055744. Throughput: 0: 12276.6. Samples: 452323328. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:45,767][1648981] Avg episode reward: [(0, '965.460')] [2024-06-15 22:01:46,363][1651669] Updated weights for policy 0, policy_version 883345 (0.0012) [2024-06-15 22:01:47,309][1651669] Updated weights for policy 0, policy_version 883392 (0.0012) [2024-06-15 22:01:50,270][1651669] Updated weights for policy 0, policy_version 883453 (0.0014) [2024-06-15 22:01:50,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1809317888. Throughput: 0: 12265.2. Samples: 452390400. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:50,767][1648981] Avg episode reward: [(0, '950.340')] [2024-06-15 22:01:54,785][1651669] Updated weights for policy 0, policy_version 883506 (0.0011) [2024-06-15 22:01:55,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 47514.0, 300 sec: 46763.8). Total num frames: 1809481728. Throughput: 0: 12435.9. Samples: 452435456. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:01:55,767][1648981] Avg episode reward: [(0, '996.160')] [2024-06-15 22:01:56,345][1651669] Updated weights for policy 0, policy_version 883568 (0.0032) [2024-06-15 22:01:57,633][1651669] Updated weights for policy 0, policy_version 883616 (0.0011) [2024-06-15 22:01:59,612][1651669] Updated weights for policy 0, policy_version 883652 (0.0020) [2024-06-15 22:02:00,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 51336.6, 300 sec: 47874.6). Total num frames: 1809809408. Throughput: 0: 12208.4. Samples: 452503552. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:02:00,767][1648981] Avg episode reward: [(0, '1041.250')] [2024-06-15 22:02:00,900][1651669] Updated weights for policy 0, policy_version 883708 (0.0013) [2024-06-15 22:02:05,765][1651669] Updated weights for policy 0, policy_version 883776 (0.0012) [2024-06-15 22:02:05,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 47513.6, 300 sec: 47097.1). Total num frames: 1809973248. Throughput: 0: 12151.5. Samples: 452577792. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:02:05,767][1648981] Avg episode reward: [(0, '1033.850')] [2024-06-15 22:02:07,126][1651669] Updated weights for policy 0, policy_version 883829 (0.0017) [2024-06-15 22:02:08,801][1651669] Updated weights for policy 0, policy_version 883873 (0.0035) [2024-06-15 22:02:10,535][1651669] Updated weights for policy 0, policy_version 883924 (0.0040) [2024-06-15 22:02:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 51336.5, 300 sec: 47763.5). Total num frames: 1810300928. Throughput: 0: 12140.1. Samples: 452612608. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:02:10,767][1648981] Avg episode reward: [(0, '1027.860')] [2024-06-15 22:02:11,508][1651669] Updated weights for policy 0, policy_version 883968 (0.0017) [2024-06-15 22:02:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.7, 300 sec: 47319.2). Total num frames: 1810497536. Throughput: 0: 12117.3. Samples: 452695040. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:02:15,767][1648981] Avg episode reward: [(0, '1046.520')] [2024-06-15 22:02:15,828][1651669] Updated weights for policy 0, policy_version 884038 (0.0052) [2024-06-15 22:02:17,134][1651669] Updated weights for policy 0, policy_version 884096 (0.0015) [2024-06-15 22:02:19,415][1651669] Updated weights for policy 0, policy_version 884152 (0.0011) [2024-06-15 22:02:20,156][1651274] Signal inference workers to stop experience collection... (46450 times) [2024-06-15 22:02:20,251][1651669] InferenceWorker_p0-w0: stopping experience collection (46450 times) [2024-06-15 22:02:20,332][1651274] Signal inference workers to resume experience collection... (46450 times) [2024-06-15 22:02:20,332][1651669] InferenceWorker_p0-w0: resuming experience collection (46450 times) [2024-06-15 22:02:20,773][1648981] Fps is (10 sec: 52393.9, 60 sec: 51330.8, 300 sec: 47762.5). Total num frames: 1810825216. Throughput: 0: 12229.3. Samples: 452766208. Policy #0 lag: (min: 79.0, avg: 131.7, max: 335.0) [2024-06-15 22:02:20,774][1648981] Avg episode reward: [(0, '1015.970')] [2024-06-15 22:02:21,306][1651669] Updated weights for policy 0, policy_version 884212 (0.0013) [2024-06-15 22:02:25,362][1651669] Updated weights for policy 0, policy_version 884262 (0.0017) [2024-06-15 22:02:25,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 47513.5, 300 sec: 47430.3). Total num frames: 1810989056. Throughput: 0: 12299.3. Samples: 452805632. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:25,767][1648981] Avg episode reward: [(0, '1031.290')] [2024-06-15 22:02:26,728][1651669] Updated weights for policy 0, policy_version 884336 (0.0012) [2024-06-15 22:02:28,847][1651669] Updated weights for policy 0, policy_version 884368 (0.0011) [2024-06-15 22:02:29,947][1651669] Updated weights for policy 0, policy_version 884416 (0.0012) [2024-06-15 22:02:30,766][1648981] Fps is (10 sec: 49184.2, 60 sec: 50790.3, 300 sec: 47652.4). Total num frames: 1811316736. Throughput: 0: 12526.9. Samples: 452887040. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:30,767][1648981] Avg episode reward: [(0, '1000.660')] [2024-06-15 22:02:31,837][1651669] Updated weights for policy 0, policy_version 884478 (0.0012) [2024-06-15 22:02:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1811513344. Throughput: 0: 12720.3. Samples: 452962816. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:35,767][1648981] Avg episode reward: [(0, '1046.090')] [2024-06-15 22:02:36,179][1651669] Updated weights for policy 0, policy_version 884560 (0.0013) [2024-06-15 22:02:39,232][1651669] Updated weights for policy 0, policy_version 884624 (0.0014) [2024-06-15 22:02:40,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 50244.1, 300 sec: 47541.4). Total num frames: 1811808256. Throughput: 0: 12595.2. Samples: 453002240. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:40,767][1648981] Avg episode reward: [(0, '1033.580')] [2024-06-15 22:02:41,974][1651669] Updated weights for policy 0, policy_version 884704 (0.0012) [2024-06-15 22:02:44,922][1651669] Updated weights for policy 0, policy_version 884741 (0.0013) [2024-06-15 22:02:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 1812037632. Throughput: 0: 12731.7. Samples: 453076480. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:45,767][1648981] Avg episode reward: [(0, '994.610')] [2024-06-15 22:02:45,837][1651669] Updated weights for policy 0, policy_version 884792 (0.0013) [2024-06-15 22:02:46,982][1651669] Updated weights for policy 0, policy_version 884848 (0.0012) [2024-06-15 22:02:49,943][1651669] Updated weights for policy 0, policy_version 884896 (0.0011) [2024-06-15 22:02:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.2, 300 sec: 47652.4). Total num frames: 1812332544. Throughput: 0: 12856.9. Samples: 453156352. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:50,767][1648981] Avg episode reward: [(0, '976.760')] [2024-06-15 22:02:51,647][1651669] Updated weights for policy 0, policy_version 884930 (0.0012) [2024-06-15 22:02:52,930][1651669] Updated weights for policy 0, policy_version 884980 (0.0031) [2024-06-15 22:02:54,802][1651669] Updated weights for policy 0, policy_version 885030 (0.0011) [2024-06-15 22:02:55,766][1648981] Fps is (10 sec: 55705.5, 60 sec: 51882.7, 300 sec: 48430.0). Total num frames: 1812594688. Throughput: 0: 12959.3. Samples: 453195776. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:02:55,767][1648981] Avg episode reward: [(0, '924.680')] [2024-06-15 22:02:56,300][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000885088_1812660224.pth... [2024-06-15 22:02:56,503][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000879376_1800962048.pth [2024-06-15 22:02:57,072][1651669] Updated weights for policy 0, policy_version 885118 (0.0013) [2024-06-15 22:03:00,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 48605.8, 300 sec: 47319.2). Total num frames: 1812725760. Throughput: 0: 12663.5. Samples: 453264896. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:00,767][1648981] Avg episode reward: [(0, '922.620')] [2024-06-15 22:03:01,366][1651274] Signal inference workers to stop experience collection... (46500 times) [2024-06-15 22:03:01,395][1651669] InferenceWorker_p0-w0: stopping experience collection (46500 times) [2024-06-15 22:03:01,528][1651274] Signal inference workers to resume experience collection... (46500 times) [2024-06-15 22:03:01,529][1651669] InferenceWorker_p0-w0: resuming experience collection (46500 times) [2024-06-15 22:03:02,063][1651669] Updated weights for policy 0, policy_version 885183 (0.0116) [2024-06-15 22:03:04,174][1651669] Updated weights for policy 0, policy_version 885246 (0.0012) [2024-06-15 22:03:05,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 51336.4, 300 sec: 48207.8). Total num frames: 1813053440. Throughput: 0: 12631.2. Samples: 453334528. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:05,767][1648981] Avg episode reward: [(0, '941.680')] [2024-06-15 22:03:05,864][1651669] Updated weights for policy 0, policy_version 885296 (0.0014) [2024-06-15 22:03:07,310][1651669] Updated weights for policy 0, policy_version 885355 (0.0012) [2024-06-15 22:03:10,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49151.9, 300 sec: 47763.5). Total num frames: 1813250048. Throughput: 0: 12504.2. Samples: 453368320. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:10,767][1648981] Avg episode reward: [(0, '944.800')] [2024-06-15 22:03:12,011][1651669] Updated weights for policy 0, policy_version 885381 (0.0012) [2024-06-15 22:03:13,974][1651669] Updated weights for policy 0, policy_version 885446 (0.0158) [2024-06-15 22:03:15,275][1651669] Updated weights for policy 0, policy_version 885520 (0.0013) [2024-06-15 22:03:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 51336.5, 300 sec: 48207.9). Total num frames: 1813577728. Throughput: 0: 12572.5. Samples: 453452800. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:15,767][1648981] Avg episode reward: [(0, '912.550')] [2024-06-15 22:03:18,217][1651669] Updated weights for policy 0, policy_version 885630 (0.0011) [2024-06-15 22:03:20,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49157.2, 300 sec: 48207.8). Total num frames: 1813774336. Throughput: 0: 12185.5. Samples: 453511168. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:20,767][1648981] Avg episode reward: [(0, '898.190')] [2024-06-15 22:03:24,151][1651669] Updated weights for policy 0, policy_version 885680 (0.0013) [2024-06-15 22:03:25,767][1648981] Fps is (10 sec: 39321.0, 60 sec: 49698.1, 300 sec: 47763.6). Total num frames: 1813970944. Throughput: 0: 12390.4. Samples: 453559808. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:25,767][1648981] Avg episode reward: [(0, '902.480')] [2024-06-15 22:03:26,462][1651669] Updated weights for policy 0, policy_version 885761 (0.0012) [2024-06-15 22:03:27,802][1651669] Updated weights for policy 0, policy_version 885824 (0.0138) [2024-06-15 22:03:29,275][1651669] Updated weights for policy 0, policy_version 885884 (0.0013) [2024-06-15 22:03:30,766][1648981] Fps is (10 sec: 52430.5, 60 sec: 49698.2, 300 sec: 48541.1). Total num frames: 1814298624. Throughput: 0: 11992.2. Samples: 453616128. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:30,767][1648981] Avg episode reward: [(0, '893.020')] [2024-06-15 22:03:35,432][1651669] Updated weights for policy 0, policy_version 885936 (0.0010) [2024-06-15 22:03:35,766][1648981] Fps is (10 sec: 45876.1, 60 sec: 48605.9, 300 sec: 47541.4). Total num frames: 1814429696. Throughput: 0: 12049.1. Samples: 453698560. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:35,767][1648981] Avg episode reward: [(0, '850.890')] [2024-06-15 22:03:36,632][1651669] Updated weights for policy 0, policy_version 885969 (0.0011) [2024-06-15 22:03:38,038][1651669] Updated weights for policy 0, policy_version 886032 (0.0011) [2024-06-15 22:03:38,770][1651274] Signal inference workers to stop experience collection... (46550 times) [2024-06-15 22:03:38,811][1651669] InferenceWorker_p0-w0: stopping experience collection (46550 times) [2024-06-15 22:03:39,005][1651274] Signal inference workers to resume experience collection... (46550 times) [2024-06-15 22:03:39,006][1651669] InferenceWorker_p0-w0: resuming experience collection (46550 times) [2024-06-15 22:03:39,252][1651669] Updated weights for policy 0, policy_version 886086 (0.0012) [2024-06-15 22:03:40,769][1648981] Fps is (10 sec: 52412.2, 60 sec: 50241.7, 300 sec: 48651.6). Total num frames: 1814822912. Throughput: 0: 11877.6. Samples: 453730304. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:40,773][1648981] Avg episode reward: [(0, '850.060')] [2024-06-15 22:03:45,177][1651669] Updated weights for policy 0, policy_version 886145 (0.0011) [2024-06-15 22:03:45,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46967.5, 300 sec: 47430.3). Total num frames: 1814855680. Throughput: 0: 11980.8. Samples: 453804032. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:45,767][1648981] Avg episode reward: [(0, '846.430')] [2024-06-15 22:03:46,367][1651669] Updated weights for policy 0, policy_version 886195 (0.0013) [2024-06-15 22:03:48,671][1651669] Updated weights for policy 0, policy_version 886258 (0.0011) [2024-06-15 22:03:50,477][1651669] Updated weights for policy 0, policy_version 886338 (0.0012) [2024-06-15 22:03:50,766][1648981] Fps is (10 sec: 42611.7, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1815248896. Throughput: 0: 11935.3. Samples: 453871616. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:50,767][1648981] Avg episode reward: [(0, '830.160')] [2024-06-15 22:03:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 47430.3). Total num frames: 1815347200. Throughput: 0: 12026.3. Samples: 453909504. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:03:55,767][1648981] Avg episode reward: [(0, '830.750')] [2024-06-15 22:03:56,040][1651669] Updated weights for policy 0, policy_version 886416 (0.0027) [2024-06-15 22:03:58,773][1651669] Updated weights for policy 0, policy_version 886465 (0.0015) [2024-06-15 22:04:00,108][1651669] Updated weights for policy 0, policy_version 886528 (0.0013) [2024-06-15 22:04:00,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1815642112. Throughput: 0: 11821.5. Samples: 453984768. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:04:00,767][1648981] Avg episode reward: [(0, '836.800')] [2024-06-15 22:04:01,808][1651669] Updated weights for policy 0, policy_version 886608 (0.0013) [2024-06-15 22:04:05,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 1815871488. Throughput: 0: 12128.8. Samples: 454056960. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:04:05,767][1648981] Avg episode reward: [(0, '874.120')] [2024-06-15 22:04:06,402][1651669] Updated weights for policy 0, policy_version 886672 (0.0013) [2024-06-15 22:04:07,413][1651669] Updated weights for policy 0, policy_version 886720 (0.0119) [2024-06-15 22:04:10,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.7, 300 sec: 47875.2). Total num frames: 1816100864. Throughput: 0: 11787.4. Samples: 454090240. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:04:10,767][1648981] Avg episode reward: [(0, '853.920')] [2024-06-15 22:04:10,904][1651669] Updated weights for policy 0, policy_version 886773 (0.0025) [2024-06-15 22:04:13,078][1651669] Updated weights for policy 0, policy_version 886868 (0.0012) [2024-06-15 22:04:13,830][1651669] Updated weights for policy 0, policy_version 886907 (0.0145) [2024-06-15 22:04:15,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1816395776. Throughput: 0: 11923.9. Samples: 454152704. Policy #0 lag: (min: 3.0, avg: 82.8, max: 259.0) [2024-06-15 22:04:15,767][1648981] Avg episode reward: [(0, '900.350')] [2024-06-15 22:04:17,809][1651669] Updated weights for policy 0, policy_version 886944 (0.0011) [2024-06-15 22:04:20,028][1651669] Updated weights for policy 0, policy_version 886992 (0.0012) [2024-06-15 22:04:20,170][1651274] Signal inference workers to stop experience collection... (46600 times) [2024-06-15 22:04:20,232][1651669] InferenceWorker_p0-w0: stopping experience collection (46600 times) [2024-06-15 22:04:20,426][1651274] Signal inference workers to resume experience collection... (46600 times) [2024-06-15 22:04:20,427][1651669] InferenceWorker_p0-w0: resuming experience collection (46600 times) [2024-06-15 22:04:20,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46967.7, 300 sec: 47763.5). Total num frames: 1816592384. Throughput: 0: 11969.4. Samples: 454237184. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:20,767][1648981] Avg episode reward: [(0, '858.730')] [2024-06-15 22:04:21,579][1651669] Updated weights for policy 0, policy_version 887044 (0.0011) [2024-06-15 22:04:23,145][1651669] Updated weights for policy 0, policy_version 887109 (0.0012) [2024-06-15 22:04:24,347][1651669] Updated weights for policy 0, policy_version 887168 (0.0029) [2024-06-15 22:04:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49152.1, 300 sec: 48874.3). Total num frames: 1816920064. Throughput: 0: 11776.8. Samples: 454260224. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:25,767][1648981] Avg episode reward: [(0, '859.770')] [2024-06-15 22:04:29,660][1651669] Updated weights for policy 0, policy_version 887220 (0.0011) [2024-06-15 22:04:30,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1817051136. Throughput: 0: 12026.3. Samples: 454345216. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:30,767][1648981] Avg episode reward: [(0, '903.720')] [2024-06-15 22:04:31,047][1651669] Updated weights for policy 0, policy_version 887248 (0.0011) [2024-06-15 22:04:32,629][1651669] Updated weights for policy 0, policy_version 887312 (0.0012) [2024-06-15 22:04:34,692][1651669] Updated weights for policy 0, policy_version 887394 (0.0126) [2024-06-15 22:04:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1817444352. Throughput: 0: 11855.6. Samples: 454405120. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:35,767][1648981] Avg episode reward: [(0, '922.600')] [2024-06-15 22:04:40,090][1651669] Updated weights for policy 0, policy_version 887456 (0.0011) [2024-06-15 22:04:40,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45331.4, 300 sec: 48430.0). Total num frames: 1817542656. Throughput: 0: 12060.4. Samples: 454452224. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:40,767][1648981] Avg episode reward: [(0, '943.900')] [2024-06-15 22:04:40,853][1651669] Updated weights for policy 0, policy_version 887486 (0.0010) [2024-06-15 22:04:42,373][1651669] Updated weights for policy 0, policy_version 887536 (0.0096) [2024-06-15 22:04:45,303][1651669] Updated weights for policy 0, policy_version 887648 (0.0012) [2024-06-15 22:04:45,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 51336.5, 300 sec: 48763.2). Total num frames: 1817935872. Throughput: 0: 11855.6. Samples: 454518272. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:45,767][1648981] Avg episode reward: [(0, '970.550')] [2024-06-15 22:04:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45329.1, 300 sec: 48430.0). Total num frames: 1817968640. Throughput: 0: 11912.5. Samples: 454593024. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:50,767][1648981] Avg episode reward: [(0, '1019.610')] [2024-06-15 22:04:51,082][1651669] Updated weights for policy 0, policy_version 887699 (0.0012) [2024-06-15 22:04:53,338][1651669] Updated weights for policy 0, policy_version 887746 (0.0015) [2024-06-15 22:04:54,733][1651669] Updated weights for policy 0, policy_version 887808 (0.0034) [2024-06-15 22:04:55,767][1648981] Fps is (10 sec: 36044.3, 60 sec: 49151.9, 300 sec: 48207.8). Total num frames: 1818296320. Throughput: 0: 11969.4. Samples: 454628864. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:04:55,767][1648981] Avg episode reward: [(0, '1021.200')] [2024-06-15 22:04:56,220][1651669] Updated weights for policy 0, policy_version 887859 (0.0181) [2024-06-15 22:04:56,384][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000887872_1818361856.pth... [2024-06-15 22:04:56,534][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000882176_1806696448.pth [2024-06-15 22:04:57,013][1651274] Signal inference workers to stop experience collection... (46650 times) [2024-06-15 22:04:57,054][1651669] InferenceWorker_p0-w0: stopping experience collection (46650 times) [2024-06-15 22:04:57,362][1651274] Signal inference workers to resume experience collection... (46650 times) [2024-06-15 22:04:57,363][1651669] InferenceWorker_p0-w0: resuming experience collection (46650 times) [2024-06-15 22:04:58,118][1651669] Updated weights for policy 0, policy_version 887933 (0.0133) [2024-06-15 22:05:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 48763.2). Total num frames: 1818492928. Throughput: 0: 11798.7. Samples: 454683648. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:00,767][1648981] Avg episode reward: [(0, '1025.180')] [2024-06-15 22:05:03,699][1651669] Updated weights for policy 0, policy_version 887991 (0.0111) [2024-06-15 22:05:05,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 47513.7, 300 sec: 48096.8). Total num frames: 1818722304. Throughput: 0: 11662.2. Samples: 454761984. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:05,767][1648981] Avg episode reward: [(0, '1032.600')] [2024-06-15 22:05:05,876][1651669] Updated weights for policy 0, policy_version 888050 (0.0124) [2024-06-15 22:05:07,879][1651669] Updated weights for policy 0, policy_version 888115 (0.0010) [2024-06-15 22:05:09,574][1651669] Updated weights for policy 0, policy_version 888186 (0.0011) [2024-06-15 22:05:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1819017216. Throughput: 0: 11639.5. Samples: 454784000. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:10,767][1648981] Avg episode reward: [(0, '1031.350')] [2024-06-15 22:05:14,916][1651669] Updated weights for policy 0, policy_version 888240 (0.0013) [2024-06-15 22:05:15,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 45875.2, 300 sec: 48096.8). Total num frames: 1819148288. Throughput: 0: 11650.9. Samples: 454869504. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:15,767][1648981] Avg episode reward: [(0, '1049.720')] [2024-06-15 22:05:16,625][1651669] Updated weights for policy 0, policy_version 888292 (0.0012) [2024-06-15 22:05:18,405][1651669] Updated weights for policy 0, policy_version 888368 (0.0035) [2024-06-15 22:05:19,640][1651669] Updated weights for policy 0, policy_version 888416 (0.0013) [2024-06-15 22:05:20,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1819541504. Throughput: 0: 11605.3. Samples: 454927360. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:20,767][1648981] Avg episode reward: [(0, '1040.380')] [2024-06-15 22:05:25,039][1651669] Updated weights for policy 0, policy_version 888464 (0.0013) [2024-06-15 22:05:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 45329.1, 300 sec: 48430.0). Total num frames: 1819639808. Throughput: 0: 11548.4. Samples: 454971904. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:25,767][1648981] Avg episode reward: [(0, '1060.860')] [2024-06-15 22:05:26,790][1651669] Updated weights for policy 0, policy_version 888544 (0.0012) [2024-06-15 22:05:28,534][1651669] Updated weights for policy 0, policy_version 888608 (0.0011) [2024-06-15 22:05:29,700][1651669] Updated weights for policy 0, policy_version 888645 (0.0011) [2024-06-15 22:05:30,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1820033024. Throughput: 0: 11628.1. Samples: 455041536. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:30,767][1648981] Avg episode reward: [(0, '1024.450')] [2024-06-15 22:05:35,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 43690.7, 300 sec: 48318.9). Total num frames: 1820065792. Throughput: 0: 11730.5. Samples: 455120896. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:35,767][1648981] Avg episode reward: [(0, '1077.760')] [2024-06-15 22:05:36,496][1651669] Updated weights for policy 0, policy_version 888738 (0.0012) [2024-06-15 22:05:37,750][1651669] Updated weights for policy 0, policy_version 888800 (0.0020) [2024-06-15 22:05:38,638][1651669] Updated weights for policy 0, policy_version 888838 (0.0011) [2024-06-15 22:05:38,933][1651274] Signal inference workers to stop experience collection... (46700 times) [2024-06-15 22:05:38,972][1651669] InferenceWorker_p0-w0: stopping experience collection (46700 times) [2024-06-15 22:05:39,134][1651274] Signal inference workers to resume experience collection... (46700 times) [2024-06-15 22:05:39,144][1651669] InferenceWorker_p0-w0: resuming experience collection (46700 times) [2024-06-15 22:05:40,642][1651669] Updated weights for policy 0, policy_version 888928 (0.0012) [2024-06-15 22:05:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 48652.1). Total num frames: 1820524544. Throughput: 0: 11730.5. Samples: 455156736. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:40,767][1648981] Avg episode reward: [(0, '1057.850')] [2024-06-15 22:05:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 44236.8, 300 sec: 48430.0). Total num frames: 1820590080. Throughput: 0: 12162.8. Samples: 455230976. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:45,767][1648981] Avg episode reward: [(0, '973.570')] [2024-06-15 22:05:46,362][1651669] Updated weights for policy 0, policy_version 888964 (0.0010) [2024-06-15 22:05:47,283][1651669] Updated weights for policy 0, policy_version 889018 (0.0033) [2024-06-15 22:05:48,858][1651669] Updated weights for policy 0, policy_version 889056 (0.0013) [2024-06-15 22:05:50,750][1651669] Updated weights for policy 0, policy_version 889143 (0.0043) [2024-06-15 22:05:50,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 49698.1, 300 sec: 48541.2). Total num frames: 1820950528. Throughput: 0: 12026.3. Samples: 455303168. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:50,767][1648981] Avg episode reward: [(0, '970.780')] [2024-06-15 22:05:51,843][1651669] Updated weights for policy 0, policy_version 889200 (0.0011) [2024-06-15 22:05:55,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 46967.4, 300 sec: 48763.2). Total num frames: 1821114368. Throughput: 0: 12299.3. Samples: 455337472. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:05:55,767][1648981] Avg episode reward: [(0, '995.890')] [2024-06-15 22:05:58,096][1651669] Updated weights for policy 0, policy_version 889269 (0.0013) [2024-06-15 22:05:59,858][1651669] Updated weights for policy 0, policy_version 889315 (0.0023) [2024-06-15 22:06:00,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48059.6, 300 sec: 48318.9). Total num frames: 1821376512. Throughput: 0: 12105.9. Samples: 455414272. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:06:00,767][1648981] Avg episode reward: [(0, '956.540')] [2024-06-15 22:06:01,996][1651669] Updated weights for policy 0, policy_version 889408 (0.0019) [2024-06-15 22:06:03,251][1651669] Updated weights for policy 0, policy_version 889472 (0.0012) [2024-06-15 22:06:05,767][1648981] Fps is (10 sec: 52429.4, 60 sec: 48605.7, 300 sec: 48874.3). Total num frames: 1821638656. Throughput: 0: 12333.5. Samples: 455482368. Policy #0 lag: (min: 24.0, avg: 117.8, max: 280.0) [2024-06-15 22:06:05,767][1648981] Avg episode reward: [(0, '953.670')] [2024-06-15 22:06:08,971][1651669] Updated weights for policy 0, policy_version 889529 (0.0012) [2024-06-15 22:06:10,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 1821835264. Throughput: 0: 12231.1. Samples: 455522304. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:10,766][1651669] Updated weights for policy 0, policy_version 889571 (0.0034) [2024-06-15 22:06:10,767][1648981] Avg episode reward: [(0, '956.440')] [2024-06-15 22:06:11,674][1651669] Updated weights for policy 0, policy_version 889616 (0.0024) [2024-06-15 22:06:12,714][1651669] Updated weights for policy 0, policy_version 889664 (0.0012) [2024-06-15 22:06:13,794][1651669] Updated weights for policy 0, policy_version 889718 (0.0012) [2024-06-15 22:06:15,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1822162944. Throughput: 0: 12253.9. Samples: 455592960. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:15,767][1648981] Avg episode reward: [(0, '955.090')] [2024-06-15 22:06:19,021][1651274] Signal inference workers to stop experience collection... (46750 times) [2024-06-15 22:06:19,067][1651669] InferenceWorker_p0-w0: stopping experience collection (46750 times) [2024-06-15 22:06:19,069][1651669] Updated weights for policy 0, policy_version 889763 (0.0012) [2024-06-15 22:06:19,281][1651274] Signal inference workers to resume experience collection... (46750 times) [2024-06-15 22:06:19,281][1651669] InferenceWorker_p0-w0: resuming experience collection (46750 times) [2024-06-15 22:06:20,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46421.3, 300 sec: 48096.8). Total num frames: 1822326784. Throughput: 0: 12299.4. Samples: 455674368. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:20,767][1648981] Avg episode reward: [(0, '961.120')] [2024-06-15 22:06:21,322][1651669] Updated weights for policy 0, policy_version 889840 (0.0037) [2024-06-15 22:06:22,866][1651669] Updated weights for policy 0, policy_version 889904 (0.0011) [2024-06-15 22:06:24,580][1651669] Updated weights for policy 0, policy_version 889983 (0.0012) [2024-06-15 22:06:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50790.4, 300 sec: 48874.3). Total num frames: 1822687232. Throughput: 0: 12117.3. Samples: 455702016. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:25,767][1648981] Avg episode reward: [(0, '938.350')] [2024-06-15 22:06:30,324][1651669] Updated weights for policy 0, policy_version 890048 (0.0169) [2024-06-15 22:06:30,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1822818304. Throughput: 0: 12117.3. Samples: 455776256. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:30,767][1648981] Avg episode reward: [(0, '998.830')] [2024-06-15 22:06:32,395][1651669] Updated weights for policy 0, policy_version 890106 (0.0013) [2024-06-15 22:06:34,108][1651669] Updated weights for policy 0, policy_version 890164 (0.0044) [2024-06-15 22:06:35,635][1651669] Updated weights for policy 0, policy_version 890240 (0.0013) [2024-06-15 22:06:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 52428.8, 300 sec: 48874.3). Total num frames: 1823211520. Throughput: 0: 11946.7. Samples: 455840768. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:35,767][1648981] Avg episode reward: [(0, '1039.760')] [2024-06-15 22:06:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 45875.2, 300 sec: 48207.8). Total num frames: 1823277056. Throughput: 0: 12185.7. Samples: 455885824. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:40,767][1648981] Avg episode reward: [(0, '1027.030')] [2024-06-15 22:06:41,109][1651669] Updated weights for policy 0, policy_version 890298 (0.0116) [2024-06-15 22:06:42,817][1651669] Updated weights for policy 0, policy_version 890368 (0.0012) [2024-06-15 22:06:45,670][1651669] Updated weights for policy 0, policy_version 890432 (0.0139) [2024-06-15 22:06:45,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 1823604736. Throughput: 0: 12106.0. Samples: 455959040. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:45,767][1648981] Avg episode reward: [(0, '1034.340')] [2024-06-15 22:06:46,652][1651669] Updated weights for policy 0, policy_version 890487 (0.0012) [2024-06-15 22:06:50,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 46967.4, 300 sec: 48430.0). Total num frames: 1823768576. Throughput: 0: 12231.1. Samples: 456032768. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:50,767][1648981] Avg episode reward: [(0, '1084.970')] [2024-06-15 22:06:51,776][1651669] Updated weights for policy 0, policy_version 890560 (0.0016) [2024-06-15 22:06:53,751][1651669] Updated weights for policy 0, policy_version 890622 (0.0025) [2024-06-15 22:06:55,766][1648981] Fps is (10 sec: 42597.8, 60 sec: 48605.9, 300 sec: 48207.8). Total num frames: 1824030720. Throughput: 0: 11923.9. Samples: 456058880. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:06:55,767][1648981] Avg episode reward: [(0, '1106.550')] [2024-06-15 22:06:56,108][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000890656_1824063488.pth... [2024-06-15 22:06:56,270][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000885088_1812660224.pth [2024-06-15 22:06:57,290][1651669] Updated weights for policy 0, policy_version 890704 (0.0087) [2024-06-15 22:07:00,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 1824260096. Throughput: 0: 12014.9. Samples: 456133632. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:00,767][1648981] Avg episode reward: [(0, '1090.700')] [2024-06-15 22:07:01,891][1651274] Signal inference workers to stop experience collection... (46800 times) [2024-06-15 22:07:01,945][1651669] InferenceWorker_p0-w0: stopping experience collection (46800 times) [2024-06-15 22:07:02,238][1651274] Signal inference workers to resume experience collection... (46800 times) [2024-06-15 22:07:02,239][1651669] InferenceWorker_p0-w0: resuming experience collection (46800 times) [2024-06-15 22:07:02,241][1651669] Updated weights for policy 0, policy_version 890768 (0.0013) [2024-06-15 22:07:04,310][1651669] Updated weights for policy 0, policy_version 890837 (0.0015) [2024-06-15 22:07:05,257][1651669] Updated weights for policy 0, policy_version 890876 (0.0014) [2024-06-15 22:07:05,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 1824522240. Throughput: 0: 11798.7. Samples: 456205312. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:05,767][1648981] Avg episode reward: [(0, '1135.970')] [2024-06-15 22:07:07,874][1651669] Updated weights for policy 0, policy_version 890944 (0.0013) [2024-06-15 22:07:08,920][1651669] Updated weights for policy 0, policy_version 890995 (0.0012) [2024-06-15 22:07:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1824784384. Throughput: 0: 11764.6. Samples: 456231424. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:10,767][1648981] Avg episode reward: [(0, '1118.290')] [2024-06-15 22:07:13,385][1651669] Updated weights for policy 0, policy_version 891025 (0.0011) [2024-06-15 22:07:14,706][1651669] Updated weights for policy 0, policy_version 891088 (0.0013) [2024-06-15 22:07:15,767][1648981] Fps is (10 sec: 49149.8, 60 sec: 47513.2, 300 sec: 48097.8). Total num frames: 1825013760. Throughput: 0: 12174.1. Samples: 456324096. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:15,769][1648981] Avg episode reward: [(0, '1124.550')] [2024-06-15 22:07:15,907][1651669] Updated weights for policy 0, policy_version 891133 (0.0014) [2024-06-15 22:07:17,216][1651669] Updated weights for policy 0, policy_version 891184 (0.0012) [2024-06-15 22:07:18,412][1651669] Updated weights for policy 0, policy_version 891223 (0.0012) [2024-06-15 22:07:19,279][1651669] Updated weights for policy 0, policy_version 891263 (0.0010) [2024-06-15 22:07:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 1825308672. Throughput: 0: 12288.0. Samples: 456393728. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:20,767][1648981] Avg episode reward: [(0, '1124.550')] [2024-06-15 22:07:24,777][1651669] Updated weights for policy 0, policy_version 891315 (0.0017) [2024-06-15 22:07:25,766][1648981] Fps is (10 sec: 49153.7, 60 sec: 46967.3, 300 sec: 48096.8). Total num frames: 1825505280. Throughput: 0: 12356.2. Samples: 456441856. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:25,767][1648981] Avg episode reward: [(0, '1147.070')] [2024-06-15 22:07:26,670][1651669] Updated weights for policy 0, policy_version 891393 (0.0141) [2024-06-15 22:07:28,396][1651669] Updated weights for policy 0, policy_version 891459 (0.0013) [2024-06-15 22:07:29,618][1651669] Updated weights for policy 0, policy_version 891520 (0.0013) [2024-06-15 22:07:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.2, 300 sec: 48541.1). Total num frames: 1825832960. Throughput: 0: 11935.3. Samples: 456496128. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:30,767][1648981] Avg episode reward: [(0, '1118.350')] [2024-06-15 22:07:35,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 45329.0, 300 sec: 47874.6). Total num frames: 1825931264. Throughput: 0: 12322.1. Samples: 456587264. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:35,767][1648981] Avg episode reward: [(0, '1074.000')] [2024-06-15 22:07:35,854][1651669] Updated weights for policy 0, policy_version 891584 (0.0017) [2024-06-15 22:07:37,434][1651669] Updated weights for policy 0, policy_version 891654 (0.0088) [2024-06-15 22:07:37,719][1651274] Signal inference workers to stop experience collection... (46850 times) [2024-06-15 22:07:37,760][1651669] InferenceWorker_p0-w0: stopping experience collection (46850 times) [2024-06-15 22:07:38,046][1651274] Signal inference workers to resume experience collection... (46850 times) [2024-06-15 22:07:38,047][1651669] InferenceWorker_p0-w0: resuming experience collection (46850 times) [2024-06-15 22:07:39,873][1651669] Updated weights for policy 0, policy_version 891713 (0.0013) [2024-06-15 22:07:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 50244.2, 300 sec: 48318.9). Total num frames: 1826291712. Throughput: 0: 12231.1. Samples: 456609280. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:40,767][1648981] Avg episode reward: [(0, '1123.500')] [2024-06-15 22:07:41,153][1651669] Updated weights for policy 0, policy_version 891765 (0.0011) [2024-06-15 22:07:45,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 45875.3, 300 sec: 47541.4). Total num frames: 1826357248. Throughput: 0: 12379.0. Samples: 456690688. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:45,767][1648981] Avg episode reward: [(0, '1075.450')] [2024-06-15 22:07:46,218][1651669] Updated weights for policy 0, policy_version 891808 (0.0012) [2024-06-15 22:07:47,375][1651669] Updated weights for policy 0, policy_version 891858 (0.0011) [2024-06-15 22:07:49,030][1651669] Updated weights for policy 0, policy_version 891936 (0.0010) [2024-06-15 22:07:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1826750464. Throughput: 0: 12288.0. Samples: 456758272. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:50,767][1648981] Avg episode reward: [(0, '1063.160')] [2024-06-15 22:07:52,041][1651669] Updated weights for policy 0, policy_version 892016 (0.0131) [2024-06-15 22:07:55,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 47513.7, 300 sec: 47985.7). Total num frames: 1826881536. Throughput: 0: 12435.9. Samples: 456791040. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:07:55,767][1648981] Avg episode reward: [(0, '1082.140')] [2024-06-15 22:07:57,052][1651669] Updated weights for policy 0, policy_version 892064 (0.0011) [2024-06-15 22:07:58,831][1651669] Updated weights for policy 0, policy_version 892144 (0.0015) [2024-06-15 22:08:00,374][1651669] Updated weights for policy 0, policy_version 892213 (0.0011) [2024-06-15 22:08:00,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 50244.3, 300 sec: 48207.8). Total num frames: 1827274752. Throughput: 0: 12015.1. Samples: 456864768. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:08:00,767][1648981] Avg episode reward: [(0, '1061.730')] [2024-06-15 22:08:02,476][1651669] Updated weights for policy 0, policy_version 892256 (0.0016) [2024-06-15 22:08:05,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1827405824. Throughput: 0: 12014.9. Samples: 456934400. Policy #0 lag: (min: 31.0, avg: 124.7, max: 287.0) [2024-06-15 22:08:05,767][1648981] Avg episode reward: [(0, '1041.730')] [2024-06-15 22:08:08,033][1651669] Updated weights for policy 0, policy_version 892305 (0.0012) [2024-06-15 22:08:09,090][1651669] Updated weights for policy 0, policy_version 892357 (0.0014) [2024-06-15 22:08:10,349][1651669] Updated weights for policy 0, policy_version 892416 (0.0012) [2024-06-15 22:08:10,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 47874.6). Total num frames: 1827700736. Throughput: 0: 11935.3. Samples: 456978944. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:10,767][1648981] Avg episode reward: [(0, '1018.180')] [2024-06-15 22:08:11,594][1651669] Updated weights for policy 0, policy_version 892478 (0.0017) [2024-06-15 22:08:13,587][1651669] Updated weights for policy 0, policy_version 892532 (0.0015) [2024-06-15 22:08:15,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48606.2, 300 sec: 47985.7). Total num frames: 1827930112. Throughput: 0: 12276.6. Samples: 457048576. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:15,767][1648981] Avg episode reward: [(0, '980.410')] [2024-06-15 22:08:18,434][1651669] Updated weights for policy 0, policy_version 892562 (0.0012) [2024-06-15 22:08:19,304][1651274] Signal inference workers to stop experience collection... (46900 times) [2024-06-15 22:08:19,371][1651669] InferenceWorker_p0-w0: stopping experience collection (46900 times) [2024-06-15 22:08:19,504][1651274] Signal inference workers to resume experience collection... (46900 times) [2024-06-15 22:08:19,504][1651669] InferenceWorker_p0-w0: resuming experience collection (46900 times) [2024-06-15 22:08:20,226][1651669] Updated weights for policy 0, policy_version 892640 (0.0011) [2024-06-15 22:08:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 1828159488. Throughput: 0: 11867.0. Samples: 457121280. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:20,767][1648981] Avg episode reward: [(0, '982.140')] [2024-06-15 22:08:22,420][1651669] Updated weights for policy 0, policy_version 892731 (0.0011) [2024-06-15 22:08:24,657][1651669] Updated weights for policy 0, policy_version 892771 (0.0010) [2024-06-15 22:08:25,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1828454400. Throughput: 0: 12083.2. Samples: 457153024. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:25,767][1648981] Avg episode reward: [(0, '971.450')] [2024-06-15 22:08:30,416][1651669] Updated weights for policy 0, policy_version 892834 (0.0017) [2024-06-15 22:08:30,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 45329.2, 300 sec: 47874.6). Total num frames: 1828552704. Throughput: 0: 12049.1. Samples: 457232896. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:30,767][1648981] Avg episode reward: [(0, '949.710')] [2024-06-15 22:08:32,151][1651669] Updated weights for policy 0, policy_version 892912 (0.0075) [2024-06-15 22:08:33,854][1651669] Updated weights for policy 0, policy_version 892986 (0.0207) [2024-06-15 22:08:35,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 49152.0, 300 sec: 47653.0). Total num frames: 1828880384. Throughput: 0: 11832.9. Samples: 457290752. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:35,767][1648981] Avg episode reward: [(0, '906.600')] [2024-06-15 22:08:36,455][1651669] Updated weights for policy 0, policy_version 893051 (0.0017) [2024-06-15 22:08:40,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 45329.1, 300 sec: 47985.7). Total num frames: 1829011456. Throughput: 0: 12037.7. Samples: 457332736. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:40,767][1648981] Avg episode reward: [(0, '902.590')] [2024-06-15 22:08:42,281][1651669] Updated weights for policy 0, policy_version 893136 (0.0012) [2024-06-15 22:08:43,693][1651669] Updated weights for policy 0, policy_version 893200 (0.0011) [2024-06-15 22:08:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 50244.2, 300 sec: 47874.6). Total num frames: 1829371904. Throughput: 0: 11639.5. Samples: 457388544. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:45,767][1648981] Avg episode reward: [(0, '885.770')] [2024-06-15 22:08:46,823][1651669] Updated weights for policy 0, policy_version 893264 (0.0015) [2024-06-15 22:08:50,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1829502976. Throughput: 0: 11958.0. Samples: 457472512. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:50,767][1648981] Avg episode reward: [(0, '874.660')] [2024-06-15 22:08:51,309][1651669] Updated weights for policy 0, policy_version 893315 (0.0069) [2024-06-15 22:08:52,708][1651669] Updated weights for policy 0, policy_version 893374 (0.0012) [2024-06-15 22:08:54,058][1651669] Updated weights for policy 0, policy_version 893414 (0.0012) [2024-06-15 22:08:55,520][1651669] Updated weights for policy 0, policy_version 893475 (0.0011) [2024-06-15 22:08:55,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 49697.9, 300 sec: 48207.8). Total num frames: 1829863424. Throughput: 0: 11696.3. Samples: 457505280. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:08:55,767][1648981] Avg episode reward: [(0, '886.350')] [2024-06-15 22:08:56,053][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000893504_1829896192.pth... [2024-06-15 22:08:56,092][1651669] Updated weights for policy 0, policy_version 893504 (0.0015) [2024-06-15 22:08:56,112][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000887872_1818361856.pth [2024-06-15 22:08:57,546][1651274] Signal inference workers to stop experience collection... (46950 times) [2024-06-15 22:08:57,602][1651669] InferenceWorker_p0-w0: stopping experience collection (46950 times) [2024-06-15 22:08:57,777][1651274] Signal inference workers to resume experience collection... (46950 times) [2024-06-15 22:08:57,778][1651669] InferenceWorker_p0-w0: resuming experience collection (46950 times) [2024-06-15 22:09:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1830027264. Throughput: 0: 11764.6. Samples: 457577984. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:00,767][1648981] Avg episode reward: [(0, '870.490')] [2024-06-15 22:09:02,116][1651669] Updated weights for policy 0, policy_version 893569 (0.0012) [2024-06-15 22:09:03,524][1651669] Updated weights for policy 0, policy_version 893622 (0.0013) [2024-06-15 22:09:04,965][1651669] Updated weights for policy 0, policy_version 893680 (0.0054) [2024-06-15 22:09:05,766][1648981] Fps is (10 sec: 45876.9, 60 sec: 48605.9, 300 sec: 48207.8). Total num frames: 1830322176. Throughput: 0: 11696.4. Samples: 457647616. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:05,767][1648981] Avg episode reward: [(0, '864.170')] [2024-06-15 22:09:06,670][1651669] Updated weights for policy 0, policy_version 893750 (0.0014) [2024-06-15 22:09:09,345][1651669] Updated weights for policy 0, policy_version 893793 (0.0119) [2024-06-15 22:09:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1830551552. Throughput: 0: 11753.3. Samples: 457681920. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:10,767][1648981] Avg episode reward: [(0, '862.580')] [2024-06-15 22:09:14,380][1651669] Updated weights for policy 0, policy_version 893864 (0.0090) [2024-06-15 22:09:15,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1830748160. Throughput: 0: 11616.7. Samples: 457755648. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:15,767][1648981] Avg episode reward: [(0, '835.100')] [2024-06-15 22:09:15,828][1651669] Updated weights for policy 0, policy_version 893923 (0.0011) [2024-06-15 22:09:18,022][1651669] Updated weights for policy 0, policy_version 894009 (0.0011) [2024-06-15 22:09:20,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 46421.3, 300 sec: 47541.4). Total num frames: 1830944768. Throughput: 0: 11616.7. Samples: 457813504. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:20,767][1648981] Avg episode reward: [(0, '786.690')] [2024-06-15 22:09:21,566][1651669] Updated weights for policy 0, policy_version 894049 (0.0013) [2024-06-15 22:09:25,766][1648981] Fps is (10 sec: 36044.5, 60 sec: 44236.8, 300 sec: 47652.4). Total num frames: 1831108608. Throughput: 0: 11571.2. Samples: 457853440. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:25,767][1648981] Avg episode reward: [(0, '835.870')] [2024-06-15 22:09:26,887][1651669] Updated weights for policy 0, policy_version 894160 (0.0013) [2024-06-15 22:09:29,705][1651669] Updated weights for policy 0, policy_version 894256 (0.0171) [2024-06-15 22:09:30,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 1831469056. Throughput: 0: 11514.3. Samples: 457906688. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:30,767][1648981] Avg episode reward: [(0, '810.030')] [2024-06-15 22:09:33,120][1651669] Updated weights for policy 0, policy_version 894290 (0.0011) [2024-06-15 22:09:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 45329.1, 300 sec: 47652.4). Total num frames: 1831600128. Throughput: 0: 11434.7. Samples: 457987072. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:35,767][1648981] Avg episode reward: [(0, '804.980')] [2024-06-15 22:09:37,589][1651669] Updated weights for policy 0, policy_version 894352 (0.0013) [2024-06-15 22:09:39,584][1651669] Updated weights for policy 0, policy_version 894419 (0.0014) [2024-06-15 22:09:39,889][1651274] Signal inference workers to stop experience collection... (47000 times) [2024-06-15 22:09:39,959][1651669] InferenceWorker_p0-w0: stopping experience collection (47000 times) [2024-06-15 22:09:40,154][1651274] Signal inference workers to resume experience collection... (47000 times) [2024-06-15 22:09:40,155][1651669] InferenceWorker_p0-w0: resuming experience collection (47000 times) [2024-06-15 22:09:40,766][1648981] Fps is (10 sec: 39321.0, 60 sec: 47513.5, 300 sec: 47208.1). Total num frames: 1831862272. Throughput: 0: 11446.1. Samples: 458020352. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:40,767][1648981] Avg episode reward: [(0, '820.910')] [2024-06-15 22:09:41,028][1651669] Updated weights for policy 0, policy_version 894480 (0.0014) [2024-06-15 22:09:42,316][1651669] Updated weights for policy 0, policy_version 894528 (0.0012) [2024-06-15 22:09:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1832124416. Throughput: 0: 11377.8. Samples: 458089984. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:45,767][1648981] Avg episode reward: [(0, '812.330')] [2024-06-15 22:09:48,771][1651669] Updated weights for policy 0, policy_version 894608 (0.0016) [2024-06-15 22:09:50,116][1651669] Updated weights for policy 0, policy_version 894672 (0.0011) [2024-06-15 22:09:50,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1832321024. Throughput: 0: 11298.1. Samples: 458156032. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:50,767][1648981] Avg episode reward: [(0, '802.960')] [2024-06-15 22:09:52,195][1651669] Updated weights for policy 0, policy_version 894752 (0.0013) [2024-06-15 22:09:55,766][1648981] Fps is (10 sec: 39321.4, 60 sec: 44237.0, 300 sec: 47541.4). Total num frames: 1832517632. Throughput: 0: 11104.7. Samples: 458181632. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:09:55,767][1648981] Avg episode reward: [(0, '756.970')] [2024-06-15 22:09:56,836][1651669] Updated weights for policy 0, policy_version 894832 (0.0149) [2024-06-15 22:10:00,767][1648981] Fps is (10 sec: 39321.1, 60 sec: 44782.8, 300 sec: 47430.3). Total num frames: 1832714240. Throughput: 0: 11229.8. Samples: 458260992. Policy #0 lag: (min: 15.0, avg: 75.5, max: 271.0) [2024-06-15 22:10:00,767][1648981] Avg episode reward: [(0, '782.450')] [2024-06-15 22:10:01,247][1651669] Updated weights for policy 0, policy_version 894912 (0.0016) [2024-06-15 22:10:02,522][1651669] Updated weights for policy 0, policy_version 894960 (0.0012) [2024-06-15 22:10:04,218][1651669] Updated weights for policy 0, policy_version 895024 (0.0010) [2024-06-15 22:10:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45329.0, 300 sec: 47541.4). Total num frames: 1833041920. Throughput: 0: 11309.5. Samples: 458322432. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:05,767][1648981] Avg episode reward: [(0, '751.780')] [2024-06-15 22:10:08,240][1651669] Updated weights for policy 0, policy_version 895077 (0.0013) [2024-06-15 22:10:10,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 43690.6, 300 sec: 47541.4). Total num frames: 1833172992. Throughput: 0: 11207.1. Samples: 458357760. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:10,767][1648981] Avg episode reward: [(0, '774.410')] [2024-06-15 22:10:11,852][1651669] Updated weights for policy 0, policy_version 895136 (0.0014) [2024-06-15 22:10:13,216][1651669] Updated weights for policy 0, policy_version 895200 (0.0014) [2024-06-15 22:10:14,009][1651669] Updated weights for policy 0, policy_version 895232 (0.0011) [2024-06-15 22:10:15,317][1651669] Updated weights for policy 0, policy_version 895295 (0.0013) [2024-06-15 22:10:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1833566208. Throughput: 0: 11673.6. Samples: 458432000. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:15,767][1648981] Avg episode reward: [(0, '805.940')] [2024-06-15 22:10:19,631][1651669] Updated weights for policy 0, policy_version 895360 (0.0013) [2024-06-15 22:10:20,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 45875.2, 300 sec: 47652.5). Total num frames: 1833697280. Throughput: 0: 11537.1. Samples: 458506240. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:20,767][1648981] Avg episode reward: [(0, '822.030')] [2024-06-15 22:10:21,760][1651274] Signal inference workers to stop experience collection... (47050 times) [2024-06-15 22:10:21,825][1651669] InferenceWorker_p0-w0: stopping experience collection (47050 times) [2024-06-15 22:10:21,940][1651274] Signal inference workers to resume experience collection... (47050 times) [2024-06-15 22:10:21,941][1651669] InferenceWorker_p0-w0: resuming experience collection (47050 times) [2024-06-15 22:10:22,477][1651669] Updated weights for policy 0, policy_version 895410 (0.0018) [2024-06-15 22:10:24,010][1651669] Updated weights for policy 0, policy_version 895486 (0.0021) [2024-06-15 22:10:25,632][1651669] Updated weights for policy 0, policy_version 895537 (0.0018) [2024-06-15 22:10:25,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1834057728. Throughput: 0: 11628.1. Samples: 458543616. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:25,767][1648981] Avg episode reward: [(0, '856.330')] [2024-06-15 22:10:29,236][1651669] Updated weights for policy 0, policy_version 895585 (0.0013) [2024-06-15 22:10:30,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1834221568. Throughput: 0: 11787.4. Samples: 458620416. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:30,767][1648981] Avg episode reward: [(0, '881.070')] [2024-06-15 22:10:32,418][1651669] Updated weights for policy 0, policy_version 895633 (0.0011) [2024-06-15 22:10:33,892][1651669] Updated weights for policy 0, policy_version 895712 (0.0011) [2024-06-15 22:10:35,309][1651669] Updated weights for policy 0, policy_version 895760 (0.0019) [2024-06-15 22:10:35,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 49152.1, 300 sec: 47541.4). Total num frames: 1834549248. Throughput: 0: 11969.4. Samples: 458694656. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:35,767][1648981] Avg episode reward: [(0, '873.840')] [2024-06-15 22:10:39,015][1651669] Updated weights for policy 0, policy_version 895809 (0.0014) [2024-06-15 22:10:40,421][1651669] Updated weights for policy 0, policy_version 895872 (0.0011) [2024-06-15 22:10:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48059.8, 300 sec: 47985.7). Total num frames: 1834745856. Throughput: 0: 12265.3. Samples: 458733568. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:40,767][1648981] Avg episode reward: [(0, '882.880')] [2024-06-15 22:10:43,859][1651669] Updated weights for policy 0, policy_version 895928 (0.0013) [2024-06-15 22:10:45,211][1651669] Updated weights for policy 0, policy_version 895991 (0.0012) [2024-06-15 22:10:45,767][1648981] Fps is (10 sec: 45873.6, 60 sec: 48059.6, 300 sec: 47652.4). Total num frames: 1835008000. Throughput: 0: 12105.9. Samples: 458805760. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:45,768][1648981] Avg episode reward: [(0, '845.980')] [2024-06-15 22:10:46,187][1651669] Updated weights for policy 0, policy_version 896032 (0.0048) [2024-06-15 22:10:50,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 47513.6, 300 sec: 47652.5). Total num frames: 1835171840. Throughput: 0: 12447.3. Samples: 458882560. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:50,767][1648981] Avg episode reward: [(0, '817.610')] [2024-06-15 22:10:50,782][1651669] Updated weights for policy 0, policy_version 896082 (0.0012) [2024-06-15 22:10:54,090][1651669] Updated weights for policy 0, policy_version 896160 (0.0013) [2024-06-15 22:10:55,630][1651669] Updated weights for policy 0, policy_version 896209 (0.0012) [2024-06-15 22:10:55,766][1648981] Fps is (10 sec: 42599.3, 60 sec: 48605.9, 300 sec: 47652.5). Total num frames: 1835433984. Throughput: 0: 12379.0. Samples: 458914816. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:10:55,767][1648981] Avg episode reward: [(0, '829.600')] [2024-06-15 22:10:56,146][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000896240_1835499520.pth... [2024-06-15 22:10:56,188][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000890656_1824063488.pth [2024-06-15 22:10:56,195][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000896240_1835499520.pth [2024-06-15 22:10:56,516][1651669] Updated weights for policy 0, policy_version 896256 (0.0012) [2024-06-15 22:10:59,019][1651669] Updated weights for policy 0, policy_version 896317 (0.0013) [2024-06-15 22:11:00,768][1648981] Fps is (10 sec: 49142.4, 60 sec: 49150.5, 300 sec: 47541.1). Total num frames: 1835663360. Throughput: 0: 12003.0. Samples: 458972160. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:00,769][1648981] Avg episode reward: [(0, '809.360')] [2024-06-15 22:11:02,666][1651274] Signal inference workers to stop experience collection... (47100 times) [2024-06-15 22:11:02,706][1651669] InferenceWorker_p0-w0: stopping experience collection (47100 times) [2024-06-15 22:11:02,982][1651274] Signal inference workers to resume experience collection... (47100 times) [2024-06-15 22:11:02,983][1651669] InferenceWorker_p0-w0: resuming experience collection (47100 times) [2024-06-15 22:11:03,676][1651669] Updated weights for policy 0, policy_version 896382 (0.0014) [2024-06-15 22:11:05,564][1651669] Updated weights for policy 0, policy_version 896432 (0.0037) [2024-06-15 22:11:05,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 47513.6, 300 sec: 47652.4). Total num frames: 1835892736. Throughput: 0: 11889.8. Samples: 459041280. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:05,767][1648981] Avg episode reward: [(0, '857.480')] [2024-06-15 22:11:08,208][1651669] Updated weights for policy 0, policy_version 896496 (0.0012) [2024-06-15 22:11:10,766][1648981] Fps is (10 sec: 45883.7, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1836122112. Throughput: 0: 11798.8. Samples: 459074560. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:10,767][1648981] Avg episode reward: [(0, '853.500')] [2024-06-15 22:11:10,784][1651669] Updated weights for policy 0, policy_version 896548 (0.0015) [2024-06-15 22:11:14,263][1651669] Updated weights for policy 0, policy_version 896608 (0.0017) [2024-06-15 22:11:15,707][1651669] Updated weights for policy 0, policy_version 896672 (0.0024) [2024-06-15 22:11:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 46967.4, 300 sec: 47652.4). Total num frames: 1836384256. Throughput: 0: 11878.4. Samples: 459154944. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:15,767][1648981] Avg episode reward: [(0, '832.020')] [2024-06-15 22:11:18,005][1651669] Updated weights for policy 0, policy_version 896710 (0.0014) [2024-06-15 22:11:19,185][1651669] Updated weights for policy 0, policy_version 896762 (0.0010) [2024-06-15 22:11:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1836580864. Throughput: 0: 11901.1. Samples: 459230208. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:20,767][1648981] Avg episode reward: [(0, '831.170')] [2024-06-15 22:11:21,813][1651669] Updated weights for policy 0, policy_version 896828 (0.0010) [2024-06-15 22:11:24,855][1651669] Updated weights for policy 0, policy_version 896867 (0.0020) [2024-06-15 22:11:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 47652.4). Total num frames: 1836875776. Throughput: 0: 11855.6. Samples: 459267072. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:25,767][1648981] Avg episode reward: [(0, '842.210')] [2024-06-15 22:11:26,248][1651669] Updated weights for policy 0, policy_version 896937 (0.0013) [2024-06-15 22:11:28,370][1651669] Updated weights for policy 0, policy_version 896977 (0.0011) [2024-06-15 22:11:29,371][1651669] Updated weights for policy 0, policy_version 897024 (0.0012) [2024-06-15 22:11:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47097.1). Total num frames: 1837105152. Throughput: 0: 11821.6. Samples: 459337728. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:30,767][1648981] Avg episode reward: [(0, '892.170')] [2024-06-15 22:11:35,085][1651669] Updated weights for policy 0, policy_version 897090 (0.0032) [2024-06-15 22:11:35,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 45875.1, 300 sec: 47541.4). Total num frames: 1837301760. Throughput: 0: 11844.3. Samples: 459415552. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:35,767][1648981] Avg episode reward: [(0, '959.640')] [2024-06-15 22:11:36,018][1651669] Updated weights for policy 0, policy_version 897139 (0.0011) [2024-06-15 22:11:37,432][1651669] Updated weights for policy 0, policy_version 897200 (0.0028) [2024-06-15 22:11:39,443][1651669] Updated weights for policy 0, policy_version 897233 (0.0012) [2024-06-15 22:11:40,409][1651669] Updated weights for policy 0, policy_version 897279 (0.0013) [2024-06-15 22:11:40,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1837629440. Throughput: 0: 11889.8. Samples: 459449856. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:40,767][1648981] Avg episode reward: [(0, '915.600')] [2024-06-15 22:11:43,192][1651669] Updated weights for policy 0, policy_version 897341 (0.0027) [2024-06-15 22:11:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 45875.3, 300 sec: 47430.3). Total num frames: 1837760512. Throughput: 0: 12197.5. Samples: 459521024. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:45,767][1648981] Avg episode reward: [(0, '892.640')] [2024-06-15 22:11:45,797][1651274] Signal inference workers to stop experience collection... (47150 times) [2024-06-15 22:11:45,858][1651669] InferenceWorker_p0-w0: stopping experience collection (47150 times) [2024-06-15 22:11:46,013][1651274] Signal inference workers to resume experience collection... (47150 times) [2024-06-15 22:11:46,014][1651669] InferenceWorker_p0-w0: resuming experience collection (47150 times) [2024-06-15 22:11:46,841][1651669] Updated weights for policy 0, policy_version 897395 (0.0013) [2024-06-15 22:11:47,951][1651669] Updated weights for policy 0, policy_version 897443 (0.0013) [2024-06-15 22:11:50,439][1651669] Updated weights for policy 0, policy_version 897504 (0.0011) [2024-06-15 22:11:50,769][1648981] Fps is (10 sec: 45862.3, 60 sec: 48603.5, 300 sec: 47652.0). Total num frames: 1838088192. Throughput: 0: 12207.6. Samples: 459590656. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:50,770][1648981] Avg episode reward: [(0, '914.000')] [2024-06-15 22:11:53,139][1651669] Updated weights for policy 0, policy_version 897541 (0.0053) [2024-06-15 22:11:55,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1838284800. Throughput: 0: 12458.7. Samples: 459635200. Policy #0 lag: (min: 62.0, avg: 130.2, max: 303.0) [2024-06-15 22:11:55,767][1648981] Avg episode reward: [(0, '952.290')] [2024-06-15 22:11:56,069][1651669] Updated weights for policy 0, policy_version 897602 (0.0010) [2024-06-15 22:11:57,306][1651669] Updated weights for policy 0, policy_version 897664 (0.0012) [2024-06-15 22:11:58,694][1651669] Updated weights for policy 0, policy_version 897724 (0.0024) [2024-06-15 22:12:00,742][1651669] Updated weights for policy 0, policy_version 897782 (0.0019) [2024-06-15 22:12:00,766][1648981] Fps is (10 sec: 55721.3, 60 sec: 49699.7, 300 sec: 47874.6). Total num frames: 1838645248. Throughput: 0: 12231.1. Samples: 459705344. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:00,767][1648981] Avg episode reward: [(0, '997.570')] [2024-06-15 22:12:04,965][1651669] Updated weights for policy 0, policy_version 897826 (0.0012) [2024-06-15 22:12:05,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48605.8, 300 sec: 47541.4). Total num frames: 1838809088. Throughput: 0: 12117.3. Samples: 459775488. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:05,767][1648981] Avg episode reward: [(0, '951.140')] [2024-06-15 22:12:07,125][1651669] Updated weights for policy 0, policy_version 897857 (0.0013) [2024-06-15 22:12:09,275][1651669] Updated weights for policy 0, policy_version 897968 (0.0095) [2024-06-15 22:12:10,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 1839071232. Throughput: 0: 12140.1. Samples: 459813376. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:10,767][1648981] Avg episode reward: [(0, '990.060')] [2024-06-15 22:12:10,821][1651669] Updated weights for policy 0, policy_version 897987 (0.0010) [2024-06-15 22:12:11,728][1651669] Updated weights for policy 0, policy_version 898039 (0.0014) [2024-06-15 22:12:15,231][1651669] Updated weights for policy 0, policy_version 898080 (0.0010) [2024-06-15 22:12:15,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48605.9, 300 sec: 47430.3). Total num frames: 1839300608. Throughput: 0: 12367.7. Samples: 459894272. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:15,767][1648981] Avg episode reward: [(0, '988.250')] [2024-06-15 22:12:16,138][1651669] Updated weights for policy 0, policy_version 898111 (0.0010) [2024-06-15 22:12:18,284][1651669] Updated weights for policy 0, policy_version 898162 (0.0012) [2024-06-15 22:12:19,588][1651669] Updated weights for policy 0, policy_version 898225 (0.0012) [2024-06-15 22:12:20,767][1648981] Fps is (10 sec: 52426.6, 60 sec: 50243.9, 300 sec: 47763.5). Total num frames: 1839595520. Throughput: 0: 12128.6. Samples: 459961344. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:20,768][1648981] Avg episode reward: [(0, '1055.920')] [2024-06-15 22:12:21,772][1651669] Updated weights for policy 0, policy_version 898260 (0.0010) [2024-06-15 22:12:25,193][1651669] Updated weights for policy 0, policy_version 898305 (0.0015) [2024-06-15 22:12:25,618][1651274] Signal inference workers to stop experience collection... (47200 times) [2024-06-15 22:12:25,684][1651669] InferenceWorker_p0-w0: stopping experience collection (47200 times) [2024-06-15 22:12:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47208.1). Total num frames: 1839759360. Throughput: 0: 12356.3. Samples: 460005888. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:25,767][1648981] Avg episode reward: [(0, '1025.580')] [2024-06-15 22:12:25,968][1651274] Signal inference workers to resume experience collection... (47200 times) [2024-06-15 22:12:25,968][1651669] InferenceWorker_p0-w0: resuming experience collection (47200 times) [2024-06-15 22:12:26,694][1651669] Updated weights for policy 0, policy_version 898365 (0.0111) [2024-06-15 22:12:28,949][1651669] Updated weights for policy 0, policy_version 898416 (0.0014) [2024-06-15 22:12:30,663][1651669] Updated weights for policy 0, policy_version 898481 (0.0014) [2024-06-15 22:12:30,766][1648981] Fps is (10 sec: 49154.0, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1840087040. Throughput: 0: 12299.4. Samples: 460074496. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:30,767][1648981] Avg episode reward: [(0, '1038.330')] [2024-06-15 22:12:33,047][1651669] Updated weights for policy 0, policy_version 898531 (0.0011) [2024-06-15 22:12:35,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49152.0, 300 sec: 47319.2). Total num frames: 1840250880. Throughput: 0: 12425.3. Samples: 460149760. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:35,767][1648981] Avg episode reward: [(0, '1040.370')] [2024-06-15 22:12:36,308][1651669] Updated weights for policy 0, policy_version 898576 (0.0013) [2024-06-15 22:12:38,963][1651669] Updated weights for policy 0, policy_version 898626 (0.0016) [2024-06-15 22:12:40,491][1651669] Updated weights for policy 0, policy_version 898691 (0.0011) [2024-06-15 22:12:40,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.8, 300 sec: 48096.7). Total num frames: 1840545792. Throughput: 0: 12322.1. Samples: 460189696. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:40,767][1648981] Avg episode reward: [(0, '1069.390')] [2024-06-15 22:12:41,703][1651669] Updated weights for policy 0, policy_version 898747 (0.0013) [2024-06-15 22:12:43,761][1651669] Updated weights for policy 0, policy_version 898784 (0.0139) [2024-06-15 22:12:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1840775168. Throughput: 0: 12140.1. Samples: 460251648. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:45,767][1648981] Avg episode reward: [(0, '1069.670')] [2024-06-15 22:12:46,975][1651669] Updated weights for policy 0, policy_version 898833 (0.0013) [2024-06-15 22:12:47,946][1651669] Updated weights for policy 0, policy_version 898880 (0.0016) [2024-06-15 22:12:50,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49154.3, 300 sec: 47985.7). Total num frames: 1841037312. Throughput: 0: 12333.5. Samples: 460330496. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:50,767][1648981] Avg episode reward: [(0, '1063.630')] [2024-06-15 22:12:51,162][1651669] Updated weights for policy 0, policy_version 898961 (0.0098) [2024-06-15 22:12:54,939][1651669] Updated weights for policy 0, policy_version 899057 (0.0079) [2024-06-15 22:12:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.3, 300 sec: 47541.4). Total num frames: 1841299456. Throughput: 0: 12197.0. Samples: 460362240. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:12:55,767][1648981] Avg episode reward: [(0, '1034.510')] [2024-06-15 22:12:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000899072_1841299456.pth... [2024-06-15 22:12:55,827][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000893504_1829896192.pth [2024-06-15 22:12:57,883][1651669] Updated weights for policy 0, policy_version 899088 (0.0013) [2024-06-15 22:12:58,674][1651669] Updated weights for policy 0, policy_version 899125 (0.0012) [2024-06-15 22:13:00,058][1651669] Updated weights for policy 0, policy_version 899152 (0.0012) [2024-06-15 22:13:00,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 1841496064. Throughput: 0: 12322.1. Samples: 460448768. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:00,767][1648981] Avg episode reward: [(0, '1064.750')] [2024-06-15 22:13:01,716][1651669] Updated weights for policy 0, policy_version 899218 (0.0012) [2024-06-15 22:13:05,564][1651669] Updated weights for policy 0, policy_version 899285 (0.0043) [2024-06-15 22:13:05,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.1, 300 sec: 47652.5). Total num frames: 1841758208. Throughput: 0: 12242.6. Samples: 460512256. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:05,767][1648981] Avg episode reward: [(0, '1048.160')] [2024-06-15 22:13:05,789][1651274] Signal inference workers to stop experience collection... (47250 times) [2024-06-15 22:13:05,820][1651669] InferenceWorker_p0-w0: stopping experience collection (47250 times) [2024-06-15 22:13:06,008][1651274] Signal inference workers to resume experience collection... (47250 times) [2024-06-15 22:13:06,009][1651669] InferenceWorker_p0-w0: resuming experience collection (47250 times) [2024-06-15 22:13:08,364][1651669] Updated weights for policy 0, policy_version 899344 (0.0012) [2024-06-15 22:13:10,767][1648981] Fps is (10 sec: 45873.8, 60 sec: 48059.5, 300 sec: 47541.3). Total num frames: 1841954816. Throughput: 0: 12094.5. Samples: 460550144. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:10,767][1648981] Avg episode reward: [(0, '1069.730')] [2024-06-15 22:13:11,436][1651669] Updated weights for policy 0, policy_version 899398 (0.0027) [2024-06-15 22:13:12,708][1651669] Updated weights for policy 0, policy_version 899456 (0.0014) [2024-06-15 22:13:14,152][1651669] Updated weights for policy 0, policy_version 899520 (0.0013) [2024-06-15 22:13:15,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48605.8, 300 sec: 47652.4). Total num frames: 1842216960. Throughput: 0: 12060.4. Samples: 460617216. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:15,767][1648981] Avg episode reward: [(0, '1075.590')] [2024-06-15 22:13:17,546][1651669] Updated weights for policy 0, policy_version 899582 (0.0010) [2024-06-15 22:13:20,022][1651669] Updated weights for policy 0, policy_version 899639 (0.0012) [2024-06-15 22:13:20,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 48060.0, 300 sec: 47541.4). Total num frames: 1842479104. Throughput: 0: 12014.9. Samples: 460690432. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:20,767][1648981] Avg episode reward: [(0, '1053.720')] [2024-06-15 22:13:22,616][1651669] Updated weights for policy 0, policy_version 899696 (0.0012) [2024-06-15 22:13:24,448][1651669] Updated weights for policy 0, policy_version 899773 (0.0113) [2024-06-15 22:13:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 48096.7). Total num frames: 1842741248. Throughput: 0: 11980.8. Samples: 460728832. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:25,767][1648981] Avg episode reward: [(0, '1002.340')] [2024-06-15 22:13:27,881][1651669] Updated weights for policy 0, policy_version 899810 (0.0011) [2024-06-15 22:13:30,520][1651669] Updated weights for policy 0, policy_version 899872 (0.0012) [2024-06-15 22:13:30,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 47513.7, 300 sec: 47652.5). Total num frames: 1842937856. Throughput: 0: 12333.5. Samples: 460806656. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:30,767][1648981] Avg episode reward: [(0, '1024.440')] [2024-06-15 22:13:32,439][1651669] Updated weights for policy 0, policy_version 899906 (0.0015) [2024-06-15 22:13:33,803][1651669] Updated weights for policy 0, policy_version 899969 (0.0012) [2024-06-15 22:13:35,108][1651669] Updated weights for policy 0, policy_version 900025 (0.0012) [2024-06-15 22:13:35,774][1648981] Fps is (10 sec: 52387.6, 60 sec: 50237.7, 300 sec: 48317.6). Total num frames: 1843265536. Throughput: 0: 12047.0. Samples: 460872704. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:35,775][1648981] Avg episode reward: [(0, '961.900')] [2024-06-15 22:13:39,146][1651669] Updated weights for policy 0, policy_version 900082 (0.0108) [2024-06-15 22:13:40,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1843396608. Throughput: 0: 12128.7. Samples: 460908032. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:40,767][1648981] Avg episode reward: [(0, '963.620')] [2024-06-15 22:13:41,164][1651669] Updated weights for policy 0, policy_version 900113 (0.0013) [2024-06-15 22:13:43,516][1651669] Updated weights for policy 0, policy_version 900165 (0.0020) [2024-06-15 22:13:44,768][1651669] Updated weights for policy 0, policy_version 900225 (0.0012) [2024-06-15 22:13:45,766][1648981] Fps is (10 sec: 45911.5, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1843724288. Throughput: 0: 11935.3. Samples: 460985856. Policy #0 lag: (min: 8.0, avg: 103.5, max: 264.0) [2024-06-15 22:13:45,767][1648981] Avg episode reward: [(0, '1014.570')] [2024-06-15 22:13:46,147][1651669] Updated weights for policy 0, policy_version 900286 (0.0135) [2024-06-15 22:13:48,709][1651274] Signal inference workers to stop experience collection... (47300 times) [2024-06-15 22:13:48,726][1651669] InferenceWorker_p0-w0: stopping experience collection (47300 times) [2024-06-15 22:13:49,016][1651274] Signal inference workers to resume experience collection... (47300 times) [2024-06-15 22:13:49,017][1651669] InferenceWorker_p0-w0: resuming experience collection (47300 times) [2024-06-15 22:13:49,941][1651669] Updated weights for policy 0, policy_version 900340 (0.0034) [2024-06-15 22:13:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 47652.5). Total num frames: 1843920896. Throughput: 0: 12026.3. Samples: 461053440. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:13:50,767][1648981] Avg episode reward: [(0, '1002.280')] [2024-06-15 22:13:52,444][1651669] Updated weights for policy 0, policy_version 900374 (0.0024) [2024-06-15 22:13:54,499][1651669] Updated weights for policy 0, policy_version 900436 (0.0012) [2024-06-15 22:13:55,524][1651669] Updated weights for policy 0, policy_version 900496 (0.0011) [2024-06-15 22:13:55,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1844215808. Throughput: 0: 11992.2. Samples: 461089792. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:13:55,767][1648981] Avg episode reward: [(0, '991.710')] [2024-06-15 22:13:56,485][1651669] Updated weights for policy 0, policy_version 900534 (0.0010) [2024-06-15 22:13:59,301][1651669] Updated weights for policy 0, policy_version 900579 (0.0012) [2024-06-15 22:14:00,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 49151.7, 300 sec: 47874.6). Total num frames: 1844445184. Throughput: 0: 12504.1. Samples: 461179904. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:00,767][1648981] Avg episode reward: [(0, '900.360')] [2024-06-15 22:14:01,725][1651669] Updated weights for policy 0, policy_version 900624 (0.0016) [2024-06-15 22:14:02,985][1651669] Updated weights for policy 0, policy_version 900668 (0.0031) [2024-06-15 22:14:05,654][1651669] Updated weights for policy 0, policy_version 900736 (0.0012) [2024-06-15 22:14:05,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 47985.7). Total num frames: 1844707328. Throughput: 0: 12379.0. Samples: 461247488. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:05,767][1648981] Avg episode reward: [(0, '940.900')] [2024-06-15 22:14:07,156][1651669] Updated weights for policy 0, policy_version 900795 (0.0011) [2024-06-15 22:14:10,340][1651669] Updated weights for policy 0, policy_version 900832 (0.0011) [2024-06-15 22:14:10,766][1648981] Fps is (10 sec: 49153.4, 60 sec: 49698.4, 300 sec: 48096.8). Total num frames: 1844936704. Throughput: 0: 12322.1. Samples: 461283328. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:10,767][1648981] Avg episode reward: [(0, '963.700')] [2024-06-15 22:14:12,612][1651669] Updated weights for policy 0, policy_version 900870 (0.0012) [2024-06-15 22:14:13,737][1651669] Updated weights for policy 0, policy_version 900923 (0.0012) [2024-06-15 22:14:15,359][1651669] Updated weights for policy 0, policy_version 900961 (0.0012) [2024-06-15 22:14:15,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1845198848. Throughput: 0: 12344.9. Samples: 461362176. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:15,767][1648981] Avg episode reward: [(0, '944.860')] [2024-06-15 22:14:16,943][1651669] Updated weights for policy 0, policy_version 901028 (0.0032) [2024-06-15 22:14:20,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49152.0, 300 sec: 48541.1). Total num frames: 1845428224. Throughput: 0: 12506.4. Samples: 461435392. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:20,767][1648981] Avg episode reward: [(0, '944.870')] [2024-06-15 22:14:20,893][1651669] Updated weights for policy 0, policy_version 901090 (0.0017) [2024-06-15 22:14:23,987][1651669] Updated weights for policy 0, policy_version 901125 (0.0011) [2024-06-15 22:14:25,164][1651669] Updated weights for policy 0, policy_version 901170 (0.0011) [2024-06-15 22:14:25,771][1648981] Fps is (10 sec: 42577.8, 60 sec: 48055.8, 300 sec: 47984.9). Total num frames: 1845624832. Throughput: 0: 12605.2. Samples: 461475328. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:25,772][1648981] Avg episode reward: [(0, '963.710')] [2024-06-15 22:14:26,668][1651669] Updated weights for policy 0, policy_version 901235 (0.0015) [2024-06-15 22:14:26,983][1651274] Signal inference workers to stop experience collection... (47350 times) [2024-06-15 22:14:27,016][1651669] InferenceWorker_p0-w0: stopping experience collection (47350 times) [2024-06-15 22:14:27,246][1651274] Signal inference workers to resume experience collection... (47350 times) [2024-06-15 22:14:27,247][1651669] InferenceWorker_p0-w0: resuming experience collection (47350 times) [2024-06-15 22:14:28,366][1651669] Updated weights for policy 0, policy_version 901309 (0.0020) [2024-06-15 22:14:30,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 49151.9, 300 sec: 48430.0). Total num frames: 1845886976. Throughput: 0: 12367.6. Samples: 461542400. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:30,767][1648981] Avg episode reward: [(0, '978.240')] [2024-06-15 22:14:31,915][1651669] Updated weights for policy 0, policy_version 901373 (0.0095) [2024-06-15 22:14:35,766][1648981] Fps is (10 sec: 45897.5, 60 sec: 46973.6, 300 sec: 48207.8). Total num frames: 1846083584. Throughput: 0: 12572.4. Samples: 461619200. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:35,767][1648981] Avg episode reward: [(0, '976.920')] [2024-06-15 22:14:36,113][1651669] Updated weights for policy 0, policy_version 901424 (0.0012) [2024-06-15 22:14:38,200][1651669] Updated weights for policy 0, policy_version 901509 (0.0012) [2024-06-15 22:14:39,474][1651669] Updated weights for policy 0, policy_version 901562 (0.0011) [2024-06-15 22:14:40,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1846411264. Throughput: 0: 12276.6. Samples: 461642240. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:40,767][1648981] Avg episode reward: [(0, '976.920')] [2024-06-15 22:14:43,133][1651669] Updated weights for policy 0, policy_version 901616 (0.0012) [2024-06-15 22:14:45,767][1648981] Fps is (10 sec: 45874.4, 60 sec: 46967.3, 300 sec: 48207.8). Total num frames: 1846542336. Throughput: 0: 11912.6. Samples: 461715968. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:45,767][1648981] Avg episode reward: [(0, '967.980')] [2024-06-15 22:14:46,671][1651669] Updated weights for policy 0, policy_version 901640 (0.0012) [2024-06-15 22:14:47,936][1651669] Updated weights for policy 0, policy_version 901700 (0.0048) [2024-06-15 22:14:49,835][1651669] Updated weights for policy 0, policy_version 901777 (0.0012) [2024-06-15 22:14:50,625][1651669] Updated weights for policy 0, policy_version 901819 (0.0012) [2024-06-15 22:14:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1846935552. Throughput: 0: 11923.9. Samples: 461784064. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:50,767][1648981] Avg episode reward: [(0, '974.480')] [2024-06-15 22:14:53,538][1651669] Updated weights for policy 0, policy_version 901881 (0.0173) [2024-06-15 22:14:55,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 47513.2, 300 sec: 48652.1). Total num frames: 1847066624. Throughput: 0: 12162.7. Samples: 461830656. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:14:55,767][1648981] Avg episode reward: [(0, '941.810')] [2024-06-15 22:14:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000901888_1847066624.pth... [2024-06-15 22:14:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000896240_1835499520.pth [2024-06-15 22:14:57,915][1651669] Updated weights for policy 0, policy_version 901938 (0.0011) [2024-06-15 22:14:59,420][1651669] Updated weights for policy 0, policy_version 902016 (0.0011) [2024-06-15 22:15:00,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 49152.1, 300 sec: 48652.1). Total num frames: 1847394304. Throughput: 0: 12014.9. Samples: 461902848. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:00,767][1648981] Avg episode reward: [(0, '979.470')] [2024-06-15 22:15:01,018][1651669] Updated weights for policy 0, policy_version 902068 (0.0011) [2024-06-15 22:15:04,021][1651669] Updated weights for policy 0, policy_version 902115 (0.0032) [2024-06-15 22:15:05,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1847590912. Throughput: 0: 12185.6. Samples: 461983744. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:05,767][1648981] Avg episode reward: [(0, '953.180')] [2024-06-15 22:15:07,358][1651669] Updated weights for policy 0, policy_version 902161 (0.0037) [2024-06-15 22:15:07,827][1651274] Signal inference workers to stop experience collection... (47400 times) [2024-06-15 22:15:07,880][1651669] InferenceWorker_p0-w0: stopping experience collection (47400 times) [2024-06-15 22:15:08,132][1651274] Signal inference workers to resume experience collection... (47400 times) [2024-06-15 22:15:08,132][1651669] InferenceWorker_p0-w0: resuming experience collection (47400 times) [2024-06-15 22:15:09,133][1651669] Updated weights for policy 0, policy_version 902240 (0.0120) [2024-06-15 22:15:10,330][1651669] Updated weights for policy 0, policy_version 902291 (0.0012) [2024-06-15 22:15:10,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 49697.8, 300 sec: 48652.1). Total num frames: 1847918592. Throughput: 0: 11982.0. Samples: 462014464. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:10,767][1648981] Avg episode reward: [(0, '972.800')] [2024-06-15 22:15:11,071][1651669] Updated weights for policy 0, policy_version 902333 (0.0012) [2024-06-15 22:15:15,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48606.0, 300 sec: 48874.3). Total num frames: 1848115200. Throughput: 0: 12242.5. Samples: 462093312. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:15,767][1648981] Avg episode reward: [(0, '952.740')] [2024-06-15 22:15:17,528][1651669] Updated weights for policy 0, policy_version 902401 (0.0014) [2024-06-15 22:15:19,570][1651669] Updated weights for policy 0, policy_version 902480 (0.0012) [2024-06-15 22:15:20,767][1648981] Fps is (10 sec: 42598.7, 60 sec: 48605.7, 300 sec: 48430.0). Total num frames: 1848344576. Throughput: 0: 12037.6. Samples: 462160896. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:20,767][1648981] Avg episode reward: [(0, '969.770')] [2024-06-15 22:15:21,170][1651669] Updated weights for policy 0, policy_version 902544 (0.0013) [2024-06-15 22:15:22,368][1651669] Updated weights for policy 0, policy_version 902591 (0.0015) [2024-06-15 22:15:25,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 49155.9, 300 sec: 48652.1). Total num frames: 1848573952. Throughput: 0: 12242.5. Samples: 462193152. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:25,767][1648981] Avg episode reward: [(0, '963.640')] [2024-06-15 22:15:26,036][1651669] Updated weights for policy 0, policy_version 902649 (0.0013) [2024-06-15 22:15:30,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1848770560. Throughput: 0: 12367.7. Samples: 462272512. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:30,767][1648981] Avg episode reward: [(0, '898.960')] [2024-06-15 22:15:31,213][1651669] Updated weights for policy 0, policy_version 902740 (0.0012) [2024-06-15 22:15:32,591][1651669] Updated weights for policy 0, policy_version 902786 (0.0012) [2024-06-15 22:15:33,643][1651669] Updated weights for policy 0, policy_version 902841 (0.0023) [2024-06-15 22:15:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1849032704. Throughput: 0: 12231.1. Samples: 462334464. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:35,767][1648981] Avg episode reward: [(0, '855.140')] [2024-06-15 22:15:37,254][1651669] Updated weights for policy 0, policy_version 902880 (0.0017) [2024-06-15 22:15:40,324][1651669] Updated weights for policy 0, policy_version 902928 (0.0012) [2024-06-15 22:15:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 46967.5, 300 sec: 48207.9). Total num frames: 1849229312. Throughput: 0: 12060.6. Samples: 462373376. Policy #0 lag: (min: 40.0, avg: 150.2, max: 296.0) [2024-06-15 22:15:40,767][1648981] Avg episode reward: [(0, '863.880')] [2024-06-15 22:15:41,671][1651669] Updated weights for policy 0, policy_version 902978 (0.0011) [2024-06-15 22:15:43,210][1651669] Updated weights for policy 0, policy_version 903032 (0.0015) [2024-06-15 22:15:44,290][1651669] Updated weights for policy 0, policy_version 903075 (0.0013) [2024-06-15 22:15:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.4, 300 sec: 48763.2). Total num frames: 1849556992. Throughput: 0: 11889.8. Samples: 462437888. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:15:45,767][1648981] Avg episode reward: [(0, '901.070')] [2024-06-15 22:15:47,356][1651274] Signal inference workers to stop experience collection... (47450 times) [2024-06-15 22:15:47,427][1651669] InferenceWorker_p0-w0: stopping experience collection (47450 times) [2024-06-15 22:15:47,439][1651669] Updated weights for policy 0, policy_version 903106 (0.0012) [2024-06-15 22:15:47,718][1651274] Signal inference workers to resume experience collection... (47450 times) [2024-06-15 22:15:47,719][1651669] InferenceWorker_p0-w0: resuming experience collection (47450 times) [2024-06-15 22:15:50,123][1651669] Updated weights for policy 0, policy_version 903184 (0.0021) [2024-06-15 22:15:50,766][1648981] Fps is (10 sec: 55705.6, 60 sec: 47513.6, 300 sec: 48652.2). Total num frames: 1849786368. Throughput: 0: 11923.9. Samples: 462520320. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:15:50,767][1648981] Avg episode reward: [(0, '905.170')] [2024-06-15 22:15:55,166][1651669] Updated weights for policy 0, policy_version 903313 (0.0014) [2024-06-15 22:15:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49698.5, 300 sec: 48763.5). Total num frames: 1850048512. Throughput: 0: 11776.1. Samples: 462544384. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:15:55,767][1648981] Avg episode reward: [(0, '945.140')] [2024-06-15 22:15:58,263][1651669] Updated weights for policy 0, policy_version 903362 (0.0012) [2024-06-15 22:15:59,303][1651669] Updated weights for policy 0, policy_version 903420 (0.0012) [2024-06-15 22:16:00,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46967.6, 300 sec: 48541.1). Total num frames: 1850212352. Throughput: 0: 11889.8. Samples: 462628352. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:00,767][1648981] Avg episode reward: [(0, '949.290')] [2024-06-15 22:16:01,674][1651669] Updated weights for policy 0, policy_version 903472 (0.0013) [2024-06-15 22:16:03,602][1651669] Updated weights for policy 0, policy_version 903541 (0.0013) [2024-06-15 22:16:05,540][1651669] Updated weights for policy 0, policy_version 903568 (0.0012) [2024-06-15 22:16:05,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48605.9, 300 sec: 48763.2). Total num frames: 1850507264. Throughput: 0: 11935.4. Samples: 462697984. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:05,767][1648981] Avg episode reward: [(0, '929.330')] [2024-06-15 22:16:09,555][1651669] Updated weights for policy 0, policy_version 903632 (0.0058) [2024-06-15 22:16:10,498][1651669] Updated weights for policy 0, policy_version 903680 (0.0012) [2024-06-15 22:16:10,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.8, 300 sec: 48652.2). Total num frames: 1850736640. Throughput: 0: 12003.6. Samples: 462733312. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:10,767][1648981] Avg episode reward: [(0, '889.420')] [2024-06-15 22:16:13,117][1651669] Updated weights for policy 0, policy_version 903730 (0.0012) [2024-06-15 22:16:14,757][1651669] Updated weights for policy 0, policy_version 903795 (0.0015) [2024-06-15 22:16:15,774][1648981] Fps is (10 sec: 49114.9, 60 sec: 48053.6, 300 sec: 48873.1). Total num frames: 1850998784. Throughput: 0: 11876.4. Samples: 462807040. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:15,774][1648981] Avg episode reward: [(0, '887.100')] [2024-06-15 22:16:16,036][1651669] Updated weights for policy 0, policy_version 903824 (0.0023) [2024-06-15 22:16:17,161][1651669] Updated weights for policy 0, policy_version 903872 (0.0013) [2024-06-15 22:16:20,767][1648981] Fps is (10 sec: 42597.4, 60 sec: 46967.6, 300 sec: 48430.0). Total num frames: 1851162624. Throughput: 0: 12083.2. Samples: 462878208. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:20,767][1648981] Avg episode reward: [(0, '892.850')] [2024-06-15 22:16:21,605][1651669] Updated weights for policy 0, policy_version 903936 (0.0012) [2024-06-15 22:16:24,797][1651669] Updated weights for policy 0, policy_version 904001 (0.0023) [2024-06-15 22:16:25,767][1648981] Fps is (10 sec: 45908.8, 60 sec: 48059.6, 300 sec: 48652.1). Total num frames: 1851457536. Throughput: 0: 12060.4. Samples: 462916096. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:25,768][1648981] Avg episode reward: [(0, '891.020')] [2024-06-15 22:16:26,101][1651669] Updated weights for policy 0, policy_version 904053 (0.0011) [2024-06-15 22:16:27,437][1651669] Updated weights for policy 0, policy_version 904081 (0.0048) [2024-06-15 22:16:27,891][1651274] Signal inference workers to stop experience collection... (47500 times) [2024-06-15 22:16:27,957][1651669] InferenceWorker_p0-w0: stopping experience collection (47500 times) [2024-06-15 22:16:28,204][1651274] Signal inference workers to resume experience collection... (47500 times) [2024-06-15 22:16:28,204][1651669] InferenceWorker_p0-w0: resuming experience collection (47500 times) [2024-06-15 22:16:28,529][1651669] Updated weights for policy 0, policy_version 904126 (0.0012) [2024-06-15 22:16:30,766][1648981] Fps is (10 sec: 49152.9, 60 sec: 48059.7, 300 sec: 48652.1). Total num frames: 1851654144. Throughput: 0: 12049.1. Samples: 462980096. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:30,767][1648981] Avg episode reward: [(0, '870.630')] [2024-06-15 22:16:32,348][1651669] Updated weights for policy 0, policy_version 904176 (0.0015) [2024-06-15 22:16:35,039][1651669] Updated weights for policy 0, policy_version 904227 (0.0011) [2024-06-15 22:16:35,766][1648981] Fps is (10 sec: 45876.3, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1851916288. Throughput: 0: 12014.9. Samples: 463060992. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:35,767][1648981] Avg episode reward: [(0, '874.710')] [2024-06-15 22:16:36,350][1651669] Updated weights for policy 0, policy_version 904276 (0.0010) [2024-06-15 22:16:37,712][1651669] Updated weights for policy 0, policy_version 904336 (0.0012) [2024-06-15 22:16:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1852178432. Throughput: 0: 12105.9. Samples: 463089152. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:40,767][1648981] Avg episode reward: [(0, '857.490')] [2024-06-15 22:16:42,957][1651669] Updated weights for policy 0, policy_version 904389 (0.0014) [2024-06-15 22:16:45,275][1651669] Updated weights for policy 0, policy_version 904451 (0.0012) [2024-06-15 22:16:45,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 46421.3, 300 sec: 48319.4). Total num frames: 1852342272. Throughput: 0: 11935.3. Samples: 463165440. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:45,767][1648981] Avg episode reward: [(0, '821.550')] [2024-06-15 22:16:47,106][1651669] Updated weights for policy 0, policy_version 904528 (0.0015) [2024-06-15 22:16:48,636][1651669] Updated weights for policy 0, policy_version 904594 (0.0081) [2024-06-15 22:16:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 48874.3). Total num frames: 1852702720. Throughput: 0: 11821.5. Samples: 463229952. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:50,767][1648981] Avg episode reward: [(0, '817.800')] [2024-06-15 22:16:53,720][1651669] Updated weights for policy 0, policy_version 904641 (0.0015) [2024-06-15 22:16:54,604][1651669] Updated weights for policy 0, policy_version 904699 (0.0015) [2024-06-15 22:16:55,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 46421.3, 300 sec: 48096.8). Total num frames: 1852833792. Throughput: 0: 11980.8. Samples: 463272448. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:16:55,767][1648981] Avg episode reward: [(0, '841.400')] [2024-06-15 22:16:55,788][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000904704_1852833792.pth... [2024-06-15 22:16:55,982][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000899072_1841299456.pth [2024-06-15 22:16:56,848][1651669] Updated weights for policy 0, policy_version 904742 (0.0011) [2024-06-15 22:16:58,581][1651669] Updated weights for policy 0, policy_version 904822 (0.0116) [2024-06-15 22:17:00,156][1651669] Updated weights for policy 0, policy_version 904886 (0.0081) [2024-06-15 22:17:00,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1853227008. Throughput: 0: 11914.5. Samples: 463343104. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:00,767][1648981] Avg episode reward: [(0, '818.620')] [2024-06-15 22:17:04,601][1651669] Updated weights for policy 0, policy_version 904928 (0.0033) [2024-06-15 22:17:05,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 48430.0). Total num frames: 1853358080. Throughput: 0: 12049.1. Samples: 463420416. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:05,767][1648981] Avg episode reward: [(0, '800.010')] [2024-06-15 22:17:06,819][1651669] Updated weights for policy 0, policy_version 904965 (0.0012) [2024-06-15 22:17:08,163][1651669] Updated weights for policy 0, policy_version 905027 (0.0012) [2024-06-15 22:17:08,490][1651274] Signal inference workers to stop experience collection... (47550 times) [2024-06-15 22:17:08,531][1651669] InferenceWorker_p0-w0: stopping experience collection (47550 times) [2024-06-15 22:17:08,673][1651274] Signal inference workers to resume experience collection... (47550 times) [2024-06-15 22:17:08,674][1651669] InferenceWorker_p0-w0: resuming experience collection (47550 times) [2024-06-15 22:17:09,105][1651669] Updated weights for policy 0, policy_version 905083 (0.0012) [2024-06-15 22:17:10,509][1651669] Updated weights for policy 0, policy_version 905139 (0.0012) [2024-06-15 22:17:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.2, 300 sec: 48985.4). Total num frames: 1853751296. Throughput: 0: 11980.9. Samples: 463455232. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:10,767][1648981] Avg episode reward: [(0, '772.380')] [2024-06-15 22:17:14,099][1651669] Updated weights for policy 0, policy_version 905153 (0.0009) [2024-06-15 22:17:15,574][1651669] Updated weights for policy 0, policy_version 905216 (0.0016) [2024-06-15 22:17:15,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48065.8, 300 sec: 48430.1). Total num frames: 1853882368. Throughput: 0: 12424.5. Samples: 463539200. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:15,771][1648981] Avg episode reward: [(0, '775.280')] [2024-06-15 22:17:18,288][1651669] Updated weights for policy 0, policy_version 905281 (0.0012) [2024-06-15 22:17:19,498][1651669] Updated weights for policy 0, policy_version 905344 (0.0011) [2024-06-15 22:17:20,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1854177280. Throughput: 0: 12162.8. Samples: 463608320. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:20,767][1648981] Avg episode reward: [(0, '762.580')] [2024-06-15 22:17:21,482][1651669] Updated weights for policy 0, policy_version 905406 (0.0037) [2024-06-15 22:17:25,391][1651669] Updated weights for policy 0, policy_version 905463 (0.0131) [2024-06-15 22:17:25,767][1648981] Fps is (10 sec: 52427.5, 60 sec: 49152.0, 300 sec: 48541.0). Total num frames: 1854406656. Throughput: 0: 12572.4. Samples: 463654912. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:25,767][1648981] Avg episode reward: [(0, '771.460')] [2024-06-15 22:17:27,522][1651669] Updated weights for policy 0, policy_version 905475 (0.0044) [2024-06-15 22:17:28,799][1651669] Updated weights for policy 0, policy_version 905533 (0.0010) [2024-06-15 22:17:29,966][1651669] Updated weights for policy 0, policy_version 905588 (0.0017) [2024-06-15 22:17:30,767][1648981] Fps is (10 sec: 49151.8, 60 sec: 50244.1, 300 sec: 48874.3). Total num frames: 1854668800. Throughput: 0: 12356.3. Samples: 463721472. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:30,767][1648981] Avg episode reward: [(0, '776.620')] [2024-06-15 22:17:31,180][1651669] Updated weights for policy 0, policy_version 905618 (0.0015) [2024-06-15 22:17:34,316][1651669] Updated weights for policy 0, policy_version 905665 (0.0011) [2024-06-15 22:17:35,766][1648981] Fps is (10 sec: 52430.0, 60 sec: 50244.3, 300 sec: 48763.2). Total num frames: 1854930944. Throughput: 0: 12640.7. Samples: 463798784. Policy #0 lag: (min: 17.0, avg: 169.6, max: 287.0) [2024-06-15 22:17:35,767][1648981] Avg episode reward: [(0, '770.450')] [2024-06-15 22:17:37,434][1651669] Updated weights for policy 0, policy_version 905734 (0.0012) [2024-06-15 22:17:39,801][1651669] Updated weights for policy 0, policy_version 905808 (0.0015) [2024-06-15 22:17:40,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1855160320. Throughput: 0: 12435.9. Samples: 463832064. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:17:40,767][1648981] Avg episode reward: [(0, '787.470')] [2024-06-15 22:17:40,817][1651669] Updated weights for policy 0, policy_version 905853 (0.0060) [2024-06-15 22:17:45,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 1855356928. Throughput: 0: 12640.7. Samples: 463911936. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:17:45,767][1648981] Avg episode reward: [(0, '764.360')] [2024-06-15 22:17:45,963][1651669] Updated weights for policy 0, policy_version 905940 (0.0054) [2024-06-15 22:17:48,300][1651669] Updated weights for policy 0, policy_version 906000 (0.0014) [2024-06-15 22:17:49,651][1651669] Updated weights for policy 0, policy_version 906047 (0.0011) [2024-06-15 22:17:50,374][1651274] Signal inference workers to stop experience collection... (47600 times) [2024-06-15 22:17:50,426][1651669] InferenceWorker_p0-w0: stopping experience collection (47600 times) [2024-06-15 22:17:50,766][1648981] Fps is (10 sec: 45875.8, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1855619072. Throughput: 0: 12470.0. Samples: 463981568. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:17:50,767][1648981] Avg episode reward: [(0, '782.770')] [2024-06-15 22:17:50,789][1651274] Signal inference workers to resume experience collection... (47600 times) [2024-06-15 22:17:50,789][1651669] InferenceWorker_p0-w0: resuming experience collection (47600 times) [2024-06-15 22:17:51,316][1651669] Updated weights for policy 0, policy_version 906104 (0.0012) [2024-06-15 22:17:52,911][1651669] Updated weights for policy 0, policy_version 906168 (0.0185) [2024-06-15 22:17:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 48652.1). Total num frames: 1855848448. Throughput: 0: 12447.3. Samples: 464015360. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:17:55,767][1648981] Avg episode reward: [(0, '786.690')] [2024-06-15 22:17:57,622][1651669] Updated weights for policy 0, policy_version 906213 (0.0032) [2024-06-15 22:17:59,957][1651669] Updated weights for policy 0, policy_version 906277 (0.0014) [2024-06-15 22:18:00,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 48652.1). Total num frames: 1856110592. Throughput: 0: 12310.8. Samples: 464093184. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:00,767][1648981] Avg episode reward: [(0, '762.400')] [2024-06-15 22:18:01,286][1651669] Updated weights for policy 0, policy_version 906327 (0.0013) [2024-06-15 22:18:02,252][1651669] Updated weights for policy 0, policy_version 906368 (0.0009) [2024-06-15 22:18:04,006][1651669] Updated weights for policy 0, policy_version 906418 (0.0042) [2024-06-15 22:18:05,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 50244.4, 300 sec: 48874.4). Total num frames: 1856372736. Throughput: 0: 12447.4. Samples: 464168448. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:05,767][1648981] Avg episode reward: [(0, '738.960')] [2024-06-15 22:18:07,874][1651669] Updated weights for policy 0, policy_version 906470 (0.0011) [2024-06-15 22:18:09,618][1651669] Updated weights for policy 0, policy_version 906512 (0.0020) [2024-06-15 22:18:10,744][1651669] Updated weights for policy 0, policy_version 906559 (0.0071) [2024-06-15 22:18:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1856634880. Throughput: 0: 12322.2. Samples: 464209408. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:10,767][1648981] Avg episode reward: [(0, '752.460')] [2024-06-15 22:18:13,012][1651669] Updated weights for policy 0, policy_version 906624 (0.0012) [2024-06-15 22:18:14,608][1651669] Updated weights for policy 0, policy_version 906687 (0.0124) [2024-06-15 22:18:15,767][1648981] Fps is (10 sec: 52427.4, 60 sec: 50244.2, 300 sec: 48874.3). Total num frames: 1856897024. Throughput: 0: 12060.5. Samples: 464264192. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:15,767][1648981] Avg episode reward: [(0, '748.260')] [2024-06-15 22:18:18,920][1651669] Updated weights for policy 0, policy_version 906744 (0.0010) [2024-06-15 22:18:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48606.0, 300 sec: 48652.2). Total num frames: 1857093632. Throughput: 0: 12276.6. Samples: 464351232. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:20,767][1648981] Avg episode reward: [(0, '792.830')] [2024-06-15 22:18:21,207][1651669] Updated weights for policy 0, policy_version 906811 (0.0015) [2024-06-15 22:18:23,455][1651669] Updated weights for policy 0, policy_version 906870 (0.0012) [2024-06-15 22:18:25,045][1651669] Updated weights for policy 0, policy_version 906931 (0.0010) [2024-06-15 22:18:25,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.5, 300 sec: 49096.4). Total num frames: 1857421312. Throughput: 0: 12253.9. Samples: 464383488. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:25,767][1648981] Avg episode reward: [(0, '806.130')] [2024-06-15 22:18:28,823][1651669] Updated weights for policy 0, policy_version 906960 (0.0010) [2024-06-15 22:18:29,674][1651669] Updated weights for policy 0, policy_version 907008 (0.0011) [2024-06-15 22:18:30,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 48606.0, 300 sec: 48542.4). Total num frames: 1857585152. Throughput: 0: 12265.3. Samples: 464463872. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:30,767][1648981] Avg episode reward: [(0, '830.810')] [2024-06-15 22:18:31,496][1651669] Updated weights for policy 0, policy_version 907062 (0.0012) [2024-06-15 22:18:33,735][1651274] Signal inference workers to stop experience collection... (47650 times) [2024-06-15 22:18:33,759][1651669] InferenceWorker_p0-w0: stopping experience collection (47650 times) [2024-06-15 22:18:34,014][1651274] Signal inference workers to resume experience collection... (47650 times) [2024-06-15 22:18:34,015][1651669] InferenceWorker_p0-w0: resuming experience collection (47650 times) [2024-06-15 22:18:34,017][1651669] Updated weights for policy 0, policy_version 907104 (0.0011) [2024-06-15 22:18:35,768][1648981] Fps is (10 sec: 45866.6, 60 sec: 49150.5, 300 sec: 49096.1). Total num frames: 1857880064. Throughput: 0: 12196.5. Samples: 464530432. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:35,769][1648981] Avg episode reward: [(0, '853.290')] [2024-06-15 22:18:36,138][1651669] Updated weights for policy 0, policy_version 907184 (0.0023) [2024-06-15 22:18:40,091][1651669] Updated weights for policy 0, policy_version 907221 (0.0011) [2024-06-15 22:18:40,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48059.7, 300 sec: 48541.0). Total num frames: 1858043904. Throughput: 0: 12288.0. Samples: 464568320. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:40,767][1648981] Avg episode reward: [(0, '854.420')] [2024-06-15 22:18:41,489][1651669] Updated weights for policy 0, policy_version 907296 (0.0013) [2024-06-15 22:18:44,259][1651669] Updated weights for policy 0, policy_version 907332 (0.0024) [2024-06-15 22:18:45,766][1648981] Fps is (10 sec: 45884.0, 60 sec: 49698.2, 300 sec: 48874.3). Total num frames: 1858338816. Throughput: 0: 12344.9. Samples: 464648704. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:45,767][1648981] Avg episode reward: [(0, '857.150')] [2024-06-15 22:18:45,979][1651669] Updated weights for policy 0, policy_version 907408 (0.0012) [2024-06-15 22:18:50,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 47513.6, 300 sec: 48318.9). Total num frames: 1858469888. Throughput: 0: 12265.2. Samples: 464720384. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:50,767][1648981] Avg episode reward: [(0, '849.830')] [2024-06-15 22:18:52,113][1651669] Updated weights for policy 0, policy_version 907520 (0.0148) [2024-06-15 22:18:53,190][1651669] Updated weights for policy 0, policy_version 907576 (0.0013) [2024-06-15 22:18:55,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1858764800. Throughput: 0: 12014.9. Samples: 464750080. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:18:55,767][1648981] Avg episode reward: [(0, '851.570')] [2024-06-15 22:18:55,931][1651669] Updated weights for policy 0, policy_version 907621 (0.0012) [2024-06-15 22:18:56,078][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000907632_1858830336.pth... [2024-06-15 22:18:56,187][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000901888_1847066624.pth [2024-06-15 22:18:57,761][1651669] Updated weights for policy 0, policy_version 907712 (0.0017) [2024-06-15 22:19:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1858994176. Throughput: 0: 12322.2. Samples: 464818688. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:00,767][1648981] Avg episode reward: [(0, '871.460')] [2024-06-15 22:19:02,727][1651669] Updated weights for policy 0, policy_version 907768 (0.0096) [2024-06-15 22:19:04,052][1651669] Updated weights for policy 0, policy_version 907813 (0.0012) [2024-06-15 22:19:05,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48059.6, 300 sec: 48541.1). Total num frames: 1859256320. Throughput: 0: 12151.5. Samples: 464898048. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:05,767][1648981] Avg episode reward: [(0, '864.020')] [2024-06-15 22:19:06,334][1651669] Updated weights for policy 0, policy_version 907856 (0.0013) [2024-06-15 22:19:07,384][1651669] Updated weights for policy 0, policy_version 907905 (0.0015) [2024-06-15 22:19:08,826][1651669] Updated weights for policy 0, policy_version 907968 (0.0012) [2024-06-15 22:19:10,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 48059.6, 300 sec: 48541.0). Total num frames: 1859518464. Throughput: 0: 12105.9. Samples: 464928256. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:10,768][1648981] Avg episode reward: [(0, '862.440')] [2024-06-15 22:19:12,788][1651669] Updated weights for policy 0, policy_version 908031 (0.0013) [2024-06-15 22:19:14,336][1651274] Signal inference workers to stop experience collection... (47700 times) [2024-06-15 22:19:14,404][1651669] InferenceWorker_p0-w0: stopping experience collection (47700 times) [2024-06-15 22:19:14,574][1651274] Signal inference workers to resume experience collection... (47700 times) [2024-06-15 22:19:14,576][1651669] InferenceWorker_p0-w0: resuming experience collection (47700 times) [2024-06-15 22:19:15,716][1651669] Updated weights for policy 0, policy_version 908091 (0.0134) [2024-06-15 22:19:15,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 47513.7, 300 sec: 48541.1). Total num frames: 1859747840. Throughput: 0: 12208.4. Samples: 465013248. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:15,767][1648981] Avg episode reward: [(0, '898.330')] [2024-06-15 22:19:17,865][1651669] Updated weights for policy 0, policy_version 908146 (0.0012) [2024-06-15 22:19:19,492][1651669] Updated weights for policy 0, policy_version 908219 (0.0011) [2024-06-15 22:19:20,766][1648981] Fps is (10 sec: 52430.1, 60 sec: 49152.0, 300 sec: 48875.1). Total num frames: 1860042752. Throughput: 0: 12015.5. Samples: 465071104. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:20,767][1648981] Avg episode reward: [(0, '919.970')] [2024-06-15 22:19:23,958][1651669] Updated weights for policy 0, policy_version 908278 (0.0013) [2024-06-15 22:19:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1860173824. Throughput: 0: 12026.3. Samples: 465109504. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:25,767][1648981] Avg episode reward: [(0, '925.120')] [2024-06-15 22:19:26,414][1651669] Updated weights for policy 0, policy_version 908322 (0.0050) [2024-06-15 22:19:28,095][1651669] Updated weights for policy 0, policy_version 908384 (0.0018) [2024-06-15 22:19:29,711][1651669] Updated weights for policy 0, policy_version 908448 (0.0012) [2024-06-15 22:19:30,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49698.2, 300 sec: 49096.5). Total num frames: 1860567040. Throughput: 0: 11787.4. Samples: 465179136. Policy #0 lag: (min: 31.0, avg: 125.2, max: 287.0) [2024-06-15 22:19:30,767][1648981] Avg episode reward: [(0, '922.910')] [2024-06-15 22:19:33,897][1651669] Updated weights for policy 0, policy_version 908483 (0.0012) [2024-06-15 22:19:35,234][1651669] Updated weights for policy 0, policy_version 908543 (0.0013) [2024-06-15 22:19:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 46969.0, 300 sec: 48430.0). Total num frames: 1860698112. Throughput: 0: 11901.2. Samples: 465255936. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:19:35,767][1648981] Avg episode reward: [(0, '932.680')] [2024-06-15 22:19:38,085][1651669] Updated weights for policy 0, policy_version 908596 (0.0014) [2024-06-15 22:19:39,710][1651669] Updated weights for policy 0, policy_version 908661 (0.0012) [2024-06-15 22:19:40,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 49152.1, 300 sec: 48985.4). Total num frames: 1860993024. Throughput: 0: 12014.9. Samples: 465290752. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:19:40,767][1648981] Avg episode reward: [(0, '943.260')] [2024-06-15 22:19:40,984][1651669] Updated weights for policy 0, policy_version 908705 (0.0011) [2024-06-15 22:19:45,573][1651669] Updated weights for policy 0, policy_version 908770 (0.0012) [2024-06-15 22:19:45,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 1861156864. Throughput: 0: 12003.6. Samples: 465358848. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:19:45,767][1648981] Avg episode reward: [(0, '912.410')] [2024-06-15 22:19:48,800][1651669] Updated weights for policy 0, policy_version 908816 (0.0023) [2024-06-15 22:19:50,346][1651669] Updated weights for policy 0, policy_version 908868 (0.0171) [2024-06-15 22:19:50,766][1648981] Fps is (10 sec: 39321.6, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1861386240. Throughput: 0: 11764.6. Samples: 465427456. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:19:50,767][1648981] Avg episode reward: [(0, '906.950')] [2024-06-15 22:19:51,656][1651669] Updated weights for policy 0, policy_version 908928 (0.0012) [2024-06-15 22:19:53,033][1651669] Updated weights for policy 0, policy_version 908989 (0.0013) [2024-06-15 22:19:55,706][1651274] Signal inference workers to stop experience collection... (47750 times) [2024-06-15 22:19:55,756][1651669] InferenceWorker_p0-w0: stopping experience collection (47750 times) [2024-06-15 22:19:55,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 47513.6, 300 sec: 48207.9). Total num frames: 1861615616. Throughput: 0: 11707.8. Samples: 465455104. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:19:55,767][1648981] Avg episode reward: [(0, '903.470')] [2024-06-15 22:19:55,969][1651274] Signal inference workers to resume experience collection... (47750 times) [2024-06-15 22:19:55,970][1651669] InferenceWorker_p0-w0: resuming experience collection (47750 times) [2024-06-15 22:19:56,685][1651669] Updated weights for policy 0, policy_version 909042 (0.0012) [2024-06-15 22:19:59,446][1651669] Updated weights for policy 0, policy_version 909074 (0.0011) [2024-06-15 22:20:00,557][1651669] Updated weights for policy 0, policy_version 909120 (0.0011) [2024-06-15 22:20:00,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1861877760. Throughput: 0: 11787.4. Samples: 465543680. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:00,767][1648981] Avg episode reward: [(0, '941.590')] [2024-06-15 22:20:02,019][1651669] Updated weights for policy 0, policy_version 909172 (0.0011) [2024-06-15 22:20:03,289][1651669] Updated weights for policy 0, policy_version 909232 (0.0012) [2024-06-15 22:20:05,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 48059.6, 300 sec: 48207.9). Total num frames: 1862139904. Throughput: 0: 12117.3. Samples: 465616384. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:05,767][1648981] Avg episode reward: [(0, '939.480')] [2024-06-15 22:20:07,007][1651669] Updated weights for policy 0, policy_version 909297 (0.0015) [2024-06-15 22:20:10,555][1651669] Updated weights for policy 0, policy_version 909345 (0.0011) [2024-06-15 22:20:10,767][1648981] Fps is (10 sec: 45873.6, 60 sec: 46967.4, 300 sec: 48207.8). Total num frames: 1862336512. Throughput: 0: 12037.6. Samples: 465651200. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:10,768][1648981] Avg episode reward: [(0, '931.110')] [2024-06-15 22:20:12,056][1651669] Updated weights for policy 0, policy_version 909396 (0.0011) [2024-06-15 22:20:14,540][1651669] Updated weights for policy 0, policy_version 909502 (0.0016) [2024-06-15 22:20:15,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1862664192. Throughput: 0: 11855.6. Samples: 465712640. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:15,767][1648981] Avg episode reward: [(0, '932.340')] [2024-06-15 22:20:18,915][1651669] Updated weights for policy 0, policy_version 909561 (0.0017) [2024-06-15 22:20:20,766][1648981] Fps is (10 sec: 45876.5, 60 sec: 45875.2, 300 sec: 48207.9). Total num frames: 1862795264. Throughput: 0: 11980.8. Samples: 465795072. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:20,767][1648981] Avg episode reward: [(0, '910.650')] [2024-06-15 22:20:21,724][1651669] Updated weights for policy 0, policy_version 909604 (0.0013) [2024-06-15 22:20:22,703][1651669] Updated weights for policy 0, policy_version 909648 (0.0014) [2024-06-15 22:20:24,428][1651669] Updated weights for policy 0, policy_version 909712 (0.0013) [2024-06-15 22:20:25,767][1648981] Fps is (10 sec: 52425.1, 60 sec: 50243.7, 300 sec: 48874.2). Total num frames: 1863188480. Throughput: 0: 11969.2. Samples: 465829376. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:25,768][1648981] Avg episode reward: [(0, '943.180')] [2024-06-15 22:20:28,702][1651669] Updated weights for policy 0, policy_version 909776 (0.0012) [2024-06-15 22:20:30,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 45875.1, 300 sec: 48430.0). Total num frames: 1863319552. Throughput: 0: 11958.0. Samples: 465896960. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:30,767][1648981] Avg episode reward: [(0, '900.570')] [2024-06-15 22:20:32,158][1651669] Updated weights for policy 0, policy_version 909843 (0.0102) [2024-06-15 22:20:34,084][1651669] Updated weights for policy 0, policy_version 909906 (0.0011) [2024-06-15 22:20:34,932][1651274] Signal inference workers to stop experience collection... (47800 times) [2024-06-15 22:20:35,002][1651669] InferenceWorker_p0-w0: stopping experience collection (47800 times) [2024-06-15 22:20:35,178][1651274] Signal inference workers to resume experience collection... (47800 times) [2024-06-15 22:20:35,186][1651669] InferenceWorker_p0-w0: resuming experience collection (47800 times) [2024-06-15 22:20:35,766][1648981] Fps is (10 sec: 42601.3, 60 sec: 48605.8, 300 sec: 48763.2). Total num frames: 1863614464. Throughput: 0: 11946.7. Samples: 465965056. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:35,767][1648981] Avg episode reward: [(0, '893.410')] [2024-06-15 22:20:35,792][1651669] Updated weights for policy 0, policy_version 909971 (0.0014) [2024-06-15 22:20:39,523][1651669] Updated weights for policy 0, policy_version 910025 (0.0090) [2024-06-15 22:20:40,594][1651669] Updated weights for policy 0, policy_version 910079 (0.0013) [2024-06-15 22:20:40,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 47513.6, 300 sec: 48430.0). Total num frames: 1863843840. Throughput: 0: 12117.3. Samples: 466000384. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:40,767][1648981] Avg episode reward: [(0, '848.540')] [2024-06-15 22:20:44,222][1651669] Updated weights for policy 0, policy_version 910129 (0.0096) [2024-06-15 22:20:45,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1864040448. Throughput: 0: 11821.5. Samples: 466075648. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:45,767][1648981] Avg episode reward: [(0, '849.300')] [2024-06-15 22:20:46,165][1651669] Updated weights for policy 0, policy_version 910208 (0.0012) [2024-06-15 22:20:47,644][1651669] Updated weights for policy 0, policy_version 910260 (0.0011) [2024-06-15 22:20:50,766][1648981] Fps is (10 sec: 39321.2, 60 sec: 47513.5, 300 sec: 48096.7). Total num frames: 1864237056. Throughput: 0: 11719.1. Samples: 466143744. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:50,767][1648981] Avg episode reward: [(0, '899.530')] [2024-06-15 22:20:51,717][1651669] Updated weights for policy 0, policy_version 910307 (0.0011) [2024-06-15 22:20:55,138][1651669] Updated weights for policy 0, policy_version 910369 (0.0120) [2024-06-15 22:20:55,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 47513.7, 300 sec: 48318.9). Total num frames: 1864466432. Throughput: 0: 11787.5. Samples: 466181632. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:20:55,767][1648981] Avg episode reward: [(0, '890.480')] [2024-06-15 22:20:56,250][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000910416_1864531968.pth... [2024-06-15 22:20:56,251][1651669] Updated weights for policy 0, policy_version 910416 (0.0013) [2024-06-15 22:20:56,459][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000904704_1852833792.pth [2024-06-15 22:20:58,721][1651669] Updated weights for policy 0, policy_version 910512 (0.0010) [2024-06-15 22:21:00,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1864761344. Throughput: 0: 11832.9. Samples: 466245120. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:21:00,767][1648981] Avg episode reward: [(0, '897.350')] [2024-06-15 22:21:02,938][1651669] Updated weights for policy 0, policy_version 910565 (0.0012) [2024-06-15 22:21:05,767][1648981] Fps is (10 sec: 42597.0, 60 sec: 45875.1, 300 sec: 47985.6). Total num frames: 1864892416. Throughput: 0: 11810.1. Samples: 466326528. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:21:05,767][1648981] Avg episode reward: [(0, '892.630')] [2024-06-15 22:21:07,088][1651669] Updated weights for policy 0, policy_version 910642 (0.0014) [2024-06-15 22:21:08,679][1651669] Updated weights for policy 0, policy_version 910706 (0.0012) [2024-06-15 22:21:10,105][1651669] Updated weights for policy 0, policy_version 910773 (0.0120) [2024-06-15 22:21:10,766][1648981] Fps is (10 sec: 52427.8, 60 sec: 49152.2, 300 sec: 48431.2). Total num frames: 1865285632. Throughput: 0: 11525.9. Samples: 466348032. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:21:10,767][1648981] Avg episode reward: [(0, '904.750')] [2024-06-15 22:21:14,435][1651669] Updated weights for policy 0, policy_version 910817 (0.0012) [2024-06-15 22:21:15,766][1648981] Fps is (10 sec: 52430.2, 60 sec: 45875.2, 300 sec: 48318.9). Total num frames: 1865416704. Throughput: 0: 11696.4. Samples: 466423296. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:21:15,767][1648981] Avg episode reward: [(0, '866.960')] [2024-06-15 22:21:18,243][1651274] Signal inference workers to stop experience collection... (47850 times) [2024-06-15 22:21:18,296][1651669] InferenceWorker_p0-w0: stopping experience collection (47850 times) [2024-06-15 22:21:18,550][1651274] Signal inference workers to resume experience collection... (47850 times) [2024-06-15 22:21:18,551][1651669] InferenceWorker_p0-w0: resuming experience collection (47850 times) [2024-06-15 22:21:18,663][1651669] Updated weights for policy 0, policy_version 910899 (0.0014) [2024-06-15 22:21:19,483][1651669] Updated weights for policy 0, policy_version 910932 (0.0012) [2024-06-15 22:21:20,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 48319.0). Total num frames: 1865711616. Throughput: 0: 11719.1. Samples: 466492416. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:21:20,767][1648981] Avg episode reward: [(0, '900.370')] [2024-06-15 22:21:21,010][1651669] Updated weights for policy 0, policy_version 911008 (0.0095) [2024-06-15 22:21:24,966][1651669] Updated weights for policy 0, policy_version 911058 (0.0012) [2024-06-15 22:21:25,760][1651669] Updated weights for policy 0, policy_version 911096 (0.0013) [2024-06-15 22:21:25,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 45329.6, 300 sec: 48318.9). Total num frames: 1865908224. Throughput: 0: 11707.7. Samples: 466527232. Policy #0 lag: (min: 15.0, avg: 113.5, max: 271.0) [2024-06-15 22:21:25,767][1648981] Avg episode reward: [(0, '937.030')] [2024-06-15 22:21:28,548][1651669] Updated weights for policy 0, policy_version 911124 (0.0013) [2024-06-15 22:21:30,066][1651669] Updated weights for policy 0, policy_version 911177 (0.0013) [2024-06-15 22:21:30,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 46967.6, 300 sec: 48207.8). Total num frames: 1866137600. Throughput: 0: 11707.7. Samples: 466602496. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:21:30,767][1648981] Avg episode reward: [(0, '964.580')] [2024-06-15 22:21:31,335][1651669] Updated weights for policy 0, policy_version 911234 (0.0011) [2024-06-15 22:21:32,377][1651669] Updated weights for policy 0, policy_version 911290 (0.0013) [2024-06-15 22:21:35,442][1651669] Updated weights for policy 0, policy_version 911350 (0.0114) [2024-06-15 22:21:35,769][1648981] Fps is (10 sec: 55688.6, 60 sec: 47511.2, 300 sec: 48429.5). Total num frames: 1866465280. Throughput: 0: 11934.5. Samples: 466680832. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:21:35,770][1648981] Avg episode reward: [(0, '932.950')] [2024-06-15 22:21:38,747][1651669] Updated weights for policy 0, policy_version 911395 (0.0013) [2024-06-15 22:21:39,845][1651669] Updated weights for policy 0, policy_version 911440 (0.0012) [2024-06-15 22:21:40,766][1648981] Fps is (10 sec: 55705.4, 60 sec: 47513.6, 300 sec: 48652.2). Total num frames: 1866694656. Throughput: 0: 12128.7. Samples: 466727424. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:21:40,767][1648981] Avg episode reward: [(0, '922.530')] [2024-06-15 22:21:41,702][1651669] Updated weights for policy 0, policy_version 911507 (0.0026) [2024-06-15 22:21:45,135][1651669] Updated weights for policy 0, policy_version 911574 (0.0012) [2024-06-15 22:21:45,766][1648981] Fps is (10 sec: 49167.0, 60 sec: 48605.8, 300 sec: 48318.9). Total num frames: 1866956800. Throughput: 0: 12162.8. Samples: 466792448. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:21:45,767][1648981] Avg episode reward: [(0, '921.870')] [2024-06-15 22:21:49,056][1651669] Updated weights for policy 0, policy_version 911618 (0.0016) [2024-06-15 22:21:50,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48059.9, 300 sec: 48430.0). Total num frames: 1867120640. Throughput: 0: 11946.7. Samples: 466864128. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:21:50,767][1648981] Avg episode reward: [(0, '943.730')] [2024-06-15 22:21:51,048][1651669] Updated weights for policy 0, policy_version 911681 (0.0010) [2024-06-15 22:21:52,398][1651669] Updated weights for policy 0, policy_version 911747 (0.0014) [2024-06-15 22:21:53,615][1651669] Updated weights for policy 0, policy_version 911808 (0.0013) [2024-06-15 22:21:55,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48605.7, 300 sec: 47985.7). Total num frames: 1867382784. Throughput: 0: 12140.1. Samples: 466894336. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:21:55,767][1648981] Avg episode reward: [(0, '917.600')] [2024-06-15 22:21:55,812][1651274] Signal inference workers to stop experience collection... (47900 times) [2024-06-15 22:21:55,871][1651669] InferenceWorker_p0-w0: stopping experience collection (47900 times) [2024-06-15 22:21:55,977][1651274] Signal inference workers to resume experience collection... (47900 times) [2024-06-15 22:21:55,978][1651669] InferenceWorker_p0-w0: resuming experience collection (47900 times) [2024-06-15 22:21:56,972][1651669] Updated weights for policy 0, policy_version 911872 (0.0153) [2024-06-15 22:22:00,766][1648981] Fps is (10 sec: 42597.6, 60 sec: 46421.1, 300 sec: 48096.7). Total num frames: 1867546624. Throughput: 0: 12231.1. Samples: 466973696. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:00,767][1648981] Avg episode reward: [(0, '932.280')] [2024-06-15 22:22:01,532][1651669] Updated weights for policy 0, policy_version 911923 (0.0012) [2024-06-15 22:22:03,145][1651669] Updated weights for policy 0, policy_version 912000 (0.0013) [2024-06-15 22:22:04,647][1651669] Updated weights for policy 0, policy_version 912058 (0.0092) [2024-06-15 22:22:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.4, 300 sec: 47985.7). Total num frames: 1867907072. Throughput: 0: 12083.2. Samples: 467036160. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:05,767][1648981] Avg episode reward: [(0, '956.600')] [2024-06-15 22:22:08,111][1651669] Updated weights for policy 0, policy_version 912112 (0.0019) [2024-06-15 22:22:10,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1868038144. Throughput: 0: 12174.2. Samples: 467075072. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:10,767][1648981] Avg episode reward: [(0, '931.420')] [2024-06-15 22:22:11,929][1651669] Updated weights for policy 0, policy_version 912160 (0.0012) [2024-06-15 22:22:13,475][1651669] Updated weights for policy 0, policy_version 912231 (0.0015) [2024-06-15 22:22:15,340][1651669] Updated weights for policy 0, policy_version 912314 (0.0015) [2024-06-15 22:22:15,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.2, 300 sec: 48318.9). Total num frames: 1868431360. Throughput: 0: 12140.1. Samples: 467148800. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:15,767][1648981] Avg episode reward: [(0, '937.410')] [2024-06-15 22:22:19,141][1651669] Updated weights for policy 0, policy_version 912368 (0.0013) [2024-06-15 22:22:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1868562432. Throughput: 0: 12197.8. Samples: 467229696. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:20,767][1648981] Avg episode reward: [(0, '929.720')] [2024-06-15 22:22:22,412][1651669] Updated weights for policy 0, policy_version 912403 (0.0012) [2024-06-15 22:22:23,597][1651669] Updated weights for policy 0, policy_version 912464 (0.0021) [2024-06-15 22:22:24,707][1651669] Updated weights for policy 0, policy_version 912517 (0.0011) [2024-06-15 22:22:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 48318.9). Total num frames: 1868922880. Throughput: 0: 11946.7. Samples: 467265024. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:25,767][1648981] Avg episode reward: [(0, '940.830')] [2024-06-15 22:22:25,916][1651669] Updated weights for policy 0, policy_version 912574 (0.0021) [2024-06-15 22:22:29,966][1651669] Updated weights for policy 0, policy_version 912633 (0.0016) [2024-06-15 22:22:30,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 49151.9, 300 sec: 47985.7). Total num frames: 1869086720. Throughput: 0: 12140.1. Samples: 467338752. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:30,767][1648981] Avg episode reward: [(0, '940.010')] [2024-06-15 22:22:32,964][1651669] Updated weights for policy 0, policy_version 912688 (0.0081) [2024-06-15 22:22:33,811][1651274] Signal inference workers to stop experience collection... (47950 times) [2024-06-15 22:22:33,841][1651669] InferenceWorker_p0-w0: stopping experience collection (47950 times) [2024-06-15 22:22:33,979][1651274] Signal inference workers to resume experience collection... (47950 times) [2024-06-15 22:22:33,982][1651669] InferenceWorker_p0-w0: resuming experience collection (47950 times) [2024-06-15 22:22:34,429][1651669] Updated weights for policy 0, policy_version 912753 (0.0040) [2024-06-15 22:22:35,580][1651669] Updated weights for policy 0, policy_version 912823 (0.0013) [2024-06-15 22:22:35,766][1648981] Fps is (10 sec: 55705.3, 60 sec: 50246.8, 300 sec: 48541.1). Total num frames: 1869479936. Throughput: 0: 12231.1. Samples: 467414528. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:35,767][1648981] Avg episode reward: [(0, '917.380')] [2024-06-15 22:22:40,235][1651669] Updated weights for policy 0, policy_version 912892 (0.0019) [2024-06-15 22:22:40,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48605.9, 300 sec: 48318.9). Total num frames: 1869611008. Throughput: 0: 12470.1. Samples: 467455488. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:40,767][1648981] Avg episode reward: [(0, '905.260')] [2024-06-15 22:22:43,240][1651669] Updated weights for policy 0, policy_version 912929 (0.0068) [2024-06-15 22:22:44,980][1651669] Updated weights for policy 0, policy_version 912999 (0.0012) [2024-06-15 22:22:45,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 49152.1, 300 sec: 48430.0). Total num frames: 1869905920. Throughput: 0: 12197.0. Samples: 467522560. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:45,767][1648981] Avg episode reward: [(0, '875.680')] [2024-06-15 22:22:46,126][1651669] Updated weights for policy 0, policy_version 913058 (0.0011) [2024-06-15 22:22:50,177][1651669] Updated weights for policy 0, policy_version 913104 (0.0012) [2024-06-15 22:22:50,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 48207.9). Total num frames: 1870069760. Throughput: 0: 12561.1. Samples: 467601408. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:50,767][1648981] Avg episode reward: [(0, '900.680')] [2024-06-15 22:22:51,315][1651669] Updated weights for policy 0, policy_version 913147 (0.0016) [2024-06-15 22:22:54,215][1651669] Updated weights for policy 0, policy_version 913216 (0.0112) [2024-06-15 22:22:55,623][1651669] Updated weights for policy 0, policy_version 913269 (0.0013) [2024-06-15 22:22:55,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1870364672. Throughput: 0: 12515.5. Samples: 467638272. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:22:55,767][1648981] Avg episode reward: [(0, '900.130')] [2024-06-15 22:22:55,847][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000913280_1870397440.pth... [2024-06-15 22:22:55,898][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000907632_1858830336.pth [2024-06-15 22:22:57,254][1651669] Updated weights for policy 0, policy_version 913298 (0.0025) [2024-06-15 22:23:00,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 49698.2, 300 sec: 47985.7). Total num frames: 1870528512. Throughput: 0: 12322.1. Samples: 467703296. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:23:00,767][1648981] Avg episode reward: [(0, '872.770')] [2024-06-15 22:23:02,178][1651669] Updated weights for policy 0, policy_version 913345 (0.0012) [2024-06-15 22:23:03,471][1651669] Updated weights for policy 0, policy_version 913398 (0.0012) [2024-06-15 22:23:05,237][1651669] Updated weights for policy 0, policy_version 913456 (0.0012) [2024-06-15 22:23:05,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1870790656. Throughput: 0: 12037.7. Samples: 467771392. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:23:05,767][1648981] Avg episode reward: [(0, '884.310')] [2024-06-15 22:23:07,101][1651669] Updated weights for policy 0, policy_version 913533 (0.0009) [2024-06-15 22:23:09,222][1651669] Updated weights for policy 0, policy_version 913584 (0.0019) [2024-06-15 22:23:10,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 47985.7). Total num frames: 1871052800. Throughput: 0: 12003.6. Samples: 467805184. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:23:10,767][1648981] Avg episode reward: [(0, '901.070')] [2024-06-15 22:23:14,860][1651669] Updated weights for policy 0, policy_version 913648 (0.0014) [2024-06-15 22:23:15,546][1651274] Signal inference workers to stop experience collection... (48000 times) [2024-06-15 22:23:15,621][1651669] InferenceWorker_p0-w0: stopping experience collection (48000 times) [2024-06-15 22:23:15,767][1648981] Fps is (10 sec: 39320.7, 60 sec: 45875.0, 300 sec: 47763.5). Total num frames: 1871183872. Throughput: 0: 12014.9. Samples: 467879424. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:23:15,768][1648981] Avg episode reward: [(0, '898.120')] [2024-06-15 22:23:15,994][1651274] Signal inference workers to resume experience collection... (48000 times) [2024-06-15 22:23:15,995][1651669] InferenceWorker_p0-w0: resuming experience collection (48000 times) [2024-06-15 22:23:16,597][1651669] Updated weights for policy 0, policy_version 913697 (0.0013) [2024-06-15 22:23:18,205][1651669] Updated weights for policy 0, policy_version 913761 (0.0114) [2024-06-15 22:23:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 47763.5). Total num frames: 1871511552. Throughput: 0: 11639.5. Samples: 467938304. Policy #0 lag: (min: 13.0, avg: 95.1, max: 269.0) [2024-06-15 22:23:20,767][1648981] Avg episode reward: [(0, '894.060')] [2024-06-15 22:23:21,055][1651669] Updated weights for policy 0, policy_version 913840 (0.0117) [2024-06-15 22:23:25,766][1648981] Fps is (10 sec: 42599.6, 60 sec: 44782.9, 300 sec: 47541.4). Total num frames: 1871609856. Throughput: 0: 11502.9. Samples: 467973120. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:25,767][1648981] Avg episode reward: [(0, '909.980')] [2024-06-15 22:23:26,067][1651669] Updated weights for policy 0, policy_version 913888 (0.0013) [2024-06-15 22:23:27,803][1651669] Updated weights for policy 0, policy_version 913939 (0.0012) [2024-06-15 22:23:29,675][1651669] Updated weights for policy 0, policy_version 914016 (0.0011) [2024-06-15 22:23:30,490][1651669] Updated weights for policy 0, policy_version 914047 (0.0010) [2024-06-15 22:23:30,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 48059.6, 300 sec: 47763.8). Total num frames: 1871970304. Throughput: 0: 11548.4. Samples: 468042240. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:30,768][1648981] Avg episode reward: [(0, '952.210')] [2024-06-15 22:23:35,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 43690.7, 300 sec: 47652.5). Total num frames: 1872101376. Throughput: 0: 11320.9. Samples: 468110848. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:35,767][1648981] Avg episode reward: [(0, '994.660')] [2024-06-15 22:23:36,560][1651669] Updated weights for policy 0, policy_version 914116 (0.0013) [2024-06-15 22:23:38,661][1651669] Updated weights for policy 0, policy_version 914177 (0.0046) [2024-06-15 22:23:40,423][1651669] Updated weights for policy 0, policy_version 914241 (0.0011) [2024-06-15 22:23:40,767][1648981] Fps is (10 sec: 42598.6, 60 sec: 46421.2, 300 sec: 47652.4). Total num frames: 1872396288. Throughput: 0: 11400.5. Samples: 468151296. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:40,767][1648981] Avg episode reward: [(0, '1015.440')] [2024-06-15 22:23:43,900][1651669] Updated weights for policy 0, policy_version 914320 (0.0012) [2024-06-15 22:23:45,183][1651669] Updated weights for policy 0, policy_version 914368 (0.0011) [2024-06-15 22:23:45,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 45329.0, 300 sec: 47985.7). Total num frames: 1872625664. Throughput: 0: 11161.6. Samples: 468205568. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:45,767][1648981] Avg episode reward: [(0, '929.130')] [2024-06-15 22:23:49,071][1651669] Updated weights for policy 0, policy_version 914431 (0.0024) [2024-06-15 22:23:50,766][1648981] Fps is (10 sec: 39322.7, 60 sec: 45329.0, 300 sec: 47541.4). Total num frames: 1872789504. Throughput: 0: 11491.6. Samples: 468288512. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:50,767][1648981] Avg episode reward: [(0, '951.620')] [2024-06-15 22:23:51,960][1651669] Updated weights for policy 0, policy_version 914497 (0.0150) [2024-06-15 22:23:53,111][1651669] Updated weights for policy 0, policy_version 914554 (0.0139) [2024-06-15 22:23:55,600][1651669] Updated weights for policy 0, policy_version 914608 (0.0036) [2024-06-15 22:23:55,767][1648981] Fps is (10 sec: 49150.9, 60 sec: 45875.1, 300 sec: 47874.6). Total num frames: 1873117184. Throughput: 0: 11366.3. Samples: 468316672. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:23:55,767][1648981] Avg episode reward: [(0, '935.500')] [2024-06-15 22:23:59,669][1651274] Signal inference workers to stop experience collection... (48050 times) [2024-06-15 22:23:59,684][1651669] InferenceWorker_p0-w0: stopping experience collection (48050 times) [2024-06-15 22:23:59,862][1651274] Signal inference workers to resume experience collection... (48050 times) [2024-06-15 22:23:59,864][1651669] InferenceWorker_p0-w0: resuming experience collection (48050 times) [2024-06-15 22:24:00,152][1651669] Updated weights for policy 0, policy_version 914672 (0.0012) [2024-06-15 22:24:00,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 45875.2, 300 sec: 47541.4). Total num frames: 1873281024. Throughput: 0: 11491.6. Samples: 468396544. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:00,767][1648981] Avg episode reward: [(0, '958.980')] [2024-06-15 22:24:01,316][1651669] Updated weights for policy 0, policy_version 914705 (0.0029) [2024-06-15 22:24:02,932][1651669] Updated weights for policy 0, policy_version 914769 (0.0011) [2024-06-15 22:24:03,838][1651669] Updated weights for policy 0, policy_version 914816 (0.0012) [2024-06-15 22:24:05,766][1648981] Fps is (10 sec: 49153.0, 60 sec: 46967.5, 300 sec: 47763.6). Total num frames: 1873608704. Throughput: 0: 11650.8. Samples: 468462592. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:05,767][1648981] Avg episode reward: [(0, '966.580')] [2024-06-15 22:24:05,985][1651669] Updated weights for policy 0, policy_version 914864 (0.0011) [2024-06-15 22:24:10,740][1651669] Updated weights for policy 0, policy_version 914912 (0.0015) [2024-06-15 22:24:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 44782.9, 300 sec: 47430.3). Total num frames: 1873739776. Throughput: 0: 11685.0. Samples: 468498944. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:10,767][1648981] Avg episode reward: [(0, '981.240')] [2024-06-15 22:24:11,620][1651669] Updated weights for policy 0, policy_version 914945 (0.0020) [2024-06-15 22:24:13,279][1651669] Updated weights for policy 0, policy_version 915011 (0.0010) [2024-06-15 22:24:14,707][1651669] Updated weights for policy 0, policy_version 915070 (0.0012) [2024-06-15 22:24:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.9, 300 sec: 47541.4). Total num frames: 1874067456. Throughput: 0: 11650.9. Samples: 468566528. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:15,767][1648981] Avg episode reward: [(0, '938.550')] [2024-06-15 22:24:17,065][1651669] Updated weights for policy 0, policy_version 915108 (0.0017) [2024-06-15 22:24:20,542][1651669] Updated weights for policy 0, policy_version 915152 (0.0058) [2024-06-15 22:24:20,767][1648981] Fps is (10 sec: 49151.1, 60 sec: 45328.9, 300 sec: 47652.4). Total num frames: 1874231296. Throughput: 0: 11958.0. Samples: 468648960. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:20,768][1648981] Avg episode reward: [(0, '957.020')] [2024-06-15 22:24:22,989][1651669] Updated weights for policy 0, policy_version 915216 (0.0019) [2024-06-15 22:24:24,232][1651669] Updated weights for policy 0, policy_version 915267 (0.0082) [2024-06-15 22:24:25,428][1651669] Updated weights for policy 0, policy_version 915328 (0.0011) [2024-06-15 22:24:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.2, 300 sec: 47541.4). Total num frames: 1874591744. Throughput: 0: 11901.2. Samples: 468686848. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:25,767][1648981] Avg episode reward: [(0, '947.210')] [2024-06-15 22:24:27,639][1651669] Updated weights for policy 0, policy_version 915376 (0.0011) [2024-06-15 22:24:30,766][1648981] Fps is (10 sec: 49153.3, 60 sec: 45875.4, 300 sec: 47541.4). Total num frames: 1874722816. Throughput: 0: 12310.8. Samples: 468759552. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:30,767][1648981] Avg episode reward: [(0, '991.110')] [2024-06-15 22:24:31,364][1651669] Updated weights for policy 0, policy_version 915424 (0.0014) [2024-06-15 22:24:34,489][1651669] Updated weights for policy 0, policy_version 915488 (0.0111) [2024-06-15 22:24:35,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.0, 300 sec: 47652.5). Total num frames: 1875050496. Throughput: 0: 11889.8. Samples: 468823552. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:35,767][1648981] Avg episode reward: [(0, '988.870')] [2024-06-15 22:24:35,916][1651669] Updated weights for policy 0, policy_version 915553 (0.0012) [2024-06-15 22:24:37,834][1651669] Updated weights for policy 0, policy_version 915591 (0.0021) [2024-06-15 22:24:38,393][1651274] Signal inference workers to stop experience collection... (48100 times) [2024-06-15 22:24:38,437][1651669] InferenceWorker_p0-w0: stopping experience collection (48100 times) [2024-06-15 22:24:38,725][1651274] Signal inference workers to resume experience collection... (48100 times) [2024-06-15 22:24:38,726][1651669] InferenceWorker_p0-w0: resuming experience collection (48100 times) [2024-06-15 22:24:38,952][1651669] Updated weights for policy 0, policy_version 915642 (0.0012) [2024-06-15 22:24:40,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 47513.8, 300 sec: 47763.5). Total num frames: 1875247104. Throughput: 0: 12219.8. Samples: 468866560. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:40,767][1648981] Avg episode reward: [(0, '1004.660')] [2024-06-15 22:24:42,231][1651669] Updated weights for policy 0, policy_version 915702 (0.0013) [2024-06-15 22:24:44,640][1651669] Updated weights for policy 0, policy_version 915750 (0.0012) [2024-06-15 22:24:45,767][1648981] Fps is (10 sec: 49151.2, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1875542016. Throughput: 0: 12162.8. Samples: 468943872. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:45,767][1648981] Avg episode reward: [(0, '1011.640')] [2024-06-15 22:24:46,322][1651669] Updated weights for policy 0, policy_version 915824 (0.0012) [2024-06-15 22:24:49,786][1651669] Updated weights for policy 0, policy_version 915892 (0.0011) [2024-06-15 22:24:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1875771392. Throughput: 0: 12231.1. Samples: 469012992. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:50,767][1648981] Avg episode reward: [(0, '1008.660')] [2024-06-15 22:24:52,526][1651669] Updated weights for policy 0, policy_version 915936 (0.0011) [2024-06-15 22:24:54,225][1651669] Updated weights for policy 0, policy_version 915970 (0.0010) [2024-06-15 22:24:55,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.8, 300 sec: 47874.6). Total num frames: 1876000768. Throughput: 0: 12435.9. Samples: 469058560. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:24:55,767][1648981] Avg episode reward: [(0, '991.400')] [2024-06-15 22:24:56,223][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000916048_1876066304.pth... [2024-06-15 22:24:56,364][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000910416_1864531968.pth [2024-06-15 22:24:56,628][1651669] Updated weights for policy 0, policy_version 916064 (0.0129) [2024-06-15 22:25:00,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49152.1, 300 sec: 47763.5). Total num frames: 1876230144. Throughput: 0: 12367.7. Samples: 469123072. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:25:00,767][1648981] Avg episode reward: [(0, '1037.430')] [2024-06-15 22:25:00,888][1651669] Updated weights for policy 0, policy_version 916144 (0.0099) [2024-06-15 22:25:04,333][1651669] Updated weights for policy 0, policy_version 916208 (0.0010) [2024-06-15 22:25:04,768][1651669] Updated weights for policy 0, policy_version 916224 (0.0024) [2024-06-15 22:25:05,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 47513.6, 300 sec: 47874.7). Total num frames: 1876459520. Throughput: 0: 12265.3. Samples: 469200896. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:25:05,767][1648981] Avg episode reward: [(0, '1028.710')] [2024-06-15 22:25:06,664][1651669] Updated weights for policy 0, policy_version 916288 (0.0011) [2024-06-15 22:25:07,947][1651669] Updated weights for policy 0, policy_version 916345 (0.0014) [2024-06-15 22:25:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49152.0, 300 sec: 47541.4). Total num frames: 1876688896. Throughput: 0: 12071.8. Samples: 469230080. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:25:10,767][1648981] Avg episode reward: [(0, '1070.670')] [2024-06-15 22:25:11,508][1651669] Updated weights for policy 0, policy_version 916385 (0.0011) [2024-06-15 22:25:13,466][1651669] Updated weights for policy 0, policy_version 916418 (0.0015) [2024-06-15 22:25:15,766][1648981] Fps is (10 sec: 49151.4, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1876951040. Throughput: 0: 12231.1. Samples: 469309952. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:25:15,767][1648981] Avg episode reward: [(0, '1054.680')] [2024-06-15 22:25:16,104][1651669] Updated weights for policy 0, policy_version 916499 (0.0015) [2024-06-15 22:25:17,960][1651669] Updated weights for policy 0, policy_version 916576 (0.0012) [2024-06-15 22:25:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.3, 300 sec: 47541.5). Total num frames: 1877213184. Throughput: 0: 12242.5. Samples: 469374464. Policy #0 lag: (min: 15.0, avg: 89.7, max: 271.0) [2024-06-15 22:25:20,767][1648981] Avg episode reward: [(0, '1059.290')] [2024-06-15 22:25:22,155][1651274] Signal inference workers to stop experience collection... (48150 times) [2024-06-15 22:25:22,191][1651669] InferenceWorker_p0-w0: stopping experience collection (48150 times) [2024-06-15 22:25:22,429][1651274] Signal inference workers to resume experience collection... (48150 times) [2024-06-15 22:25:22,430][1651669] InferenceWorker_p0-w0: resuming experience collection (48150 times) [2024-06-15 22:25:22,807][1651669] Updated weights for policy 0, policy_version 916640 (0.0014) [2024-06-15 22:25:25,093][1651669] Updated weights for policy 0, policy_version 916688 (0.0014) [2024-06-15 22:25:25,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 46967.5, 300 sec: 47763.6). Total num frames: 1877409792. Throughput: 0: 12174.2. Samples: 469414400. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:25,767][1648981] Avg episode reward: [(0, '1065.420')] [2024-06-15 22:25:26,517][1651669] Updated weights for policy 0, policy_version 916737 (0.0014) [2024-06-15 22:25:28,873][1651669] Updated weights for policy 0, policy_version 916832 (0.0012) [2024-06-15 22:25:30,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 50244.3, 300 sec: 47874.6). Total num frames: 1877737472. Throughput: 0: 11741.9. Samples: 469472256. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:30,766][1648981] Avg episode reward: [(0, '1043.460')] [2024-06-15 22:25:33,562][1651669] Updated weights for policy 0, policy_version 916896 (0.0014) [2024-06-15 22:25:35,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 47541.4). Total num frames: 1877868544. Throughput: 0: 12071.8. Samples: 469556224. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:35,767][1648981] Avg episode reward: [(0, '1067.780')] [2024-06-15 22:25:36,358][1651669] Updated weights for policy 0, policy_version 916962 (0.0012) [2024-06-15 22:25:38,139][1651669] Updated weights for policy 0, policy_version 917025 (0.0014) [2024-06-15 22:25:38,748][1651669] Updated weights for policy 0, policy_version 917056 (0.0011) [2024-06-15 22:25:40,638][1651669] Updated weights for policy 0, policy_version 917115 (0.0014) [2024-06-15 22:25:40,766][1648981] Fps is (10 sec: 52427.7, 60 sec: 50244.2, 300 sec: 48207.8). Total num frames: 1878261760. Throughput: 0: 11867.0. Samples: 469592576. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:40,767][1648981] Avg episode reward: [(0, '1052.290')] [2024-06-15 22:25:44,206][1651669] Updated weights for policy 0, policy_version 917155 (0.0012) [2024-06-15 22:25:45,766][1648981] Fps is (10 sec: 55705.7, 60 sec: 48059.9, 300 sec: 48096.8). Total num frames: 1878425600. Throughput: 0: 12185.6. Samples: 469671424. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:45,767][1648981] Avg episode reward: [(0, '1029.540')] [2024-06-15 22:25:46,411][1651669] Updated weights for policy 0, policy_version 917232 (0.0012) [2024-06-15 22:25:48,065][1651669] Updated weights for policy 0, policy_version 917281 (0.0016) [2024-06-15 22:25:49,977][1651669] Updated weights for policy 0, policy_version 917328 (0.0013) [2024-06-15 22:25:50,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 48318.9). Total num frames: 1878720512. Throughput: 0: 12140.1. Samples: 469747200. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:50,767][1648981] Avg episode reward: [(0, '1038.380')] [2024-06-15 22:25:51,167][1651669] Updated weights for policy 0, policy_version 917374 (0.0013) [2024-06-15 22:25:54,447][1651669] Updated weights for policy 0, policy_version 917433 (0.0012) [2024-06-15 22:25:55,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48606.0, 300 sec: 47985.7). Total num frames: 1878917120. Throughput: 0: 12447.3. Samples: 469790208. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:25:55,767][1648981] Avg episode reward: [(0, '1058.670')] [2024-06-15 22:25:56,816][1651669] Updated weights for policy 0, policy_version 917495 (0.0012) [2024-06-15 22:25:58,656][1651669] Updated weights for policy 0, policy_version 917536 (0.0011) [2024-06-15 22:26:00,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49698.1, 300 sec: 48541.1). Total num frames: 1879212032. Throughput: 0: 12265.3. Samples: 469861888. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:00,767][1648981] Avg episode reward: [(0, '1096.500')] [2024-06-15 22:26:00,782][1651669] Updated weights for policy 0, policy_version 917600 (0.0014) [2024-06-15 22:26:01,625][1651669] Updated weights for policy 0, policy_version 917632 (0.0017) [2024-06-15 22:26:03,823][1651274] Signal inference workers to stop experience collection... (48200 times) [2024-06-15 22:26:03,872][1651669] InferenceWorker_p0-w0: stopping experience collection (48200 times) [2024-06-15 22:26:04,004][1651274] Signal inference workers to resume experience collection... (48200 times) [2024-06-15 22:26:04,005][1651669] InferenceWorker_p0-w0: resuming experience collection (48200 times) [2024-06-15 22:26:04,444][1651669] Updated weights for policy 0, policy_version 917680 (0.0013) [2024-06-15 22:26:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1879441408. Throughput: 0: 12572.5. Samples: 469940224. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:05,767][1648981] Avg episode reward: [(0, '1093.490')] [2024-06-15 22:26:05,801][1651669] Updated weights for policy 0, policy_version 917698 (0.0012) [2024-06-15 22:26:09,774][1651669] Updated weights for policy 0, policy_version 917762 (0.0028) [2024-06-15 22:26:10,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49698.1, 300 sec: 48318.9). Total num frames: 1879670784. Throughput: 0: 12401.8. Samples: 469972480. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:10,767][1648981] Avg episode reward: [(0, '1057.050')] [2024-06-15 22:26:11,035][1651669] Updated weights for policy 0, policy_version 917824 (0.0012) [2024-06-15 22:26:12,509][1651669] Updated weights for policy 0, policy_version 917888 (0.0137) [2024-06-15 22:26:15,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49698.2, 300 sec: 48207.8). Total num frames: 1879932928. Throughput: 0: 12697.6. Samples: 470043648. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:15,767][1648981] Avg episode reward: [(0, '1078.450')] [2024-06-15 22:26:16,061][1651669] Updated weights for policy 0, policy_version 917949 (0.0012) [2024-06-15 22:26:18,517][1651669] Updated weights for policy 0, policy_version 918008 (0.0012) [2024-06-15 22:26:20,767][1648981] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 48096.7). Total num frames: 1880096768. Throughput: 0: 12435.9. Samples: 470115840. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:20,767][1648981] Avg episode reward: [(0, '1075.750')] [2024-06-15 22:26:21,811][1651669] Updated weights for policy 0, policy_version 918066 (0.0137) [2024-06-15 22:26:23,508][1651669] Updated weights for policy 0, policy_version 918140 (0.0011) [2024-06-15 22:26:25,767][1648981] Fps is (10 sec: 42596.9, 60 sec: 49151.7, 300 sec: 48207.8). Total num frames: 1880358912. Throughput: 0: 12287.9. Samples: 470145536. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:25,767][1648981] Avg episode reward: [(0, '1061.360')] [2024-06-15 22:26:26,318][1651669] Updated weights for policy 0, policy_version 918177 (0.0011) [2024-06-15 22:26:28,514][1651669] Updated weights for policy 0, policy_version 918224 (0.0132) [2024-06-15 22:26:29,570][1651669] Updated weights for policy 0, policy_version 918271 (0.0098) [2024-06-15 22:26:30,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 48059.7, 300 sec: 47986.2). Total num frames: 1880621056. Throughput: 0: 12174.2. Samples: 470219264. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:30,767][1648981] Avg episode reward: [(0, '1056.810')] [2024-06-15 22:26:32,740][1651669] Updated weights for policy 0, policy_version 918336 (0.0091) [2024-06-15 22:26:33,889][1651669] Updated weights for policy 0, policy_version 918393 (0.0011) [2024-06-15 22:26:35,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 50244.4, 300 sec: 48096.8). Total num frames: 1880883200. Throughput: 0: 12379.0. Samples: 470304256. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:35,767][1648981] Avg episode reward: [(0, '1107.990')] [2024-06-15 22:26:37,425][1651669] Updated weights for policy 0, policy_version 918456 (0.0062) [2024-06-15 22:26:39,415][1651669] Updated weights for policy 0, policy_version 918522 (0.0158) [2024-06-15 22:26:40,778][1648981] Fps is (10 sec: 52366.6, 60 sec: 48050.3, 300 sec: 48094.8). Total num frames: 1881145344. Throughput: 0: 12091.4. Samples: 470334464. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:40,779][1648981] Avg episode reward: [(0, '1105.500')] [2024-06-15 22:26:43,564][1651669] Updated weights for policy 0, policy_version 918592 (0.0036) [2024-06-15 22:26:44,359][1651274] Signal inference workers to stop experience collection... (48250 times) [2024-06-15 22:26:44,423][1651669] InferenceWorker_p0-w0: stopping experience collection (48250 times) [2024-06-15 22:26:44,640][1651274] Signal inference workers to resume experience collection... (48250 times) [2024-06-15 22:26:44,640][1651669] InferenceWorker_p0-w0: resuming experience collection (48250 times) [2024-06-15 22:26:44,821][1651669] Updated weights for policy 0, policy_version 918649 (0.0012) [2024-06-15 22:26:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 49698.1, 300 sec: 48430.0). Total num frames: 1881407488. Throughput: 0: 12003.6. Samples: 470402048. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:45,767][1648981] Avg episode reward: [(0, '1089.620')] [2024-06-15 22:26:48,946][1651669] Updated weights for policy 0, policy_version 918705 (0.0015) [2024-06-15 22:26:50,483][1651669] Updated weights for policy 0, policy_version 918768 (0.0027) [2024-06-15 22:26:50,766][1648981] Fps is (10 sec: 52490.7, 60 sec: 49151.9, 300 sec: 48430.0). Total num frames: 1881669632. Throughput: 0: 11867.0. Samples: 470474240. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:50,767][1648981] Avg episode reward: [(0, '1130.060')] [2024-06-15 22:26:53,963][1651669] Updated weights for policy 0, policy_version 918820 (0.0024) [2024-06-15 22:26:55,697][1651669] Updated weights for policy 0, policy_version 918903 (0.0015) [2024-06-15 22:26:55,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 49697.9, 300 sec: 48652.1). Total num frames: 1881899008. Throughput: 0: 12140.0. Samples: 470518784. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:26:55,768][1648981] Avg episode reward: [(0, '1102.290')] [2024-06-15 22:26:55,846][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000918912_1881931776.pth... [2024-06-15 22:26:55,897][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000913280_1870397440.pth [2024-06-15 22:26:59,337][1651669] Updated weights for policy 0, policy_version 918944 (0.0095) [2024-06-15 22:27:00,766][1648981] Fps is (10 sec: 42598.8, 60 sec: 48059.8, 300 sec: 48096.8). Total num frames: 1882095616. Throughput: 0: 12128.7. Samples: 470589440. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:27:00,767][1648981] Avg episode reward: [(0, '1134.730')] [2024-06-15 22:27:01,333][1651669] Updated weights for policy 0, policy_version 919024 (0.0089) [2024-06-15 22:27:05,216][1651669] Updated weights for policy 0, policy_version 919072 (0.0042) [2024-06-15 22:27:05,766][1648981] Fps is (10 sec: 39323.2, 60 sec: 47513.7, 300 sec: 48318.9). Total num frames: 1882292224. Throughput: 0: 12071.9. Samples: 470659072. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:27:05,767][1648981] Avg episode reward: [(0, '1128.060')] [2024-06-15 22:27:06,612][1651669] Updated weights for policy 0, policy_version 919136 (0.0012) [2024-06-15 22:27:10,125][1651669] Updated weights for policy 0, policy_version 919184 (0.0012) [2024-06-15 22:27:10,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 47513.6, 300 sec: 47763.5). Total num frames: 1882521600. Throughput: 0: 12151.6. Samples: 470692352. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:27:10,767][1648981] Avg episode reward: [(0, '1114.870')] [2024-06-15 22:27:11,775][1651669] Updated weights for policy 0, policy_version 919250 (0.0119) [2024-06-15 22:27:15,514][1651669] Updated weights for policy 0, policy_version 919298 (0.0012) [2024-06-15 22:27:15,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1882750976. Throughput: 0: 12071.8. Samples: 470762496. Policy #0 lag: (min: 7.0, avg: 102.9, max: 263.0) [2024-06-15 22:27:15,767][1648981] Avg episode reward: [(0, '1118.540')] [2024-06-15 22:27:16,918][1651669] Updated weights for policy 0, policy_version 919360 (0.0011) [2024-06-15 22:27:18,228][1651669] Updated weights for policy 0, policy_version 919424 (0.0024) [2024-06-15 22:27:20,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 48059.8, 300 sec: 47652.4). Total num frames: 1882980352. Throughput: 0: 11764.6. Samples: 470833664. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:20,767][1648981] Avg episode reward: [(0, '1090.100')] [2024-06-15 22:27:23,026][1651669] Updated weights for policy 0, policy_version 919504 (0.0136) [2024-06-15 22:27:25,618][1651669] Updated weights for policy 0, policy_version 919554 (0.0016) [2024-06-15 22:27:25,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48060.0, 300 sec: 47985.7). Total num frames: 1883242496. Throughput: 0: 11847.4. Samples: 470867456. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:25,767][1648981] Avg episode reward: [(0, '1110.110')] [2024-06-15 22:27:26,016][1651274] Signal inference workers to stop experience collection... (48300 times) [2024-06-15 22:27:26,041][1651669] InferenceWorker_p0-w0: stopping experience collection (48300 times) [2024-06-15 22:27:26,292][1651274] Signal inference workers to resume experience collection... (48300 times) [2024-06-15 22:27:26,293][1651669] InferenceWorker_p0-w0: resuming experience collection (48300 times) [2024-06-15 22:27:28,332][1651669] Updated weights for policy 0, policy_version 919632 (0.0010) [2024-06-15 22:27:29,303][1651669] Updated weights for policy 0, policy_version 919676 (0.0086) [2024-06-15 22:27:30,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 47541.4). Total num frames: 1883504640. Throughput: 0: 11923.9. Samples: 470938624. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:30,767][1648981] Avg episode reward: [(0, '1083.190')] [2024-06-15 22:27:31,923][1651669] Updated weights for policy 0, policy_version 919714 (0.0012) [2024-06-15 22:27:33,544][1651669] Updated weights for policy 0, policy_version 919761 (0.0095) [2024-06-15 22:27:35,767][1648981] Fps is (10 sec: 55704.6, 60 sec: 48605.6, 300 sec: 48096.7). Total num frames: 1883799552. Throughput: 0: 12128.7. Samples: 471020032. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:35,767][1648981] Avg episode reward: [(0, '1074.080')] [2024-06-15 22:27:35,774][1651669] Updated weights for policy 0, policy_version 919840 (0.0015) [2024-06-15 22:27:36,518][1651669] Updated weights for policy 0, policy_version 919872 (0.0012) [2024-06-15 22:27:39,650][1651669] Updated weights for policy 0, policy_version 919931 (0.0019) [2024-06-15 22:27:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48069.2, 300 sec: 47874.6). Total num frames: 1884028928. Throughput: 0: 12015.0. Samples: 471059456. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:40,767][1648981] Avg episode reward: [(0, '1058.020')] [2024-06-15 22:27:42,180][1651669] Updated weights for policy 0, policy_version 919999 (0.0012) [2024-06-15 22:27:45,231][1651669] Updated weights for policy 0, policy_version 920061 (0.0013) [2024-06-15 22:27:45,766][1648981] Fps is (10 sec: 49153.1, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1884291072. Throughput: 0: 12117.3. Samples: 471134720. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:45,767][1648981] Avg episode reward: [(0, '1078.930')] [2024-06-15 22:27:46,965][1651669] Updated weights for policy 0, policy_version 920112 (0.0013) [2024-06-15 22:27:49,638][1651669] Updated weights for policy 0, policy_version 920148 (0.0012) [2024-06-15 22:27:50,619][1651669] Updated weights for policy 0, policy_version 920192 (0.0013) [2024-06-15 22:27:50,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 1884553216. Throughput: 0: 12105.9. Samples: 471203840. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:50,767][1648981] Avg episode reward: [(0, '1099.560')] [2024-06-15 22:27:53,050][1651669] Updated weights for policy 0, policy_version 920247 (0.0011) [2024-06-15 22:27:54,879][1651669] Updated weights for policy 0, policy_version 920314 (0.0012) [2024-06-15 22:27:55,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48606.1, 300 sec: 48430.0). Total num frames: 1884815360. Throughput: 0: 12344.9. Samples: 471247872. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:27:55,767][1648981] Avg episode reward: [(0, '1068.840')] [2024-06-15 22:27:57,585][1651669] Updated weights for policy 0, policy_version 920382 (0.0012) [2024-06-15 22:28:00,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 48605.9, 300 sec: 48207.9). Total num frames: 1885011968. Throughput: 0: 12401.8. Samples: 471320576. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:00,767][1648981] Avg episode reward: [(0, '988.960')] [2024-06-15 22:28:01,321][1651669] Updated weights for policy 0, policy_version 920443 (0.0012) [2024-06-15 22:28:03,891][1651669] Updated weights for policy 0, policy_version 920506 (0.0011) [2024-06-15 22:28:05,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 49698.0, 300 sec: 48207.8). Total num frames: 1885274112. Throughput: 0: 12447.3. Samples: 471393792. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:05,767][1648981] Avg episode reward: [(0, '1016.830')] [2024-06-15 22:28:05,862][1651669] Updated weights for policy 0, policy_version 920548 (0.0012) [2024-06-15 22:28:07,589][1651669] Updated weights for policy 0, policy_version 920592 (0.0012) [2024-06-15 22:28:10,677][1651274] Signal inference workers to stop experience collection... (48350 times) [2024-06-15 22:28:10,695][1651669] Updated weights for policy 0, policy_version 920641 (0.0011) [2024-06-15 22:28:10,715][1651669] InferenceWorker_p0-w0: stopping experience collection (48350 times) [2024-06-15 22:28:10,766][1648981] Fps is (10 sec: 45874.7, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1885470720. Throughput: 0: 12413.2. Samples: 471426048. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:10,767][1648981] Avg episode reward: [(0, '995.950')] [2024-06-15 22:28:10,843][1651274] Signal inference workers to resume experience collection... (48350 times) [2024-06-15 22:28:10,844][1651669] InferenceWorker_p0-w0: resuming experience collection (48350 times) [2024-06-15 22:28:12,077][1651669] Updated weights for policy 0, policy_version 920701 (0.0035) [2024-06-15 22:28:14,395][1651669] Updated weights for policy 0, policy_version 920752 (0.0034) [2024-06-15 22:28:15,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 50244.3, 300 sec: 48318.9). Total num frames: 1885765632. Throughput: 0: 12424.5. Samples: 471497728. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:15,767][1648981] Avg episode reward: [(0, '952.590')] [2024-06-15 22:28:15,968][1651669] Updated weights for policy 0, policy_version 920800 (0.0095) [2024-06-15 22:28:19,093][1651669] Updated weights for policy 0, policy_version 920868 (0.0011) [2024-06-15 22:28:20,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 50244.3, 300 sec: 48763.2). Total num frames: 1885995008. Throughput: 0: 12310.8. Samples: 471574016. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:20,767][1648981] Avg episode reward: [(0, '953.130')] [2024-06-15 22:28:22,326][1651669] Updated weights for policy 0, policy_version 920919 (0.0012) [2024-06-15 22:28:23,328][1651669] Updated weights for policy 0, policy_version 920960 (0.0019) [2024-06-15 22:28:25,321][1651669] Updated weights for policy 0, policy_version 921015 (0.0011) [2024-06-15 22:28:25,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 50244.3, 300 sec: 48430.0). Total num frames: 1886257152. Throughput: 0: 12231.1. Samples: 471609856. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:25,767][1648981] Avg episode reward: [(0, '943.750')] [2024-06-15 22:28:26,625][1651669] Updated weights for policy 0, policy_version 921056 (0.0011) [2024-06-15 22:28:30,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1886486528. Throughput: 0: 12151.5. Samples: 471681536. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:30,767][1648981] Avg episode reward: [(0, '908.660')] [2024-06-15 22:28:30,805][1651669] Updated weights for policy 0, policy_version 921140 (0.0127) [2024-06-15 22:28:33,766][1651669] Updated weights for policy 0, policy_version 921200 (0.0159) [2024-06-15 22:28:35,611][1651669] Updated weights for policy 0, policy_version 921249 (0.0039) [2024-06-15 22:28:35,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48606.1, 300 sec: 48541.1). Total num frames: 1886715904. Throughput: 0: 12219.8. Samples: 471753728. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:35,767][1648981] Avg episode reward: [(0, '930.990')] [2024-06-15 22:28:37,557][1651669] Updated weights for policy 0, policy_version 921313 (0.0012) [2024-06-15 22:28:38,146][1651669] Updated weights for policy 0, policy_version 921342 (0.0012) [2024-06-15 22:28:40,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 1886978048. Throughput: 0: 12049.1. Samples: 471790080. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:40,767][1648981] Avg episode reward: [(0, '928.560')] [2024-06-15 22:28:41,066][1651669] Updated weights for policy 0, policy_version 921392 (0.0011) [2024-06-15 22:28:43,503][1651669] Updated weights for policy 0, policy_version 921413 (0.0056) [2024-06-15 22:28:45,087][1651669] Updated weights for policy 0, policy_version 921473 (0.0012) [2024-06-15 22:28:45,768][1648981] Fps is (10 sec: 52419.0, 60 sec: 49150.5, 300 sec: 48985.1). Total num frames: 1887240192. Throughput: 0: 12150.9. Samples: 471867392. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:45,769][1648981] Avg episode reward: [(0, '923.110')] [2024-06-15 22:28:46,371][1651669] Updated weights for policy 0, policy_version 921534 (0.0014) [2024-06-15 22:28:48,399][1651669] Updated weights for policy 0, policy_version 921588 (0.0014) [2024-06-15 22:28:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 1887436800. Throughput: 0: 12106.0. Samples: 471938560. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:50,767][1648981] Avg episode reward: [(0, '916.520')] [2024-06-15 22:28:51,104][1651669] Updated weights for policy 0, policy_version 921616 (0.0012) [2024-06-15 22:28:54,919][1651669] Updated weights for policy 0, policy_version 921680 (0.0014) [2024-06-15 22:28:55,427][1651274] Signal inference workers to stop experience collection... (48400 times) [2024-06-15 22:28:55,468][1651669] InferenceWorker_p0-w0: stopping experience collection (48400 times) [2024-06-15 22:28:55,619][1651274] Signal inference workers to resume experience collection... (48400 times) [2024-06-15 22:28:55,619][1651669] InferenceWorker_p0-w0: resuming experience collection (48400 times) [2024-06-15 22:28:55,774][1648981] Fps is (10 sec: 42573.3, 60 sec: 47507.5, 300 sec: 48761.9). Total num frames: 1887666176. Throughput: 0: 12229.0. Samples: 471976448. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:28:55,775][1648981] Avg episode reward: [(0, '919.240')] [2024-06-15 22:28:56,048][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000921728_1887698944.pth... [2024-06-15 22:28:56,252][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000916048_1876066304.pth [2024-06-15 22:28:56,649][1651669] Updated weights for policy 0, policy_version 921746 (0.0012) [2024-06-15 22:28:59,028][1651669] Updated weights for policy 0, policy_version 921809 (0.0013) [2024-06-15 22:28:59,940][1651669] Updated weights for policy 0, policy_version 921852 (0.0011) [2024-06-15 22:29:00,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1887961088. Throughput: 0: 12151.5. Samples: 472044544. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:29:00,767][1648981] Avg episode reward: [(0, '920.900')] [2024-06-15 22:29:02,123][1651669] Updated weights for policy 0, policy_version 921905 (0.0122) [2024-06-15 22:29:05,386][1651669] Updated weights for policy 0, policy_version 921941 (0.0014) [2024-06-15 22:29:05,766][1648981] Fps is (10 sec: 49189.8, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1888157696. Throughput: 0: 12424.5. Samples: 472133120. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:29:05,767][1648981] Avg episode reward: [(0, '905.140')] [2024-06-15 22:29:07,212][1651669] Updated weights for policy 0, policy_version 922018 (0.0226) [2024-06-15 22:29:10,095][1651669] Updated weights for policy 0, policy_version 922096 (0.0013) [2024-06-15 22:29:10,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1888485376. Throughput: 0: 12253.9. Samples: 472161280. Policy #0 lag: (min: 17.0, avg: 151.3, max: 321.0) [2024-06-15 22:29:10,767][1648981] Avg episode reward: [(0, '911.280')] [2024-06-15 22:29:11,456][1651669] Updated weights for policy 0, policy_version 922128 (0.0013) [2024-06-15 22:29:15,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 47513.6, 300 sec: 48763.3). Total num frames: 1888616448. Throughput: 0: 12424.5. Samples: 472240640. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:15,767][1648981] Avg episode reward: [(0, '903.220')] [2024-06-15 22:29:15,775][1651669] Updated weights for policy 0, policy_version 922177 (0.0106) [2024-06-15 22:29:17,316][1651669] Updated weights for policy 0, policy_version 922242 (0.0012) [2024-06-15 22:29:20,181][1651669] Updated weights for policy 0, policy_version 922305 (0.0129) [2024-06-15 22:29:20,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1888911360. Throughput: 0: 12265.2. Samples: 472305664. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:20,767][1648981] Avg episode reward: [(0, '864.700')] [2024-06-15 22:29:21,517][1651669] Updated weights for policy 0, policy_version 922368 (0.0012) [2024-06-15 22:29:23,399][1651669] Updated weights for policy 0, policy_version 922423 (0.0013) [2024-06-15 22:29:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1889140736. Throughput: 0: 12197.0. Samples: 472338944. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:25,767][1648981] Avg episode reward: [(0, '912.290')] [2024-06-15 22:29:27,181][1651669] Updated weights for policy 0, policy_version 922468 (0.0048) [2024-06-15 22:29:29,291][1651669] Updated weights for policy 0, policy_version 922549 (0.0014) [2024-06-15 22:29:30,766][1648981] Fps is (10 sec: 49152.2, 60 sec: 48605.9, 300 sec: 48652.2). Total num frames: 1889402880. Throughput: 0: 12197.5. Samples: 472416256. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:30,767][1648981] Avg episode reward: [(0, '941.460')] [2024-06-15 22:29:31,282][1651669] Updated weights for policy 0, policy_version 922597 (0.0015) [2024-06-15 22:29:33,492][1651669] Updated weights for policy 0, policy_version 922656 (0.0014) [2024-06-15 22:29:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1889665024. Throughput: 0: 12288.0. Samples: 472491520. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:35,767][1648981] Avg episode reward: [(0, '995.560')] [2024-06-15 22:29:37,255][1651274] Signal inference workers to stop experience collection... (48450 times) [2024-06-15 22:29:37,272][1651669] Updated weights for policy 0, policy_version 922693 (0.0011) [2024-06-15 22:29:37,337][1651669] InferenceWorker_p0-w0: stopping experience collection (48450 times) [2024-06-15 22:29:37,470][1651274] Signal inference workers to resume experience collection... (48450 times) [2024-06-15 22:29:37,471][1651669] InferenceWorker_p0-w0: resuming experience collection (48450 times) [2024-06-15 22:29:38,864][1651669] Updated weights for policy 0, policy_version 922757 (0.0161) [2024-06-15 22:29:40,156][1651669] Updated weights for policy 0, policy_version 922810 (0.0011) [2024-06-15 22:29:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.0, 300 sec: 48763.3). Total num frames: 1889927168. Throughput: 0: 12335.6. Samples: 472531456. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:40,767][1648981] Avg episode reward: [(0, '1010.280')] [2024-06-15 22:29:42,628][1651669] Updated weights for policy 0, policy_version 922864 (0.0032) [2024-06-15 22:29:43,799][1651669] Updated weights for policy 0, policy_version 922901 (0.0011) [2024-06-15 22:29:44,368][1651669] Updated weights for policy 0, policy_version 922939 (0.0010) [2024-06-15 22:29:45,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49153.5, 300 sec: 48874.3). Total num frames: 1890189312. Throughput: 0: 12424.5. Samples: 472603648. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:45,767][1648981] Avg episode reward: [(0, '1010.270')] [2024-06-15 22:29:48,419][1651669] Updated weights for policy 0, policy_version 922992 (0.0122) [2024-06-15 22:29:49,919][1651669] Updated weights for policy 0, policy_version 923056 (0.0018) [2024-06-15 22:29:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.2, 300 sec: 48985.4). Total num frames: 1890451456. Throughput: 0: 12140.1. Samples: 472679424. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:50,767][1648981] Avg episode reward: [(0, '1061.330')] [2024-06-15 22:29:52,343][1651669] Updated weights for policy 0, policy_version 923091 (0.0020) [2024-06-15 22:29:53,722][1651669] Updated weights for policy 0, policy_version 923152 (0.0013) [2024-06-15 22:29:55,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50797.0, 300 sec: 49096.5). Total num frames: 1890713600. Throughput: 0: 12356.3. Samples: 472717312. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:29:55,767][1648981] Avg episode reward: [(0, '1025.690')] [2024-06-15 22:29:57,587][1651669] Updated weights for policy 0, policy_version 923205 (0.0013) [2024-06-15 22:29:59,126][1651669] Updated weights for policy 0, policy_version 923280 (0.0012) [2024-06-15 22:30:00,223][1651669] Updated weights for policy 0, policy_version 923327 (0.0013) [2024-06-15 22:30:00,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.1, 300 sec: 49207.5). Total num frames: 1890975744. Throughput: 0: 12526.9. Samples: 472804352. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:00,767][1648981] Avg episode reward: [(0, '1060.340')] [2024-06-15 22:30:03,864][1651669] Updated weights for policy 0, policy_version 923408 (0.0014) [2024-06-15 22:30:05,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 51336.6, 300 sec: 49318.6). Total num frames: 1891237888. Throughput: 0: 12470.0. Samples: 472866816. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:05,767][1648981] Avg episode reward: [(0, '1081.620')] [2024-06-15 22:30:08,398][1651669] Updated weights for policy 0, policy_version 923477 (0.0093) [2024-06-15 22:30:10,219][1651669] Updated weights for policy 0, policy_version 923553 (0.0012) [2024-06-15 22:30:10,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1891500032. Throughput: 0: 12868.3. Samples: 472918016. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:10,767][1648981] Avg episode reward: [(0, '1083.340')] [2024-06-15 22:30:13,325][1651669] Updated weights for policy 0, policy_version 923616 (0.0012) [2024-06-15 22:30:13,422][1651274] Signal inference workers to stop experience collection... (48500 times) [2024-06-15 22:30:13,522][1651669] InferenceWorker_p0-w0: stopping experience collection (48500 times) [2024-06-15 22:30:13,737][1651274] Signal inference workers to resume experience collection... (48500 times) [2024-06-15 22:30:13,737][1651669] InferenceWorker_p0-w0: resuming experience collection (48500 times) [2024-06-15 22:30:15,484][1651669] Updated weights for policy 0, policy_version 923696 (0.0223) [2024-06-15 22:30:15,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 51882.6, 300 sec: 49207.5). Total num frames: 1891729408. Throughput: 0: 12629.3. Samples: 472984576. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:15,767][1648981] Avg episode reward: [(0, '1078.530')] [2024-06-15 22:30:19,249][1651669] Updated weights for policy 0, policy_version 923729 (0.0010) [2024-06-15 22:30:20,609][1651669] Updated weights for policy 0, policy_version 923784 (0.0012) [2024-06-15 22:30:20,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 50244.2, 300 sec: 49207.5). Total num frames: 1891926016. Throughput: 0: 12617.9. Samples: 473059328. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:20,767][1648981] Avg episode reward: [(0, '1086.630')] [2024-06-15 22:30:21,548][1651669] Updated weights for policy 0, policy_version 923830 (0.0011) [2024-06-15 22:30:24,439][1651669] Updated weights for policy 0, policy_version 923888 (0.0012) [2024-06-15 22:30:25,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 50790.4, 300 sec: 48985.4). Total num frames: 1892188160. Throughput: 0: 12606.6. Samples: 473098752. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:25,767][1648981] Avg episode reward: [(0, '1085.840')] [2024-06-15 22:30:26,180][1651669] Updated weights for policy 0, policy_version 923952 (0.0014) [2024-06-15 22:30:30,766][1648981] Fps is (10 sec: 39321.9, 60 sec: 48605.8, 300 sec: 48985.4). Total num frames: 1892319232. Throughput: 0: 12504.2. Samples: 473166336. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:30,767][1648981] Avg episode reward: [(0, '1143.980')] [2024-06-15 22:30:31,118][1651669] Updated weights for policy 0, policy_version 924003 (0.0099) [2024-06-15 22:30:32,897][1651669] Updated weights for policy 0, policy_version 924067 (0.0129) [2024-06-15 22:30:34,999][1651669] Updated weights for policy 0, policy_version 924131 (0.0012) [2024-06-15 22:30:35,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 50244.3, 300 sec: 48874.3). Total num frames: 1892679680. Throughput: 0: 12219.7. Samples: 473229312. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:35,767][1648981] Avg episode reward: [(0, '1137.440')] [2024-06-15 22:30:36,730][1651669] Updated weights for policy 0, policy_version 924161 (0.0015) [2024-06-15 22:30:38,327][1651669] Updated weights for policy 0, policy_version 924220 (0.0016) [2024-06-15 22:30:40,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 48763.2). Total num frames: 1892810752. Throughput: 0: 12106.0. Samples: 473262080. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:40,767][1648981] Avg episode reward: [(0, '1137.940')] [2024-06-15 22:30:42,793][1651669] Updated weights for policy 0, policy_version 924272 (0.0013) [2024-06-15 22:30:43,942][1651669] Updated weights for policy 0, policy_version 924323 (0.0012) [2024-06-15 22:30:45,317][1651669] Updated weights for policy 0, policy_version 924385 (0.0010) [2024-06-15 22:30:45,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49698.1, 300 sec: 48985.4). Total num frames: 1893171200. Throughput: 0: 12003.6. Samples: 473344512. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:45,767][1648981] Avg episode reward: [(0, '1134.910')] [2024-06-15 22:30:47,501][1651669] Updated weights for policy 0, policy_version 924432 (0.0013) [2024-06-15 22:30:50,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1893335040. Throughput: 0: 12424.5. Samples: 473425920. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:50,767][1648981] Avg episode reward: [(0, '1169.430')] [2024-06-15 22:30:51,409][1651669] Updated weights for policy 0, policy_version 924481 (0.0012) [2024-06-15 22:30:52,559][1651274] Signal inference workers to stop experience collection... (48550 times) [2024-06-15 22:30:52,605][1651669] InferenceWorker_p0-w0: stopping experience collection (48550 times) [2024-06-15 22:30:52,909][1651274] Signal inference workers to resume experience collection... (48550 times) [2024-06-15 22:30:52,910][1651669] InferenceWorker_p0-w0: resuming experience collection (48550 times) [2024-06-15 22:30:53,137][1651669] Updated weights for policy 0, policy_version 924562 (0.0013) [2024-06-15 22:30:54,629][1651669] Updated weights for policy 0, policy_version 924614 (0.0021) [2024-06-15 22:30:55,743][1651669] Updated weights for policy 0, policy_version 924672 (0.0012) [2024-06-15 22:30:55,766][1648981] Fps is (10 sec: 55705.5, 60 sec: 50244.2, 300 sec: 49207.5). Total num frames: 1893728256. Throughput: 0: 11980.8. Samples: 473457152. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:30:55,767][1648981] Avg episode reward: [(0, '1140.060')] [2024-06-15 22:30:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000924672_1893728256.pth... [2024-06-15 22:30:55,814][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000918912_1881931776.pth [2024-06-15 22:30:55,819][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000924672_1893728256.pth [2024-06-15 22:30:59,462][1651669] Updated weights for policy 0, policy_version 924735 (0.0012) [2024-06-15 22:31:00,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.8, 300 sec: 48874.3). Total num frames: 1893859328. Throughput: 0: 12106.0. Samples: 473529344. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:31:00,767][1648981] Avg episode reward: [(0, '1142.060')] [2024-06-15 22:31:02,886][1651669] Updated weights for policy 0, policy_version 924798 (0.0011) [2024-06-15 22:31:04,565][1651669] Updated weights for policy 0, policy_version 924859 (0.0015) [2024-06-15 22:31:05,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 49096.5). Total num frames: 1894154240. Throughput: 0: 12071.8. Samples: 473602560. Policy #0 lag: (min: 47.0, avg: 180.1, max: 303.0) [2024-06-15 22:31:05,767][1648981] Avg episode reward: [(0, '1136.870')] [2024-06-15 22:31:06,610][1651669] Updated weights for policy 0, policy_version 924925 (0.0013) [2024-06-15 22:31:09,725][1651669] Updated weights for policy 0, policy_version 924964 (0.0011) [2024-06-15 22:31:10,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.6, 300 sec: 48985.4). Total num frames: 1894383616. Throughput: 0: 12060.4. Samples: 473641472. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:10,767][1648981] Avg episode reward: [(0, '1149.270')] [2024-06-15 22:31:12,852][1651669] Updated weights for policy 0, policy_version 925014 (0.0011) [2024-06-15 22:31:13,495][1651669] Updated weights for policy 0, policy_version 925053 (0.0020) [2024-06-15 22:31:15,367][1651669] Updated weights for policy 0, policy_version 925104 (0.0010) [2024-06-15 22:31:15,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 48059.7, 300 sec: 49207.6). Total num frames: 1894612992. Throughput: 0: 12333.5. Samples: 473721344. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:15,767][1648981] Avg episode reward: [(0, '1046.160')] [2024-06-15 22:31:16,436][1651669] Updated weights for policy 0, policy_version 925152 (0.0012) [2024-06-15 22:31:19,024][1651669] Updated weights for policy 0, policy_version 925188 (0.0011) [2024-06-15 22:31:20,047][1651669] Updated weights for policy 0, policy_version 925247 (0.0028) [2024-06-15 22:31:20,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 49698.2, 300 sec: 49318.7). Total num frames: 1894907904. Throughput: 0: 12697.6. Samples: 473800704. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:20,767][1648981] Avg episode reward: [(0, '1041.270')] [2024-06-15 22:31:23,086][1651669] Updated weights for policy 0, policy_version 925310 (0.0012) [2024-06-15 22:31:25,110][1651669] Updated weights for policy 0, policy_version 925347 (0.0013) [2024-06-15 22:31:25,766][1648981] Fps is (10 sec: 55705.4, 60 sec: 49698.1, 300 sec: 49318.6). Total num frames: 1895170048. Throughput: 0: 12743.1. Samples: 473835520. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:25,767][1648981] Avg episode reward: [(0, '1059.040')] [2024-06-15 22:31:26,320][1651669] Updated weights for policy 0, policy_version 925392 (0.0012) [2024-06-15 22:31:27,421][1651669] Updated weights for policy 0, policy_version 925440 (0.0013) [2024-06-15 22:31:30,568][1651669] Updated weights for policy 0, policy_version 925488 (0.0010) [2024-06-15 22:31:30,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 51336.5, 300 sec: 49207.5). Total num frames: 1895399424. Throughput: 0: 12629.3. Samples: 473912832. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:30,767][1648981] Avg episode reward: [(0, '1005.170')] [2024-06-15 22:31:32,165][1651669] Updated weights for policy 0, policy_version 925520 (0.0010) [2024-06-15 22:31:34,896][1651669] Updated weights for policy 0, policy_version 925584 (0.0012) [2024-06-15 22:31:35,020][1651274] Signal inference workers to stop experience collection... (48600 times) [2024-06-15 22:31:35,071][1651669] InferenceWorker_p0-w0: stopping experience collection (48600 times) [2024-06-15 22:31:35,312][1651274] Signal inference workers to resume experience collection... (48600 times) [2024-06-15 22:31:35,313][1651669] InferenceWorker_p0-w0: resuming experience collection (48600 times) [2024-06-15 22:31:35,777][1648981] Fps is (10 sec: 49098.6, 60 sec: 49689.1, 300 sec: 49207.7). Total num frames: 1895661568. Throughput: 0: 12387.4. Samples: 473983488. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:35,778][1648981] Avg episode reward: [(0, '973.220')] [2024-06-15 22:31:36,497][1651669] Updated weights for policy 0, policy_version 925633 (0.0123) [2024-06-15 22:31:37,569][1651669] Updated weights for policy 0, policy_version 925681 (0.0017) [2024-06-15 22:31:40,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 50790.4, 300 sec: 48985.4). Total num frames: 1895858176. Throughput: 0: 12458.7. Samples: 474017792. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:40,767][1648981] Avg episode reward: [(0, '921.270')] [2024-06-15 22:31:41,121][1651669] Updated weights for policy 0, policy_version 925728 (0.0014) [2024-06-15 22:31:41,796][1651669] Updated weights for policy 0, policy_version 925757 (0.0010) [2024-06-15 22:31:43,567][1651669] Updated weights for policy 0, policy_version 925818 (0.0016) [2024-06-15 22:31:45,766][1648981] Fps is (10 sec: 45925.3, 60 sec: 49152.0, 300 sec: 48985.4). Total num frames: 1896120320. Throughput: 0: 12686.2. Samples: 474100224. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:45,767][1648981] Avg episode reward: [(0, '924.110')] [2024-06-15 22:31:46,095][1651669] Updated weights for policy 0, policy_version 925859 (0.0012) [2024-06-15 22:31:47,986][1651669] Updated weights for policy 0, policy_version 925936 (0.0012) [2024-06-15 22:31:50,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 50244.3, 300 sec: 48985.4). Total num frames: 1896349696. Throughput: 0: 12515.6. Samples: 474165760. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:50,767][1648981] Avg episode reward: [(0, '916.680')] [2024-06-15 22:31:51,699][1651669] Updated weights for policy 0, policy_version 925968 (0.0029) [2024-06-15 22:31:54,185][1651669] Updated weights for policy 0, policy_version 926021 (0.0011) [2024-06-15 22:31:55,543][1651669] Updated weights for policy 0, policy_version 926078 (0.0012) [2024-06-15 22:31:55,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48059.8, 300 sec: 49207.5). Total num frames: 1896611840. Throughput: 0: 12458.7. Samples: 474202112. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:31:55,767][1648981] Avg episode reward: [(0, '921.850')] [2024-06-15 22:31:57,505][1651669] Updated weights for policy 0, policy_version 926137 (0.0017) [2024-06-15 22:31:58,564][1651669] Updated weights for policy 0, policy_version 926178 (0.0012) [2024-06-15 22:32:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 49429.7). Total num frames: 1896873984. Throughput: 0: 12208.4. Samples: 474270720. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:00,767][1648981] Avg episode reward: [(0, '908.660')] [2024-06-15 22:32:03,148][1651669] Updated weights for policy 0, policy_version 926233 (0.0073) [2024-06-15 22:32:04,603][1651669] Updated weights for policy 0, policy_version 926273 (0.0012) [2024-06-15 22:32:05,767][1648981] Fps is (10 sec: 49151.9, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 1897103360. Throughput: 0: 12117.3. Samples: 474345984. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:05,768][1648981] Avg episode reward: [(0, '912.380')] [2024-06-15 22:32:05,981][1651669] Updated weights for policy 0, policy_version 926336 (0.0018) [2024-06-15 22:32:08,929][1651669] Updated weights for policy 0, policy_version 926417 (0.0013) [2024-06-15 22:32:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.3, 300 sec: 49651.8). Total num frames: 1897398272. Throughput: 0: 12140.1. Samples: 474381824. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:10,767][1648981] Avg episode reward: [(0, '896.210')] [2024-06-15 22:32:13,798][1651669] Updated weights for policy 0, policy_version 926480 (0.0051) [2024-06-15 22:32:15,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 49429.7). Total num frames: 1897562112. Throughput: 0: 12185.6. Samples: 474461184. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:15,767][1648981] Avg episode reward: [(0, '881.380')] [2024-06-15 22:32:15,933][1651669] Updated weights for policy 0, policy_version 926560 (0.0126) [2024-06-15 22:32:18,268][1651274] Signal inference workers to stop experience collection... (48650 times) [2024-06-15 22:32:18,298][1651669] InferenceWorker_p0-w0: stopping experience collection (48650 times) [2024-06-15 22:32:18,496][1651274] Signal inference workers to resume experience collection... (48650 times) [2024-06-15 22:32:18,497][1651669] InferenceWorker_p0-w0: resuming experience collection (48650 times) [2024-06-15 22:32:18,643][1651669] Updated weights for policy 0, policy_version 926627 (0.0031) [2024-06-15 22:32:19,986][1651669] Updated weights for policy 0, policy_version 926704 (0.0011) [2024-06-15 22:32:20,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50244.3, 300 sec: 49762.9). Total num frames: 1897922560. Throughput: 0: 12040.6. Samples: 474525184. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:20,767][1648981] Avg episode reward: [(0, '899.760')] [2024-06-15 22:32:25,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 46967.5, 300 sec: 49096.5). Total num frames: 1897988096. Throughput: 0: 12185.6. Samples: 474566144. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:25,767][1648981] Avg episode reward: [(0, '940.990')] [2024-06-15 22:32:25,806][1651669] Updated weights for policy 0, policy_version 926753 (0.0012) [2024-06-15 22:32:27,608][1651669] Updated weights for policy 0, policy_version 926832 (0.0013) [2024-06-15 22:32:29,992][1651669] Updated weights for policy 0, policy_version 926896 (0.0023) [2024-06-15 22:32:30,766][1648981] Fps is (10 sec: 39321.7, 60 sec: 48605.9, 300 sec: 49207.6). Total num frames: 1898315776. Throughput: 0: 11901.2. Samples: 474635776. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:30,767][1648981] Avg episode reward: [(0, '925.360')] [2024-06-15 22:32:35,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 46429.7, 300 sec: 48874.3). Total num frames: 1898446848. Throughput: 0: 12060.4. Samples: 474708480. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:35,767][1648981] Avg episode reward: [(0, '914.120')] [2024-06-15 22:32:36,656][1651669] Updated weights for policy 0, policy_version 926977 (0.0013) [2024-06-15 22:32:38,100][1651669] Updated weights for policy 0, policy_version 927040 (0.0025) [2024-06-15 22:32:40,230][1651669] Updated weights for policy 0, policy_version 927107 (0.0014) [2024-06-15 22:32:40,766][1648981] Fps is (10 sec: 45874.5, 60 sec: 48605.8, 300 sec: 49096.4). Total num frames: 1898774528. Throughput: 0: 12026.3. Samples: 474743296. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:40,767][1648981] Avg episode reward: [(0, '902.670')] [2024-06-15 22:32:41,149][1651669] Updated weights for policy 0, policy_version 927156 (0.0039) [2024-06-15 22:32:42,717][1651669] Updated weights for policy 0, policy_version 927218 (0.0012) [2024-06-15 22:32:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 47513.6, 300 sec: 48874.3). Total num frames: 1898971136. Throughput: 0: 11969.4. Samples: 474809344. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:45,767][1648981] Avg episode reward: [(0, '942.230')] [2024-06-15 22:32:47,716][1651669] Updated weights for policy 0, policy_version 927252 (0.0078) [2024-06-15 22:32:49,160][1651669] Updated weights for policy 0, policy_version 927312 (0.0010) [2024-06-15 22:32:50,162][1651669] Updated weights for policy 0, policy_version 927360 (0.0026) [2024-06-15 22:32:50,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 48059.7, 300 sec: 48874.3). Total num frames: 1899233280. Throughput: 0: 12014.9. Samples: 474886656. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:50,767][1648981] Avg episode reward: [(0, '916.780')] [2024-06-15 22:32:52,992][1651669] Updated weights for policy 0, policy_version 927440 (0.0014) [2024-06-15 22:32:53,952][1651669] Updated weights for policy 0, policy_version 927488 (0.0014) [2024-06-15 22:32:55,767][1648981] Fps is (10 sec: 52427.3, 60 sec: 48059.5, 300 sec: 49096.4). Total num frames: 1899495424. Throughput: 0: 11832.8. Samples: 474914304. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:32:55,767][1648981] Avg episode reward: [(0, '911.090')] [2024-06-15 22:32:55,772][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000927488_1899495424.pth... [2024-06-15 22:32:55,829][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000921728_1887698944.pth [2024-06-15 22:32:59,133][1651274] Signal inference workers to stop experience collection... (48700 times) [2024-06-15 22:32:59,188][1651669] InferenceWorker_p0-w0: stopping experience collection (48700 times) [2024-06-15 22:32:59,445][1651274] Signal inference workers to resume experience collection... (48700 times) [2024-06-15 22:32:59,445][1651669] InferenceWorker_p0-w0: resuming experience collection (48700 times) [2024-06-15 22:32:59,614][1651669] Updated weights for policy 0, policy_version 927538 (0.0014) [2024-06-15 22:33:00,767][1648981] Fps is (10 sec: 45873.7, 60 sec: 46967.2, 300 sec: 48874.3). Total num frames: 1899692032. Throughput: 0: 11798.7. Samples: 474992128. Policy #0 lag: (min: 15.0, avg: 132.9, max: 271.0) [2024-06-15 22:33:00,767][1648981] Avg episode reward: [(0, '897.790')] [2024-06-15 22:33:01,185][1651669] Updated weights for policy 0, policy_version 927607 (0.0015) [2024-06-15 22:33:03,311][1651669] Updated weights for policy 0, policy_version 927657 (0.0012) [2024-06-15 22:33:05,264][1651669] Updated weights for policy 0, policy_version 927739 (0.0013) [2024-06-15 22:33:05,766][1648981] Fps is (10 sec: 52430.6, 60 sec: 48605.9, 300 sec: 49318.6). Total num frames: 1900019712. Throughput: 0: 11753.3. Samples: 475054080. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:05,767][1648981] Avg episode reward: [(0, '875.870')] [2024-06-15 22:33:10,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 45329.2, 300 sec: 48652.2). Total num frames: 1900118016. Throughput: 0: 11867.0. Samples: 475100160. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:10,767][1648981] Avg episode reward: [(0, '911.060')] [2024-06-15 22:33:11,144][1651669] Updated weights for policy 0, policy_version 927808 (0.0011) [2024-06-15 22:33:12,368][1651669] Updated weights for policy 0, policy_version 927872 (0.0010) [2024-06-15 22:33:15,561][1651669] Updated weights for policy 0, policy_version 927940 (0.0012) [2024-06-15 22:33:15,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 48059.7, 300 sec: 48985.4). Total num frames: 1900445696. Throughput: 0: 11787.4. Samples: 475166208. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:15,767][1648981] Avg episode reward: [(0, '974.250')] [2024-06-15 22:33:16,740][1651669] Updated weights for policy 0, policy_version 928000 (0.0099) [2024-06-15 22:33:20,766][1648981] Fps is (10 sec: 45874.9, 60 sec: 44236.8, 300 sec: 48541.1). Total num frames: 1900576768. Throughput: 0: 11946.7. Samples: 475246080. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:20,767][1648981] Avg episode reward: [(0, '972.210')] [2024-06-15 22:33:22,051][1651669] Updated weights for policy 0, policy_version 928067 (0.0040) [2024-06-15 22:33:24,383][1651669] Updated weights for policy 0, policy_version 928130 (0.0012) [2024-06-15 22:33:25,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49151.9, 300 sec: 48985.4). Total num frames: 1900937216. Throughput: 0: 11867.0. Samples: 475277312. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:25,767][1648981] Avg episode reward: [(0, '972.540')] [2024-06-15 22:33:26,299][1651669] Updated weights for policy 0, policy_version 928224 (0.0013) [2024-06-15 22:33:30,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 45875.2, 300 sec: 48652.1). Total num frames: 1901068288. Throughput: 0: 11992.2. Samples: 475348992. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:30,767][1648981] Avg episode reward: [(0, '996.440')] [2024-06-15 22:33:31,555][1651669] Updated weights for policy 0, policy_version 928275 (0.0012) [2024-06-15 22:33:32,805][1651669] Updated weights for policy 0, policy_version 928336 (0.0070) [2024-06-15 22:33:33,877][1651669] Updated weights for policy 0, policy_version 928381 (0.0013) [2024-06-15 22:33:35,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 48605.9, 300 sec: 48763.2). Total num frames: 1901363200. Throughput: 0: 11946.7. Samples: 475424256. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:35,767][1648981] Avg episode reward: [(0, '1009.490')] [2024-06-15 22:33:36,351][1651274] Signal inference workers to stop experience collection... (48750 times) [2024-06-15 22:33:36,401][1651669] InferenceWorker_p0-w0: stopping experience collection (48750 times) [2024-06-15 22:33:36,405][1651669] Updated weights for policy 0, policy_version 928435 (0.0132) [2024-06-15 22:33:36,565][1651274] Signal inference workers to resume experience collection... (48750 times) [2024-06-15 22:33:36,566][1651669] InferenceWorker_p0-w0: resuming experience collection (48750 times) [2024-06-15 22:33:37,874][1651669] Updated weights for policy 0, policy_version 928502 (0.0016) [2024-06-15 22:33:40,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 46967.6, 300 sec: 48652.5). Total num frames: 1901592576. Throughput: 0: 11992.3. Samples: 475453952. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:40,767][1648981] Avg episode reward: [(0, '1004.280')] [2024-06-15 22:33:43,083][1651669] Updated weights for policy 0, policy_version 928545 (0.0011) [2024-06-15 22:33:44,797][1651669] Updated weights for policy 0, policy_version 928625 (0.0011) [2024-06-15 22:33:45,787][1648981] Fps is (10 sec: 49049.3, 60 sec: 48043.0, 300 sec: 48870.8). Total num frames: 1901854720. Throughput: 0: 11884.3. Samples: 475527168. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:45,788][1648981] Avg episode reward: [(0, '982.730')] [2024-06-15 22:33:46,799][1651669] Updated weights for policy 0, policy_version 928674 (0.0013) [2024-06-15 22:33:47,908][1651669] Updated weights for policy 0, policy_version 928736 (0.0021) [2024-06-15 22:33:50,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.7, 300 sec: 48986.7). Total num frames: 1902116864. Throughput: 0: 12128.7. Samples: 475599872. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:50,767][1648981] Avg episode reward: [(0, '978.270')] [2024-06-15 22:33:53,951][1651669] Updated weights for policy 0, policy_version 928800 (0.0016) [2024-06-15 22:33:55,348][1651669] Updated weights for policy 0, policy_version 928864 (0.0009) [2024-06-15 22:33:55,766][1648981] Fps is (10 sec: 49254.4, 60 sec: 47513.8, 300 sec: 48763.2). Total num frames: 1902346240. Throughput: 0: 11992.1. Samples: 475639808. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:33:55,767][1648981] Avg episode reward: [(0, '1006.900')] [2024-06-15 22:33:56,105][1651669] Updated weights for policy 0, policy_version 928896 (0.0043) [2024-06-15 22:33:58,190][1651669] Updated weights for policy 0, policy_version 928956 (0.0013) [2024-06-15 22:34:00,284][1651669] Updated weights for policy 0, policy_version 929021 (0.0014) [2024-06-15 22:34:00,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.3, 300 sec: 49096.5). Total num frames: 1902641152. Throughput: 0: 11901.2. Samples: 475701760. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:00,767][1648981] Avg episode reward: [(0, '978.410')] [2024-06-15 22:34:05,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 45329.1, 300 sec: 48318.9). Total num frames: 1902739456. Throughput: 0: 11832.9. Samples: 475778560. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:05,767][1648981] Avg episode reward: [(0, '1034.510')] [2024-06-15 22:34:05,866][1651669] Updated weights for policy 0, policy_version 929088 (0.0011) [2024-06-15 22:34:08,334][1651669] Updated weights for policy 0, policy_version 929170 (0.0012) [2024-06-15 22:34:09,970][1651669] Updated weights for policy 0, policy_version 929235 (0.0013) [2024-06-15 22:34:10,794][1648981] Fps is (10 sec: 52282.2, 60 sec: 50766.6, 300 sec: 49313.9). Total num frames: 1903165440. Throughput: 0: 11927.9. Samples: 475814400. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:10,795][1648981] Avg episode reward: [(0, '983.630')] [2024-06-15 22:34:15,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 46421.5, 300 sec: 48541.1). Total num frames: 1903230976. Throughput: 0: 12140.1. Samples: 475895296. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:15,766][1648981] Avg episode reward: [(0, '969.170')] [2024-06-15 22:34:15,767][1651669] Updated weights for policy 0, policy_version 929312 (0.0133) [2024-06-15 22:34:17,704][1651274] Signal inference workers to stop experience collection... (48800 times) [2024-06-15 22:34:17,756][1651669] InferenceWorker_p0-w0: stopping experience collection (48800 times) [2024-06-15 22:34:17,775][1651669] Updated weights for policy 0, policy_version 929379 (0.0016) [2024-06-15 22:34:17,976][1651274] Signal inference workers to resume experience collection... (48800 times) [2024-06-15 22:34:17,983][1651669] InferenceWorker_p0-w0: resuming experience collection (48800 times) [2024-06-15 22:34:19,341][1651669] Updated weights for policy 0, policy_version 929456 (0.0131) [2024-06-15 22:34:20,766][1648981] Fps is (10 sec: 39432.1, 60 sec: 49698.1, 300 sec: 48874.3). Total num frames: 1903558656. Throughput: 0: 11844.3. Samples: 475957248. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:20,767][1648981] Avg episode reward: [(0, '987.620')] [2024-06-15 22:34:21,207][1651669] Updated weights for policy 0, policy_version 929489 (0.0012) [2024-06-15 22:34:22,192][1651669] Updated weights for policy 0, policy_version 929536 (0.0023) [2024-06-15 22:34:25,766][1648981] Fps is (10 sec: 45874.2, 60 sec: 45875.2, 300 sec: 48430.0). Total num frames: 1903689728. Throughput: 0: 11935.3. Samples: 475991040. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:25,767][1648981] Avg episode reward: [(0, '975.750')] [2024-06-15 22:34:27,738][1651669] Updated weights for policy 0, policy_version 929590 (0.0012) [2024-06-15 22:34:29,545][1651669] Updated weights for policy 0, policy_version 929664 (0.0012) [2024-06-15 22:34:30,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 49151.9, 300 sec: 48652.1). Total num frames: 1904017408. Throughput: 0: 11940.8. Samples: 476064256. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:30,767][1648981] Avg episode reward: [(0, '953.490')] [2024-06-15 22:34:32,213][1651669] Updated weights for policy 0, policy_version 929744 (0.0014) [2024-06-15 22:34:33,334][1651669] Updated weights for policy 0, policy_version 929792 (0.0012) [2024-06-15 22:34:35,766][1648981] Fps is (10 sec: 52429.4, 60 sec: 47513.6, 300 sec: 48430.0). Total num frames: 1904214016. Throughput: 0: 11980.8. Samples: 476139008. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:35,767][1648981] Avg episode reward: [(0, '965.710')] [2024-06-15 22:34:39,471][1651669] Updated weights for policy 0, policy_version 929874 (0.0013) [2024-06-15 22:34:40,766][1648981] Fps is (10 sec: 45876.2, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1904476160. Throughput: 0: 11901.2. Samples: 476175360. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:40,767][1648981] Avg episode reward: [(0, '955.490')] [2024-06-15 22:34:41,851][1651669] Updated weights for policy 0, policy_version 929980 (0.0122) [2024-06-15 22:34:44,251][1651669] Updated weights for policy 0, policy_version 930038 (0.0120) [2024-06-15 22:34:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 48076.5, 300 sec: 48430.0). Total num frames: 1904738304. Throughput: 0: 11901.2. Samples: 476237312. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:45,767][1648981] Avg episode reward: [(0, '951.420')] [2024-06-15 22:34:49,948][1651669] Updated weights for policy 0, policy_version 930096 (0.0014) [2024-06-15 22:34:50,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 1904902144. Throughput: 0: 11901.2. Samples: 476314112. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:50,767][1648981] Avg episode reward: [(0, '975.420')] [2024-06-15 22:34:51,668][1651669] Updated weights for policy 0, policy_version 930163 (0.0012) [2024-06-15 22:34:53,101][1651669] Updated weights for policy 0, policy_version 930239 (0.0152) [2024-06-15 22:34:55,510][1651669] Updated weights for policy 0, policy_version 930302 (0.0011) [2024-06-15 22:34:55,770][1648981] Fps is (10 sec: 52411.9, 60 sec: 48603.4, 300 sec: 48429.5). Total num frames: 1905262592. Throughput: 0: 11714.2. Samples: 476341248. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:34:55,772][1648981] Avg episode reward: [(0, '978.710')] [2024-06-15 22:34:55,778][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000930304_1905262592.pth... [2024-06-15 22:34:55,851][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000924672_1893728256.pth [2024-06-15 22:34:59,822][1651274] Signal inference workers to stop experience collection... (48850 times) [2024-06-15 22:34:59,886][1651669] InferenceWorker_p0-w0: stopping experience collection (48850 times) [2024-06-15 22:35:00,221][1651274] Signal inference workers to resume experience collection... (48850 times) [2024-06-15 22:35:00,221][1651669] InferenceWorker_p0-w0: resuming experience collection (48850 times) [2024-06-15 22:35:00,766][1648981] Fps is (10 sec: 42597.9, 60 sec: 44782.9, 300 sec: 47763.5). Total num frames: 1905328128. Throughput: 0: 11798.7. Samples: 476426240. Policy #0 lag: (min: 111.0, avg: 225.0, max: 367.0) [2024-06-15 22:35:00,767][1648981] Avg episode reward: [(0, '980.250')] [2024-06-15 22:35:01,482][1651669] Updated weights for policy 0, policy_version 930368 (0.0012) [2024-06-15 22:35:02,450][1651669] Updated weights for policy 0, policy_version 930416 (0.0012) [2024-06-15 22:35:03,971][1651669] Updated weights for policy 0, policy_version 930485 (0.0012) [2024-06-15 22:35:05,766][1648981] Fps is (10 sec: 39334.2, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1905655808. Throughput: 0: 11832.9. Samples: 476489728. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:05,767][1648981] Avg episode reward: [(0, '963.770')] [2024-06-15 22:35:06,406][1651669] Updated weights for policy 0, policy_version 930515 (0.0014) [2024-06-15 22:35:10,565][1651669] Updated weights for policy 0, policy_version 930576 (0.0013) [2024-06-15 22:35:10,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 44257.5, 300 sec: 47763.5). Total num frames: 1905819648. Throughput: 0: 11958.1. Samples: 476529152. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:10,767][1648981] Avg episode reward: [(0, '974.430')] [2024-06-15 22:35:12,738][1651669] Updated weights for policy 0, policy_version 930656 (0.0188) [2024-06-15 22:35:14,264][1651669] Updated weights for policy 0, policy_version 930725 (0.0021) [2024-06-15 22:35:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49151.9, 300 sec: 48318.9). Total num frames: 1906180096. Throughput: 0: 11719.2. Samples: 476591616. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:15,767][1648981] Avg episode reward: [(0, '994.870')] [2024-06-15 22:35:18,131][1651669] Updated weights for policy 0, policy_version 930804 (0.0088) [2024-06-15 22:35:20,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 45875.3, 300 sec: 47874.6). Total num frames: 1906311168. Throughput: 0: 12003.6. Samples: 476679168. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:20,767][1648981] Avg episode reward: [(0, '967.250')] [2024-06-15 22:35:21,658][1651669] Updated weights for policy 0, policy_version 930864 (0.0012) [2024-06-15 22:35:23,486][1651669] Updated weights for policy 0, policy_version 930914 (0.0012) [2024-06-15 22:35:25,373][1651669] Updated weights for policy 0, policy_version 930992 (0.0013) [2024-06-15 22:35:25,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50244.3, 300 sec: 48763.2). Total num frames: 1906704384. Throughput: 0: 11821.5. Samples: 476707328. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:25,767][1648981] Avg episode reward: [(0, '939.170')] [2024-06-15 22:35:29,469][1651669] Updated weights for policy 0, policy_version 931066 (0.0025) [2024-06-15 22:35:30,766][1648981] Fps is (10 sec: 52428.1, 60 sec: 46967.6, 300 sec: 47985.7). Total num frames: 1906835456. Throughput: 0: 11946.7. Samples: 476774912. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:30,767][1648981] Avg episode reward: [(0, '941.260')] [2024-06-15 22:35:32,364][1651669] Updated weights for policy 0, policy_version 931120 (0.0161) [2024-06-15 22:35:33,961][1651669] Updated weights for policy 0, policy_version 931168 (0.0010) [2024-06-15 22:35:35,271][1651669] Updated weights for policy 0, policy_version 931216 (0.0011) [2024-06-15 22:35:35,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 49152.0, 300 sec: 48652.2). Total num frames: 1907163136. Throughput: 0: 11855.6. Samples: 476847616. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:35,767][1648981] Avg episode reward: [(0, '924.410')] [2024-06-15 22:35:35,820][1651274] Signal inference workers to stop experience collection... (48900 times) [2024-06-15 22:35:35,860][1651669] InferenceWorker_p0-w0: stopping experience collection (48900 times) [2024-06-15 22:35:36,069][1651274] Signal inference workers to resume experience collection... (48900 times) [2024-06-15 22:35:36,071][1651669] InferenceWorker_p0-w0: resuming experience collection (48900 times) [2024-06-15 22:35:39,440][1651669] Updated weights for policy 0, policy_version 931267 (0.0019) [2024-06-15 22:35:40,586][1651669] Updated weights for policy 0, policy_version 931323 (0.0012) [2024-06-15 22:35:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 1907359744. Throughput: 0: 12038.5. Samples: 476882944. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:40,767][1648981] Avg episode reward: [(0, '940.230')] [2024-06-15 22:35:42,855][1651669] Updated weights for policy 0, policy_version 931362 (0.0010) [2024-06-15 22:35:44,050][1651669] Updated weights for policy 0, policy_version 931408 (0.0019) [2024-06-15 22:35:45,738][1651669] Updated weights for policy 0, policy_version 931472 (0.0139) [2024-06-15 22:35:45,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1907654656. Throughput: 0: 12014.9. Samples: 476966912. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:45,767][1648981] Avg episode reward: [(0, '943.310')] [2024-06-15 22:35:46,878][1651669] Updated weights for policy 0, policy_version 931520 (0.0129) [2024-06-15 22:35:50,661][1651669] Updated weights for policy 0, policy_version 931577 (0.0012) [2024-06-15 22:35:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49151.9, 300 sec: 47874.6). Total num frames: 1907851264. Throughput: 0: 12049.1. Samples: 477031936. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:50,767][1648981] Avg episode reward: [(0, '954.030')] [2024-06-15 22:35:53,604][1651669] Updated weights for policy 0, policy_version 931622 (0.0082) [2024-06-15 22:35:55,231][1651669] Updated weights for policy 0, policy_version 931650 (0.0013) [2024-06-15 22:35:55,767][1648981] Fps is (10 sec: 42596.8, 60 sec: 46969.7, 300 sec: 48207.8). Total num frames: 1908080640. Throughput: 0: 12060.4. Samples: 477071872. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:35:55,767][1648981] Avg episode reward: [(0, '978.840')] [2024-06-15 22:35:56,085][1651669] Updated weights for policy 0, policy_version 931696 (0.0012) [2024-06-15 22:35:57,631][1651669] Updated weights for policy 0, policy_version 931752 (0.0024) [2024-06-15 22:36:00,755][1651669] Updated weights for policy 0, policy_version 931798 (0.0089) [2024-06-15 22:36:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49698.1, 300 sec: 47985.7). Total num frames: 1908310016. Throughput: 0: 12356.3. Samples: 477147648. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:00,767][1648981] Avg episode reward: [(0, '947.220')] [2024-06-15 22:36:03,830][1651669] Updated weights for policy 0, policy_version 931858 (0.0011) [2024-06-15 22:36:05,766][1648981] Fps is (10 sec: 49153.6, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1908572160. Throughput: 0: 12140.0. Samples: 477225472. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:05,767][1648981] Avg episode reward: [(0, '948.730')] [2024-06-15 22:36:05,823][1651669] Updated weights for policy 0, policy_version 931921 (0.0014) [2024-06-15 22:36:07,472][1651669] Updated weights for policy 0, policy_version 932000 (0.0013) [2024-06-15 22:36:10,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.1, 300 sec: 48096.8). Total num frames: 1908801536. Throughput: 0: 12140.1. Samples: 477253632. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:10,767][1648981] Avg episode reward: [(0, '966.330')] [2024-06-15 22:36:11,599][1651669] Updated weights for policy 0, policy_version 932038 (0.0012) [2024-06-15 22:36:12,812][1651669] Updated weights for policy 0, policy_version 932096 (0.0011) [2024-06-15 22:36:15,750][1651669] Updated weights for policy 0, policy_version 932154 (0.0011) [2024-06-15 22:36:15,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1909030912. Throughput: 0: 12333.5. Samples: 477329920. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:15,767][1648981] Avg episode reward: [(0, '959.630')] [2024-06-15 22:36:17,459][1651669] Updated weights for policy 0, policy_version 932208 (0.0011) [2024-06-15 22:36:17,892][1651274] Signal inference workers to stop experience collection... (48950 times) [2024-06-15 22:36:17,958][1651669] InferenceWorker_p0-w0: stopping experience collection (48950 times) [2024-06-15 22:36:18,107][1651274] Signal inference workers to resume experience collection... (48950 times) [2024-06-15 22:36:18,108][1651669] InferenceWorker_p0-w0: resuming experience collection (48950 times) [2024-06-15 22:36:18,869][1651669] Updated weights for policy 0, policy_version 932272 (0.0012) [2024-06-15 22:36:20,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 50244.2, 300 sec: 47985.7). Total num frames: 1909325824. Throughput: 0: 12356.3. Samples: 477403648. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:20,767][1648981] Avg episode reward: [(0, '955.190')] [2024-06-15 22:36:22,210][1651669] Updated weights for policy 0, policy_version 932291 (0.0022) [2024-06-15 22:36:25,073][1651669] Updated weights for policy 0, policy_version 932359 (0.0014) [2024-06-15 22:36:25,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 47874.6). Total num frames: 1909522432. Throughput: 0: 12322.1. Samples: 477437440. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:25,767][1648981] Avg episode reward: [(0, '956.830')] [2024-06-15 22:36:26,274][1651669] Updated weights for policy 0, policy_version 932408 (0.0025) [2024-06-15 22:36:28,036][1651669] Updated weights for policy 0, policy_version 932464 (0.0013) [2024-06-15 22:36:29,674][1651669] Updated weights for policy 0, policy_version 932537 (0.0012) [2024-06-15 22:36:30,780][1648981] Fps is (10 sec: 52359.0, 60 sec: 50233.1, 300 sec: 48096.4). Total num frames: 1909850112. Throughput: 0: 11988.6. Samples: 477506560. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:30,780][1648981] Avg episode reward: [(0, '950.510')] [2024-06-15 22:36:34,278][1651669] Updated weights for policy 0, policy_version 932606 (0.0119) [2024-06-15 22:36:35,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 48096.8). Total num frames: 1910046720. Throughput: 0: 12367.6. Samples: 477588480. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:35,767][1648981] Avg episode reward: [(0, '938.780')] [2024-06-15 22:36:36,140][1651669] Updated weights for policy 0, policy_version 932664 (0.0014) [2024-06-15 22:36:37,779][1651669] Updated weights for policy 0, policy_version 932704 (0.0013) [2024-06-15 22:36:39,385][1651669] Updated weights for policy 0, policy_version 932769 (0.0098) [2024-06-15 22:36:40,766][1648981] Fps is (10 sec: 52498.8, 60 sec: 50244.3, 300 sec: 48318.9). Total num frames: 1910374400. Throughput: 0: 12254.0. Samples: 477623296. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:40,767][1648981] Avg episode reward: [(0, '920.480')] [2024-06-15 22:36:44,725][1651669] Updated weights for policy 0, policy_version 932833 (0.0011) [2024-06-15 22:36:45,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 47513.6, 300 sec: 47985.7). Total num frames: 1910505472. Throughput: 0: 12333.5. Samples: 477702656. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:45,767][1648981] Avg episode reward: [(0, '894.360')] [2024-06-15 22:36:46,610][1651669] Updated weights for policy 0, policy_version 932880 (0.0012) [2024-06-15 22:36:47,985][1651669] Updated weights for policy 0, policy_version 932932 (0.0012) [2024-06-15 22:36:49,330][1651669] Updated weights for policy 0, policy_version 932992 (0.0011) [2024-06-15 22:36:50,674][1651669] Updated weights for policy 0, policy_version 933045 (0.0012) [2024-06-15 22:36:50,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 50244.3, 300 sec: 48318.9). Total num frames: 1910865920. Throughput: 0: 12014.9. Samples: 477766144. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:50,767][1648981] Avg episode reward: [(0, '883.290')] [2024-06-15 22:36:55,301][1651669] Updated weights for policy 0, policy_version 933088 (0.0140) [2024-06-15 22:36:55,789][1648981] Fps is (10 sec: 49042.9, 60 sec: 48588.2, 300 sec: 47871.0). Total num frames: 1910996992. Throughput: 0: 12395.7. Samples: 477811712. Policy #0 lag: (min: 47.0, avg: 112.6, max: 303.0) [2024-06-15 22:36:55,789][1648981] Avg episode reward: [(0, '846.320')] [2024-06-15 22:36:55,976][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000933120_1911029760.pth... [2024-06-15 22:36:56,067][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000927488_1899495424.pth [2024-06-15 22:36:57,667][1651669] Updated weights for policy 0, policy_version 933136 (0.0014) [2024-06-15 22:36:59,119][1651274] Signal inference workers to stop experience collection... (49000 times) [2024-06-15 22:36:59,179][1651669] InferenceWorker_p0-w0: stopping experience collection (49000 times) [2024-06-15 22:36:59,403][1651274] Signal inference workers to resume experience collection... (49000 times) [2024-06-15 22:36:59,404][1651669] InferenceWorker_p0-w0: resuming experience collection (49000 times) [2024-06-15 22:36:59,405][1651669] Updated weights for policy 0, policy_version 933200 (0.0011) [2024-06-15 22:37:00,738][1651669] Updated weights for policy 0, policy_version 933264 (0.0011) [2024-06-15 22:37:00,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 50244.4, 300 sec: 48207.8). Total num frames: 1911324672. Throughput: 0: 12185.6. Samples: 477878272. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:00,767][1648981] Avg episode reward: [(0, '829.580')] [2024-06-15 22:37:05,200][1651669] Updated weights for policy 0, policy_version 933316 (0.0021) [2024-06-15 22:37:05,766][1648981] Fps is (10 sec: 49261.5, 60 sec: 48605.9, 300 sec: 47763.5). Total num frames: 1911488512. Throughput: 0: 12231.1. Samples: 477954048. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:05,767][1648981] Avg episode reward: [(0, '855.180')] [2024-06-15 22:37:06,116][1651669] Updated weights for policy 0, policy_version 933368 (0.0014) [2024-06-15 22:37:09,062][1651669] Updated weights for policy 0, policy_version 933411 (0.0011) [2024-06-15 22:37:10,766][1648981] Fps is (10 sec: 42598.0, 60 sec: 49152.0, 300 sec: 48096.8). Total num frames: 1911750656. Throughput: 0: 12322.1. Samples: 477991936. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:10,767][1648981] Avg episode reward: [(0, '928.450')] [2024-06-15 22:37:10,775][1651669] Updated weights for policy 0, policy_version 933474 (0.0012) [2024-06-15 22:37:12,441][1651669] Updated weights for policy 0, policy_version 933539 (0.0045) [2024-06-15 22:37:15,775][1648981] Fps is (10 sec: 45836.6, 60 sec: 48599.1, 300 sec: 47540.0). Total num frames: 1911947264. Throughput: 0: 12141.4. Samples: 478052864. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:15,776][1648981] Avg episode reward: [(0, '934.460')] [2024-06-15 22:37:16,817][1651669] Updated weights for policy 0, policy_version 933588 (0.0011) [2024-06-15 22:37:19,326][1651669] Updated weights for policy 0, policy_version 933633 (0.0014) [2024-06-15 22:37:20,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48059.7, 300 sec: 48207.8). Total num frames: 1912209408. Throughput: 0: 12049.1. Samples: 478130688. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:20,767][1648981] Avg episode reward: [(0, '952.510')] [2024-06-15 22:37:21,276][1651669] Updated weights for policy 0, policy_version 933716 (0.0012) [2024-06-15 22:37:22,528][1651669] Updated weights for policy 0, policy_version 933763 (0.0012) [2024-06-15 22:37:23,973][1651669] Updated weights for policy 0, policy_version 933824 (0.0011) [2024-06-15 22:37:25,770][1648981] Fps is (10 sec: 52452.9, 60 sec: 49148.9, 300 sec: 47985.1). Total num frames: 1912471552. Throughput: 0: 11911.5. Samples: 478159360. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:25,771][1648981] Avg episode reward: [(0, '991.490')] [2024-06-15 22:37:28,177][1651669] Updated weights for policy 0, policy_version 933882 (0.0019) [2024-06-15 22:37:30,477][1651669] Updated weights for policy 0, policy_version 933920 (0.0015) [2024-06-15 22:37:30,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 46977.9, 300 sec: 48207.8). Total num frames: 1912668160. Throughput: 0: 11980.8. Samples: 478241792. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:30,767][1648981] Avg episode reward: [(0, '952.580')] [2024-06-15 22:37:32,389][1651669] Updated weights for policy 0, policy_version 934007 (0.0013) [2024-06-15 22:37:34,478][1651669] Updated weights for policy 0, policy_version 934052 (0.0017) [2024-06-15 22:37:35,766][1648981] Fps is (10 sec: 52448.6, 60 sec: 49152.0, 300 sec: 48207.9). Total num frames: 1912995840. Throughput: 0: 12083.2. Samples: 478309888. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:35,767][1648981] Avg episode reward: [(0, '951.040')] [2024-06-15 22:37:37,571][1651669] Updated weights for policy 0, policy_version 934082 (0.0013) [2024-06-15 22:37:38,851][1651669] Updated weights for policy 0, policy_version 934139 (0.0012) [2024-06-15 22:37:40,708][1651274] Signal inference workers to stop experience collection... (49050 times) [2024-06-15 22:37:40,736][1651669] InferenceWorker_p0-w0: stopping experience collection (49050 times) [2024-06-15 22:37:40,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1913126912. Throughput: 0: 11986.7. Samples: 478350848. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:40,767][1648981] Avg episode reward: [(0, '934.340')] [2024-06-15 22:37:41,028][1651274] Signal inference workers to resume experience collection... (49050 times) [2024-06-15 22:37:41,028][1651669] InferenceWorker_p0-w0: resuming experience collection (49050 times) [2024-06-15 22:37:42,040][1651669] Updated weights for policy 0, policy_version 934205 (0.0124) [2024-06-15 22:37:43,235][1651669] Updated weights for policy 0, policy_version 934249 (0.0013) [2024-06-15 22:37:44,603][1651669] Updated weights for policy 0, policy_version 934289 (0.0012) [2024-06-15 22:37:45,766][1648981] Fps is (10 sec: 52429.7, 60 sec: 50244.4, 300 sec: 48430.0). Total num frames: 1913520128. Throughput: 0: 12003.6. Samples: 478418432. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:45,767][1648981] Avg episode reward: [(0, '936.800')] [2024-06-15 22:37:45,785][1651669] Updated weights for policy 0, policy_version 934336 (0.0011) [2024-06-15 22:37:49,200][1651669] Updated weights for policy 0, policy_version 934384 (0.0011) [2024-06-15 22:37:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 46421.3, 300 sec: 47985.7). Total num frames: 1913651200. Throughput: 0: 12128.7. Samples: 478499840. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:50,767][1648981] Avg episode reward: [(0, '943.020')] [2024-06-15 22:37:52,314][1651669] Updated weights for policy 0, policy_version 934448 (0.0014) [2024-06-15 22:37:53,586][1651669] Updated weights for policy 0, policy_version 934486 (0.0014) [2024-06-15 22:37:55,429][1651669] Updated weights for policy 0, policy_version 934560 (0.0145) [2024-06-15 22:37:55,767][1648981] Fps is (10 sec: 45874.0, 60 sec: 49716.4, 300 sec: 48430.0). Total num frames: 1913978880. Throughput: 0: 11992.2. Samples: 478531584. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:37:55,767][1648981] Avg episode reward: [(0, '964.720')] [2024-06-15 22:37:59,605][1651669] Updated weights for policy 0, policy_version 934609 (0.0017) [2024-06-15 22:38:00,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 47513.5, 300 sec: 47985.7). Total num frames: 1914175488. Throughput: 0: 12358.6. Samples: 478608896. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:00,767][1648981] Avg episode reward: [(0, '1015.000')] [2024-06-15 22:38:02,204][1651669] Updated weights for policy 0, policy_version 934660 (0.0023) [2024-06-15 22:38:03,306][1651669] Updated weights for policy 0, policy_version 934710 (0.0010) [2024-06-15 22:38:04,863][1651669] Updated weights for policy 0, policy_version 934755 (0.0012) [2024-06-15 22:38:05,782][1648981] Fps is (10 sec: 49074.8, 60 sec: 49685.0, 300 sec: 48649.5). Total num frames: 1914470400. Throughput: 0: 12260.9. Samples: 478682624. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:05,783][1648981] Avg episode reward: [(0, '1015.530')] [2024-06-15 22:38:06,127][1651669] Updated weights for policy 0, policy_version 934816 (0.0027) [2024-06-15 22:38:09,394][1651669] Updated weights for policy 0, policy_version 934851 (0.0028) [2024-06-15 22:38:10,770][1648981] Fps is (10 sec: 49133.4, 60 sec: 48602.8, 300 sec: 48207.2). Total num frames: 1914667008. Throughput: 0: 12492.8. Samples: 478721536. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:10,771][1648981] Avg episode reward: [(0, '979.260')] [2024-06-15 22:38:10,807][1651669] Updated weights for policy 0, policy_version 934904 (0.0013) [2024-06-15 22:38:13,313][1651669] Updated weights for policy 0, policy_version 934972 (0.0013) [2024-06-15 22:38:15,204][1651669] Updated weights for policy 0, policy_version 935024 (0.0038) [2024-06-15 22:38:15,766][1648981] Fps is (10 sec: 49231.0, 60 sec: 50251.5, 300 sec: 48763.3). Total num frames: 1914961920. Throughput: 0: 12344.9. Samples: 478797312. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:15,767][1648981] Avg episode reward: [(0, '952.220')] [2024-06-15 22:38:17,081][1651669] Updated weights for policy 0, policy_version 935101 (0.0012) [2024-06-15 22:38:20,766][1648981] Fps is (10 sec: 42614.5, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1915092992. Throughput: 0: 12390.4. Samples: 478867456. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:20,767][1648981] Avg episode reward: [(0, '947.540')] [2024-06-15 22:38:22,066][1651669] Updated weights for policy 0, policy_version 935161 (0.0012) [2024-06-15 22:38:23,363][1651274] Signal inference workers to stop experience collection... (49100 times) [2024-06-15 22:38:23,397][1651669] InferenceWorker_p0-w0: stopping experience collection (49100 times) [2024-06-15 22:38:23,597][1651274] Signal inference workers to resume experience collection... (49100 times) [2024-06-15 22:38:23,598][1651669] InferenceWorker_p0-w0: resuming experience collection (49100 times) [2024-06-15 22:38:24,233][1651669] Updated weights for policy 0, policy_version 935203 (0.0012) [2024-06-15 22:38:25,579][1651669] Updated weights for policy 0, policy_version 935233 (0.0015) [2024-06-15 22:38:25,766][1648981] Fps is (10 sec: 39321.0, 60 sec: 48062.8, 300 sec: 48430.0). Total num frames: 1915355136. Throughput: 0: 12276.6. Samples: 478903296. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:25,767][1648981] Avg episode reward: [(0, '965.540')] [2024-06-15 22:38:26,877][1651669] Updated weights for policy 0, policy_version 935296 (0.0012) [2024-06-15 22:38:28,103][1651669] Updated weights for policy 0, policy_version 935349 (0.0014) [2024-06-15 22:38:30,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49151.9, 300 sec: 48318.9). Total num frames: 1915617280. Throughput: 0: 12356.2. Samples: 478974464. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:30,767][1648981] Avg episode reward: [(0, '1026.460')] [2024-06-15 22:38:31,898][1651669] Updated weights for policy 0, policy_version 935392 (0.0028) [2024-06-15 22:38:34,475][1651669] Updated weights for policy 0, policy_version 935460 (0.0011) [2024-06-15 22:38:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1915879424. Throughput: 0: 12310.8. Samples: 479053824. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:35,767][1648981] Avg episode reward: [(0, '1043.070')] [2024-06-15 22:38:36,497][1651669] Updated weights for policy 0, policy_version 935506 (0.0011) [2024-06-15 22:38:38,024][1651669] Updated weights for policy 0, policy_version 935572 (0.0087) [2024-06-15 22:38:40,767][1648981] Fps is (10 sec: 52428.5, 60 sec: 50244.1, 300 sec: 48433.4). Total num frames: 1916141568. Throughput: 0: 12231.1. Samples: 479081984. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:40,767][1648981] Avg episode reward: [(0, '1057.480')] [2024-06-15 22:38:42,203][1651669] Updated weights for policy 0, policy_version 935636 (0.0012) [2024-06-15 22:38:44,284][1651669] Updated weights for policy 0, policy_version 935696 (0.0015) [2024-06-15 22:38:45,158][1651669] Updated weights for policy 0, policy_version 935741 (0.0012) [2024-06-15 22:38:45,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1916403712. Throughput: 0: 12322.2. Samples: 479163392. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:45,767][1648981] Avg episode reward: [(0, '1089.540')] [2024-06-15 22:38:47,448][1651669] Updated weights for policy 0, policy_version 935796 (0.0022) [2024-06-15 22:38:49,046][1651669] Updated weights for policy 0, policy_version 935872 (0.0129) [2024-06-15 22:38:50,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 1916665856. Throughput: 0: 12189.9. Samples: 479230976. Policy #0 lag: (min: 12.0, avg: 105.6, max: 268.0) [2024-06-15 22:38:50,767][1648981] Avg episode reward: [(0, '1128.870')] [2024-06-15 22:38:53,709][1651669] Updated weights for policy 0, policy_version 935928 (0.0048) [2024-06-15 22:38:55,767][1648981] Fps is (10 sec: 45873.4, 60 sec: 48059.6, 300 sec: 48207.8). Total num frames: 1916862464. Throughput: 0: 12220.7. Samples: 479271424. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:38:55,767][1648981] Avg episode reward: [(0, '1143.440')] [2024-06-15 22:38:56,043][1651669] Updated weights for policy 0, policy_version 935994 (0.0058) [2024-06-15 22:38:56,103][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000936000_1916928000.pth... [2024-06-15 22:38:56,150][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000930304_1905262592.pth [2024-06-15 22:38:58,382][1651669] Updated weights for policy 0, policy_version 936033 (0.0013) [2024-06-15 22:39:00,649][1651669] Updated weights for policy 0, policy_version 936112 (0.0011) [2024-06-15 22:39:00,786][1648981] Fps is (10 sec: 49054.9, 60 sec: 49681.8, 300 sec: 48871.0). Total num frames: 1917157376. Throughput: 0: 12021.0. Samples: 479338496. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:00,787][1648981] Avg episode reward: [(0, '1128.720')] [2024-06-15 22:39:03,849][1651274] Signal inference workers to stop experience collection... (49150 times) [2024-06-15 22:39:03,924][1651669] InferenceWorker_p0-w0: stopping experience collection (49150 times) [2024-06-15 22:39:04,148][1651274] Signal inference workers to resume experience collection... (49150 times) [2024-06-15 22:39:04,170][1651669] InferenceWorker_p0-w0: resuming experience collection (49150 times) [2024-06-15 22:39:04,323][1651669] Updated weights for policy 0, policy_version 936161 (0.0013) [2024-06-15 22:39:05,766][1648981] Fps is (10 sec: 45876.7, 60 sec: 47526.2, 300 sec: 47990.2). Total num frames: 1917321216. Throughput: 0: 11958.1. Samples: 479405568. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:05,767][1648981] Avg episode reward: [(0, '1081.790')] [2024-06-15 22:39:07,512][1651669] Updated weights for policy 0, policy_version 936214 (0.0014) [2024-06-15 22:39:10,006][1651669] Updated weights for policy 0, policy_version 936288 (0.0150) [2024-06-15 22:39:10,766][1648981] Fps is (10 sec: 39399.3, 60 sec: 48062.7, 300 sec: 48541.0). Total num frames: 1917550592. Throughput: 0: 12003.5. Samples: 479443456. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:10,767][1648981] Avg episode reward: [(0, '1068.160')] [2024-06-15 22:39:12,557][1651669] Updated weights for policy 0, policy_version 936383 (0.0019) [2024-06-15 22:39:15,788][1648981] Fps is (10 sec: 45777.3, 60 sec: 46950.6, 300 sec: 48204.3). Total num frames: 1917779968. Throughput: 0: 11736.3. Samples: 479502848. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:15,789][1648981] Avg episode reward: [(0, '1066.190')] [2024-06-15 22:39:16,340][1651669] Updated weights for policy 0, policy_version 936444 (0.0014) [2024-06-15 22:39:19,433][1651669] Updated weights for policy 0, policy_version 936511 (0.0013) [2024-06-15 22:39:20,770][1648981] Fps is (10 sec: 42582.8, 60 sec: 48056.8, 300 sec: 48429.4). Total num frames: 1917976576. Throughput: 0: 11740.9. Samples: 479582208. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:20,771][1648981] Avg episode reward: [(0, '1023.800')] [2024-06-15 22:39:22,074][1651669] Updated weights for policy 0, policy_version 936576 (0.0012) [2024-06-15 22:39:23,889][1651669] Updated weights for policy 0, policy_version 936639 (0.0164) [2024-06-15 22:39:25,766][1648981] Fps is (10 sec: 45973.1, 60 sec: 48059.7, 300 sec: 48207.9). Total num frames: 1918238720. Throughput: 0: 11525.7. Samples: 479600640. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:25,767][1648981] Avg episode reward: [(0, '1005.740')] [2024-06-15 22:39:27,867][1651669] Updated weights for policy 0, policy_version 936704 (0.0012) [2024-06-15 22:39:30,315][1651669] Updated weights for policy 0, policy_version 936762 (0.0012) [2024-06-15 22:39:30,766][1648981] Fps is (10 sec: 52448.2, 60 sec: 48059.8, 300 sec: 48430.0). Total num frames: 1918500864. Throughput: 0: 11548.4. Samples: 479683072. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:30,767][1648981] Avg episode reward: [(0, '1009.940')] [2024-06-15 22:39:32,537][1651669] Updated weights for policy 0, policy_version 936822 (0.0012) [2024-06-15 22:39:34,202][1651669] Updated weights for policy 0, policy_version 936880 (0.0028) [2024-06-15 22:39:35,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1918763008. Throughput: 0: 11525.7. Samples: 479749632. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:35,767][1648981] Avg episode reward: [(0, '1044.920')] [2024-06-15 22:39:38,276][1651669] Updated weights for policy 0, policy_version 936906 (0.0055) [2024-06-15 22:39:39,233][1651669] Updated weights for policy 0, policy_version 936956 (0.0012) [2024-06-15 22:39:40,767][1648981] Fps is (10 sec: 45875.0, 60 sec: 46967.5, 300 sec: 48207.8). Total num frames: 1918959616. Throughput: 0: 11582.6. Samples: 479792640. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:40,767][1648981] Avg episode reward: [(0, '1024.300')] [2024-06-15 22:39:41,001][1651669] Updated weights for policy 0, policy_version 937008 (0.0012) [2024-06-15 22:39:43,770][1651669] Updated weights for policy 0, policy_version 937088 (0.0132) [2024-06-15 22:39:45,286][1651669] Updated weights for policy 0, policy_version 937148 (0.0012) [2024-06-15 22:39:45,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 48059.6, 300 sec: 48763.2). Total num frames: 1919287296. Throughput: 0: 11587.7. Samples: 479859712. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:45,767][1648981] Avg episode reward: [(0, '969.110')] [2024-06-15 22:39:48,913][1651274] Signal inference workers to stop experience collection... (49200 times) [2024-06-15 22:39:48,976][1651669] InferenceWorker_p0-w0: stopping experience collection (49200 times) [2024-06-15 22:39:49,104][1651274] Signal inference workers to resume experience collection... (49200 times) [2024-06-15 22:39:49,105][1651669] InferenceWorker_p0-w0: resuming experience collection (49200 times) [2024-06-15 22:39:49,674][1651669] Updated weights for policy 0, policy_version 937200 (0.0013) [2024-06-15 22:39:50,766][1648981] Fps is (10 sec: 45875.9, 60 sec: 45875.2, 300 sec: 47986.2). Total num frames: 1919418368. Throughput: 0: 11878.4. Samples: 479940096. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:50,767][1648981] Avg episode reward: [(0, '1031.590')] [2024-06-15 22:39:51,441][1651669] Updated weights for policy 0, policy_version 937264 (0.0012) [2024-06-15 22:39:53,852][1651669] Updated weights for policy 0, policy_version 937313 (0.0011) [2024-06-15 22:39:55,626][1651669] Updated weights for policy 0, policy_version 937377 (0.0012) [2024-06-15 22:39:55,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48059.9, 300 sec: 48874.3). Total num frames: 1919746048. Throughput: 0: 11935.3. Samples: 479980544. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:39:55,767][1648981] Avg episode reward: [(0, '1052.690')] [2024-06-15 22:39:59,904][1651669] Updated weights for policy 0, policy_version 937428 (0.0011) [2024-06-15 22:40:00,536][1651669] Updated weights for policy 0, policy_version 937472 (0.0013) [2024-06-15 22:40:00,770][1648981] Fps is (10 sec: 52408.5, 60 sec: 46433.7, 300 sec: 48429.4). Total num frames: 1919942656. Throughput: 0: 12201.7. Samples: 480051712. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:00,771][1648981] Avg episode reward: [(0, '1049.750')] [2024-06-15 22:40:04,189][1651669] Updated weights for policy 0, policy_version 937552 (0.0012) [2024-06-15 22:40:05,767][1648981] Fps is (10 sec: 49150.8, 60 sec: 48605.6, 300 sec: 48874.3). Total num frames: 1920237568. Throughput: 0: 12084.1. Samples: 480125952. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:05,767][1648981] Avg episode reward: [(0, '1012.320')] [2024-06-15 22:40:06,083][1651669] Updated weights for policy 0, policy_version 937632 (0.0093) [2024-06-15 22:40:06,758][1651669] Updated weights for policy 0, policy_version 937662 (0.0011) [2024-06-15 22:40:10,766][1648981] Fps is (10 sec: 49171.0, 60 sec: 48059.8, 300 sec: 48318.9). Total num frames: 1920434176. Throughput: 0: 12515.6. Samples: 480163840. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:10,767][1648981] Avg episode reward: [(0, '1019.420')] [2024-06-15 22:40:10,984][1651669] Updated weights for policy 0, policy_version 937725 (0.0014) [2024-06-15 22:40:12,527][1651669] Updated weights for policy 0, policy_version 937785 (0.0011) [2024-06-15 22:40:15,766][1648981] Fps is (10 sec: 42599.9, 60 sec: 48076.9, 300 sec: 48652.1). Total num frames: 1920663552. Throughput: 0: 12390.4. Samples: 480240640. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:15,767][1648981] Avg episode reward: [(0, '1017.230')] [2024-06-15 22:40:16,793][1651669] Updated weights for policy 0, policy_version 937873 (0.0013) [2024-06-15 22:40:17,691][1651669] Updated weights for policy 0, policy_version 937920 (0.0015) [2024-06-15 22:40:20,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 48062.7, 300 sec: 47985.7). Total num frames: 1920860160. Throughput: 0: 12470.0. Samples: 480310784. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:20,767][1648981] Avg episode reward: [(0, '994.960')] [2024-06-15 22:40:22,347][1651669] Updated weights for policy 0, policy_version 937973 (0.0011) [2024-06-15 22:40:23,638][1651669] Updated weights for policy 0, policy_version 938032 (0.0011) [2024-06-15 22:40:25,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48605.9, 300 sec: 48541.1). Total num frames: 1921155072. Throughput: 0: 12242.5. Samples: 480343552. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:25,767][1648981] Avg episode reward: [(0, '1041.920')] [2024-06-15 22:40:25,902][1651669] Updated weights for policy 0, policy_version 938067 (0.0014) [2024-06-15 22:40:27,163][1651274] Signal inference workers to stop experience collection... (49250 times) [2024-06-15 22:40:27,261][1651669] InferenceWorker_p0-w0: stopping experience collection (49250 times) [2024-06-15 22:40:27,489][1651274] Signal inference workers to resume experience collection... (49250 times) [2024-06-15 22:40:27,490][1651669] InferenceWorker_p0-w0: resuming experience collection (49250 times) [2024-06-15 22:40:27,645][1651669] Updated weights for policy 0, policy_version 938130 (0.0017) [2024-06-15 22:40:28,549][1651669] Updated weights for policy 0, policy_version 938176 (0.0022) [2024-06-15 22:40:30,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 48059.8, 300 sec: 48207.8). Total num frames: 1921384448. Throughput: 0: 12288.0. Samples: 480412672. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:30,767][1648981] Avg episode reward: [(0, '1033.510')] [2024-06-15 22:40:33,912][1651669] Updated weights for policy 0, policy_version 938256 (0.0011) [2024-06-15 22:40:34,753][1651669] Updated weights for policy 0, policy_version 938304 (0.0013) [2024-06-15 22:40:35,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1921646592. Throughput: 0: 12276.6. Samples: 480492544. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:35,767][1648981] Avg episode reward: [(0, '1030.920')] [2024-06-15 22:40:37,360][1651669] Updated weights for policy 0, policy_version 938368 (0.0014) [2024-06-15 22:40:38,773][1651669] Updated weights for policy 0, policy_version 938429 (0.0022) [2024-06-15 22:40:40,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 49152.1, 300 sec: 48318.9). Total num frames: 1921908736. Throughput: 0: 12071.8. Samples: 480523776. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:40,767][1648981] Avg episode reward: [(0, '1068.300')] [2024-06-15 22:40:43,832][1651669] Updated weights for policy 0, policy_version 938486 (0.0013) [2024-06-15 22:40:45,571][1651669] Updated weights for policy 0, policy_version 938544 (0.0016) [2024-06-15 22:40:45,767][1648981] Fps is (10 sec: 49149.8, 60 sec: 47513.3, 300 sec: 48429.9). Total num frames: 1922138112. Throughput: 0: 12254.8. Samples: 480603136. Policy #0 lag: (min: 53.0, avg: 146.3, max: 309.0) [2024-06-15 22:40:45,767][1648981] Avg episode reward: [(0, '1066.610')] [2024-06-15 22:40:49,085][1651669] Updated weights for policy 0, policy_version 938641 (0.0013) [2024-06-15 22:40:49,937][1651669] Updated weights for policy 0, policy_version 938685 (0.0013) [2024-06-15 22:40:50,766][1648981] Fps is (10 sec: 52429.0, 60 sec: 50244.3, 300 sec: 48652.2). Total num frames: 1922433024. Throughput: 0: 11798.8. Samples: 480656896. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:40:50,767][1648981] Avg episode reward: [(0, '1062.690')] [2024-06-15 22:40:55,055][1651669] Updated weights for policy 0, policy_version 938752 (0.0013) [2024-06-15 22:40:55,778][1648981] Fps is (10 sec: 42549.9, 60 sec: 46958.2, 300 sec: 48317.0). Total num frames: 1922564096. Throughput: 0: 12102.8. Samples: 480708608. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:40:55,779][1648981] Avg episode reward: [(0, '1041.180')] [2024-06-15 22:40:55,793][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000938752_1922564096.pth... [2024-06-15 22:40:55,918][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000933120_1911029760.pth [2024-06-15 22:40:57,654][1651669] Updated weights for policy 0, policy_version 938816 (0.0012) [2024-06-15 22:41:00,036][1651669] Updated weights for policy 0, policy_version 938880 (0.0011) [2024-06-15 22:41:00,766][1648981] Fps is (10 sec: 42598.3, 60 sec: 48609.0, 300 sec: 48430.0). Total num frames: 1922859008. Throughput: 0: 11753.2. Samples: 480769536. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:00,767][1648981] Avg episode reward: [(0, '1077.100')] [2024-06-15 22:41:04,892][1651669] Updated weights for policy 0, policy_version 938945 (0.0012) [2024-06-15 22:41:05,768][1648981] Fps is (10 sec: 49210.1, 60 sec: 46967.7, 300 sec: 48318.9). Total num frames: 1923055616. Throughput: 0: 11923.9. Samples: 480847360. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:05,768][1648981] Avg episode reward: [(0, '1085.660')] [2024-06-15 22:41:05,964][1651669] Updated weights for policy 0, policy_version 939004 (0.0014) [2024-06-15 22:41:09,430][1651274] Signal inference workers to stop experience collection... (49300 times) [2024-06-15 22:41:09,496][1651669] InferenceWorker_p0-w0: stopping experience collection (49300 times) [2024-06-15 22:41:09,499][1651669] Updated weights for policy 0, policy_version 939077 (0.0037) [2024-06-15 22:41:09,662][1651274] Signal inference workers to resume experience collection... (49300 times) [2024-06-15 22:41:09,662][1651669] InferenceWorker_p0-w0: resuming experience collection (49300 times) [2024-06-15 22:41:10,513][1651669] Updated weights for policy 0, policy_version 939132 (0.0013) [2024-06-15 22:41:10,766][1648981] Fps is (10 sec: 49151.8, 60 sec: 48605.8, 300 sec: 48541.1). Total num frames: 1923350528. Throughput: 0: 11889.8. Samples: 480878592. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:10,767][1648981] Avg episode reward: [(0, '1096.290')] [2024-06-15 22:41:12,690][1651669] Updated weights for policy 0, policy_version 939194 (0.0012) [2024-06-15 22:41:15,766][1648981] Fps is (10 sec: 42598.6, 60 sec: 46967.5, 300 sec: 47985.7). Total num frames: 1923481600. Throughput: 0: 11844.3. Samples: 480945664. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:15,767][1648981] Avg episode reward: [(0, '1105.780')] [2024-06-15 22:41:17,109][1651669] Updated weights for policy 0, policy_version 939248 (0.0012) [2024-06-15 22:41:19,425][1651669] Updated weights for policy 0, policy_version 939300 (0.0141) [2024-06-15 22:41:20,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 48605.8, 300 sec: 48318.9). Total num frames: 1923776512. Throughput: 0: 11537.1. Samples: 481011712. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:20,767][1648981] Avg episode reward: [(0, '1111.940')] [2024-06-15 22:41:21,315][1651669] Updated weights for policy 0, policy_version 939383 (0.0013) [2024-06-15 22:41:23,909][1651669] Updated weights for policy 0, policy_version 939440 (0.0144) [2024-06-15 22:41:25,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47513.6, 300 sec: 47987.8). Total num frames: 1924005888. Throughput: 0: 11582.6. Samples: 481044992. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:25,767][1648981] Avg episode reward: [(0, '1099.960')] [2024-06-15 22:41:28,208][1651669] Updated weights for policy 0, policy_version 939488 (0.0015) [2024-06-15 22:41:29,596][1651669] Updated weights for policy 0, policy_version 939536 (0.0013) [2024-06-15 22:41:30,766][1648981] Fps is (10 sec: 49152.7, 60 sec: 48059.8, 300 sec: 48207.9). Total num frames: 1924268032. Throughput: 0: 11707.9. Samples: 481129984. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:30,767][1648981] Avg episode reward: [(0, '1105.870')] [2024-06-15 22:41:31,569][1651669] Updated weights for policy 0, policy_version 939618 (0.0011) [2024-06-15 22:41:34,242][1651669] Updated weights for policy 0, policy_version 939664 (0.0089) [2024-06-15 22:41:35,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1924530176. Throughput: 0: 11969.4. Samples: 481195520. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:35,767][1648981] Avg episode reward: [(0, '1106.790')] [2024-06-15 22:41:39,194][1651669] Updated weights for policy 0, policy_version 939744 (0.0105) [2024-06-15 22:41:39,860][1651669] Updated weights for policy 0, policy_version 939776 (0.0017) [2024-06-15 22:41:40,766][1648981] Fps is (10 sec: 42598.2, 60 sec: 46421.4, 300 sec: 48096.8). Total num frames: 1924694016. Throughput: 0: 11733.6. Samples: 481236480. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:40,767][1648981] Avg episode reward: [(0, '1072.800')] [2024-06-15 22:41:41,869][1651669] Updated weights for policy 0, policy_version 939856 (0.0012) [2024-06-15 22:41:42,959][1651669] Updated weights for policy 0, policy_version 939902 (0.0013) [2024-06-15 22:41:45,767][1648981] Fps is (10 sec: 42597.7, 60 sec: 46967.7, 300 sec: 47763.5). Total num frames: 1924956160. Throughput: 0: 11889.7. Samples: 481304576. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:45,767][1648981] Avg episode reward: [(0, '1030.990')] [2024-06-15 22:41:46,550][1651669] Updated weights for policy 0, policy_version 939962 (0.0013) [2024-06-15 22:41:50,181][1651669] Updated weights for policy 0, policy_version 940016 (0.0019) [2024-06-15 22:41:50,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 45875.2, 300 sec: 48100.4). Total num frames: 1925185536. Throughput: 0: 11730.5. Samples: 481375232. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:50,767][1648981] Avg episode reward: [(0, '1052.240')] [2024-06-15 22:41:51,782][1651669] Updated weights for policy 0, policy_version 940064 (0.0012) [2024-06-15 22:41:51,954][1651274] Signal inference workers to stop experience collection... (49350 times) [2024-06-15 22:41:52,026][1651669] InferenceWorker_p0-w0: stopping experience collection (49350 times) [2024-06-15 22:41:52,187][1651274] Signal inference workers to resume experience collection... (49350 times) [2024-06-15 22:41:52,188][1651669] InferenceWorker_p0-w0: resuming experience collection (49350 times) [2024-06-15 22:41:53,489][1651669] Updated weights for policy 0, policy_version 940129 (0.0012) [2024-06-15 22:41:55,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 48069.2, 300 sec: 47874.6). Total num frames: 1925447680. Throughput: 0: 11707.7. Samples: 481405440. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:41:55,767][1648981] Avg episode reward: [(0, '1012.850')] [2024-06-15 22:41:56,607][1651669] Updated weights for policy 0, policy_version 940194 (0.0012) [2024-06-15 22:41:59,631][1651669] Updated weights for policy 0, policy_version 940228 (0.0015) [2024-06-15 22:42:00,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 46967.5, 300 sec: 48096.8). Total num frames: 1925677056. Throughput: 0: 12174.2. Samples: 481493504. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:00,767][1648981] Avg episode reward: [(0, '1076.230')] [2024-06-15 22:42:02,166][1651669] Updated weights for policy 0, policy_version 940304 (0.0147) [2024-06-15 22:42:03,481][1651669] Updated weights for policy 0, policy_version 940352 (0.0011) [2024-06-15 22:42:04,814][1651669] Updated weights for policy 0, policy_version 940405 (0.0027) [2024-06-15 22:42:05,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48605.9, 300 sec: 48207.8). Total num frames: 1925971968. Throughput: 0: 12094.6. Samples: 481555968. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:05,767][1648981] Avg episode reward: [(0, '1055.910')] [2024-06-15 22:42:06,988][1651669] Updated weights for policy 0, policy_version 940440 (0.0015) [2024-06-15 22:42:10,133][1651669] Updated weights for policy 0, policy_version 940496 (0.0012) [2024-06-15 22:42:10,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 47513.6, 300 sec: 48320.3). Total num frames: 1926201344. Throughput: 0: 12356.3. Samples: 481601024. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:10,767][1648981] Avg episode reward: [(0, '1012.500')] [2024-06-15 22:42:11,072][1651669] Updated weights for policy 0, policy_version 940541 (0.0015) [2024-06-15 22:42:13,197][1651669] Updated weights for policy 0, policy_version 940608 (0.0012) [2024-06-15 22:42:15,199][1651669] Updated weights for policy 0, policy_version 940666 (0.0012) [2024-06-15 22:42:15,770][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 1926496256. Throughput: 0: 11992.1. Samples: 481669632. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:15,770][1648981] Avg episode reward: [(0, '983.580')] [2024-06-15 22:42:17,824][1651669] Updated weights for policy 0, policy_version 940706 (0.0023) [2024-06-15 22:42:18,327][1651669] Updated weights for policy 0, policy_version 940734 (0.0013) [2024-06-15 22:42:20,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 48059.8, 300 sec: 48097.4). Total num frames: 1926660096. Throughput: 0: 12356.3. Samples: 481751552. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:20,767][1648981] Avg episode reward: [(0, '1007.930')] [2024-06-15 22:42:21,385][1651669] Updated weights for policy 0, policy_version 940790 (0.0012) [2024-06-15 22:42:23,853][1651669] Updated weights for policy 0, policy_version 940848 (0.0039) [2024-06-15 22:42:25,367][1651669] Updated weights for policy 0, policy_version 940898 (0.0011) [2024-06-15 22:42:25,788][1648981] Fps is (10 sec: 49043.6, 60 sec: 49679.8, 300 sec: 48537.4). Total num frames: 1926987776. Throughput: 0: 12213.7. Samples: 481786368. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:25,789][1648981] Avg episode reward: [(0, '1027.170')] [2024-06-15 22:42:28,197][1651669] Updated weights for policy 0, policy_version 940960 (0.0031) [2024-06-15 22:42:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 48059.7, 300 sec: 47985.7). Total num frames: 1927151616. Throughput: 0: 12219.8. Samples: 481854464. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:30,767][1648981] Avg episode reward: [(0, '976.040')] [2024-06-15 22:42:31,616][1651669] Updated weights for policy 0, policy_version 941041 (0.0012) [2024-06-15 22:42:34,484][1651274] Signal inference workers to stop experience collection... (49400 times) [2024-06-15 22:42:34,531][1651669] InferenceWorker_p0-w0: stopping experience collection (49400 times) [2024-06-15 22:42:34,730][1651274] Signal inference workers to resume experience collection... (49400 times) [2024-06-15 22:42:34,730][1651669] InferenceWorker_p0-w0: resuming experience collection (49400 times) [2024-06-15 22:42:34,766][1651669] Updated weights for policy 0, policy_version 941104 (0.0123) [2024-06-15 22:42:35,778][1648981] Fps is (10 sec: 42642.5, 60 sec: 48050.3, 300 sec: 48428.1). Total num frames: 1927413760. Throughput: 0: 12398.5. Samples: 481933312. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:35,779][1648981] Avg episode reward: [(0, '982.370')] [2024-06-15 22:42:36,060][1651669] Updated weights for policy 0, policy_version 941137 (0.0014) [2024-06-15 22:42:37,204][1651669] Updated weights for policy 0, policy_version 941184 (0.0011) [2024-06-15 22:42:39,722][1651669] Updated weights for policy 0, policy_version 941241 (0.0013) [2024-06-15 22:42:40,767][1648981] Fps is (10 sec: 52428.2, 60 sec: 49698.0, 300 sec: 47985.6). Total num frames: 1927675904. Throughput: 0: 12526.9. Samples: 481969152. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:40,767][1648981] Avg episode reward: [(0, '1011.920')] [2024-06-15 22:42:42,637][1651669] Updated weights for policy 0, policy_version 941296 (0.0015) [2024-06-15 22:42:45,766][1648981] Fps is (10 sec: 42648.7, 60 sec: 48059.9, 300 sec: 48096.8). Total num frames: 1927839744. Throughput: 0: 12174.2. Samples: 482041344. Policy #0 lag: (min: 95.0, avg: 241.0, max: 383.0) [2024-06-15 22:42:45,767][1648981] Avg episode reward: [(0, '992.340')] [2024-06-15 22:42:46,052][1651669] Updated weights for policy 0, policy_version 941345 (0.0011) [2024-06-15 22:42:47,825][1651669] Updated weights for policy 0, policy_version 941409 (0.0013) [2024-06-15 22:42:50,370][1651669] Updated weights for policy 0, policy_version 941488 (0.0012) [2024-06-15 22:42:50,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.3, 300 sec: 48207.9). Total num frames: 1928200192. Throughput: 0: 12197.0. Samples: 482104832. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:42:50,767][1648981] Avg episode reward: [(0, '1017.290')] [2024-06-15 22:42:53,335][1651669] Updated weights for policy 0, policy_version 941539 (0.0011) [2024-06-15 22:42:53,818][1651669] Updated weights for policy 0, policy_version 941568 (0.0012) [2024-06-15 22:42:55,767][1648981] Fps is (10 sec: 49147.6, 60 sec: 48059.0, 300 sec: 47985.5). Total num frames: 1928331264. Throughput: 0: 12105.7. Samples: 482145792. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:42:55,768][1648981] Avg episode reward: [(0, '976.570')] [2024-06-15 22:42:55,773][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000941568_1928331264.pth... [2024-06-15 22:42:56,054][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000936000_1916928000.pth [2024-06-15 22:42:57,932][1651669] Updated weights for policy 0, policy_version 941650 (0.0108) [2024-06-15 22:43:00,277][1651669] Updated weights for policy 0, policy_version 941698 (0.0011) [2024-06-15 22:43:00,775][1648981] Fps is (10 sec: 42563.1, 60 sec: 49145.2, 300 sec: 47986.9). Total num frames: 1928626176. Throughput: 0: 12172.0. Samples: 482217472. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:00,777][1648981] Avg episode reward: [(0, '946.860')] [2024-06-15 22:43:01,523][1651669] Updated weights for policy 0, policy_version 941750 (0.0014) [2024-06-15 22:43:03,116][1651669] Updated weights for policy 0, policy_version 941777 (0.0012) [2024-06-15 22:43:05,766][1648981] Fps is (10 sec: 52433.2, 60 sec: 48059.7, 300 sec: 48097.4). Total num frames: 1928855552. Throughput: 0: 12094.6. Samples: 482295808. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:05,767][1648981] Avg episode reward: [(0, '936.180')] [2024-06-15 22:43:07,319][1651669] Updated weights for policy 0, policy_version 941840 (0.0014) [2024-06-15 22:43:08,544][1651669] Updated weights for policy 0, policy_version 941890 (0.0010) [2024-06-15 22:43:10,766][1648981] Fps is (10 sec: 49192.2, 60 sec: 48605.8, 300 sec: 47985.6). Total num frames: 1929117696. Throughput: 0: 12123.3. Samples: 482331648. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:10,767][1648981] Avg episode reward: [(0, '960.360')] [2024-06-15 22:43:11,230][1651669] Updated weights for policy 0, policy_version 941971 (0.0013) [2024-06-15 22:43:12,225][1651669] Updated weights for policy 0, policy_version 942015 (0.0012) [2024-06-15 22:43:14,343][1651669] Updated weights for policy 0, policy_version 942077 (0.0011) [2024-06-15 22:43:15,774][1648981] Fps is (10 sec: 52388.3, 60 sec: 48053.5, 300 sec: 48428.7). Total num frames: 1929379840. Throughput: 0: 12149.4. Samples: 482401280. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:15,775][1648981] Avg episode reward: [(0, '956.030')] [2024-06-15 22:43:18,195][1651274] Signal inference workers to stop experience collection... (49450 times) [2024-06-15 22:43:18,235][1651669] InferenceWorker_p0-w0: stopping experience collection (49450 times) [2024-06-15 22:43:18,494][1651274] Signal inference workers to resume experience collection... (49450 times) [2024-06-15 22:43:18,495][1651669] InferenceWorker_p0-w0: resuming experience collection (49450 times) [2024-06-15 22:43:19,040][1651669] Updated weights for policy 0, policy_version 942129 (0.0011) [2024-06-15 22:43:20,523][1651669] Updated weights for policy 0, policy_version 942208 (0.0010) [2024-06-15 22:43:20,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 49698.2, 300 sec: 48430.0). Total num frames: 1929641984. Throughput: 0: 12052.2. Samples: 482475520. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:20,767][1648981] Avg episode reward: [(0, '928.040')] [2024-06-15 22:43:24,523][1651669] Updated weights for policy 0, policy_version 942304 (0.0014) [2024-06-15 22:43:25,766][1648981] Fps is (10 sec: 52469.5, 60 sec: 48623.8, 300 sec: 48430.0). Total num frames: 1929904128. Throughput: 0: 12083.2. Samples: 482512896. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:25,767][1648981] Avg episode reward: [(0, '853.060')] [2024-06-15 22:43:29,306][1651669] Updated weights for policy 0, policy_version 942337 (0.0012) [2024-06-15 22:43:30,405][1651669] Updated weights for policy 0, policy_version 942384 (0.0012) [2024-06-15 22:43:30,766][1648981] Fps is (10 sec: 36044.8, 60 sec: 47513.6, 300 sec: 47874.6). Total num frames: 1930002432. Throughput: 0: 12106.0. Samples: 482586112. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:30,767][1648981] Avg episode reward: [(0, '866.830')] [2024-06-15 22:43:32,231][1651669] Updated weights for policy 0, policy_version 942457 (0.0128) [2024-06-15 22:43:35,336][1651669] Updated weights for policy 0, policy_version 942531 (0.0012) [2024-06-15 22:43:35,767][1648981] Fps is (10 sec: 42597.6, 60 sec: 48615.3, 300 sec: 48096.8). Total num frames: 1930330112. Throughput: 0: 12003.5. Samples: 482644992. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:35,767][1648981] Avg episode reward: [(0, '897.700')] [2024-06-15 22:43:36,679][1651669] Updated weights for policy 0, policy_version 942592 (0.0012) [2024-06-15 22:43:40,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 46421.4, 300 sec: 47652.4). Total num frames: 1930461184. Throughput: 0: 11981.0. Samples: 482684928. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:40,767][1648981] Avg episode reward: [(0, '862.420')] [2024-06-15 22:43:41,688][1651669] Updated weights for policy 0, policy_version 942645 (0.0011) [2024-06-15 22:43:45,709][1651669] Updated weights for policy 0, policy_version 942724 (0.0021) [2024-06-15 22:43:45,766][1648981] Fps is (10 sec: 36045.5, 60 sec: 47513.6, 300 sec: 47541.4). Total num frames: 1930690560. Throughput: 0: 11903.3. Samples: 482753024. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:45,767][1648981] Avg episode reward: [(0, '857.670')] [2024-06-15 22:43:47,693][1651669] Updated weights for policy 0, policy_version 942816 (0.0018) [2024-06-15 22:43:50,767][1648981] Fps is (10 sec: 49150.4, 60 sec: 45874.9, 300 sec: 47763.5). Total num frames: 1930952704. Throughput: 0: 11650.8. Samples: 482820096. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:50,769][1648981] Avg episode reward: [(0, '880.200')] [2024-06-15 22:43:51,489][1651669] Updated weights for policy 0, policy_version 942853 (0.0011) [2024-06-15 22:43:52,688][1651669] Updated weights for policy 0, policy_version 942904 (0.0016) [2024-06-15 22:43:54,792][1651669] Updated weights for policy 0, policy_version 942960 (0.0013) [2024-06-15 22:43:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48060.4, 300 sec: 47655.6). Total num frames: 1931214848. Throughput: 0: 11639.5. Samples: 482855424. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:43:55,767][1648981] Avg episode reward: [(0, '869.410')] [2024-06-15 22:43:57,630][1651669] Updated weights for policy 0, policy_version 943013 (0.0012) [2024-06-15 22:43:57,871][1651274] Signal inference workers to stop experience collection... (49500 times) [2024-06-15 22:43:57,921][1651669] InferenceWorker_p0-w0: stopping experience collection (49500 times) [2024-06-15 22:43:58,030][1651274] Signal inference workers to resume experience collection... (49500 times) [2024-06-15 22:43:58,031][1651669] InferenceWorker_p0-w0: resuming experience collection (49500 times) [2024-06-15 22:43:58,871][1651669] Updated weights for policy 0, policy_version 943076 (0.0011) [2024-06-15 22:43:59,434][1651669] Updated weights for policy 0, policy_version 943104 (0.0019) [2024-06-15 22:44:00,766][1648981] Fps is (10 sec: 52431.2, 60 sec: 47520.2, 300 sec: 47985.7). Total num frames: 1931476992. Throughput: 0: 11732.5. Samples: 482929152. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:00,767][1648981] Avg episode reward: [(0, '893.070')] [2024-06-15 22:44:02,670][1651669] Updated weights for policy 0, policy_version 943165 (0.0082) [2024-06-15 22:44:04,705][1651669] Updated weights for policy 0, policy_version 943228 (0.0015) [2024-06-15 22:44:05,776][1648981] Fps is (10 sec: 52380.0, 60 sec: 48052.2, 300 sec: 48095.2). Total num frames: 1931739136. Throughput: 0: 11841.8. Samples: 483008512. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:05,777][1648981] Avg episode reward: [(0, '892.420')] [2024-06-15 22:44:07,688][1651669] Updated weights for policy 0, policy_version 943280 (0.0012) [2024-06-15 22:44:09,259][1651669] Updated weights for policy 0, policy_version 943360 (0.0171) [2024-06-15 22:44:10,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 48059.8, 300 sec: 48211.3). Total num frames: 1932001280. Throughput: 0: 11776.0. Samples: 483042816. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:10,767][1648981] Avg episode reward: [(0, '912.000')] [2024-06-15 22:44:14,097][1651669] Updated weights for policy 0, policy_version 943420 (0.0044) [2024-06-15 22:44:15,766][1648981] Fps is (10 sec: 49198.5, 60 sec: 47519.8, 300 sec: 48319.5). Total num frames: 1932230656. Throughput: 0: 11719.1. Samples: 483113472. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:15,767][1648981] Avg episode reward: [(0, '911.740')] [2024-06-15 22:44:15,982][1651669] Updated weights for policy 0, policy_version 943482 (0.0012) [2024-06-15 22:44:19,269][1651669] Updated weights for policy 0, policy_version 943520 (0.0012) [2024-06-15 22:44:20,767][1648981] Fps is (10 sec: 42597.1, 60 sec: 46421.1, 300 sec: 48096.7). Total num frames: 1932427264. Throughput: 0: 11923.9. Samples: 483181568. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:20,767][1648981] Avg episode reward: [(0, '924.110')] [2024-06-15 22:44:20,988][1651669] Updated weights for policy 0, policy_version 943588 (0.0014) [2024-06-15 22:44:24,040][1651669] Updated weights for policy 0, policy_version 943634 (0.0133) [2024-06-15 22:44:25,031][1651669] Updated weights for policy 0, policy_version 943678 (0.0010) [2024-06-15 22:44:25,766][1648981] Fps is (10 sec: 42598.1, 60 sec: 45875.2, 300 sec: 47985.7). Total num frames: 1932656640. Throughput: 0: 11935.3. Samples: 483222016. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:25,767][1648981] Avg episode reward: [(0, '928.870')] [2024-06-15 22:44:26,717][1651669] Updated weights for policy 0, policy_version 943728 (0.0014) [2024-06-15 22:44:30,300][1651669] Updated weights for policy 0, policy_version 943792 (0.0020) [2024-06-15 22:44:30,773][1648981] Fps is (10 sec: 49119.8, 60 sec: 48600.3, 300 sec: 47984.6). Total num frames: 1932918784. Throughput: 0: 12024.5. Samples: 483294208. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:30,776][1648981] Avg episode reward: [(0, '924.900')] [2024-06-15 22:44:31,860][1651669] Updated weights for policy 0, policy_version 943857 (0.0013) [2024-06-15 22:44:35,058][1651669] Updated weights for policy 0, policy_version 943910 (0.0013) [2024-06-15 22:44:35,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 47513.7, 300 sec: 48207.8). Total num frames: 1933180928. Throughput: 0: 12094.7. Samples: 483364352. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:35,767][1648981] Avg episode reward: [(0, '950.260')] [2024-06-15 22:44:37,151][1651669] Updated weights for policy 0, policy_version 943968 (0.0013) [2024-06-15 22:44:39,931][1651274] Signal inference workers to stop experience collection... (49550 times) [2024-06-15 22:44:39,975][1651669] Updated weights for policy 0, policy_version 944018 (0.0011) [2024-06-15 22:44:39,995][1651669] InferenceWorker_p0-w0: stopping experience collection (49550 times) [2024-06-15 22:44:40,152][1651274] Signal inference workers to resume experience collection... (49550 times) [2024-06-15 22:44:40,153][1651669] InferenceWorker_p0-w0: resuming experience collection (49550 times) [2024-06-15 22:44:40,766][1648981] Fps is (10 sec: 49185.4, 60 sec: 49152.0, 300 sec: 47874.6). Total num frames: 1933410304. Throughput: 0: 12174.2. Samples: 483403264. Policy #0 lag: (min: 63.0, avg: 133.4, max: 255.0) [2024-06-15 22:44:40,767][1648981] Avg episode reward: [(0, '952.900')] [2024-06-15 22:44:41,261][1651669] Updated weights for policy 0, policy_version 944066 (0.0013) [2024-06-15 22:44:42,672][1651669] Updated weights for policy 0, policy_version 944119 (0.0029) [2024-06-15 22:44:45,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 48605.9, 300 sec: 48096.8). Total num frames: 1933606912. Throughput: 0: 12037.7. Samples: 483470848. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:44:45,767][1648981] Avg episode reward: [(0, '930.960')] [2024-06-15 22:44:46,071][1651669] Updated weights for policy 0, policy_version 944163 (0.0013) [2024-06-15 22:44:47,734][1651669] Updated weights for policy 0, policy_version 944208 (0.0011) [2024-06-15 22:44:48,869][1651669] Updated weights for policy 0, policy_version 944255 (0.0020) [2024-06-15 22:44:50,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.5, 300 sec: 48096.8). Total num frames: 1933934592. Throughput: 0: 12028.8. Samples: 483549696. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:44:50,767][1648981] Avg episode reward: [(0, '989.210')] [2024-06-15 22:44:51,046][1651669] Updated weights for policy 0, policy_version 944317 (0.0012) [2024-06-15 22:44:53,067][1651669] Updated weights for policy 0, policy_version 944371 (0.0015) [2024-06-15 22:44:55,773][1648981] Fps is (10 sec: 52395.1, 60 sec: 48600.7, 300 sec: 48096.3). Total num frames: 1934131200. Throughput: 0: 11967.7. Samples: 483581440. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:44:55,808][1648981] Avg episode reward: [(0, '1032.260')] [2024-06-15 22:44:56,133][1651669] Updated weights for policy 0, policy_version 944416 (0.0086) [2024-06-15 22:44:56,133][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000944416_1934163968.pth... [2024-06-15 22:44:56,288][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000938752_1922564096.pth [2024-06-15 22:44:58,857][1651669] Updated weights for policy 0, policy_version 944465 (0.0013) [2024-06-15 22:44:59,965][1651669] Updated weights for policy 0, policy_version 944512 (0.0010) [2024-06-15 22:45:00,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48605.8, 300 sec: 47985.7). Total num frames: 1934393344. Throughput: 0: 12208.3. Samples: 483662848. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:00,767][1648981] Avg episode reward: [(0, '1088.640')] [2024-06-15 22:45:01,636][1651669] Updated weights for policy 0, policy_version 944573 (0.0013) [2024-06-15 22:45:03,702][1651669] Updated weights for policy 0, policy_version 944636 (0.0015) [2024-06-15 22:45:05,768][1648981] Fps is (10 sec: 49173.2, 60 sec: 48065.6, 300 sec: 48096.4). Total num frames: 1934622720. Throughput: 0: 12242.0. Samples: 483732480. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:05,769][1648981] Avg episode reward: [(0, '1065.690')] [2024-06-15 22:45:07,748][1651669] Updated weights for policy 0, policy_version 944673 (0.0013) [2024-06-15 22:45:10,046][1651669] Updated weights for policy 0, policy_version 944722 (0.0011) [2024-06-15 22:45:10,766][1648981] Fps is (10 sec: 45875.5, 60 sec: 47513.6, 300 sec: 48096.8). Total num frames: 1934852096. Throughput: 0: 12231.1. Samples: 483772416. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:10,767][1648981] Avg episode reward: [(0, '1094.820')] [2024-06-15 22:45:11,171][1651669] Updated weights for policy 0, policy_version 944775 (0.0011) [2024-06-15 22:45:12,215][1651669] Updated weights for policy 0, policy_version 944826 (0.0012) [2024-06-15 22:45:13,125][1651669] Updated weights for policy 0, policy_version 944852 (0.0028) [2024-06-15 22:45:14,087][1651669] Updated weights for policy 0, policy_version 944895 (0.0017) [2024-06-15 22:45:15,766][1648981] Fps is (10 sec: 52439.9, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1935147008. Throughput: 0: 12153.3. Samples: 483841024. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:15,767][1648981] Avg episode reward: [(0, '1082.270')] [2024-06-15 22:45:18,769][1651669] Updated weights for policy 0, policy_version 944959 (0.0012) [2024-06-15 22:45:20,363][1651274] Signal inference workers to stop experience collection... (49600 times) [2024-06-15 22:45:20,426][1651669] InferenceWorker_p0-w0: stopping experience collection (49600 times) [2024-06-15 22:45:20,622][1651274] Signal inference workers to resume experience collection... (49600 times) [2024-06-15 22:45:20,623][1651669] InferenceWorker_p0-w0: resuming experience collection (49600 times) [2024-06-15 22:45:20,813][1648981] Fps is (10 sec: 52185.0, 60 sec: 49114.0, 300 sec: 48200.2). Total num frames: 1935376384. Throughput: 0: 12366.2. Samples: 483921408. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:20,814][1648981] Avg episode reward: [(0, '1073.670')] [2024-06-15 22:45:20,824][1651669] Updated weights for policy 0, policy_version 945023 (0.0014) [2024-06-15 22:45:22,606][1651669] Updated weights for policy 0, policy_version 945072 (0.0011) [2024-06-15 22:45:23,858][1651669] Updated weights for policy 0, policy_version 945121 (0.0160) [2024-06-15 22:45:25,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 50244.2, 300 sec: 48430.0). Total num frames: 1935671296. Throughput: 0: 12231.1. Samples: 483953664. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:25,767][1648981] Avg episode reward: [(0, '1086.850')] [2024-06-15 22:45:28,500][1651669] Updated weights for policy 0, policy_version 945169 (0.0013) [2024-06-15 22:45:29,620][1651669] Updated weights for policy 0, policy_version 945216 (0.0010) [2024-06-15 22:45:30,766][1648981] Fps is (10 sec: 42797.8, 60 sec: 48065.1, 300 sec: 47985.7). Total num frames: 1935802368. Throughput: 0: 12379.0. Samples: 484027904. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:30,767][1648981] Avg episode reward: [(0, '1055.650')] [2024-06-15 22:45:32,119][1651669] Updated weights for policy 0, policy_version 945280 (0.0011) [2024-06-15 22:45:33,480][1651669] Updated weights for policy 0, policy_version 945344 (0.0014) [2024-06-15 22:45:35,029][1651669] Updated weights for policy 0, policy_version 945408 (0.0017) [2024-06-15 22:45:35,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.4, 300 sec: 48430.0). Total num frames: 1936195584. Throughput: 0: 12219.7. Samples: 484099584. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:35,767][1648981] Avg episode reward: [(0, '1061.900')] [2024-06-15 22:45:40,773][1648981] Fps is (10 sec: 52392.5, 60 sec: 48600.2, 300 sec: 48095.7). Total num frames: 1936326656. Throughput: 0: 12458.5. Samples: 484142080. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:40,774][1648981] Avg episode reward: [(0, '1030.710')] [2024-06-15 22:45:41,697][1651669] Updated weights for policy 0, policy_version 945473 (0.0015) [2024-06-15 22:45:43,251][1651669] Updated weights for policy 0, policy_version 945537 (0.0013) [2024-06-15 22:45:44,920][1651669] Updated weights for policy 0, policy_version 945601 (0.0012) [2024-06-15 22:45:45,766][1648981] Fps is (10 sec: 45875.1, 60 sec: 50790.4, 300 sec: 48207.8). Total num frames: 1936654336. Throughput: 0: 12128.7. Samples: 484208640. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:45,767][1648981] Avg episode reward: [(0, '987.570')] [2024-06-15 22:45:49,239][1651669] Updated weights for policy 0, policy_version 945666 (0.0014) [2024-06-15 22:45:50,540][1651669] Updated weights for policy 0, policy_version 945728 (0.0098) [2024-06-15 22:45:50,766][1648981] Fps is (10 sec: 52465.5, 60 sec: 48605.9, 300 sec: 48431.9). Total num frames: 1936850944. Throughput: 0: 12186.2. Samples: 484280832. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:50,767][1648981] Avg episode reward: [(0, '973.870')] [2024-06-15 22:45:54,617][1651669] Updated weights for policy 0, policy_version 945809 (0.0014) [2024-06-15 22:45:55,650][1651669] Updated weights for policy 0, policy_version 945856 (0.0138) [2024-06-15 22:45:55,766][1648981] Fps is (10 sec: 45874.6, 60 sec: 49703.4, 300 sec: 48318.9). Total num frames: 1937113088. Throughput: 0: 12344.8. Samples: 484327936. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:45:55,767][1648981] Avg episode reward: [(0, '1001.300')] [2024-06-15 22:45:56,821][1651669] Updated weights for policy 0, policy_version 945911 (0.0013) [2024-06-15 22:45:59,879][1651274] Signal inference workers to stop experience collection... (49650 times) [2024-06-15 22:45:59,954][1651669] InferenceWorker_p0-w0: stopping experience collection (49650 times) [2024-06-15 22:46:00,077][1651274] Signal inference workers to resume experience collection... (49650 times) [2024-06-15 22:46:00,078][1651669] InferenceWorker_p0-w0: resuming experience collection (49650 times) [2024-06-15 22:46:00,412][1651669] Updated weights for policy 0, policy_version 945968 (0.0100) [2024-06-15 22:46:00,784][1648981] Fps is (10 sec: 52337.8, 60 sec: 49683.7, 300 sec: 48538.2). Total num frames: 1937375232. Throughput: 0: 12260.5. Samples: 484392960. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:00,784][1648981] Avg episode reward: [(0, '1035.990')] [2024-06-15 22:46:04,305][1651669] Updated weights for policy 0, policy_version 946016 (0.0016) [2024-06-15 22:46:05,767][1648981] Fps is (10 sec: 42597.5, 60 sec: 48607.4, 300 sec: 48096.7). Total num frames: 1937539072. Throughput: 0: 12186.8. Samples: 484469248. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:05,769][1648981] Avg episode reward: [(0, '1037.750')] [2024-06-15 22:46:06,152][1651669] Updated weights for policy 0, policy_version 946082 (0.0011) [2024-06-15 22:46:08,004][1651669] Updated weights for policy 0, policy_version 946148 (0.0051) [2024-06-15 22:46:10,766][1648981] Fps is (10 sec: 42672.4, 60 sec: 49151.9, 300 sec: 48541.1). Total num frames: 1937801216. Throughput: 0: 11969.4. Samples: 484492288. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:10,767][1648981] Avg episode reward: [(0, '1045.880')] [2024-06-15 22:46:10,785][1651669] Updated weights for policy 0, policy_version 946195 (0.0016) [2024-06-15 22:46:14,652][1651669] Updated weights for policy 0, policy_version 946256 (0.0013) [2024-06-15 22:46:15,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 47513.6, 300 sec: 48207.8). Total num frames: 1937997824. Throughput: 0: 12253.9. Samples: 484579328. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:15,767][1648981] Avg episode reward: [(0, '1029.260')] [2024-06-15 22:46:16,480][1651669] Updated weights for policy 0, policy_version 946323 (0.0127) [2024-06-15 22:46:18,630][1651669] Updated weights for policy 0, policy_version 946387 (0.0016) [2024-06-15 22:46:20,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 48643.7, 300 sec: 48430.0). Total num frames: 1938292736. Throughput: 0: 11992.2. Samples: 484639232. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:20,767][1648981] Avg episode reward: [(0, '957.850')] [2024-06-15 22:46:21,077][1651669] Updated weights for policy 0, policy_version 946439 (0.0013) [2024-06-15 22:46:25,478][1651669] Updated weights for policy 0, policy_version 946514 (0.0011) [2024-06-15 22:46:25,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 46421.4, 300 sec: 48096.7). Total num frames: 1938456576. Throughput: 0: 12016.8. Samples: 484682752. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:25,767][1648981] Avg episode reward: [(0, '1002.110')] [2024-06-15 22:46:27,418][1651669] Updated weights for policy 0, policy_version 946592 (0.0202) [2024-06-15 22:46:30,215][1651669] Updated weights for policy 0, policy_version 946641 (0.0012) [2024-06-15 22:46:30,766][1648981] Fps is (10 sec: 45875.2, 60 sec: 49152.0, 300 sec: 48207.8). Total num frames: 1938751488. Throughput: 0: 12174.2. Samples: 484756480. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:30,767][1648981] Avg episode reward: [(0, '989.740')] [2024-06-15 22:46:31,700][1651669] Updated weights for policy 0, policy_version 946704 (0.0012) [2024-06-15 22:46:35,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 45875.1, 300 sec: 48318.9). Total num frames: 1938948096. Throughput: 0: 12288.0. Samples: 484833792. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:35,767][1648981] Avg episode reward: [(0, '976.450')] [2024-06-15 22:46:36,484][1651669] Updated weights for policy 0, policy_version 946784 (0.0014) [2024-06-15 22:46:38,567][1651669] Updated weights for policy 0, policy_version 946870 (0.0027) [2024-06-15 22:46:40,774][1648981] Fps is (10 sec: 45840.1, 60 sec: 48059.2, 300 sec: 48317.7). Total num frames: 1939210240. Throughput: 0: 11842.3. Samples: 484860928. Policy #0 lag: (min: 15.0, avg: 122.4, max: 271.0) [2024-06-15 22:46:40,775][1648981] Avg episode reward: [(0, '992.780')] [2024-06-15 22:46:41,228][1651274] Signal inference workers to stop experience collection... (49700 times) [2024-06-15 22:46:41,317][1651669] InferenceWorker_p0-w0: stopping experience collection (49700 times) [2024-06-15 22:46:41,600][1651274] Signal inference workers to resume experience collection... (49700 times) [2024-06-15 22:46:41,607][1651669] InferenceWorker_p0-w0: resuming experience collection (49700 times) [2024-06-15 22:46:41,908][1651669] Updated weights for policy 0, policy_version 946928 (0.0013) [2024-06-15 22:46:43,698][1651669] Updated weights for policy 0, policy_version 946992 (0.0015) [2024-06-15 22:46:45,766][1648981] Fps is (10 sec: 52429.5, 60 sec: 46967.5, 300 sec: 48430.0). Total num frames: 1939472384. Throughput: 0: 11848.9. Samples: 484925952. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:46:45,767][1648981] Avg episode reward: [(0, '994.140')] [2024-06-15 22:46:47,961][1651669] Updated weights for policy 0, policy_version 947056 (0.0012) [2024-06-15 22:46:48,923][1651669] Updated weights for policy 0, policy_version 947092 (0.0013) [2024-06-15 22:46:50,767][1648981] Fps is (10 sec: 52467.9, 60 sec: 48059.5, 300 sec: 48430.0). Total num frames: 1939734528. Throughput: 0: 11912.6. Samples: 485005312. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:46:50,767][1648981] Avg episode reward: [(0, '992.310')] [2024-06-15 22:46:52,451][1651669] Updated weights for policy 0, policy_version 947153 (0.0013) [2024-06-15 22:46:54,867][1651669] Updated weights for policy 0, policy_version 947248 (0.0030) [2024-06-15 22:46:55,766][1648981] Fps is (10 sec: 52427.9, 60 sec: 48059.8, 300 sec: 48541.1). Total num frames: 1939996672. Throughput: 0: 12140.1. Samples: 485038592. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:46:55,767][1648981] Avg episode reward: [(0, '944.880')] [2024-06-15 22:46:55,791][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000947264_1939996672.pth... [2024-06-15 22:46:55,870][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000941568_1928331264.pth [2024-06-15 22:46:59,007][1651669] Updated weights for policy 0, policy_version 947296 (0.0013) [2024-06-15 22:46:59,926][1651669] Updated weights for policy 0, policy_version 947344 (0.0012) [2024-06-15 22:47:00,766][1648981] Fps is (10 sec: 49152.8, 60 sec: 47527.3, 300 sec: 48318.9). Total num frames: 1940226048. Throughput: 0: 11912.5. Samples: 485115392. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:00,767][1648981] Avg episode reward: [(0, '974.710')] [2024-06-15 22:47:03,341][1651669] Updated weights for policy 0, policy_version 947425 (0.0012) [2024-06-15 22:47:04,923][1651669] Updated weights for policy 0, policy_version 947472 (0.0011) [2024-06-15 22:47:05,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 49152.2, 300 sec: 48430.0). Total num frames: 1940488192. Throughput: 0: 12071.8. Samples: 485182464. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:05,767][1648981] Avg episode reward: [(0, '996.320')] [2024-06-15 22:47:05,915][1651669] Updated weights for policy 0, policy_version 947512 (0.0032) [2024-06-15 22:47:09,074][1651669] Updated weights for policy 0, policy_version 947553 (0.0022) [2024-06-15 22:47:10,767][1648981] Fps is (10 sec: 52427.8, 60 sec: 49151.8, 300 sec: 48318.9). Total num frames: 1940750336. Throughput: 0: 12094.5. Samples: 485227008. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:10,767][1648981] Avg episode reward: [(0, '985.650')] [2024-06-15 22:47:10,936][1651669] Updated weights for policy 0, policy_version 947645 (0.0012) [2024-06-15 22:47:14,358][1651669] Updated weights for policy 0, policy_version 947684 (0.0012) [2024-06-15 22:47:15,625][1651669] Updated weights for policy 0, policy_version 947728 (0.0037) [2024-06-15 22:47:15,766][1648981] Fps is (10 sec: 45875.7, 60 sec: 49152.0, 300 sec: 48430.0). Total num frames: 1940946944. Throughput: 0: 12003.6. Samples: 485296640. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:15,767][1648981] Avg episode reward: [(0, '992.060')] [2024-06-15 22:47:19,937][1651669] Updated weights for policy 0, policy_version 947792 (0.0012) [2024-06-15 22:47:20,766][1648981] Fps is (10 sec: 39322.3, 60 sec: 47513.6, 300 sec: 47989.3). Total num frames: 1941143552. Throughput: 0: 12071.8. Samples: 485377024. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:20,767][1648981] Avg episode reward: [(0, '993.440')] [2024-06-15 22:47:20,960][1651274] Signal inference workers to stop experience collection... (49750 times) [2024-06-15 22:47:21,033][1651669] InferenceWorker_p0-w0: stopping experience collection (49750 times) [2024-06-15 22:47:21,216][1651274] Signal inference workers to resume experience collection... (49750 times) [2024-06-15 22:47:21,219][1651669] InferenceWorker_p0-w0: resuming experience collection (49750 times) [2024-06-15 22:47:21,222][1651669] Updated weights for policy 0, policy_version 947856 (0.0011) [2024-06-15 22:47:23,794][1651669] Updated weights for policy 0, policy_version 947936 (0.0023) [2024-06-15 22:47:25,238][1651669] Updated weights for policy 0, policy_version 947970 (0.0012) [2024-06-15 22:47:25,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 50244.3, 300 sec: 48541.1). Total num frames: 1941471232. Throughput: 0: 12278.7. Samples: 485413376. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:25,767][1648981] Avg episode reward: [(0, '1027.890')] [2024-06-15 22:47:29,893][1651669] Updated weights for policy 0, policy_version 948034 (0.0012) [2024-06-15 22:47:30,778][1648981] Fps is (10 sec: 49094.4, 60 sec: 48050.3, 300 sec: 48207.8). Total num frames: 1941635072. Throughput: 0: 12500.9. Samples: 485488640. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:30,779][1648981] Avg episode reward: [(0, '1032.860')] [2024-06-15 22:47:31,498][1651669] Updated weights for policy 0, policy_version 948096 (0.0012) [2024-06-15 22:47:32,442][1651669] Updated weights for policy 0, policy_version 948150 (0.0013) [2024-06-15 22:47:34,900][1651669] Updated weights for policy 0, policy_version 948216 (0.0023) [2024-06-15 22:47:35,779][1648981] Fps is (10 sec: 52365.6, 60 sec: 50780.2, 300 sec: 48539.1). Total num frames: 1941995520. Throughput: 0: 12375.8. Samples: 485562368. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:35,780][1648981] Avg episode reward: [(0, '1030.870')] [2024-06-15 22:47:36,097][1651669] Updated weights for policy 0, policy_version 948260 (0.0011) [2024-06-15 22:47:40,777][1648981] Fps is (10 sec: 45879.9, 60 sec: 48057.3, 300 sec: 48317.2). Total num frames: 1942093824. Throughput: 0: 12421.6. Samples: 485597696. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:40,778][1648981] Avg episode reward: [(0, '1009.160')] [2024-06-15 22:47:41,634][1651669] Updated weights for policy 0, policy_version 948336 (0.0013) [2024-06-15 22:47:43,037][1651669] Updated weights for policy 0, policy_version 948386 (0.0011) [2024-06-15 22:47:45,592][1651669] Updated weights for policy 0, policy_version 948435 (0.0012) [2024-06-15 22:47:45,768][1648981] Fps is (10 sec: 42641.0, 60 sec: 49150.2, 300 sec: 48207.5). Total num frames: 1942421504. Throughput: 0: 12321.6. Samples: 485669888. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:45,769][1648981] Avg episode reward: [(0, '971.880')] [2024-06-15 22:47:46,920][1651669] Updated weights for policy 0, policy_version 948498 (0.0012) [2024-06-15 22:47:50,766][1648981] Fps is (10 sec: 52485.5, 60 sec: 48059.9, 300 sec: 48430.1). Total num frames: 1942618112. Throughput: 0: 12424.6. Samples: 485741568. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:50,767][1648981] Avg episode reward: [(0, '972.030')] [2024-06-15 22:47:51,486][1651669] Updated weights for policy 0, policy_version 948547 (0.0012) [2024-06-15 22:47:53,029][1651669] Updated weights for policy 0, policy_version 948608 (0.0011) [2024-06-15 22:47:54,196][1651669] Updated weights for policy 0, policy_version 948663 (0.0012) [2024-06-15 22:47:55,766][1648981] Fps is (10 sec: 45884.6, 60 sec: 48059.8, 300 sec: 48320.3). Total num frames: 1942880256. Throughput: 0: 12208.4. Samples: 485776384. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:47:55,767][1648981] Avg episode reward: [(0, '961.890')] [2024-06-15 22:47:56,230][1651669] Updated weights for policy 0, policy_version 948692 (0.0030) [2024-06-15 22:47:57,516][1651669] Updated weights for policy 0, policy_version 948753 (0.0011) [2024-06-15 22:48:00,767][1648981] Fps is (10 sec: 52428.1, 60 sec: 48605.8, 300 sec: 48430.0). Total num frames: 1943142400. Throughput: 0: 12435.9. Samples: 485856256. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:00,767][1648981] Avg episode reward: [(0, '981.790')] [2024-06-15 22:48:02,072][1651274] Signal inference workers to stop experience collection... (49800 times) [2024-06-15 22:48:02,100][1651669] Updated weights for policy 0, policy_version 948802 (0.0011) [2024-06-15 22:48:02,145][1651669] InferenceWorker_p0-w0: stopping experience collection (49800 times) [2024-06-15 22:48:02,410][1651274] Signal inference workers to resume experience collection... (49800 times) [2024-06-15 22:48:02,411][1651669] InferenceWorker_p0-w0: resuming experience collection (49800 times) [2024-06-15 22:48:03,260][1651669] Updated weights for policy 0, policy_version 948848 (0.0025) [2024-06-15 22:48:04,983][1651669] Updated weights for policy 0, policy_version 948922 (0.0013) [2024-06-15 22:48:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48606.0, 300 sec: 48430.0). Total num frames: 1943404544. Throughput: 0: 12106.0. Samples: 485921792. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:05,767][1648981] Avg episode reward: [(0, '983.790')] [2024-06-15 22:48:08,156][1651669] Updated weights for policy 0, policy_version 948992 (0.0011) [2024-06-15 22:48:10,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 48606.0, 300 sec: 48431.3). Total num frames: 1943666688. Throughput: 0: 12117.3. Samples: 485958656. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:10,767][1648981] Avg episode reward: [(0, '965.020')] [2024-06-15 22:48:13,366][1651669] Updated weights for policy 0, policy_version 949072 (0.0132) [2024-06-15 22:48:14,970][1651669] Updated weights for policy 0, policy_version 949136 (0.0011) [2024-06-15 22:48:15,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 48605.8, 300 sec: 48207.8). Total num frames: 1943863296. Throughput: 0: 12109.1. Samples: 486033408. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:15,767][1648981] Avg episode reward: [(0, '960.760')] [2024-06-15 22:48:17,849][1651669] Updated weights for policy 0, policy_version 949185 (0.0011) [2024-06-15 22:48:19,799][1651669] Updated weights for policy 0, policy_version 949264 (0.0089) [2024-06-15 22:48:20,772][1648981] Fps is (10 sec: 49126.7, 60 sec: 50240.0, 300 sec: 48318.1). Total num frames: 1944158208. Throughput: 0: 11925.7. Samples: 486098944. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:20,772][1648981] Avg episode reward: [(0, '966.930')] [2024-06-15 22:48:24,254][1651669] Updated weights for policy 0, policy_version 949329 (0.0011) [2024-06-15 22:48:25,280][1651669] Updated weights for policy 0, policy_version 949380 (0.0020) [2024-06-15 22:48:25,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.7, 300 sec: 48652.1). Total num frames: 1944354816. Throughput: 0: 12165.7. Samples: 486145024. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:25,767][1648981] Avg episode reward: [(0, '949.580')] [2024-06-15 22:48:28,820][1651669] Updated weights for policy 0, policy_version 949459 (0.0016) [2024-06-15 22:48:30,450][1651669] Updated weights for policy 0, policy_version 949523 (0.0013) [2024-06-15 22:48:30,766][1648981] Fps is (10 sec: 49177.6, 60 sec: 50254.1, 300 sec: 48541.1). Total num frames: 1944649728. Throughput: 0: 12026.9. Samples: 486211072. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:30,767][1648981] Avg episode reward: [(0, '950.550')] [2024-06-15 22:48:35,770][1648981] Fps is (10 sec: 42582.4, 60 sec: 46427.7, 300 sec: 48540.5). Total num frames: 1944780800. Throughput: 0: 12036.7. Samples: 486283264. Policy #0 lag: (min: 79.0, avg: 182.0, max: 335.0) [2024-06-15 22:48:35,771][1648981] Avg episode reward: [(0, '970.990')] [2024-06-15 22:48:36,220][1651669] Updated weights for policy 0, policy_version 949624 (0.0013) [2024-06-15 22:48:38,082][1651669] Updated weights for policy 0, policy_version 949689 (0.0013) [2024-06-15 22:48:40,057][1651669] Updated weights for policy 0, policy_version 949728 (0.0012) [2024-06-15 22:48:40,261][1651274] Signal inference workers to stop experience collection... (49850 times) [2024-06-15 22:48:40,338][1651669] InferenceWorker_p0-w0: stopping experience collection (49850 times) [2024-06-15 22:48:40,504][1651274] Signal inference workers to resume experience collection... (49850 times) [2024-06-15 22:48:40,504][1651669] InferenceWorker_p0-w0: resuming experience collection (49850 times) [2024-06-15 22:48:40,766][1648981] Fps is (10 sec: 42598.7, 60 sec: 49707.1, 300 sec: 48763.2). Total num frames: 1945075712. Throughput: 0: 11969.4. Samples: 486315008. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:48:40,767][1648981] Avg episode reward: [(0, '999.530')] [2024-06-15 22:48:40,912][1651669] Updated weights for policy 0, policy_version 949758 (0.0011) [2024-06-15 22:48:42,359][1651669] Updated weights for policy 0, policy_version 949813 (0.0014) [2024-06-15 22:48:45,464][1651669] Updated weights for policy 0, policy_version 949857 (0.0015) [2024-06-15 22:48:45,766][1648981] Fps is (10 sec: 55726.9, 60 sec: 48607.5, 300 sec: 48763.3). Total num frames: 1945337856. Throughput: 0: 12049.1. Samples: 486398464. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:48:45,767][1648981] Avg episode reward: [(0, '1021.750')] [2024-06-15 22:48:46,922][1651669] Updated weights for policy 0, policy_version 949891 (0.0027) [2024-06-15 22:48:49,357][1651669] Updated weights for policy 0, policy_version 949954 (0.0050) [2024-06-15 22:48:50,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49698.1, 300 sec: 48763.2). Total num frames: 1945600000. Throughput: 0: 12049.0. Samples: 486464000. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:48:50,767][1648981] Avg episode reward: [(0, '1003.340')] [2024-06-15 22:48:50,796][1651669] Updated weights for policy 0, policy_version 950012 (0.0012) [2024-06-15 22:48:53,674][1651669] Updated weights for policy 0, policy_version 950080 (0.0013) [2024-06-15 22:48:55,778][1648981] Fps is (10 sec: 42547.0, 60 sec: 48050.1, 300 sec: 48428.0). Total num frames: 1945763840. Throughput: 0: 11932.1. Samples: 486495744. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:48:55,779][1648981] Avg episode reward: [(0, '976.980')] [2024-06-15 22:48:55,787][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000950080_1945763840.pth... [2024-06-15 22:48:55,934][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000944416_1934163968.pth [2024-06-15 22:48:57,207][1651669] Updated weights for policy 0, policy_version 950136 (0.0011) [2024-06-15 22:48:59,023][1651669] Updated weights for policy 0, policy_version 950195 (0.0011) [2024-06-15 22:49:00,573][1651669] Updated weights for policy 0, policy_version 950225 (0.0013) [2024-06-15 22:49:00,766][1648981] Fps is (10 sec: 45875.6, 60 sec: 48606.0, 300 sec: 48542.6). Total num frames: 1946058752. Throughput: 0: 12094.6. Samples: 486577664. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:00,767][1648981] Avg episode reward: [(0, '985.620')] [2024-06-15 22:49:01,503][1651669] Updated weights for policy 0, policy_version 950270 (0.0012) [2024-06-15 22:49:03,992][1651669] Updated weights for policy 0, policy_version 950327 (0.0013) [2024-06-15 22:49:05,766][1648981] Fps is (10 sec: 52492.2, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1946288128. Throughput: 0: 12312.2. Samples: 486652928. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:05,767][1648981] Avg episode reward: [(0, '984.600')] [2024-06-15 22:49:07,250][1651669] Updated weights for policy 0, policy_version 950390 (0.0013) [2024-06-15 22:49:08,827][1651669] Updated weights for policy 0, policy_version 950421 (0.0015) [2024-06-15 22:49:10,235][1651669] Updated weights for policy 0, policy_version 950465 (0.0013) [2024-06-15 22:49:10,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 48605.9, 300 sec: 48652.1). Total num frames: 1946583040. Throughput: 0: 12174.2. Samples: 486692864. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:10,767][1648981] Avg episode reward: [(0, '1021.120')] [2024-06-15 22:49:12,911][1651669] Updated weights for policy 0, policy_version 950532 (0.0014) [2024-06-15 22:49:14,095][1651669] Updated weights for policy 0, policy_version 950582 (0.0022) [2024-06-15 22:49:15,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 49152.0, 300 sec: 48763.3). Total num frames: 1946812416. Throughput: 0: 12401.8. Samples: 486769152. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:15,767][1648981] Avg episode reward: [(0, '930.390')] [2024-06-15 22:49:17,501][1651669] Updated weights for policy 0, policy_version 950629 (0.0041) [2024-06-15 22:49:19,566][1651669] Updated weights for policy 0, policy_version 950712 (0.0016) [2024-06-15 22:49:20,766][1648981] Fps is (10 sec: 49152.4, 60 sec: 48610.1, 300 sec: 48874.3). Total num frames: 1947074560. Throughput: 0: 12402.8. Samples: 486841344. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:20,767][1648981] Avg episode reward: [(0, '968.690')] [2024-06-15 22:49:21,975][1651274] Signal inference workers to stop experience collection... (49900 times) [2024-06-15 22:49:21,995][1651669] Updated weights for policy 0, policy_version 950753 (0.0011) [2024-06-15 22:49:22,009][1651669] InferenceWorker_p0-w0: stopping experience collection (49900 times) [2024-06-15 22:49:22,170][1651274] Signal inference workers to resume experience collection... (49900 times) [2024-06-15 22:49:22,171][1651669] InferenceWorker_p0-w0: resuming experience collection (49900 times) [2024-06-15 22:49:23,300][1651669] Updated weights for policy 0, policy_version 950801 (0.0013) [2024-06-15 22:49:25,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 49698.2, 300 sec: 48875.4). Total num frames: 1947336704. Throughput: 0: 12572.4. Samples: 486880768. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:25,767][1648981] Avg episode reward: [(0, '972.500')] [2024-06-15 22:49:27,452][1651669] Updated weights for policy 0, policy_version 950853 (0.0011) [2024-06-15 22:49:29,361][1651669] Updated weights for policy 0, policy_version 950928 (0.0011) [2024-06-15 22:49:30,396][1651669] Updated weights for policy 0, policy_version 950968 (0.0011) [2024-06-15 22:49:30,766][1648981] Fps is (10 sec: 52428.5, 60 sec: 49152.0, 300 sec: 48874.3). Total num frames: 1947598848. Throughput: 0: 12344.9. Samples: 486953984. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:30,767][1648981] Avg episode reward: [(0, '932.160')] [2024-06-15 22:49:32,128][1651669] Updated weights for policy 0, policy_version 950997 (0.0016) [2024-06-15 22:49:33,022][1651669] Updated weights for policy 0, policy_version 951039 (0.0021) [2024-06-15 22:49:34,752][1651669] Updated weights for policy 0, policy_version 951089 (0.0016) [2024-06-15 22:49:35,768][1648981] Fps is (10 sec: 52421.9, 60 sec: 51338.7, 300 sec: 48985.2). Total num frames: 1947860992. Throughput: 0: 12651.7. Samples: 487033344. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:35,769][1648981] Avg episode reward: [(0, '952.440')] [2024-06-15 22:49:38,091][1651669] Updated weights for policy 0, policy_version 951120 (0.0011) [2024-06-15 22:49:40,694][1651669] Updated weights for policy 0, policy_version 951201 (0.0019) [2024-06-15 22:49:40,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 49698.1, 300 sec: 48985.4). Total num frames: 1948057600. Throughput: 0: 12871.7. Samples: 487074816. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:40,767][1648981] Avg episode reward: [(0, '945.940')] [2024-06-15 22:49:42,934][1651669] Updated weights for policy 0, policy_version 951264 (0.0030) [2024-06-15 22:49:45,093][1651669] Updated weights for policy 0, policy_version 951312 (0.0022) [2024-06-15 22:49:45,766][1648981] Fps is (10 sec: 45881.5, 60 sec: 49698.2, 300 sec: 48763.2). Total num frames: 1948319744. Throughput: 0: 12379.0. Samples: 487134720. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:45,767][1648981] Avg episode reward: [(0, '953.290')] [2024-06-15 22:49:46,034][1651669] Updated weights for policy 0, policy_version 951360 (0.0012) [2024-06-15 22:49:50,057][1651669] Updated weights for policy 0, policy_version 951416 (0.0012) [2024-06-15 22:49:50,759][1651669] Updated weights for policy 0, policy_version 951443 (0.0010) [2024-06-15 22:49:50,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 49152.0, 300 sec: 48875.4). Total num frames: 1948549120. Throughput: 0: 12583.8. Samples: 487219200. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:50,767][1648981] Avg episode reward: [(0, '979.830')] [2024-06-15 22:49:51,828][1651669] Updated weights for policy 0, policy_version 951481 (0.0017) [2024-06-15 22:49:53,469][1651669] Updated weights for policy 0, policy_version 951548 (0.0146) [2024-06-15 22:49:55,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 51346.8, 300 sec: 48985.4). Total num frames: 1948844032. Throughput: 0: 12344.9. Samples: 487248384. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:49:55,767][1648981] Avg episode reward: [(0, '973.640')] [2024-06-15 22:49:55,988][1651669] Updated weights for policy 0, policy_version 951610 (0.0012) [2024-06-15 22:50:00,767][1648981] Fps is (10 sec: 45874.6, 60 sec: 49151.8, 300 sec: 48763.6). Total num frames: 1949007872. Throughput: 0: 12686.2. Samples: 487340032. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:00,767][1648981] Avg episode reward: [(0, '937.140')] [2024-06-15 22:50:00,929][1651669] Updated weights for policy 0, policy_version 951677 (0.0116) [2024-06-15 22:50:01,791][1651274] Signal inference workers to stop experience collection... (49950 times) [2024-06-15 22:50:01,853][1651669] InferenceWorker_p0-w0: stopping experience collection (49950 times) [2024-06-15 22:50:02,037][1651274] Signal inference workers to resume experience collection... (49950 times) [2024-06-15 22:50:02,038][1651669] InferenceWorker_p0-w0: resuming experience collection (49950 times) [2024-06-15 22:50:02,040][1651669] Updated weights for policy 0, policy_version 951728 (0.0043) [2024-06-15 22:50:03,913][1651669] Updated weights for policy 0, policy_version 951792 (0.0011) [2024-06-15 22:50:05,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 50790.3, 300 sec: 49096.4). Total num frames: 1949335552. Throughput: 0: 12367.6. Samples: 487397888. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:05,767][1648981] Avg episode reward: [(0, '938.000')] [2024-06-15 22:50:05,779][1651669] Updated weights for policy 0, policy_version 951840 (0.0086) [2024-06-15 22:50:10,766][1648981] Fps is (10 sec: 42598.9, 60 sec: 47513.6, 300 sec: 48430.0). Total num frames: 1949433856. Throughput: 0: 12310.8. Samples: 487434752. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:10,767][1648981] Avg episode reward: [(0, '943.580')] [2024-06-15 22:50:11,079][1651669] Updated weights for policy 0, policy_version 951894 (0.0014) [2024-06-15 22:50:12,150][1651669] Updated weights for policy 0, policy_version 951940 (0.0048) [2024-06-15 22:50:13,386][1651669] Updated weights for policy 0, policy_version 952003 (0.0011) [2024-06-15 22:50:14,709][1651669] Updated weights for policy 0, policy_version 952060 (0.0011) [2024-06-15 22:50:15,774][1648981] Fps is (10 sec: 52388.0, 60 sec: 50783.8, 300 sec: 49102.9). Total num frames: 1949859840. Throughput: 0: 12376.9. Samples: 487511040. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:15,775][1648981] Avg episode reward: [(0, '963.260')] [2024-06-15 22:50:16,663][1651669] Updated weights for policy 0, policy_version 952120 (0.0095) [2024-06-15 22:50:20,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1949958144. Throughput: 0: 12322.5. Samples: 487587840. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:20,767][1648981] Avg episode reward: [(0, '941.840')] [2024-06-15 22:50:22,519][1651669] Updated weights for policy 0, policy_version 952176 (0.0013) [2024-06-15 22:50:24,609][1651669] Updated weights for policy 0, policy_version 952272 (0.0097) [2024-06-15 22:50:25,440][1651669] Updated weights for policy 0, policy_version 952318 (0.0012) [2024-06-15 22:50:25,767][1648981] Fps is (10 sec: 49189.4, 60 sec: 50244.1, 300 sec: 49318.6). Total num frames: 1950351360. Throughput: 0: 12162.8. Samples: 487622144. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:25,767][1648981] Avg episode reward: [(0, '939.560')] [2024-06-15 22:50:27,445][1651669] Updated weights for policy 0, policy_version 952376 (0.0013) [2024-06-15 22:50:30,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 48059.7, 300 sec: 48430.0). Total num frames: 1950482432. Throughput: 0: 12356.2. Samples: 487690752. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:30,767][1648981] Avg episode reward: [(0, '940.220')] [2024-06-15 22:50:33,192][1651669] Updated weights for policy 0, policy_version 952416 (0.0013) [2024-06-15 22:50:34,931][1651669] Updated weights for policy 0, policy_version 952496 (0.0028) [2024-06-15 22:50:35,766][1648981] Fps is (10 sec: 42599.7, 60 sec: 48607.0, 300 sec: 48986.5). Total num frames: 1950777344. Throughput: 0: 12106.0. Samples: 487763968. Policy #0 lag: (min: 28.0, avg: 144.2, max: 284.0) [2024-06-15 22:50:35,767][1648981] Avg episode reward: [(0, '911.820')] [2024-06-15 22:50:36,133][1651669] Updated weights for policy 0, policy_version 952548 (0.0010) [2024-06-15 22:50:37,264][1651669] Updated weights for policy 0, policy_version 952596 (0.0087) [2024-06-15 22:50:37,657][1651274] Signal inference workers to stop experience collection... (50000 times) [2024-06-15 22:50:37,700][1651669] InferenceWorker_p0-w0: stopping experience collection (50000 times) [2024-06-15 22:50:37,937][1651274] Signal inference workers to resume experience collection... (50000 times) [2024-06-15 22:50:37,938][1651669] InferenceWorker_p0-w0: resuming experience collection (50000 times) [2024-06-15 22:50:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49152.0, 300 sec: 48652.1). Total num frames: 1951006720. Throughput: 0: 12151.5. Samples: 487795200. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:50:40,767][1648981] Avg episode reward: [(0, '954.230')] [2024-06-15 22:50:43,649][1651669] Updated weights for policy 0, policy_version 952656 (0.0013) [2024-06-15 22:50:45,008][1651669] Updated weights for policy 0, policy_version 952720 (0.0013) [2024-06-15 22:50:45,767][1648981] Fps is (10 sec: 45874.1, 60 sec: 48605.7, 300 sec: 48763.2). Total num frames: 1951236096. Throughput: 0: 11992.2. Samples: 487879680. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:50:45,767][1648981] Avg episode reward: [(0, '941.690')] [2024-06-15 22:50:46,833][1651669] Updated weights for policy 0, policy_version 952805 (0.0012) [2024-06-15 22:50:47,898][1651669] Updated weights for policy 0, policy_version 952854 (0.0088) [2024-06-15 22:50:50,775][1648981] Fps is (10 sec: 52382.9, 60 sec: 49690.8, 300 sec: 48872.9). Total num frames: 1951531008. Throughput: 0: 12251.5. Samples: 487949312. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:50:50,776][1648981] Avg episode reward: [(0, '907.130')] [2024-06-15 22:50:53,877][1651669] Updated weights for policy 0, policy_version 952897 (0.0010) [2024-06-15 22:50:55,387][1651669] Updated weights for policy 0, policy_version 952976 (0.0013) [2024-06-15 22:50:55,767][1648981] Fps is (10 sec: 45874.3, 60 sec: 47513.3, 300 sec: 48543.9). Total num frames: 1951694848. Throughput: 0: 12435.8. Samples: 487994368. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:50:55,767][1648981] Avg episode reward: [(0, '875.860')] [2024-06-15 22:50:56,122][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000953008_1951760384.pth... [2024-06-15 22:50:56,270][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000947264_1939996672.pth [2024-06-15 22:50:56,275][1651274] Saving a milestone train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/milestones/checkpoint_000953008_1951760384.pth [2024-06-15 22:50:57,529][1651669] Updated weights for policy 0, policy_version 953072 (0.0145) [2024-06-15 22:50:58,496][1651669] Updated weights for policy 0, policy_version 953104 (0.0012) [2024-06-15 22:51:00,766][1648981] Fps is (10 sec: 52475.3, 60 sec: 50790.6, 300 sec: 49207.6). Total num frames: 1952055296. Throughput: 0: 12096.7. Samples: 488055296. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:00,767][1648981] Avg episode reward: [(0, '868.910')] [2024-06-15 22:51:04,336][1651669] Updated weights for policy 0, policy_version 953155 (0.0012) [2024-06-15 22:51:05,766][1648981] Fps is (10 sec: 45877.0, 60 sec: 46967.5, 300 sec: 48652.2). Total num frames: 1952153600. Throughput: 0: 12310.7. Samples: 488141824. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:05,767][1648981] Avg episode reward: [(0, '885.440')] [2024-06-15 22:51:06,019][1651669] Updated weights for policy 0, policy_version 953219 (0.0012) [2024-06-15 22:51:07,250][1651669] Updated weights for policy 0, policy_version 953280 (0.0090) [2024-06-15 22:51:08,213][1651669] Updated weights for policy 0, policy_version 953340 (0.0034) [2024-06-15 22:51:09,742][1651669] Updated weights for policy 0, policy_version 953403 (0.0015) [2024-06-15 22:51:10,778][1648981] Fps is (10 sec: 52368.0, 60 sec: 52418.7, 300 sec: 49427.8). Total num frames: 1952579584. Throughput: 0: 12284.9. Samples: 488175104. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:10,779][1648981] Avg episode reward: [(0, '879.070')] [2024-06-15 22:51:15,283][1651669] Updated weights for policy 0, policy_version 953443 (0.0034) [2024-06-15 22:51:15,773][1648981] Fps is (10 sec: 52395.2, 60 sec: 46968.6, 300 sec: 48762.2). Total num frames: 1952677888. Throughput: 0: 12729.9. Samples: 488263680. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:15,774][1648981] Avg episode reward: [(0, '887.900')] [2024-06-15 22:51:17,419][1651669] Updated weights for policy 0, policy_version 953520 (0.0129) [2024-06-15 22:51:17,433][1651274] Signal inference workers to stop experience collection... (50050 times) [2024-06-15 22:51:17,548][1651669] InferenceWorker_p0-w0: stopping experience collection (50050 times) [2024-06-15 22:51:17,663][1651274] Signal inference workers to resume experience collection... (50050 times) [2024-06-15 22:51:17,666][1651669] InferenceWorker_p0-w0: resuming experience collection (50050 times) [2024-06-15 22:51:18,608][1651669] Updated weights for policy 0, policy_version 953584 (0.0012) [2024-06-15 22:51:20,300][1651669] Updated weights for policy 0, policy_version 953648 (0.0010) [2024-06-15 22:51:20,766][1648981] Fps is (10 sec: 52489.6, 60 sec: 52428.8, 300 sec: 49651.9). Total num frames: 1953103872. Throughput: 0: 12458.7. Samples: 488324608. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:20,767][1648981] Avg episode reward: [(0, '869.610')] [2024-06-15 22:51:25,360][1651669] Updated weights for policy 0, policy_version 953683 (0.0012) [2024-06-15 22:51:25,786][1648981] Fps is (10 sec: 49086.2, 60 sec: 46952.2, 300 sec: 48871.0). Total num frames: 1953169408. Throughput: 0: 12748.9. Samples: 488369152. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:25,787][1648981] Avg episode reward: [(0, '848.530')] [2024-06-15 22:51:27,548][1651669] Updated weights for policy 0, policy_version 953761 (0.0013) [2024-06-15 22:51:29,058][1651669] Updated weights for policy 0, policy_version 953824 (0.0011) [2024-06-15 22:51:29,973][1651669] Updated weights for policy 0, policy_version 953857 (0.0011) [2024-06-15 22:51:30,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 51336.5, 300 sec: 49540.8). Total num frames: 1953562624. Throughput: 0: 12322.2. Samples: 488434176. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:30,767][1648981] Avg episode reward: [(0, '831.600')] [2024-06-15 22:51:35,591][1651669] Updated weights for policy 0, policy_version 953922 (0.0011) [2024-06-15 22:51:35,766][1648981] Fps is (10 sec: 45966.3, 60 sec: 47513.6, 300 sec: 48875.6). Total num frames: 1953628160. Throughput: 0: 12654.6. Samples: 488518656. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:35,767][1648981] Avg episode reward: [(0, '839.120')] [2024-06-15 22:51:37,134][1651669] Updated weights for policy 0, policy_version 953984 (0.0012) [2024-06-15 22:51:38,861][1651669] Updated weights for policy 0, policy_version 954050 (0.0260) [2024-06-15 22:51:39,858][1651669] Updated weights for policy 0, policy_version 954104 (0.0011) [2024-06-15 22:51:40,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1954021376. Throughput: 0: 12219.8. Samples: 488544256. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:40,767][1648981] Avg episode reward: [(0, '833.200')] [2024-06-15 22:51:41,971][1651669] Updated weights for policy 0, policy_version 954149 (0.0013) [2024-06-15 22:51:45,769][1648981] Fps is (10 sec: 52414.6, 60 sec: 48603.8, 300 sec: 48873.9). Total num frames: 1954152448. Throughput: 0: 12526.2. Samples: 488619008. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:45,770][1648981] Avg episode reward: [(0, '844.620')] [2024-06-15 22:51:46,269][1651669] Updated weights for policy 0, policy_version 954195 (0.0013) [2024-06-15 22:51:48,116][1651669] Updated weights for policy 0, policy_version 954256 (0.0011) [2024-06-15 22:51:49,192][1651669] Updated weights for policy 0, policy_version 954304 (0.0012) [2024-06-15 22:51:50,748][1651669] Updated weights for policy 0, policy_version 954365 (0.0011) [2024-06-15 22:51:50,766][1648981] Fps is (10 sec: 49152.5, 60 sec: 49705.5, 300 sec: 49207.6). Total num frames: 1954512896. Throughput: 0: 12151.5. Samples: 488688640. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:50,767][1648981] Avg episode reward: [(0, '864.830')] [2024-06-15 22:51:53,249][1651669] Updated weights for policy 0, policy_version 954432 (0.0013) [2024-06-15 22:51:55,766][1648981] Fps is (10 sec: 52442.9, 60 sec: 49698.4, 300 sec: 48985.4). Total num frames: 1954676736. Throughput: 0: 12143.2. Samples: 488721408. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:51:55,767][1648981] Avg episode reward: [(0, '875.560')] [2024-06-15 22:51:57,522][1651274] Signal inference workers to stop experience collection... (50100 times) [2024-06-15 22:51:57,608][1651669] InferenceWorker_p0-w0: stopping experience collection (50100 times) [2024-06-15 22:51:57,745][1651274] Signal inference workers to resume experience collection... (50100 times) [2024-06-15 22:51:57,747][1651669] InferenceWorker_p0-w0: resuming experience collection (50100 times) [2024-06-15 22:51:57,932][1651669] Updated weights for policy 0, policy_version 954495 (0.0045) [2024-06-15 22:51:59,393][1651669] Updated weights for policy 0, policy_version 954535 (0.0014) [2024-06-15 22:52:00,766][1648981] Fps is (10 sec: 49151.7, 60 sec: 49152.0, 300 sec: 49207.6). Total num frames: 1955004416. Throughput: 0: 12119.1. Samples: 488808960. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:00,767][1648981] Avg episode reward: [(0, '881.360')] [2024-06-15 22:52:01,136][1651669] Updated weights for policy 0, policy_version 954611 (0.0103) [2024-06-15 22:52:02,431][1651669] Updated weights for policy 0, policy_version 954641 (0.0011) [2024-06-15 22:52:03,469][1651669] Updated weights for policy 0, policy_version 954688 (0.0012) [2024-06-15 22:52:05,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 50790.4, 300 sec: 48985.4). Total num frames: 1955201024. Throughput: 0: 12322.1. Samples: 488879104. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:05,767][1648981] Avg episode reward: [(0, '846.740')] [2024-06-15 22:52:08,314][1651669] Updated weights for policy 0, policy_version 954748 (0.0087) [2024-06-15 22:52:10,198][1651669] Updated weights for policy 0, policy_version 954808 (0.0011) [2024-06-15 22:52:10,766][1648981] Fps is (10 sec: 45875.0, 60 sec: 48069.0, 300 sec: 49207.5). Total num frames: 1955463168. Throughput: 0: 12225.1. Samples: 488919040. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:10,767][1648981] Avg episode reward: [(0, '846.740')] [2024-06-15 22:52:11,353][1651669] Updated weights for policy 0, policy_version 954849 (0.0012) [2024-06-15 22:52:12,607][1651669] Updated weights for policy 0, policy_version 954896 (0.0095) [2024-06-15 22:52:15,789][1648981] Fps is (10 sec: 52309.3, 60 sec: 50776.5, 300 sec: 49425.9). Total num frames: 1955725312. Throughput: 0: 12270.4. Samples: 488986624. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:15,790][1648981] Avg episode reward: [(0, '847.020')] [2024-06-15 22:52:17,895][1651669] Updated weights for policy 0, policy_version 954960 (0.0013) [2024-06-15 22:52:18,748][1651669] Updated weights for policy 0, policy_version 955008 (0.0011) [2024-06-15 22:52:20,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 48059.7, 300 sec: 49207.5). Total num frames: 1955987456. Throughput: 0: 12185.6. Samples: 489067008. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:20,767][1648981] Avg episode reward: [(0, '855.810')] [2024-06-15 22:52:21,102][1651669] Updated weights for policy 0, policy_version 955075 (0.0017) [2024-06-15 22:52:22,558][1651669] Updated weights for policy 0, policy_version 955139 (0.0012) [2024-06-15 22:52:23,723][1651669] Updated weights for policy 0, policy_version 955189 (0.0012) [2024-06-15 22:52:25,766][1648981] Fps is (10 sec: 52548.6, 60 sec: 51353.5, 300 sec: 49542.8). Total num frames: 1956249600. Throughput: 0: 12413.2. Samples: 489102848. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:25,767][1648981] Avg episode reward: [(0, '942.370')] [2024-06-15 22:52:28,627][1651669] Updated weights for policy 0, policy_version 955224 (0.0017) [2024-06-15 22:52:29,527][1651669] Updated weights for policy 0, policy_version 955264 (0.0015) [2024-06-15 22:52:30,767][1648981] Fps is (10 sec: 42597.9, 60 sec: 47513.5, 300 sec: 48876.3). Total num frames: 1956413440. Throughput: 0: 12448.0. Samples: 489179136. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:30,769][1648981] Avg episode reward: [(0, '931.970')] [2024-06-15 22:52:31,419][1651669] Updated weights for policy 0, policy_version 955327 (0.0010) [2024-06-15 22:52:32,975][1651669] Updated weights for policy 0, policy_version 955387 (0.0013) [2024-06-15 22:52:34,470][1651669] Updated weights for policy 0, policy_version 955456 (0.0011) [2024-06-15 22:52:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 52428.8, 300 sec: 49764.8). Total num frames: 1956773888. Throughput: 0: 12492.8. Samples: 489250816. Policy #0 lag: (min: 47.0, avg: 198.2, max: 287.0) [2024-06-15 22:52:35,767][1648981] Avg episode reward: [(0, '910.740')] [2024-06-15 22:52:38,243][1651274] Signal inference workers to stop experience collection... (50150 times) [2024-06-15 22:52:38,289][1651669] InferenceWorker_p0-w0: stopping experience collection (50150 times) [2024-06-15 22:52:38,521][1651274] Signal inference workers to resume experience collection... (50150 times) [2024-06-15 22:52:38,521][1651669] InferenceWorker_p0-w0: resuming experience collection (50150 times) [2024-06-15 22:52:39,596][1651669] Updated weights for policy 0, policy_version 955511 (0.0013) [2024-06-15 22:52:40,766][1648981] Fps is (10 sec: 49152.6, 60 sec: 48059.7, 300 sec: 49096.8). Total num frames: 1956904960. Throughput: 0: 12686.2. Samples: 489292288. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:52:40,767][1648981] Avg episode reward: [(0, '947.860')] [2024-06-15 22:52:40,989][1651669] Updated weights for policy 0, policy_version 955538 (0.0011) [2024-06-15 22:52:42,669][1651669] Updated weights for policy 0, policy_version 955587 (0.0014) [2024-06-15 22:52:44,065][1651669] Updated weights for policy 0, policy_version 955650 (0.0030) [2024-06-15 22:52:45,131][1651669] Updated weights for policy 0, policy_version 955708 (0.0012) [2024-06-15 22:52:45,766][1648981] Fps is (10 sec: 52428.4, 60 sec: 52431.1, 300 sec: 49762.9). Total num frames: 1957298176. Throughput: 0: 12333.5. Samples: 489363968. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:52:45,767][1648981] Avg episode reward: [(0, '973.420')] [2024-06-15 22:52:50,250][1651669] Updated weights for policy 0, policy_version 955766 (0.0073) [2024-06-15 22:52:50,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 48605.6, 300 sec: 49318.6). Total num frames: 1957429248. Throughput: 0: 12447.2. Samples: 489439232. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:52:50,767][1648981] Avg episode reward: [(0, '921.200')] [2024-06-15 22:52:50,866][1651669] Updated weights for policy 0, policy_version 955792 (0.0013) [2024-06-15 22:52:53,507][1651669] Updated weights for policy 0, policy_version 955845 (0.0011) [2024-06-15 22:52:55,171][1651669] Updated weights for policy 0, policy_version 955909 (0.0014) [2024-06-15 22:52:55,767][1648981] Fps is (10 sec: 45874.7, 60 sec: 51336.4, 300 sec: 49540.8). Total num frames: 1957756928. Throughput: 0: 12401.7. Samples: 489477120. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:52:55,767][1648981] Avg episode reward: [(0, '888.360')] [2024-06-15 22:52:56,008][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000955952_1957789696.pth... [2024-06-15 22:52:56,051][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000950080_1945763840.pth [2024-06-15 22:52:56,340][1651669] Updated weights for policy 0, policy_version 955968 (0.0011) [2024-06-15 22:53:00,767][1648981] Fps is (10 sec: 55704.1, 60 sec: 49697.7, 300 sec: 49429.6). Total num frames: 1957986304. Throughput: 0: 12692.5. Samples: 489557504. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:00,768][1648981] Avg episode reward: [(0, '890.120')] [2024-06-15 22:53:00,910][1651669] Updated weights for policy 0, policy_version 956049 (0.0012) [2024-06-15 22:53:01,609][1651669] Updated weights for policy 0, policy_version 956091 (0.0011) [2024-06-15 22:53:05,766][1648981] Fps is (10 sec: 45876.0, 60 sec: 50244.3, 300 sec: 49318.6). Total num frames: 1958215680. Throughput: 0: 12401.8. Samples: 489625088. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:05,767][1648981] Avg episode reward: [(0, '903.470')] [2024-06-15 22:53:05,993][1651669] Updated weights for policy 0, policy_version 956161 (0.0017) [2024-06-15 22:53:07,071][1651669] Updated weights for policy 0, policy_version 956221 (0.0010) [2024-06-15 22:53:10,769][1648981] Fps is (10 sec: 42588.4, 60 sec: 49149.7, 300 sec: 49318.1). Total num frames: 1958412288. Throughput: 0: 12639.9. Samples: 489671680. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:10,770][1648981] Avg episode reward: [(0, '947.530')] [2024-06-15 22:53:11,733][1651669] Updated weights for policy 0, policy_version 956304 (0.0012) [2024-06-15 22:53:14,881][1651669] Updated weights for policy 0, policy_version 956353 (0.0012) [2024-06-15 22:53:15,781][1648981] Fps is (10 sec: 45806.8, 60 sec: 49158.5, 300 sec: 49205.9). Total num frames: 1958674432. Throughput: 0: 12409.1. Samples: 489737728. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:15,782][1648981] Avg episode reward: [(0, '945.870')] [2024-06-15 22:53:16,808][1651669] Updated weights for policy 0, policy_version 956418 (0.0012) [2024-06-15 22:53:17,087][1651274] Signal inference workers to stop experience collection... (50200 times) [2024-06-15 22:53:17,128][1651669] InferenceWorker_p0-w0: stopping experience collection (50200 times) [2024-06-15 22:53:17,226][1651274] Signal inference workers to resume experience collection... (50200 times) [2024-06-15 22:53:17,227][1651669] InferenceWorker_p0-w0: resuming experience collection (50200 times) [2024-06-15 22:53:20,386][1651669] Updated weights for policy 0, policy_version 956496 (0.0013) [2024-06-15 22:53:20,766][1648981] Fps is (10 sec: 49166.0, 60 sec: 48605.9, 300 sec: 49318.6). Total num frames: 1958903808. Throughput: 0: 12617.9. Samples: 489818624. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:20,767][1648981] Avg episode reward: [(0, '944.780')] [2024-06-15 22:53:22,539][1651669] Updated weights for policy 0, policy_version 956576 (0.0011) [2024-06-15 22:53:25,766][1648981] Fps is (10 sec: 45943.6, 60 sec: 48059.7, 300 sec: 49096.5). Total num frames: 1959133184. Throughput: 0: 12208.3. Samples: 489841664. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:25,767][1648981] Avg episode reward: [(0, '964.650')] [2024-06-15 22:53:26,475][1651669] Updated weights for policy 0, policy_version 956642 (0.0018) [2024-06-15 22:53:28,874][1651669] Updated weights for policy 0, policy_version 956720 (0.0012) [2024-06-15 22:53:30,766][1648981] Fps is (10 sec: 49152.0, 60 sec: 49698.2, 300 sec: 49541.4). Total num frames: 1959395328. Throughput: 0: 12288.0. Samples: 489916928. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:30,767][1648981] Avg episode reward: [(0, '972.360')] [2024-06-15 22:53:31,254][1651669] Updated weights for policy 0, policy_version 956752 (0.0014) [2024-06-15 22:53:32,553][1651669] Updated weights for policy 0, policy_version 956806 (0.0102) [2024-06-15 22:53:33,769][1651669] Updated weights for policy 0, policy_version 956861 (0.0015) [2024-06-15 22:53:35,769][1648981] Fps is (10 sec: 52416.2, 60 sec: 48057.8, 300 sec: 49429.3). Total num frames: 1959657472. Throughput: 0: 12310.2. Samples: 489993216. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:35,769][1648981] Avg episode reward: [(0, '961.130')] [2024-06-15 22:53:37,163][1651669] Updated weights for policy 0, policy_version 956912 (0.0020) [2024-06-15 22:53:40,014][1651669] Updated weights for policy 0, policy_version 956960 (0.0013) [2024-06-15 22:53:40,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 50244.2, 300 sec: 49429.7). Total num frames: 1959919616. Throughput: 0: 12197.0. Samples: 490025984. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:40,767][1648981] Avg episode reward: [(0, '982.220')] [2024-06-15 22:53:42,635][1651669] Updated weights for policy 0, policy_version 957008 (0.0041) [2024-06-15 22:53:44,895][1651669] Updated weights for policy 0, policy_version 957104 (0.0011) [2024-06-15 22:53:45,769][1648981] Fps is (10 sec: 52426.2, 60 sec: 48057.4, 300 sec: 49429.2). Total num frames: 1960181760. Throughput: 0: 12014.3. Samples: 490098176. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:45,770][1648981] Avg episode reward: [(0, '945.100')] [2024-06-15 22:53:46,824][1651669] Updated weights for policy 0, policy_version 957123 (0.0017) [2024-06-15 22:53:48,305][1651669] Updated weights for policy 0, policy_version 957181 (0.0011) [2024-06-15 22:53:50,766][1648981] Fps is (10 sec: 45875.3, 60 sec: 49152.2, 300 sec: 49542.8). Total num frames: 1960378368. Throughput: 0: 12140.1. Samples: 490171392. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:50,767][1648981] Avg episode reward: [(0, '921.750')] [2024-06-15 22:53:51,027][1651669] Updated weights for policy 0, policy_version 957239 (0.0018) [2024-06-15 22:53:54,258][1651669] Updated weights for policy 0, policy_version 957281 (0.0044) [2024-06-15 22:53:55,690][1651669] Updated weights for policy 0, policy_version 957350 (0.0012) [2024-06-15 22:53:55,781][1648981] Fps is (10 sec: 45820.1, 60 sec: 48047.9, 300 sec: 49427.2). Total num frames: 1960640512. Throughput: 0: 12091.3. Samples: 490215936. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:53:55,782][1648981] Avg episode reward: [(0, '917.110')] [2024-06-15 22:53:57,635][1651669] Updated weights for policy 0, policy_version 957392 (0.0023) [2024-06-15 22:53:58,101][1651274] Signal inference workers to stop experience collection... (50250 times) [2024-06-15 22:53:58,168][1651669] InferenceWorker_p0-w0: stopping experience collection (50250 times) [2024-06-15 22:53:58,368][1651274] Signal inference workers to resume experience collection... (50250 times) [2024-06-15 22:53:58,368][1651669] InferenceWorker_p0-w0: resuming experience collection (50250 times) [2024-06-15 22:53:58,596][1651669] Updated weights for policy 0, policy_version 957433 (0.0011) [2024-06-15 22:54:00,789][1648981] Fps is (10 sec: 45773.1, 60 sec: 47496.3, 300 sec: 49314.9). Total num frames: 1960837120. Throughput: 0: 12058.5. Samples: 490280448. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:00,790][1648981] Avg episode reward: [(0, '937.470')] [2024-06-15 22:54:01,627][1651669] Updated weights for policy 0, policy_version 957488 (0.0030) [2024-06-15 22:54:04,614][1651669] Updated weights for policy 0, policy_version 957552 (0.0012) [2024-06-15 22:54:05,766][1648981] Fps is (10 sec: 49225.7, 60 sec: 48605.9, 300 sec: 49318.6). Total num frames: 1961132032. Throughput: 0: 11980.8. Samples: 490357760. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:05,767][1648981] Avg episode reward: [(0, '911.050')] [2024-06-15 22:54:06,408][1651669] Updated weights for policy 0, policy_version 957625 (0.0012) [2024-06-15 22:54:09,081][1651669] Updated weights for policy 0, policy_version 957667 (0.0013) [2024-06-15 22:54:10,766][1648981] Fps is (10 sec: 52545.5, 60 sec: 49154.2, 300 sec: 49318.6). Total num frames: 1961361408. Throughput: 0: 12288.0. Samples: 490394624. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:10,767][1648981] Avg episode reward: [(0, '875.280')] [2024-06-15 22:54:11,909][1651669] Updated weights for policy 0, policy_version 957728 (0.0018) [2024-06-15 22:54:14,606][1651669] Updated weights for policy 0, policy_version 957776 (0.0012) [2024-06-15 22:54:15,766][1648981] Fps is (10 sec: 49151.6, 60 sec: 49164.2, 300 sec: 49318.6). Total num frames: 1961623552. Throughput: 0: 12390.4. Samples: 490474496. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:15,767][1648981] Avg episode reward: [(0, '867.100')] [2024-06-15 22:54:15,936][1651669] Updated weights for policy 0, policy_version 957840 (0.0022) [2024-06-15 22:54:17,068][1651669] Updated weights for policy 0, policy_version 957888 (0.0011) [2024-06-15 22:54:20,497][1651669] Updated weights for policy 0, policy_version 957949 (0.0084) [2024-06-15 22:54:20,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 49698.0, 300 sec: 49318.6). Total num frames: 1961885696. Throughput: 0: 12174.8. Samples: 490541056. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:20,767][1648981] Avg episode reward: [(0, '855.760')] [2024-06-15 22:54:25,637][1651669] Updated weights for policy 0, policy_version 958048 (0.0012) [2024-06-15 22:54:25,769][1648981] Fps is (10 sec: 45866.7, 60 sec: 49150.5, 300 sec: 49096.1). Total num frames: 1962082304. Throughput: 0: 12276.1. Samples: 490578432. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:25,771][1648981] Avg episode reward: [(0, '874.910')] [2024-06-15 22:54:27,013][1651669] Updated weights for policy 0, policy_version 958107 (0.0015) [2024-06-15 22:54:30,437][1651669] Updated weights for policy 0, policy_version 958160 (0.0012) [2024-06-15 22:54:30,766][1648981] Fps is (10 sec: 42599.1, 60 sec: 48605.9, 300 sec: 48985.6). Total num frames: 1962311680. Throughput: 0: 12425.3. Samples: 490657280. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:30,767][1648981] Avg episode reward: [(0, '896.660')] [2024-06-15 22:54:32,972][1651669] Updated weights for policy 0, policy_version 958240 (0.0012) [2024-06-15 22:54:35,782][1648981] Fps is (10 sec: 49083.7, 60 sec: 48595.0, 300 sec: 49204.9). Total num frames: 1962573824. Throughput: 0: 12431.5. Samples: 490731008. Policy #0 lag: (min: 15.0, avg: 105.3, max: 271.0) [2024-06-15 22:54:35,783][1648981] Avg episode reward: [(0, '903.870')] [2024-06-15 22:54:36,009][1651669] Updated weights for policy 0, policy_version 958304 (0.0030) [2024-06-15 22:54:37,656][1651669] Updated weights for policy 0, policy_version 958368 (0.0014) [2024-06-15 22:54:40,767][1648981] Fps is (10 sec: 49149.4, 60 sec: 48059.4, 300 sec: 49096.4). Total num frames: 1962803200. Throughput: 0: 12018.8. Samples: 490756608. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:54:40,768][1648981] Avg episode reward: [(0, '936.160')] [2024-06-15 22:54:41,304][1651274] Signal inference workers to stop experience collection... (50300 times) [2024-06-15 22:54:41,380][1651669] InferenceWorker_p0-w0: stopping experience collection (50300 times) [2024-06-15 22:54:41,525][1651274] Signal inference workers to resume experience collection... (50300 times) [2024-06-15 22:54:41,526][1651669] InferenceWorker_p0-w0: resuming experience collection (50300 times) [2024-06-15 22:54:41,707][1651669] Updated weights for policy 0, policy_version 958419 (0.0012) [2024-06-15 22:54:43,763][1651669] Updated weights for policy 0, policy_version 958512 (0.0015) [2024-06-15 22:54:45,766][1648981] Fps is (10 sec: 49229.6, 60 sec: 48062.0, 300 sec: 49207.5). Total num frames: 1963065344. Throughput: 0: 12168.9. Samples: 490827776. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:54:45,767][1648981] Avg episode reward: [(0, '909.310')] [2024-06-15 22:54:47,700][1651669] Updated weights for policy 0, policy_version 958589 (0.0012) [2024-06-15 22:54:49,943][1651669] Updated weights for policy 0, policy_version 958656 (0.0012) [2024-06-15 22:54:50,781][1648981] Fps is (10 sec: 52352.5, 60 sec: 49139.7, 300 sec: 49094.0). Total num frames: 1963327488. Throughput: 0: 11999.5. Samples: 490897920. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:54:50,782][1648981] Avg episode reward: [(0, '900.240')] [2024-06-15 22:54:53,018][1651669] Updated weights for policy 0, policy_version 958713 (0.0013) [2024-06-15 22:54:55,077][1651669] Updated weights for policy 0, policy_version 958771 (0.0013) [2024-06-15 22:54:55,766][1648981] Fps is (10 sec: 52428.9, 60 sec: 49164.2, 300 sec: 49429.7). Total num frames: 1963589632. Throughput: 0: 12026.3. Samples: 490935808. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:54:55,767][1648981] Avg episode reward: [(0, '920.550')] [2024-06-15 22:54:55,771][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000958784_1963589632.pth... [2024-06-15 22:54:55,815][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000953008_1951760384.pth [2024-06-15 22:54:58,007][1651669] Updated weights for policy 0, policy_version 958816 (0.0021) [2024-06-15 22:54:59,053][1651669] Updated weights for policy 0, policy_version 958864 (0.0014) [2024-06-15 22:55:00,429][1651669] Updated weights for policy 0, policy_version 958912 (0.0011) [2024-06-15 22:55:00,766][1648981] Fps is (10 sec: 52507.9, 60 sec: 50263.0, 300 sec: 49207.6). Total num frames: 1963851776. Throughput: 0: 12026.3. Samples: 491015680. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:00,767][1648981] Avg episode reward: [(0, '983.870')] [2024-06-15 22:55:03,433][1651669] Updated weights for policy 0, policy_version 958970 (0.0023) [2024-06-15 22:55:05,530][1651669] Updated weights for policy 0, policy_version 959029 (0.0014) [2024-06-15 22:55:05,766][1648981] Fps is (10 sec: 52429.8, 60 sec: 49698.3, 300 sec: 49763.0). Total num frames: 1964113920. Throughput: 0: 12162.9. Samples: 491088384. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:05,767][1648981] Avg episode reward: [(0, '971.340')] [2024-06-15 22:55:08,323][1651669] Updated weights for policy 0, policy_version 959076 (0.0011) [2024-06-15 22:55:09,418][1651669] Updated weights for policy 0, policy_version 959127 (0.0012) [2024-06-15 22:55:10,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 50244.2, 300 sec: 49208.8). Total num frames: 1964376064. Throughput: 0: 12163.3. Samples: 491125760. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:10,769][1648981] Avg episode reward: [(0, '945.330')] [2024-06-15 22:55:13,418][1651669] Updated weights for policy 0, policy_version 959184 (0.0013) [2024-06-15 22:55:15,773][1648981] Fps is (10 sec: 39294.6, 60 sec: 48054.4, 300 sec: 49317.5). Total num frames: 1964507136. Throughput: 0: 12161.0. Samples: 491204608. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:15,774][1648981] Avg episode reward: [(0, '957.330')] [2024-06-15 22:55:16,048][1651669] Updated weights for policy 0, policy_version 959250 (0.0012) [2024-06-15 22:55:17,102][1651669] Updated weights for policy 0, policy_version 959296 (0.0010) [2024-06-15 22:55:19,518][1651669] Updated weights for policy 0, policy_version 959362 (0.0013) [2024-06-15 22:55:20,294][1651274] Signal inference workers to stop experience collection... (50350 times) [2024-06-15 22:55:20,345][1651669] InferenceWorker_p0-w0: stopping experience collection (50350 times) [2024-06-15 22:55:20,580][1651274] Signal inference workers to resume experience collection... (50350 times) [2024-06-15 22:55:20,581][1651669] InferenceWorker_p0-w0: resuming experience collection (50350 times) [2024-06-15 22:55:20,766][1648981] Fps is (10 sec: 52429.6, 60 sec: 50244.4, 300 sec: 49318.7). Total num frames: 1964900352. Throughput: 0: 11894.0. Samples: 491266048. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:20,767][1648981] Avg episode reward: [(0, '980.630')] [2024-06-15 22:55:24,040][1651669] Updated weights for policy 0, policy_version 959440 (0.0011) [2024-06-15 22:55:25,766][1648981] Fps is (10 sec: 52463.9, 60 sec: 49153.5, 300 sec: 49318.6). Total num frames: 1965031424. Throughput: 0: 12401.9. Samples: 491314688. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:25,767][1648981] Avg episode reward: [(0, '963.210')] [2024-06-15 22:55:26,810][1651669] Updated weights for policy 0, policy_version 959505 (0.0012) [2024-06-15 22:55:27,843][1651669] Updated weights for policy 0, policy_version 959552 (0.0013) [2024-06-15 22:55:29,953][1651669] Updated weights for policy 0, policy_version 959619 (0.0108) [2024-06-15 22:55:30,766][1648981] Fps is (10 sec: 45874.8, 60 sec: 50790.3, 300 sec: 49429.7). Total num frames: 1965359104. Throughput: 0: 12356.3. Samples: 491383808. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:30,767][1648981] Avg episode reward: [(0, '960.970')] [2024-06-15 22:55:31,188][1651669] Updated weights for policy 0, policy_version 959675 (0.0012) [2024-06-15 22:55:35,425][1651669] Updated weights for policy 0, policy_version 959728 (0.0025) [2024-06-15 22:55:35,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 49711.0, 300 sec: 49318.6). Total num frames: 1965555712. Throughput: 0: 12451.4. Samples: 491458048. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:35,767][1648981] Avg episode reward: [(0, '987.810')] [2024-06-15 22:55:38,224][1651669] Updated weights for policy 0, policy_version 959778 (0.0013) [2024-06-15 22:55:40,394][1651669] Updated weights for policy 0, policy_version 959840 (0.0012) [2024-06-15 22:55:40,766][1648981] Fps is (10 sec: 42598.4, 60 sec: 49698.5, 300 sec: 49318.6). Total num frames: 1965785088. Throughput: 0: 12299.4. Samples: 491489280. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:40,767][1648981] Avg episode reward: [(0, '1047.750')] [2024-06-15 22:55:42,138][1651669] Updated weights for policy 0, policy_version 959920 (0.0116) [2024-06-15 22:55:45,206][1651669] Updated weights for policy 0, policy_version 959952 (0.0013) [2024-06-15 22:55:45,766][1648981] Fps is (10 sec: 45876.6, 60 sec: 49152.1, 300 sec: 49097.9). Total num frames: 1966014464. Throughput: 0: 12288.0. Samples: 491568640. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:45,767][1648981] Avg episode reward: [(0, '1059.160')] [2024-06-15 22:55:48,001][1651669] Updated weights for policy 0, policy_version 960016 (0.0079) [2024-06-15 22:55:50,766][1648981] Fps is (10 sec: 45875.4, 60 sec: 48618.0, 300 sec: 49318.7). Total num frames: 1966243840. Throughput: 0: 12401.7. Samples: 491646464. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:50,767][1648981] Avg episode reward: [(0, '1072.170')] [2024-06-15 22:55:50,872][1651669] Updated weights for policy 0, policy_version 960082 (0.0106) [2024-06-15 22:55:53,199][1651669] Updated weights for policy 0, policy_version 960176 (0.0011) [2024-06-15 22:55:55,766][1648981] Fps is (10 sec: 49151.5, 60 sec: 48605.8, 300 sec: 48985.4). Total num frames: 1966505984. Throughput: 0: 12003.6. Samples: 491665920. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:55:55,767][1648981] Avg episode reward: [(0, '1041.250')] [2024-06-15 22:55:56,025][1651669] Updated weights for policy 0, policy_version 960224 (0.0013) [2024-06-15 22:55:59,806][1651669] Updated weights for policy 0, policy_version 960290 (0.0012) [2024-06-15 22:56:00,777][1648981] Fps is (10 sec: 49098.6, 60 sec: 48051.0, 300 sec: 49427.9). Total num frames: 1966735360. Throughput: 0: 12264.1. Samples: 491756544. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:00,778][1648981] Avg episode reward: [(0, '992.090')] [2024-06-15 22:56:02,926][1651669] Updated weights for policy 0, policy_version 960369 (0.0057) [2024-06-15 22:56:03,222][1651274] Signal inference workers to stop experience collection... (50400 times) [2024-06-15 22:56:03,303][1651669] InferenceWorker_p0-w0: stopping experience collection (50400 times) [2024-06-15 22:56:03,487][1651274] Signal inference workers to resume experience collection... (50400 times) [2024-06-15 22:56:03,488][1651669] InferenceWorker_p0-w0: resuming experience collection (50400 times) [2024-06-15 22:56:04,354][1651669] Updated weights for policy 0, policy_version 960444 (0.0098) [2024-06-15 22:56:05,766][1648981] Fps is (10 sec: 49152.3, 60 sec: 48059.6, 300 sec: 48876.2). Total num frames: 1966997504. Throughput: 0: 12310.8. Samples: 491820032. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:05,767][1648981] Avg episode reward: [(0, '985.020')] [2024-06-15 22:56:07,139][1651669] Updated weights for policy 0, policy_version 960501 (0.0012) [2024-06-15 22:56:10,766][1648981] Fps is (10 sec: 42644.6, 60 sec: 46421.4, 300 sec: 49097.5). Total num frames: 1967161344. Throughput: 0: 12140.1. Samples: 491860992. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:10,767][1648981] Avg episode reward: [(0, '1017.850')] [2024-06-15 22:56:11,022][1651669] Updated weights for policy 0, policy_version 960544 (0.0011) [2024-06-15 22:56:12,658][1651669] Updated weights for policy 0, policy_version 960592 (0.0012) [2024-06-15 22:56:14,663][1651669] Updated weights for policy 0, policy_version 960658 (0.0028) [2024-06-15 22:56:15,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 50249.9, 300 sec: 48874.3). Total num frames: 1967521792. Throughput: 0: 12071.8. Samples: 491927040. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:15,767][1648981] Avg episode reward: [(0, '993.030')] [2024-06-15 22:56:17,875][1651669] Updated weights for policy 0, policy_version 960743 (0.0016) [2024-06-15 22:56:20,766][1648981] Fps is (10 sec: 49152.1, 60 sec: 45875.2, 300 sec: 49099.7). Total num frames: 1967652864. Throughput: 0: 12197.0. Samples: 492006912. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:20,767][1648981] Avg episode reward: [(0, '1027.410')] [2024-06-15 22:56:21,371][1651669] Updated weights for policy 0, policy_version 960775 (0.0013) [2024-06-15 22:56:23,444][1651669] Updated weights for policy 0, policy_version 960834 (0.0026) [2024-06-15 22:56:25,114][1651669] Updated weights for policy 0, policy_version 960896 (0.0033) [2024-06-15 22:56:25,766][1648981] Fps is (10 sec: 42598.5, 60 sec: 48605.9, 300 sec: 48763.2). Total num frames: 1967947776. Throughput: 0: 12288.0. Samples: 492042240. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:25,767][1648981] Avg episode reward: [(0, '997.310')] [2024-06-15 22:56:26,514][1651669] Updated weights for policy 0, policy_version 960958 (0.0011) [2024-06-15 22:56:29,477][1651669] Updated weights for policy 0, policy_version 961022 (0.0012) [2024-06-15 22:56:30,777][1648981] Fps is (10 sec: 52373.6, 60 sec: 46959.2, 300 sec: 49316.8). Total num frames: 1968177152. Throughput: 0: 11773.2. Samples: 492098560. Policy #0 lag: (min: 47.0, avg: 153.5, max: 303.0) [2024-06-15 22:56:30,778][1648981] Avg episode reward: [(0, '998.880')] [2024-06-15 22:56:34,039][1651669] Updated weights for policy 0, policy_version 961073 (0.0012) [2024-06-15 22:56:35,091][1651669] Updated weights for policy 0, policy_version 961121 (0.0011) [2024-06-15 22:56:35,766][1648981] Fps is (10 sec: 49151.9, 60 sec: 48059.9, 300 sec: 48874.3). Total num frames: 1968439296. Throughput: 0: 11878.4. Samples: 492180992. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:56:35,767][1648981] Avg episode reward: [(0, '991.100')] [2024-06-15 22:56:36,334][1651669] Updated weights for policy 0, policy_version 961184 (0.0016) [2024-06-15 22:56:39,098][1651669] Updated weights for policy 0, policy_version 961232 (0.0030) [2024-06-15 22:56:39,861][1651669] Updated weights for policy 0, policy_version 961268 (0.0017) [2024-06-15 22:56:40,766][1648981] Fps is (10 sec: 52484.3, 60 sec: 48605.9, 300 sec: 49319.1). Total num frames: 1968701440. Throughput: 0: 12208.4. Samples: 492215296. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:56:40,767][1648981] Avg episode reward: [(0, '940.950')] [2024-06-15 22:56:44,728][1651669] Updated weights for policy 0, policy_version 961328 (0.0011) [2024-06-15 22:56:45,441][1651274] Signal inference workers to stop experience collection... (50450 times) [2024-06-15 22:56:45,483][1651669] InferenceWorker_p0-w0: stopping experience collection (50450 times) [2024-06-15 22:56:45,735][1651274] Signal inference workers to resume experience collection... (50450 times) [2024-06-15 22:56:45,736][1651669] InferenceWorker_p0-w0: resuming experience collection (50450 times) [2024-06-15 22:56:45,770][1648981] Fps is (10 sec: 42582.2, 60 sec: 47510.6, 300 sec: 48651.5). Total num frames: 1968865280. Throughput: 0: 11982.7. Samples: 492295680. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:56:45,771][1648981] Avg episode reward: [(0, '928.440')] [2024-06-15 22:56:46,796][1651669] Updated weights for policy 0, policy_version 961408 (0.0028) [2024-06-15 22:56:48,138][1651669] Updated weights for policy 0, policy_version 961472 (0.0012) [2024-06-15 22:56:50,768][1648981] Fps is (10 sec: 45867.5, 60 sec: 48604.5, 300 sec: 49096.2). Total num frames: 1969160192. Throughput: 0: 11969.0. Samples: 492358656. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:56:50,769][1648981] Avg episode reward: [(0, '920.560')] [2024-06-15 22:56:51,183][1651669] Updated weights for policy 0, policy_version 961535 (0.0086) [2024-06-15 22:56:55,766][1648981] Fps is (10 sec: 45892.3, 60 sec: 46967.4, 300 sec: 48541.1). Total num frames: 1969324032. Throughput: 0: 11992.2. Samples: 492400640. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:56:55,767][1648981] Avg episode reward: [(0, '929.330')] [2024-06-15 22:56:55,857][1651669] Updated weights for policy 0, policy_version 961587 (0.0014) [2024-06-15 22:56:56,434][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000961616_1969389568.pth... [2024-06-15 22:56:56,541][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000955952_1957789696.pth [2024-06-15 22:56:57,408][1651669] Updated weights for policy 0, policy_version 961652 (0.0085) [2024-06-15 22:56:59,024][1651669] Updated weights for policy 0, policy_version 961728 (0.0011) [2024-06-15 22:57:00,766][1648981] Fps is (10 sec: 45883.1, 60 sec: 48068.5, 300 sec: 48874.3). Total num frames: 1969618944. Throughput: 0: 11844.3. Samples: 492460032. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:00,767][1648981] Avg episode reward: [(0, '943.120')] [2024-06-15 22:57:05,780][1648981] Fps is (10 sec: 42542.7, 60 sec: 45865.1, 300 sec: 48427.8). Total num frames: 1969750016. Throughput: 0: 11909.1. Samples: 492542976. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:05,780][1648981] Avg episode reward: [(0, '951.100')] [2024-06-15 22:57:06,068][1651669] Updated weights for policy 0, policy_version 961808 (0.0087) [2024-06-15 22:57:07,262][1651669] Updated weights for policy 0, policy_version 961857 (0.0014) [2024-06-15 22:57:08,440][1651669] Updated weights for policy 0, policy_version 961920 (0.0087) [2024-06-15 22:57:09,605][1651669] Updated weights for policy 0, policy_version 961976 (0.0017) [2024-06-15 22:57:10,766][1648981] Fps is (10 sec: 52428.8, 60 sec: 49698.2, 300 sec: 48878.1). Total num frames: 1970143232. Throughput: 0: 11764.6. Samples: 492571648. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:10,767][1648981] Avg episode reward: [(0, '966.210')] [2024-06-15 22:57:12,186][1651669] Updated weights for policy 0, policy_version 962032 (0.0011) [2024-06-15 22:57:15,523][1651669] Updated weights for policy 0, policy_version 962096 (0.0010) [2024-06-15 22:57:15,767][1648981] Fps is (10 sec: 65620.6, 60 sec: 48059.5, 300 sec: 48874.3). Total num frames: 1970405376. Throughput: 0: 12746.0. Samples: 492672000. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:15,767][1648981] Avg episode reward: [(0, '1008.080')] [2024-06-15 22:57:16,504][1651669] Updated weights for policy 0, policy_version 962149 (0.0011) [2024-06-15 22:57:17,680][1651669] Updated weights for policy 0, policy_version 962211 (0.0032) [2024-06-15 22:57:20,369][1651274] Signal inference workers to stop experience collection... (50500 times) [2024-06-15 22:57:20,400][1651669] InferenceWorker_p0-w0: stopping experience collection (50500 times) [2024-06-15 22:57:20,564][1651274] Signal inference workers to resume experience collection... (50500 times) [2024-06-15 22:57:20,564][1651669] InferenceWorker_p0-w0: resuming experience collection (50500 times) [2024-06-15 22:57:20,566][1651669] Updated weights for policy 0, policy_version 962288 (0.0013) [2024-06-15 22:57:20,766][1648981] Fps is (10 sec: 62259.2, 60 sec: 51882.7, 300 sec: 49207.5). Total num frames: 1970765824. Throughput: 0: 12652.1. Samples: 492750336. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:20,767][1648981] Avg episode reward: [(0, '964.070')] [2024-06-15 22:57:23,967][1651669] Updated weights for policy 0, policy_version 962325 (0.0011) [2024-06-15 22:57:25,402][1651669] Updated weights for policy 0, policy_version 962400 (0.0011) [2024-06-15 22:57:25,771][1648981] Fps is (10 sec: 62229.0, 60 sec: 51332.1, 300 sec: 49539.9). Total num frames: 1971027968. Throughput: 0: 13060.2. Samples: 492803072. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:25,772][1648981] Avg episode reward: [(0, '932.280')] [2024-06-15 22:57:26,795][1651669] Updated weights for policy 0, policy_version 962467 (0.0012) [2024-06-15 22:57:28,856][1651669] Updated weights for policy 0, policy_version 962499 (0.0012) [2024-06-15 22:57:29,822][1651669] Updated weights for policy 0, policy_version 962558 (0.0010) [2024-06-15 22:57:30,766][1648981] Fps is (10 sec: 55706.5, 60 sec: 52438.2, 300 sec: 49318.6). Total num frames: 1971322880. Throughput: 0: 13256.3. Samples: 492892160. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:30,766][1648981] Avg episode reward: [(0, '955.480')] [2024-06-15 22:57:33,433][1651669] Updated weights for policy 0, policy_version 962625 (0.0014) [2024-06-15 22:57:34,924][1651669] Updated weights for policy 0, policy_version 962704 (0.0099) [2024-06-15 22:57:35,624][1651669] Updated weights for policy 0, policy_version 962752 (0.0009) [2024-06-15 22:57:35,766][1648981] Fps is (10 sec: 68848.7, 60 sec: 54613.4, 300 sec: 50207.3). Total num frames: 1971716096. Throughput: 0: 13733.5. Samples: 492976640. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:35,767][1648981] Avg episode reward: [(0, '973.700')] [2024-06-15 22:57:38,647][1651669] Updated weights for policy 0, policy_version 962810 (0.0110) [2024-06-15 22:57:40,766][1648981] Fps is (10 sec: 55705.3, 60 sec: 52975.0, 300 sec: 49429.7). Total num frames: 1971879936. Throughput: 0: 13983.3. Samples: 493029888. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:40,767][1648981] Avg episode reward: [(0, '1013.540')] [2024-06-15 22:57:41,610][1651669] Updated weights for policy 0, policy_version 962883 (0.0012) [2024-06-15 22:57:42,707][1651669] Updated weights for policy 0, policy_version 962946 (0.0010) [2024-06-15 22:57:43,667][1651669] Updated weights for policy 0, policy_version 962998 (0.0020) [2024-06-15 22:57:45,766][1648981] Fps is (10 sec: 52428.6, 60 sec: 56255.3, 300 sec: 50207.3). Total num frames: 1972240384. Throughput: 0: 14768.4. Samples: 493124608. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:45,767][1648981] Avg episode reward: [(0, '1018.150')] [2024-06-15 22:57:46,874][1651669] Updated weights for policy 0, policy_version 963047 (0.0010) [2024-06-15 22:57:48,309][1651669] Updated weights for policy 0, policy_version 963092 (0.0011) [2024-06-15 22:57:49,622][1651669] Updated weights for policy 0, policy_version 963156 (0.0012) [2024-06-15 22:57:50,685][1651274] Signal inference workers to stop experience collection... (50550 times) [2024-06-15 22:57:50,723][1651669] InferenceWorker_p0-w0: stopping experience collection (50550 times) [2024-06-15 22:57:50,766][1648981] Fps is (10 sec: 75365.7, 60 sec: 57891.8, 300 sec: 50429.4). Total num frames: 1972633600. Throughput: 0: 14977.5. Samples: 493216768. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:50,767][1648981] Avg episode reward: [(0, '1021.910')] [2024-06-15 22:57:50,861][1651274] Signal inference workers to resume experience collection... (50550 times) [2024-06-15 22:57:50,862][1651669] InferenceWorker_p0-w0: resuming experience collection (50550 times) [2024-06-15 22:57:50,864][1651669] Updated weights for policy 0, policy_version 963216 (0.0012) [2024-06-15 22:57:54,276][1651669] Updated weights for policy 0, policy_version 963268 (0.0011) [2024-06-15 22:57:55,360][1651669] Updated weights for policy 0, policy_version 963324 (0.0017) [2024-06-15 22:57:55,766][1648981] Fps is (10 sec: 65535.9, 60 sec: 59528.6, 300 sec: 50540.6). Total num frames: 1972895744. Throughput: 0: 15678.6. Samples: 493277184. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:57:55,767][1648981] Avg episode reward: [(0, '1106.730')] [2024-06-15 22:57:57,455][1651669] Updated weights for policy 0, policy_version 963376 (0.0010) [2024-06-15 22:57:58,712][1651669] Updated weights for policy 0, policy_version 963451 (0.0014) [2024-06-15 22:57:59,814][1651669] Updated weights for policy 0, policy_version 963504 (0.0012) [2024-06-15 22:58:00,766][1648981] Fps is (10 sec: 65535.7, 60 sec: 61166.9, 300 sec: 51095.9). Total num frames: 1973288960. Throughput: 0: 15394.2. Samples: 493364736. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:00,767][1648981] Avg episode reward: [(0, '1142.610')] [2024-06-15 22:58:02,712][1651669] Updated weights for policy 0, policy_version 963536 (0.0013) [2024-06-15 22:58:05,501][1651669] Updated weights for policy 0, policy_version 963616 (0.0011) [2024-06-15 22:58:05,766][1648981] Fps is (10 sec: 62259.1, 60 sec: 62819.1, 300 sec: 51207.4). Total num frames: 1973518336. Throughput: 0: 15735.5. Samples: 493458432. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:05,767][1648981] Avg episode reward: [(0, '1134.080')] [2024-06-15 22:58:06,920][1651669] Updated weights for policy 0, policy_version 963696 (0.0120) [2024-06-15 22:58:07,873][1651669] Updated weights for policy 0, policy_version 963728 (0.0018) [2024-06-15 22:58:10,800][1648981] Fps is (10 sec: 52254.3, 60 sec: 61132.9, 300 sec: 51314.8). Total num frames: 1973813248. Throughput: 0: 15498.2. Samples: 493500928. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:10,800][1648981] Avg episode reward: [(0, '1136.590')] [2024-06-15 22:58:11,655][1651669] Updated weights for policy 0, policy_version 963795 (0.0013) [2024-06-15 22:58:12,337][1651669] Updated weights for policy 0, policy_version 963840 (0.0010) [2024-06-15 22:58:14,967][1651669] Updated weights for policy 0, policy_version 963920 (0.0012) [2024-06-15 22:58:15,766][1648981] Fps is (10 sec: 65535.7, 60 sec: 62805.6, 300 sec: 51762.3). Total num frames: 1974173696. Throughput: 0: 15553.4. Samples: 493592064. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:15,767][1648981] Avg episode reward: [(0, '1092.970')] [2024-06-15 22:58:17,013][1651669] Updated weights for policy 0, policy_version 963969 (0.0013) [2024-06-15 22:58:17,826][1651669] Updated weights for policy 0, policy_version 964024 (0.0010) [2024-06-15 22:58:20,121][1651669] Updated weights for policy 0, policy_version 964082 (0.0015) [2024-06-15 22:58:20,766][1648981] Fps is (10 sec: 65755.8, 60 sec: 61713.1, 300 sec: 51984.5). Total num frames: 1974468608. Throughput: 0: 15872.0. Samples: 493690880. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:20,767][1648981] Avg episode reward: [(0, '1105.350')] [2024-06-15 22:58:22,243][1651669] Updated weights for policy 0, policy_version 964128 (0.0012) [2024-06-15 22:58:22,855][1651274] Signal inference workers to stop experience collection... (50600 times) [2024-06-15 22:58:22,877][1651669] InferenceWorker_p0-w0: stopping experience collection (50600 times) [2024-06-15 22:58:23,028][1651274] Signal inference workers to resume experience collection... (50600 times) [2024-06-15 22:58:23,028][1651669] InferenceWorker_p0-w0: resuming experience collection (50600 times) [2024-06-15 22:58:23,558][1651669] Updated weights for policy 0, policy_version 964195 (0.0149) [2024-06-15 22:58:25,767][1648981] Fps is (10 sec: 55704.2, 60 sec: 61718.1, 300 sec: 51984.4). Total num frames: 1974730752. Throughput: 0: 15587.4. Samples: 493731328. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:25,768][1648981] Avg episode reward: [(0, '1160.580')] [2024-06-15 22:58:26,059][1651669] Updated weights for policy 0, policy_version 964240 (0.0011) [2024-06-15 22:58:26,821][1651669] Updated weights for policy 0, policy_version 964288 (0.0011) [2024-06-15 22:58:29,315][1651669] Updated weights for policy 0, policy_version 964352 (0.0011) [2024-06-15 22:58:30,766][1648981] Fps is (10 sec: 58982.1, 60 sec: 62259.0, 300 sec: 52207.1). Total num frames: 1975058432. Throughput: 0: 15633.0. Samples: 493828096. Policy #0 lag: (min: 63.0, avg: 145.8, max: 319.0) [2024-06-15 22:58:30,767][1648981] Avg episode reward: [(0, '1123.380')] [2024-06-15 22:58:31,117][1651669] Updated weights for policy 0, policy_version 964400 (0.0012) [2024-06-15 22:58:32,075][1651669] Updated weights for policy 0, policy_version 964449 (0.0014) [2024-06-15 22:58:34,239][1651669] Updated weights for policy 0, policy_version 964489 (0.0010) [2024-06-15 22:58:35,170][1651669] Updated weights for policy 0, policy_version 964542 (0.0014) [2024-06-15 22:58:35,766][1648981] Fps is (10 sec: 65538.1, 60 sec: 61166.9, 300 sec: 52428.8). Total num frames: 1975386112. Throughput: 0: 15644.5. Samples: 493920768. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:58:35,767][1648981] Avg episode reward: [(0, '1100.910')] [2024-06-15 22:58:37,756][1651669] Updated weights for policy 0, policy_version 964599 (0.0012) [2024-06-15 22:58:39,215][1651669] Updated weights for policy 0, policy_version 964656 (0.0083) [2024-06-15 22:58:40,471][1651669] Updated weights for policy 0, policy_version 964707 (0.0011) [2024-06-15 22:58:40,766][1648981] Fps is (10 sec: 68813.1, 60 sec: 64443.6, 300 sec: 52762.6). Total num frames: 1975746560. Throughput: 0: 15371.4. Samples: 493968896. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:58:40,767][1648981] Avg episode reward: [(0, '1098.820')] [2024-06-15 22:58:43,433][1651669] Updated weights for policy 0, policy_version 964752 (0.0011) [2024-06-15 22:58:44,179][1651669] Updated weights for policy 0, policy_version 964795 (0.0011) [2024-06-15 22:58:45,770][1648981] Fps is (10 sec: 52408.4, 60 sec: 61163.0, 300 sec: 52650.3). Total num frames: 1975910400. Throughput: 0: 15301.8. Samples: 494053376. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:58:45,771][1648981] Avg episode reward: [(0, '1111.540')] [2024-06-15 22:58:46,629][1651669] Updated weights for policy 0, policy_version 964840 (0.0012) [2024-06-15 22:58:47,947][1651669] Updated weights for policy 0, policy_version 964912 (0.0014) [2024-06-15 22:58:49,355][1651669] Updated weights for policy 0, policy_version 964980 (0.0091) [2024-06-15 22:58:50,766][1648981] Fps is (10 sec: 55705.8, 60 sec: 61167.0, 300 sec: 53098.0). Total num frames: 1976303616. Throughput: 0: 15121.1. Samples: 494138880. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:58:50,767][1648981] Avg episode reward: [(0, '1079.420')] [2024-06-15 22:58:52,530][1651669] Updated weights for policy 0, policy_version 965025 (0.0015) [2024-06-15 22:58:55,171][1651669] Updated weights for policy 0, policy_version 965061 (0.0020) [2024-06-15 22:58:55,766][1648981] Fps is (10 sec: 62283.0, 60 sec: 60620.8, 300 sec: 53210.4). Total num frames: 1976532992. Throughput: 0: 15303.1. Samples: 494189056. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:58:55,767][1648981] Avg episode reward: [(0, '1054.730')] [2024-06-15 22:58:56,042][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000965120_1976565760.pth... [2024-06-15 22:58:56,119][1651274] Signal inference workers to stop experience collection... (50650 times) [2024-06-15 22:58:56,152][1651669] InferenceWorker_p0-w0: stopping experience collection (50650 times) [2024-06-15 22:58:56,153][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000958784_1963589632.pth [2024-06-15 22:58:56,357][1651274] Signal inference workers to resume experience collection... (50650 times) [2024-06-15 22:58:56,357][1651669] InferenceWorker_p0-w0: resuming experience collection (50650 times) [2024-06-15 22:58:56,494][1651669] Updated weights for policy 0, policy_version 965137 (0.0093) [2024-06-15 22:58:57,568][1651669] Updated weights for policy 0, policy_version 965202 (0.0014) [2024-06-15 22:59:00,072][1651669] Updated weights for policy 0, policy_version 965250 (0.0012) [2024-06-15 22:59:00,766][1648981] Fps is (10 sec: 58982.3, 60 sec: 60074.7, 300 sec: 53428.5). Total num frames: 1976893440. Throughput: 0: 15314.5. Samples: 494281216. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:00,767][1648981] Avg episode reward: [(0, '1033.140')] [2024-06-15 22:59:00,924][1651669] Updated weights for policy 0, policy_version 965303 (0.0012) [2024-06-15 22:59:04,441][1651669] Updated weights for policy 0, policy_version 965364 (0.0013) [2024-06-15 22:59:05,766][1648981] Fps is (10 sec: 65536.4, 60 sec: 61167.0, 300 sec: 53650.7). Total num frames: 1977188352. Throughput: 0: 15030.1. Samples: 494367232. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:05,767][1648981] Avg episode reward: [(0, '1018.190')] [2024-06-15 22:59:05,997][1651669] Updated weights for policy 0, policy_version 965447 (0.0092) [2024-06-15 22:59:06,811][1651669] Updated weights for policy 0, policy_version 965502 (0.0010) [2024-06-15 22:59:10,055][1651669] Updated weights for policy 0, policy_version 965568 (0.0013) [2024-06-15 22:59:10,766][1648981] Fps is (10 sec: 58982.2, 60 sec: 61201.0, 300 sec: 53761.7). Total num frames: 1977483264. Throughput: 0: 15189.4. Samples: 494414848. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:10,767][1648981] Avg episode reward: [(0, '963.760')] [2024-06-15 22:59:13,129][1651669] Updated weights for policy 0, policy_version 965616 (0.0013) [2024-06-15 22:59:14,458][1651669] Updated weights for policy 0, policy_version 965680 (0.0011) [2024-06-15 22:59:15,674][1651669] Updated weights for policy 0, policy_version 965750 (0.0019) [2024-06-15 22:59:15,766][1648981] Fps is (10 sec: 65535.7, 60 sec: 61167.0, 300 sec: 54095.0). Total num frames: 1977843712. Throughput: 0: 15018.7. Samples: 494503936. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:15,767][1648981] Avg episode reward: [(0, '906.480')] [2024-06-15 22:59:18,069][1651669] Updated weights for policy 0, policy_version 965808 (0.0011) [2024-06-15 22:59:20,766][1648981] Fps is (10 sec: 52428.7, 60 sec: 58982.4, 300 sec: 53984.2). Total num frames: 1978007552. Throughput: 0: 15052.8. Samples: 494598144. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:20,767][1648981] Avg episode reward: [(0, '863.130')] [2024-06-15 22:59:21,907][1651669] Updated weights for policy 0, policy_version 965872 (0.0011) [2024-06-15 22:59:22,948][1651669] Updated weights for policy 0, policy_version 965922 (0.0010) [2024-06-15 22:59:24,292][1651669] Updated weights for policy 0, policy_version 965972 (0.0014) [2024-06-15 22:59:25,767][1648981] Fps is (10 sec: 55704.0, 60 sec: 61166.9, 300 sec: 54539.2). Total num frames: 1978400768. Throughput: 0: 14813.8. Samples: 494635520. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:25,768][1648981] Avg episode reward: [(0, '859.420')] [2024-06-15 22:59:26,722][1651669] Updated weights for policy 0, policy_version 966032 (0.0026) [2024-06-15 22:59:27,110][1651274] Signal inference workers to stop experience collection... (50700 times) [2024-06-15 22:59:27,139][1651669] InferenceWorker_p0-w0: stopping experience collection (50700 times) [2024-06-15 22:59:27,347][1651274] Signal inference workers to resume experience collection... (50700 times) [2024-06-15 22:59:27,347][1651669] InferenceWorker_p0-w0: resuming experience collection (50700 times) [2024-06-15 22:59:27,523][1651669] Updated weights for policy 0, policy_version 966074 (0.0010) [2024-06-15 22:59:30,767][1648981] Fps is (10 sec: 52427.6, 60 sec: 57889.9, 300 sec: 54097.8). Total num frames: 1978531840. Throughput: 0: 14951.6. Samples: 494726144. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:30,767][1648981] Avg episode reward: [(0, '941.900')] [2024-06-15 22:59:31,331][1651669] Updated weights for policy 0, policy_version 966128 (0.0017) [2024-06-15 22:59:33,042][1651669] Updated weights for policy 0, policy_version 966208 (0.0024) [2024-06-15 22:59:34,197][1651669] Updated weights for policy 0, policy_version 966270 (0.0011) [2024-06-15 22:59:35,768][1648981] Fps is (10 sec: 55696.6, 60 sec: 59526.6, 300 sec: 54761.2). Total num frames: 1978957824. Throughput: 0: 14836.0. Samples: 494806528. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:35,769][1648981] Avg episode reward: [(0, '911.750')] [2024-06-15 22:59:36,498][1651669] Updated weights for policy 0, policy_version 966325 (0.0014) [2024-06-15 22:59:40,187][1651669] Updated weights for policy 0, policy_version 966371 (0.0011) [2024-06-15 22:59:40,770][1648981] Fps is (10 sec: 65511.8, 60 sec: 57340.2, 300 sec: 54649.6). Total num frames: 1979187200. Throughput: 0: 14812.6. Samples: 494855680. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:40,771][1648981] Avg episode reward: [(0, '936.170')] [2024-06-15 22:59:41,509][1651669] Updated weights for policy 0, policy_version 966435 (0.0064) [2024-06-15 22:59:42,687][1651669] Updated weights for policy 0, policy_version 966496 (0.0015) [2024-06-15 22:59:44,818][1651669] Updated weights for policy 0, policy_version 966544 (0.0011) [2024-06-15 22:59:45,674][1651669] Updated weights for policy 0, policy_version 966592 (0.0012) [2024-06-15 22:59:45,766][1648981] Fps is (10 sec: 62271.5, 60 sec: 61170.9, 300 sec: 55097.5). Total num frames: 1979580416. Throughput: 0: 14631.8. Samples: 494939648. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:45,767][1648981] Avg episode reward: [(0, '863.150')] [2024-06-15 22:59:49,000][1651669] Updated weights for policy 0, policy_version 966646 (0.0011) [2024-06-15 22:59:49,715][1651669] Updated weights for policy 0, policy_version 966676 (0.0072) [2024-06-15 22:59:50,767][1648981] Fps is (10 sec: 65560.0, 60 sec: 58982.1, 300 sec: 55094.6). Total num frames: 1979842560. Throughput: 0: 14631.7. Samples: 495025664. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:50,768][1648981] Avg episode reward: [(0, '862.530')] [2024-06-15 22:59:51,209][1651669] Updated weights for policy 0, policy_version 966739 (0.0013) [2024-06-15 22:59:53,661][1651669] Updated weights for policy 0, policy_version 966800 (0.0011) [2024-06-15 22:59:55,767][1648981] Fps is (10 sec: 52427.7, 60 sec: 59528.4, 300 sec: 55094.6). Total num frames: 1980104704. Throughput: 0: 14711.4. Samples: 495076864. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 22:59:55,767][1648981] Avg episode reward: [(0, '848.040')] [2024-06-15 22:59:56,642][1651669] Updated weights for policy 0, policy_version 966850 (0.0011) [2024-06-15 22:59:57,707][1651669] Updated weights for policy 0, policy_version 966912 (0.0011) [2024-06-15 22:59:58,839][1651669] Updated weights for policy 0, policy_version 966976 (0.0013) [2024-06-15 23:00:00,053][1651274] Signal inference workers to stop experience collection... (50750 times) [2024-06-15 23:00:00,133][1651669] InferenceWorker_p0-w0: stopping experience collection (50750 times) [2024-06-15 23:00:00,309][1651274] Signal inference workers to resume experience collection... (50750 times) [2024-06-15 23:00:00,310][1651669] InferenceWorker_p0-w0: resuming experience collection (50750 times) [2024-06-15 23:00:00,653][1651669] Updated weights for policy 0, policy_version 967024 (0.0020) [2024-06-15 23:00:00,767][1648981] Fps is (10 sec: 62259.4, 60 sec: 59528.3, 300 sec: 55427.8). Total num frames: 1980465152. Throughput: 0: 14665.9. Samples: 495163904. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:00,767][1648981] Avg episode reward: [(0, '812.130')] [2024-06-15 23:00:03,040][1651669] Updated weights for policy 0, policy_version 967100 (0.0013) [2024-06-15 23:00:05,766][1648981] Fps is (10 sec: 58983.4, 60 sec: 58436.3, 300 sec: 55316.9). Total num frames: 1980694528. Throughput: 0: 14666.0. Samples: 495258112. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:05,767][1648981] Avg episode reward: [(0, '856.390')] [2024-06-15 23:00:06,280][1651669] Updated weights for policy 0, policy_version 967158 (0.0011) [2024-06-15 23:00:07,512][1651669] Updated weights for policy 0, policy_version 967229 (0.0010) [2024-06-15 23:00:09,725][1651669] Updated weights for policy 0, policy_version 967289 (0.0012) [2024-06-15 23:00:10,766][1648981] Fps is (10 sec: 55706.8, 60 sec: 58982.4, 300 sec: 55984.6). Total num frames: 1981022208. Throughput: 0: 14814.0. Samples: 495302144. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:10,767][1648981] Avg episode reward: [(0, '879.890')] [2024-06-15 23:00:12,030][1651669] Updated weights for policy 0, policy_version 967344 (0.0014) [2024-06-15 23:00:14,259][1651669] Updated weights for policy 0, policy_version 967377 (0.0014) [2024-06-15 23:00:15,022][1651669] Updated weights for policy 0, policy_version 967424 (0.0012) [2024-06-15 23:00:15,767][1648981] Fps is (10 sec: 65532.2, 60 sec: 58435.7, 300 sec: 55761.0). Total num frames: 1981349888. Throughput: 0: 14734.1. Samples: 495389184. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:15,768][1648981] Avg episode reward: [(0, '881.360')] [2024-06-15 23:00:15,952][1651669] Updated weights for policy 0, policy_version 967476 (0.0011) [2024-06-15 23:00:18,362][1651669] Updated weights for policy 0, policy_version 967536 (0.0012) [2024-06-15 23:00:20,060][1651669] Updated weights for policy 0, policy_version 967559 (0.0010) [2024-06-15 23:00:20,766][1648981] Fps is (10 sec: 58982.0, 60 sec: 60074.6, 300 sec: 56205.4). Total num frames: 1981612032. Throughput: 0: 14985.1. Samples: 495480832. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:20,767][1648981] Avg episode reward: [(0, '869.300')] [2024-06-15 23:00:21,185][1651669] Updated weights for policy 0, policy_version 967611 (0.0029) [2024-06-15 23:00:23,174][1651669] Updated weights for policy 0, policy_version 967670 (0.0011) [2024-06-15 23:00:24,054][1651669] Updated weights for policy 0, policy_version 967712 (0.0015) [2024-06-15 23:00:25,768][1648981] Fps is (10 sec: 58974.0, 60 sec: 58980.7, 300 sec: 56205.1). Total num frames: 1981939712. Throughput: 0: 14985.2. Samples: 495529984. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:25,769][1648981] Avg episode reward: [(0, '880.300')] [2024-06-15 23:00:26,708][1651669] Updated weights for policy 0, policy_version 967776 (0.0011) [2024-06-15 23:00:29,072][1651669] Updated weights for policy 0, policy_version 967840 (0.0011) [2024-06-15 23:00:30,767][1648981] Fps is (10 sec: 58981.7, 60 sec: 61167.0, 300 sec: 56427.6). Total num frames: 1982201856. Throughput: 0: 15075.5. Samples: 495618048. Policy #0 lag: (min: 111.0, avg: 213.5, max: 367.0) [2024-06-15 23:00:30,767][1648981] Avg episode reward: [(0, '907.080')] [2024-06-15 23:00:31,571][1651669] Updated weights for policy 0, policy_version 967910 (0.0025) [2024-06-15 23:00:32,788][1651669] Updated weights for policy 0, policy_version 967955 (0.0012) [2024-06-15 23:00:34,749][1651669] Updated weights for policy 0, policy_version 968002 (0.0011) [2024-06-15 23:00:34,973][1651274] Signal inference workers to stop experience collection... (50800 times) [2024-06-15 23:00:35,017][1651669] InferenceWorker_p0-w0: stopping experience collection (50800 times) [2024-06-15 23:00:35,170][1651274] Signal inference workers to resume experience collection... (50800 times) [2024-06-15 23:00:35,171][1651669] InferenceWorker_p0-w0: resuming experience collection (50800 times) [2024-06-15 23:00:35,766][1648981] Fps is (10 sec: 65549.1, 60 sec: 60622.7, 300 sec: 56983.0). Total num frames: 1982595072. Throughput: 0: 15200.8. Samples: 495709696. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:00:35,767][1648981] Avg episode reward: [(0, '907.060')] [2024-06-15 23:00:37,766][1651669] Updated weights for policy 0, policy_version 968080 (0.0013) [2024-06-15 23:00:38,503][1651669] Updated weights for policy 0, policy_version 968120 (0.0011) [2024-06-15 23:00:40,744][1651669] Updated weights for policy 0, policy_version 968176 (0.0013) [2024-06-15 23:00:40,766][1648981] Fps is (10 sec: 62261.0, 60 sec: 60624.9, 300 sec: 56983.0). Total num frames: 1982824448. Throughput: 0: 15007.4. Samples: 495752192. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:00:40,767][1648981] Avg episode reward: [(0, '931.920')] [2024-06-15 23:00:41,903][1651669] Updated weights for policy 0, policy_version 968242 (0.0013) [2024-06-15 23:00:44,585][1651669] Updated weights for policy 0, policy_version 968288 (0.0094) [2024-06-15 23:00:45,024][1651669] Updated weights for policy 0, policy_version 968320 (0.0011) [2024-06-15 23:00:45,767][1648981] Fps is (10 sec: 52428.0, 60 sec: 58982.2, 300 sec: 57205.1). Total num frames: 1983119360. Throughput: 0: 15087.0. Samples: 495842816. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:00:45,767][1648981] Avg episode reward: [(0, '932.710')] [2024-06-15 23:00:47,046][1651669] Updated weights for policy 0, policy_version 968377 (0.0091) [2024-06-15 23:00:50,661][1651669] Updated weights for policy 0, policy_version 968464 (0.0012) [2024-06-15 23:00:50,767][1648981] Fps is (10 sec: 58975.2, 60 sec: 59527.7, 300 sec: 57316.0). Total num frames: 1983414272. Throughput: 0: 14756.6. Samples: 495922176. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:00:50,768][1648981] Avg episode reward: [(0, '978.580')] [2024-06-15 23:00:53,393][1651669] Updated weights for policy 0, policy_version 968519 (0.0015) [2024-06-15 23:00:54,304][1651669] Updated weights for policy 0, policy_version 968571 (0.0020) [2024-06-15 23:00:55,766][1648981] Fps is (10 sec: 55705.9, 60 sec: 59528.6, 300 sec: 57429.4). Total num frames: 1983676416. Throughput: 0: 14825.2. Samples: 495969280. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:00:55,767][1648981] Avg episode reward: [(0, '1029.000')] [2024-06-15 23:00:55,971][1651669] Updated weights for policy 0, policy_version 968611 (0.0015) [2024-06-15 23:00:56,107][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000968624_1983741952.pth... [2024-06-15 23:00:56,191][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000961616_1969389568.pth [2024-06-15 23:00:57,725][1651669] Updated weights for policy 0, policy_version 968672 (0.0024) [2024-06-15 23:00:58,830][1651669] Updated weights for policy 0, policy_version 968725 (0.0012) [2024-06-15 23:01:00,769][1648981] Fps is (10 sec: 62247.4, 60 sec: 59525.8, 300 sec: 57759.9). Total num frames: 1984036864. Throughput: 0: 14949.6. Samples: 496061952. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:00,770][1648981] Avg episode reward: [(0, '1028.340')] [2024-06-15 23:01:02,311][1651669] Updated weights for policy 0, policy_version 968800 (0.0012) [2024-06-15 23:01:04,425][1651669] Updated weights for policy 0, policy_version 968837 (0.0025) [2024-06-15 23:01:05,095][1651669] Updated weights for policy 0, policy_version 968886 (0.0016) [2024-06-15 23:01:05,767][1648981] Fps is (10 sec: 62255.9, 60 sec: 60074.0, 300 sec: 58093.7). Total num frames: 1984299008. Throughput: 0: 14893.3. Samples: 496151040. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:05,768][1648981] Avg episode reward: [(0, '1055.090')] [2024-06-15 23:01:07,244][1651669] Updated weights for policy 0, policy_version 968960 (0.0014) [2024-06-15 23:01:08,479][1651669] Updated weights for policy 0, policy_version 969008 (0.0013) [2024-06-15 23:01:10,180][1651274] Signal inference workers to stop experience collection... (50850 times) [2024-06-15 23:01:10,265][1651669] InferenceWorker_p0-w0: stopping experience collection (50850 times) [2024-06-15 23:01:10,318][1651274] Signal inference workers to resume experience collection... (50850 times) [2024-06-15 23:01:10,318][1651669] InferenceWorker_p0-w0: resuming experience collection (50850 times) [2024-06-15 23:01:10,519][1651669] Updated weights for policy 0, policy_version 969047 (0.0132) [2024-06-15 23:01:10,766][1648981] Fps is (10 sec: 59000.0, 60 sec: 60074.7, 300 sec: 57982.7). Total num frames: 1984626688. Throughput: 0: 14814.5. Samples: 496196608. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:10,767][1648981] Avg episode reward: [(0, '1069.270')] [2024-06-15 23:01:13,134][1651669] Updated weights for policy 0, policy_version 969105 (0.0011) [2024-06-15 23:01:14,595][1651669] Updated weights for policy 0, policy_version 969156 (0.0012) [2024-06-15 23:01:15,766][1648981] Fps is (10 sec: 62263.2, 60 sec: 59529.1, 300 sec: 58538.1). Total num frames: 1984921600. Throughput: 0: 14882.2. Samples: 496287744. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:15,767][1648981] Avg episode reward: [(0, '1091.970')] [2024-06-15 23:01:15,834][1651669] Updated weights for policy 0, policy_version 969216 (0.0012) [2024-06-15 23:01:17,804][1651669] Updated weights for policy 0, policy_version 969280 (0.0026) [2024-06-15 23:01:19,915][1651669] Updated weights for policy 0, policy_version 969342 (0.0010) [2024-06-15 23:01:20,766][1648981] Fps is (10 sec: 58983.0, 60 sec: 60074.8, 300 sec: 58538.1). Total num frames: 1985216512. Throughput: 0: 14870.8. Samples: 496378880. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:20,767][1648981] Avg episode reward: [(0, '1134.710')] [2024-06-15 23:01:22,401][1651669] Updated weights for policy 0, policy_version 969392 (0.0012) [2024-06-15 23:01:23,809][1651669] Updated weights for policy 0, policy_version 969444 (0.0012) [2024-06-15 23:01:25,767][1648981] Fps is (10 sec: 58980.9, 60 sec: 59530.3, 300 sec: 58762.3). Total num frames: 1985511424. Throughput: 0: 14882.0. Samples: 496421888. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:25,767][1648981] Avg episode reward: [(0, '1128.020')] [2024-06-15 23:01:25,819][1651669] Updated weights for policy 0, policy_version 969494 (0.0011) [2024-06-15 23:01:27,865][1651669] Updated weights for policy 0, policy_version 969562 (0.0011) [2024-06-15 23:01:28,505][1651669] Updated weights for policy 0, policy_version 969599 (0.0010) [2024-06-15 23:01:30,683][1651669] Updated weights for policy 0, policy_version 969653 (0.0012) [2024-06-15 23:01:30,766][1648981] Fps is (10 sec: 62259.2, 60 sec: 60621.1, 300 sec: 58982.4). Total num frames: 1985839104. Throughput: 0: 15064.2. Samples: 496520704. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:30,767][1648981] Avg episode reward: [(0, '1140.380')] [2024-06-15 23:01:32,027][1651669] Updated weights for policy 0, policy_version 969696 (0.0075) [2024-06-15 23:01:34,467][1651669] Updated weights for policy 0, policy_version 969748 (0.0011) [2024-06-15 23:01:35,350][1651669] Updated weights for policy 0, policy_version 969792 (0.0011) [2024-06-15 23:01:35,768][1648981] Fps is (10 sec: 62253.4, 60 sec: 58981.2, 300 sec: 59093.2). Total num frames: 1986134016. Throughput: 0: 15234.8. Samples: 496607744. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:35,768][1648981] Avg episode reward: [(0, '1127.570')] [2024-06-15 23:01:36,736][1651669] Updated weights for policy 0, policy_version 969850 (0.0014) [2024-06-15 23:01:39,104][1651669] Updated weights for policy 0, policy_version 969889 (0.0010) [2024-06-15 23:01:40,769][1648981] Fps is (10 sec: 58967.5, 60 sec: 60072.1, 300 sec: 59538.1). Total num frames: 1986428928. Throughput: 0: 15256.8. Samples: 496655872. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:40,769][1648981] Avg episode reward: [(0, '1132.350')] [2024-06-15 23:01:41,258][1651669] Updated weights for policy 0, policy_version 969977 (0.0015) [2024-06-15 23:01:44,205][1651669] Updated weights for policy 0, policy_version 970039 (0.0011) [2024-06-15 23:01:45,481][1651669] Updated weights for policy 0, policy_version 970080 (0.0012) [2024-06-15 23:01:45,584][1651274] Signal inference workers to stop experience collection... (50900 times) [2024-06-15 23:01:45,651][1651669] InferenceWorker_p0-w0: stopping experience collection (50900 times) [2024-06-15 23:01:45,766][1648981] Fps is (10 sec: 58989.7, 60 sec: 60074.9, 300 sec: 59538.1). Total num frames: 1986723840. Throughput: 0: 15008.3. Samples: 496737280. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:45,767][1648981] Avg episode reward: [(0, '1115.090')] [2024-06-15 23:01:45,796][1651274] Signal inference workers to resume experience collection... (50900 times) [2024-06-15 23:01:45,797][1651669] InferenceWorker_p0-w0: resuming experience collection (50900 times) [2024-06-15 23:01:46,151][1651669] Updated weights for policy 0, policy_version 970112 (0.0017) [2024-06-15 23:01:47,835][1651669] Updated weights for policy 0, policy_version 970172 (0.0011) [2024-06-15 23:01:49,330][1651669] Updated weights for policy 0, policy_version 970212 (0.0011) [2024-06-15 23:01:50,774][1648981] Fps is (10 sec: 62226.6, 60 sec: 60614.2, 300 sec: 60091.6). Total num frames: 1987051520. Throughput: 0: 15073.2. Samples: 496829440. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:50,775][1648981] Avg episode reward: [(0, '1120.960')] [2024-06-15 23:01:53,105][1651669] Updated weights for policy 0, policy_version 970301 (0.0012) [2024-06-15 23:01:54,692][1651669] Updated weights for policy 0, policy_version 970359 (0.0012) [2024-06-15 23:01:55,766][1648981] Fps is (10 sec: 58982.6, 60 sec: 60621.0, 300 sec: 59982.1). Total num frames: 1987313664. Throughput: 0: 15155.2. Samples: 496878592. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:01:55,767][1648981] Avg episode reward: [(0, '1147.680')] [2024-06-15 23:01:56,795][1651669] Updated weights for policy 0, policy_version 970427 (0.0011) [2024-06-15 23:01:58,064][1651669] Updated weights for policy 0, policy_version 970496 (0.0018) [2024-06-15 23:02:00,767][1648981] Fps is (10 sec: 52468.1, 60 sec: 58985.2, 300 sec: 60429.1). Total num frames: 1987575808. Throughput: 0: 15121.0. Samples: 496968192. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:02:00,767][1648981] Avg episode reward: [(0, '1173.400')] [2024-06-15 23:02:01,786][1651669] Updated weights for policy 0, policy_version 970549 (0.0011) [2024-06-15 23:02:02,699][1651669] Updated weights for policy 0, policy_version 970593 (0.0125) [2024-06-15 23:02:04,371][1651669] Updated weights for policy 0, policy_version 970628 (0.0012) [2024-06-15 23:02:05,498][1651669] Updated weights for policy 0, policy_version 970682 (0.0030) [2024-06-15 23:02:05,766][1648981] Fps is (10 sec: 65535.5, 60 sec: 61167.6, 300 sec: 60426.4). Total num frames: 1987969024. Throughput: 0: 15018.7. Samples: 497054720. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:02:05,767][1648981] Avg episode reward: [(0, '1199.610')] [2024-06-15 23:02:06,609][1651669] Updated weights for policy 0, policy_version 970736 (0.0010) [2024-06-15 23:02:09,654][1651669] Updated weights for policy 0, policy_version 970784 (0.0117) [2024-06-15 23:02:10,766][1648981] Fps is (10 sec: 68813.5, 60 sec: 60620.7, 300 sec: 60537.5). Total num frames: 1988263936. Throughput: 0: 15348.7. Samples: 497112576. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:02:10,767][1648981] Avg episode reward: [(0, '1141.510')] [2024-06-15 23:02:11,139][1651669] Updated weights for policy 0, policy_version 970850 (0.0009) [2024-06-15 23:02:12,656][1651669] Updated weights for policy 0, policy_version 970890 (0.0021) [2024-06-15 23:02:14,471][1651669] Updated weights for policy 0, policy_version 970976 (0.0011) [2024-06-15 23:02:15,766][1648981] Fps is (10 sec: 65536.1, 60 sec: 61713.1, 300 sec: 60537.5). Total num frames: 1988624384. Throughput: 0: 15086.9. Samples: 497199616. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:02:15,767][1648981] Avg episode reward: [(0, '1164.360')] [2024-06-15 23:02:18,338][1651669] Updated weights for policy 0, policy_version 971011 (0.0012) [2024-06-15 23:02:19,487][1651669] Updated weights for policy 0, policy_version 971077 (0.0073) [2024-06-15 23:02:19,726][1651274] Signal inference workers to stop experience collection... (50950 times) [2024-06-15 23:02:19,773][1651669] InferenceWorker_p0-w0: stopping experience collection (50950 times) [2024-06-15 23:02:19,906][1651274] Signal inference workers to resume experience collection... (50950 times) [2024-06-15 23:02:19,907][1651669] InferenceWorker_p0-w0: resuming experience collection (50950 times) [2024-06-15 23:02:20,343][1651669] Updated weights for policy 0, policy_version 971123 (0.0013) [2024-06-15 23:02:20,766][1648981] Fps is (10 sec: 62260.3, 60 sec: 61166.9, 300 sec: 60538.6). Total num frames: 1988886528. Throughput: 0: 15235.3. Samples: 497293312. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:02:20,767][1648981] Avg episode reward: [(0, '1210.230')] [2024-06-15 23:02:21,491][1651669] Updated weights for policy 0, policy_version 971168 (0.0011) [2024-06-15 23:02:22,715][1651669] Updated weights for policy 0, policy_version 971217 (0.0012) [2024-06-15 23:02:23,321][1651669] Updated weights for policy 0, policy_version 971263 (0.0012) [2024-06-15 23:02:25,766][1648981] Fps is (10 sec: 52428.3, 60 sec: 60621.0, 300 sec: 60426.4). Total num frames: 1989148672. Throughput: 0: 15178.8. Samples: 497338880. Policy #0 lag: (min: 0.0, avg: 109.4, max: 256.0) [2024-06-15 23:02:25,767][1648981] Avg episode reward: [(0, '1209.470')] [2024-06-15 23:02:27,697][1651669] Updated weights for policy 0, policy_version 971316 (0.0014) [2024-06-15 23:02:28,853][1651669] Updated weights for policy 0, policy_version 971391 (0.0011) [2024-06-15 23:02:30,770][1648981] Fps is (10 sec: 62234.7, 60 sec: 61162.9, 300 sec: 60314.5). Total num frames: 1989509120. Throughput: 0: 15392.8. Samples: 497430016. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:02:30,771][1648981] Avg episode reward: [(0, '1207.710')] [2024-06-15 23:02:31,035][1651669] Updated weights for policy 0, policy_version 971455 (0.0214) [2024-06-15 23:02:32,124][1651669] Updated weights for policy 0, policy_version 971510 (0.0014) [2024-06-15 23:02:35,766][1648981] Fps is (10 sec: 52429.1, 60 sec: 58983.6, 300 sec: 60315.3). Total num frames: 1989672960. Throughput: 0: 15544.7. Samples: 497528832. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:02:35,767][1648981] Avg episode reward: [(0, '1126.240')] [2024-06-15 23:02:36,493][1651669] Updated weights for policy 0, policy_version 971568 (0.0014) [2024-06-15 23:02:37,457][1651669] Updated weights for policy 0, policy_version 971617 (0.0011) [2024-06-15 23:02:38,680][1651669] Updated weights for policy 0, policy_version 971664 (0.0139) [2024-06-15 23:02:39,618][1651669] Updated weights for policy 0, policy_version 971714 (0.0017) [2024-06-15 23:02:40,510][1651669] Updated weights for policy 0, policy_version 971776 (0.0012) [2024-06-15 23:02:40,785][1648981] Fps is (10 sec: 68710.5, 60 sec: 62788.3, 300 sec: 60866.8). Total num frames: 1990197248. Throughput: 0: 15285.3. Samples: 497566720. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:02:40,786][1648981] Avg episode reward: [(0, '1126.390')] [2024-06-15 23:02:45,293][1651669] Updated weights for policy 0, policy_version 971826 (0.0009) [2024-06-15 23:02:45,766][1648981] Fps is (10 sec: 65536.5, 60 sec: 60074.7, 300 sec: 59982.1). Total num frames: 1990328320. Throughput: 0: 15326.0. Samples: 497657856. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:02:45,767][1648981] Avg episode reward: [(0, '1087.130')] [2024-06-15 23:02:46,651][1651669] Updated weights for policy 0, policy_version 971897 (0.0011) [2024-06-15 23:02:48,155][1651669] Updated weights for policy 0, policy_version 971937 (0.0010) [2024-06-15 23:02:48,956][1651669] Updated weights for policy 0, policy_version 971986 (0.0012) [2024-06-15 23:02:49,605][1651669] Updated weights for policy 0, policy_version 972032 (0.0012) [2024-06-15 23:02:50,766][1648981] Fps is (10 sec: 52527.2, 60 sec: 61174.8, 300 sec: 60426.4). Total num frames: 1990721536. Throughput: 0: 15394.1. Samples: 497747456. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:02:50,767][1648981] Avg episode reward: [(0, '1069.890')] [2024-06-15 23:02:53,512][1651274] Signal inference workers to stop experience collection... (51000 times) [2024-06-15 23:02:53,543][1651669] InferenceWorker_p0-w0: stopping experience collection (51000 times) [2024-06-15 23:02:53,709][1651274] Signal inference workers to resume experience collection... (51000 times) [2024-06-15 23:02:53,710][1651669] InferenceWorker_p0-w0: resuming experience collection (51000 times) [2024-06-15 23:02:54,064][1651669] Updated weights for policy 0, policy_version 972096 (0.0010) [2024-06-15 23:02:55,273][1651669] Updated weights for policy 0, policy_version 972151 (0.0012) [2024-06-15 23:02:55,770][1648981] Fps is (10 sec: 65510.6, 60 sec: 61163.0, 300 sec: 59981.3). Total num frames: 1990983680. Throughput: 0: 15131.2. Samples: 497793536. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:02:55,771][1648981] Avg episode reward: [(0, '1059.770')] [2024-06-15 23:02:55,775][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000972160_1990983680.pth... [2024-06-15 23:02:55,835][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000965120_1976565760.pth [2024-06-15 23:02:57,408][1651669] Updated weights for policy 0, policy_version 972195 (0.0012) [2024-06-15 23:02:58,690][1651669] Updated weights for policy 0, policy_version 972272 (0.0011) [2024-06-15 23:03:00,766][1648981] Fps is (10 sec: 52429.2, 60 sec: 61167.2, 300 sec: 60093.2). Total num frames: 1991245824. Throughput: 0: 14904.9. Samples: 497870336. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:00,767][1648981] Avg episode reward: [(0, '1064.390')] [2024-06-15 23:03:02,402][1651669] Updated weights for policy 0, policy_version 972309 (0.0011) [2024-06-15 23:03:04,176][1651669] Updated weights for policy 0, policy_version 972389 (0.0131) [2024-06-15 23:03:05,775][1648981] Fps is (10 sec: 52404.0, 60 sec: 58974.0, 300 sec: 59987.2). Total num frames: 1991507968. Throughput: 0: 14867.9. Samples: 497962496. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:05,775][1648981] Avg episode reward: [(0, '1093.280')] [2024-06-15 23:03:05,907][1651669] Updated weights for policy 0, policy_version 972420 (0.0011) [2024-06-15 23:03:07,034][1651669] Updated weights for policy 0, policy_version 972480 (0.0014) [2024-06-15 23:03:08,273][1651669] Updated weights for policy 0, policy_version 972541 (0.0010) [2024-06-15 23:03:10,767][1648981] Fps is (10 sec: 52428.3, 60 sec: 58436.3, 300 sec: 59648.9). Total num frames: 1991770112. Throughput: 0: 14688.7. Samples: 497999872. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:10,768][1648981] Avg episode reward: [(0, '1146.750')] [2024-06-15 23:03:12,128][1651669] Updated weights for policy 0, policy_version 972593 (0.0012) [2024-06-15 23:03:13,471][1651669] Updated weights for policy 0, policy_version 972665 (0.0009) [2024-06-15 23:03:15,766][1648981] Fps is (10 sec: 59032.8, 60 sec: 57890.1, 300 sec: 59759.9). Total num frames: 1992097792. Throughput: 0: 14724.1. Samples: 498092544. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:15,767][1648981] Avg episode reward: [(0, '1138.730')] [2024-06-15 23:03:16,385][1651669] Updated weights for policy 0, policy_version 972745 (0.0016) [2024-06-15 23:03:19,776][1651669] Updated weights for policy 0, policy_version 972801 (0.0012) [2024-06-15 23:03:20,773][1648981] Fps is (10 sec: 62216.4, 60 sec: 58429.5, 300 sec: 59869.7). Total num frames: 1992392704. Throughput: 0: 14629.6. Samples: 498187264. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:20,774][1648981] Avg episode reward: [(0, '1101.630')] [2024-06-15 23:03:21,317][1651669] Updated weights for policy 0, policy_version 972883 (0.0012) [2024-06-15 23:03:22,040][1651669] Updated weights for policy 0, policy_version 972926 (0.0010) [2024-06-15 23:03:24,556][1651274] Signal inference workers to stop experience collection... (51050 times) [2024-06-15 23:03:24,589][1651669] InferenceWorker_p0-w0: stopping experience collection (51050 times) [2024-06-15 23:03:24,667][1651669] Updated weights for policy 0, policy_version 972981 (0.0105) [2024-06-15 23:03:24,784][1651274] Signal inference workers to resume experience collection... (51050 times) [2024-06-15 23:03:24,785][1651669] InferenceWorker_p0-w0: resuming experience collection (51050 times) [2024-06-15 23:03:25,787][1648981] Fps is (10 sec: 68668.7, 60 sec: 60599.7, 300 sec: 60088.9). Total num frames: 1992785920. Throughput: 0: 14813.1. Samples: 498233344. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:25,788][1648981] Avg episode reward: [(0, '1075.320')] [2024-06-15 23:03:25,800][1651669] Updated weights for policy 0, policy_version 973055 (0.0013) [2024-06-15 23:03:29,501][1651669] Updated weights for policy 0, policy_version 973107 (0.0010) [2024-06-15 23:03:30,563][1651669] Updated weights for policy 0, policy_version 973168 (0.0010) [2024-06-15 23:03:30,766][1648981] Fps is (10 sec: 65581.6, 60 sec: 58986.2, 300 sec: 59871.0). Total num frames: 1993048064. Throughput: 0: 14677.3. Samples: 498318336. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:30,767][1648981] Avg episode reward: [(0, '1062.770')] [2024-06-15 23:03:32,984][1651669] Updated weights for policy 0, policy_version 973220 (0.0027) [2024-06-15 23:03:34,641][1651669] Updated weights for policy 0, policy_version 973303 (0.0103) [2024-06-15 23:03:35,791][1648981] Fps is (10 sec: 55683.9, 60 sec: 61141.6, 300 sec: 59643.8). Total num frames: 1993342976. Throughput: 0: 14578.3. Samples: 498403840. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:35,792][1648981] Avg episode reward: [(0, '1024.180')] [2024-06-15 23:03:38,051][1651669] Updated weights for policy 0, policy_version 973350 (0.0012) [2024-06-15 23:03:38,726][1651669] Updated weights for policy 0, policy_version 973396 (0.0010) [2024-06-15 23:03:39,478][1651669] Updated weights for policy 0, policy_version 973440 (0.0040) [2024-06-15 23:03:40,798][1648981] Fps is (10 sec: 55529.2, 60 sec: 56785.6, 300 sec: 59976.4). Total num frames: 1993605120. Throughput: 0: 14634.1. Samples: 498452480. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:40,799][1648981] Avg episode reward: [(0, '993.940')] [2024-06-15 23:03:42,175][1651669] Updated weights for policy 0, policy_version 973507 (0.0012) [2024-06-15 23:03:42,917][1651669] Updated weights for policy 0, policy_version 973557 (0.0011) [2024-06-15 23:03:45,766][1648981] Fps is (10 sec: 55845.0, 60 sec: 59528.5, 300 sec: 59648.9). Total num frames: 1993900032. Throughput: 0: 15087.0. Samples: 498549248. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:45,767][1648981] Avg episode reward: [(0, '1020.040')] [2024-06-15 23:03:46,331][1651669] Updated weights for policy 0, policy_version 973603 (0.0012) [2024-06-15 23:03:47,991][1651669] Updated weights for policy 0, policy_version 973687 (0.0014) [2024-06-15 23:03:50,376][1651669] Updated weights for policy 0, policy_version 973746 (0.0013) [2024-06-15 23:03:50,770][1648981] Fps is (10 sec: 65719.9, 60 sec: 58978.7, 300 sec: 60092.4). Total num frames: 1994260480. Throughput: 0: 15031.6. Samples: 498638848. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:50,771][1648981] Avg episode reward: [(0, '1073.810')] [2024-06-15 23:03:51,756][1651669] Updated weights for policy 0, policy_version 973813 (0.0011) [2024-06-15 23:03:55,446][1651669] Updated weights for policy 0, policy_version 973873 (0.0011) [2024-06-15 23:03:55,711][1651274] Signal inference workers to stop experience collection... (51100 times) [2024-06-15 23:03:55,736][1651669] InferenceWorker_p0-w0: stopping experience collection (51100 times) [2024-06-15 23:03:55,766][1648981] Fps is (10 sec: 62258.2, 60 sec: 58986.1, 300 sec: 59759.9). Total num frames: 1994522624. Throughput: 0: 15064.2. Samples: 498677760. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:03:55,767][1648981] Avg episode reward: [(0, '1076.990')] [2024-06-15 23:03:55,893][1651274] Signal inference workers to resume experience collection... (51100 times) [2024-06-15 23:03:55,894][1651669] InferenceWorker_p0-w0: resuming experience collection (51100 times) [2024-06-15 23:03:56,833][1651669] Updated weights for policy 0, policy_version 973948 (0.0011) [2024-06-15 23:03:58,512][1651669] Updated weights for policy 0, policy_version 974005 (0.0011) [2024-06-15 23:04:00,064][1651669] Updated weights for policy 0, policy_version 974048 (0.0011) [2024-06-15 23:04:00,766][1648981] Fps is (10 sec: 65561.0, 60 sec: 61166.9, 300 sec: 60093.2). Total num frames: 1994915840. Throughput: 0: 15030.1. Samples: 498768896. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:04:00,767][1648981] Avg episode reward: [(0, '1086.970')] [2024-06-15 23:04:03,133][1651669] Updated weights for policy 0, policy_version 974086 (0.0013) [2024-06-15 23:04:04,401][1651669] Updated weights for policy 0, policy_version 974145 (0.0010) [2024-06-15 23:04:05,305][1651669] Updated weights for policy 0, policy_version 974205 (0.0011) [2024-06-15 23:04:05,767][1648981] Fps is (10 sec: 65532.2, 60 sec: 61175.0, 300 sec: 59982.0). Total num frames: 1995177984. Throughput: 0: 15020.8. Samples: 498863104. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:04:05,769][1648981] Avg episode reward: [(0, '1105.850')] [2024-06-15 23:04:06,625][1651669] Updated weights for policy 0, policy_version 974256 (0.0013) [2024-06-15 23:04:08,749][1651669] Updated weights for policy 0, policy_version 974304 (0.0014) [2024-06-15 23:04:10,774][1648981] Fps is (10 sec: 52387.7, 60 sec: 61159.0, 300 sec: 59647.3). Total num frames: 1995440128. Throughput: 0: 14988.9. Samples: 498907648. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:04:10,775][1648981] Avg episode reward: [(0, '1073.410')] [2024-06-15 23:04:11,691][1651669] Updated weights for policy 0, policy_version 974352 (0.0016) [2024-06-15 23:04:12,488][1651669] Updated weights for policy 0, policy_version 974400 (0.0013) [2024-06-15 23:04:13,621][1651669] Updated weights for policy 0, policy_version 974456 (0.0023) [2024-06-15 23:04:14,889][1651669] Updated weights for policy 0, policy_version 974512 (0.0013) [2024-06-15 23:04:15,766][1648981] Fps is (10 sec: 65540.6, 60 sec: 62259.2, 300 sec: 60426.4). Total num frames: 1995833344. Throughput: 0: 15257.6. Samples: 499004928. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:04:15,767][1648981] Avg episode reward: [(0, '1136.550')] [2024-06-15 23:04:16,670][1651669] Updated weights for policy 0, policy_version 974548 (0.0012) [2024-06-15 23:04:17,243][1651669] Updated weights for policy 0, policy_version 974592 (0.0015) [2024-06-15 23:04:20,766][1648981] Fps is (10 sec: 55749.2, 60 sec: 60081.6, 300 sec: 59648.9). Total num frames: 1995997184. Throughput: 0: 15573.4. Samples: 499104256. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:04:20,767][1648981] Avg episode reward: [(0, '1152.800')] [2024-06-15 23:04:22,026][1651669] Updated weights for policy 0, policy_version 974688 (0.0105) [2024-06-15 23:04:23,831][1651669] Updated weights for policy 0, policy_version 974782 (0.0013) [2024-06-15 23:04:25,766][1648981] Fps is (10 sec: 52428.2, 60 sec: 59549.3, 300 sec: 60426.4). Total num frames: 1996357632. Throughput: 0: 15177.3. Samples: 499134976. Policy #0 lag: (min: 13.0, avg: 133.6, max: 269.0) [2024-06-15 23:04:25,767][1648981] Avg episode reward: [(0, '1141.730')] [2024-06-15 23:04:26,322][1651669] Updated weights for policy 0, policy_version 974835 (0.0081) [2024-06-15 23:04:29,802][1651274] Signal inference workers to stop experience collection... (51150 times) [2024-06-15 23:04:29,857][1651669] InferenceWorker_p0-w0: stopping experience collection (51150 times) [2024-06-15 23:04:30,021][1651274] Signal inference workers to resume experience collection... (51150 times) [2024-06-15 23:04:30,022][1651669] InferenceWorker_p0-w0: resuming experience collection (51150 times) [2024-06-15 23:04:30,766][1648981] Fps is (10 sec: 58982.3, 60 sec: 58982.4, 300 sec: 59760.3). Total num frames: 1996587008. Throughput: 0: 15121.0. Samples: 499229696. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:04:30,767][1648981] Avg episode reward: [(0, '1112.030')] [2024-06-15 23:04:30,821][1651669] Updated weights for policy 0, policy_version 974897 (0.0080) [2024-06-15 23:04:31,842][1651669] Updated weights for policy 0, policy_version 974960 (0.0010) [2024-06-15 23:04:35,596][1651669] Updated weights for policy 0, policy_version 975044 (0.0012) [2024-06-15 23:04:35,766][1648981] Fps is (10 sec: 55706.0, 60 sec: 59553.2, 300 sec: 60094.0). Total num frames: 1996914688. Throughput: 0: 14883.4. Samples: 499308544. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:04:35,767][1648981] Avg episode reward: [(0, '1118.190')] [2024-06-15 23:04:36,329][1651669] Updated weights for policy 0, policy_version 975089 (0.0011) [2024-06-15 23:04:38,520][1651669] Updated weights for policy 0, policy_version 975125 (0.0012) [2024-06-15 23:04:40,136][1651669] Updated weights for policy 0, policy_version 975200 (0.0012) [2024-06-15 23:04:40,767][1648981] Fps is (10 sec: 68812.1, 60 sec: 61199.2, 300 sec: 59982.1). Total num frames: 1997275136. Throughput: 0: 15280.3. Samples: 499365376. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:04:40,768][1648981] Avg episode reward: [(0, '1108.370')] [2024-06-15 23:04:41,533][1651669] Updated weights for policy 0, policy_version 975280 (0.0090) [2024-06-15 23:04:44,380][1651669] Updated weights for policy 0, policy_version 975316 (0.0010) [2024-06-15 23:04:45,774][1648981] Fps is (10 sec: 62210.3, 60 sec: 60612.8, 300 sec: 59980.6). Total num frames: 1997537280. Throughput: 0: 15163.9. Samples: 499451392. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:04:45,775][1648981] Avg episode reward: [(0, '1163.010')] [2024-06-15 23:04:47,545][1651669] Updated weights for policy 0, policy_version 975379 (0.0011) [2024-06-15 23:04:49,061][1651669] Updated weights for policy 0, policy_version 975456 (0.0013) [2024-06-15 23:04:50,547][1651669] Updated weights for policy 0, policy_version 975528 (0.0013) [2024-06-15 23:04:50,767][1648981] Fps is (10 sec: 62258.7, 60 sec: 60624.4, 300 sec: 60315.3). Total num frames: 1997897728. Throughput: 0: 14609.2. Samples: 499520512. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:04:50,768][1648981] Avg episode reward: [(0, '1179.340')] [2024-06-15 23:04:53,333][1651669] Updated weights for policy 0, policy_version 975554 (0.0025) [2024-06-15 23:04:54,321][1651669] Updated weights for policy 0, policy_version 975616 (0.0011) [2024-06-15 23:04:55,766][1648981] Fps is (10 sec: 52469.7, 60 sec: 58982.4, 300 sec: 59648.9). Total num frames: 1998061568. Throughput: 0: 14998.5. Samples: 499582464. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:04:55,767][1648981] Avg episode reward: [(0, '1169.940')] [2024-06-15 23:04:56,050][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000975632_1998094336.pth... [2024-06-15 23:04:56,161][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000968624_1983741952.pth [2024-06-15 23:04:57,480][1651669] Updated weights for policy 0, policy_version 975703 (0.0012) [2024-06-15 23:04:58,743][1651274] Signal inference workers to stop experience collection... (51200 times) [2024-06-15 23:04:58,809][1651669] InferenceWorker_p0-w0: stopping experience collection (51200 times) [2024-06-15 23:04:59,019][1651274] Signal inference workers to resume experience collection... (51200 times) [2024-06-15 23:04:59,019][1651669] InferenceWorker_p0-w0: resuming experience collection (51200 times) [2024-06-15 23:04:59,199][1651669] Updated weights for policy 0, policy_version 975780 (0.0019) [2024-06-15 23:05:00,766][1648981] Fps is (10 sec: 55706.2, 60 sec: 58982.3, 300 sec: 60204.2). Total num frames: 1998454784. Throughput: 0: 14449.7. Samples: 499655168. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:05:00,767][1648981] Avg episode reward: [(0, '1133.670')] [2024-06-15 23:05:02,516][1651669] Updated weights for policy 0, policy_version 975810 (0.0011) [2024-06-15 23:05:03,388][1651669] Updated weights for policy 0, policy_version 975870 (0.0013) [2024-06-15 23:05:05,450][1651669] Updated weights for policy 0, policy_version 975937 (0.0090) [2024-06-15 23:05:05,771][1648981] Fps is (10 sec: 68783.5, 60 sec: 59524.9, 300 sec: 60092.3). Total num frames: 1998749696. Throughput: 0: 14573.5. Samples: 499760128. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:05:05,771][1648981] Avg episode reward: [(0, '1088.530')] [2024-06-15 23:05:06,692][1651669] Updated weights for policy 0, policy_version 975995 (0.0089) [2024-06-15 23:05:07,595][1651669] Updated weights for policy 0, policy_version 976054 (0.0012) [2024-06-15 23:05:10,766][1648981] Fps is (10 sec: 52429.3, 60 sec: 58990.1, 300 sec: 59760.1). Total num frames: 1998979072. Throughput: 0: 14825.3. Samples: 499802112. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:05:10,767][1648981] Avg episode reward: [(0, '1082.630')] [2024-06-15 23:05:11,847][1651669] Updated weights for policy 0, policy_version 976103 (0.0013) [2024-06-15 23:05:13,109][1651669] Updated weights for policy 0, policy_version 976148 (0.0010) [2024-06-15 23:05:14,179][1651669] Updated weights for policy 0, policy_version 976208 (0.0013) [2024-06-15 23:05:15,319][1651669] Updated weights for policy 0, policy_version 976259 (0.0015) [2024-06-15 23:05:15,766][1648981] Fps is (10 sec: 68842.2, 60 sec: 60074.6, 300 sec: 60426.4). Total num frames: 1999437824. Throughput: 0: 14950.4. Samples: 499902464. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:05:15,767][1648981] Avg episode reward: [(0, '1069.170')] [2024-06-15 23:05:16,124][1651669] Updated weights for policy 0, policy_version 976317 (0.0009) [2024-06-15 23:05:20,033][1651669] Updated weights for policy 0, policy_version 976380 (0.0011) [2024-06-15 23:05:20,769][1648981] Fps is (10 sec: 65520.6, 60 sec: 60618.4, 300 sec: 59982.0). Total num frames: 1999634432. Throughput: 0: 15063.4. Samples: 499986432. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:05:20,771][1648981] Avg episode reward: [(0, '1086.500')] [2024-06-15 23:05:22,390][1651669] Updated weights for policy 0, policy_version 976421 (0.0011) [2024-06-15 23:05:23,235][1651669] Updated weights for policy 0, policy_version 976450 (0.0048) [2024-06-15 23:05:24,333][1651669] Updated weights for policy 0, policy_version 976516 (0.0071) [2024-06-15 23:05:25,247][1651669] Updated weights for policy 0, policy_version 976569 (0.0009) [2024-06-15 23:05:25,766][1648981] Fps is (10 sec: 58982.5, 60 sec: 61167.0, 300 sec: 60426.4). Total num frames: 2000027648. Throughput: 0: 14859.4. Samples: 500034048. Policy #0 lag: (min: 15.0, avg: 83.3, max: 271.0) [2024-06-15 23:05:25,767][1648981] Avg episode reward: [(0, '1110.960')] [2024-06-15 23:05:27,342][1648981] Component RolloutWorker_w1 stopped! [2024-06-15 23:05:27,342][1651670] Stopping RolloutWorker_w1... [2024-06-15 23:05:27,343][1651670] Loop rollout_proc1_evt_loop terminating... [2024-06-15 23:05:27,355][1648981] Component RolloutWorker_w3 stopped! [2024-06-15 23:05:27,354][1651672] Stopping RolloutWorker_w3... [2024-06-15 23:05:27,356][1651672] Loop rollout_proc3_evt_loop terminating... [2024-06-15 23:05:27,388][1648981] Component RolloutWorker_w0 stopped! [2024-06-15 23:05:27,388][1651668] Stopping RolloutWorker_w0... [2024-06-15 23:05:27,395][1651668] Loop rollout_proc0_evt_loop terminating... [2024-06-15 23:05:27,433][1648981] Component RolloutWorker_w2 stopped! [2024-06-15 23:05:27,433][1651671] Stopping RolloutWorker_w2... [2024-06-15 23:05:27,435][1651671] Loop rollout_proc2_evt_loop terminating... [2024-06-15 23:05:27,447][1648981] Component Batcher_0 stopped! [2024-06-15 23:05:27,447][1651274] Stopping Batcher_0... [2024-06-15 23:05:27,447][1651274] Loop batcher_evt_loop terminating... [2024-06-15 23:05:27,463][1651669] Weights refcount: 2 0 [2024-06-15 23:05:27,464][1651669] Stopping InferenceWorker_p0-w0... [2024-06-15 23:05:27,464][1651669] Loop inference_proc0-0_evt_loop terminating... [2024-06-15 23:05:27,464][1648981] Component InferenceWorker_p0-w0 stopped! [2024-06-15 23:05:27,682][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000976608_2000093184.pth... [2024-06-15 23:05:27,747][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000972160_1990983680.pth [2024-06-15 23:05:27,965][1651274] Saving train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000976624_2000125952.pth... [2024-06-15 23:05:28,015][1651274] Removing train_dir/atari_2B_atari_airraid_1111/checkpoint_p0/checkpoint_000975632_1998094336.pth [2024-06-15 23:05:28,020][1651274] Stopping LearnerWorker_p0... [2024-06-15 23:05:28,020][1651274] Loop learner_proc0_evt_loop terminating... [2024-06-15 23:05:28,020][1648981] Component LearnerWorker_p0 stopped! [2024-06-15 23:05:28,021][1648981] Waiting for process learner_proc0 to stop... [2024-06-15 23:05:29,750][1648981] Waiting for process inference_proc0-0 to join... [2024-06-15 23:05:29,750][1648981] Waiting for process rollout_proc0 to join... [2024-06-15 23:05:29,751][1648981] Waiting for process rollout_proc1 to join... [2024-06-15 23:05:29,800][1648981] Waiting for process rollout_proc2 to join... [2024-06-15 23:05:29,801][1648981] Waiting for process rollout_proc3 to join... [2024-06-15 23:05:29,801][1648981] Batcher 0 profile tree view: batching: 2385.1467, releasing_batches: 5266.1888 [2024-06-15 23:05:29,802][1648981] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 11407.6646 update_model: 563.1421 weight_update: 0.0011 one_step: 0.0142 handle_policy_step: 21046.0505 deserialize: 18.5345, stack: 3620.3004, obs_to_device_normalize: 11910.2759, forward: 4176.7670, prepare_outputs: 893.4380, send_messages: 167.3306 [2024-06-15 23:05:29,802][1648981] Learner 0 profile tree view: misc: 0.4097, prepare_batch: 6244.0992 train: 16020.0366 epoch_init: 3.8150, minibatch_init: 195.8869, losses_postprocess: 2348.4206, kl_divergence: 1228.6469, update: 6054.8503, after_optimizer: 2975.9107 calculate_losses: 2991.5000 losses_init: 5.8563, forward_head: 1217.6613, bptt_initial: 18.9962, bptt: 27.0197, tail: 622.7827, advantages_returns: 181.4429, losses: 731.4132 [2024-06-15 23:05:29,802][1648981] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.6909, enqueue_policy_requests: 1910.7271, process_policy_outputs: 71.1119, env_step: 20134.3458, finalize_trajectories: 27.0773, complete_rollouts: 5.0762 post_env_step: 102.4060 process_env_step: 20.0682 [2024-06-15 23:05:29,803][1648981] RolloutWorker_w3 profile tree view: wait_for_trajectories: 0.7218, enqueue_policy_requests: 1870.2922, process_policy_outputs: 68.6937, env_step: 20262.0808, finalize_trajectories: 25.9606, complete_rollouts: 5.3897 post_env_step: 102.7112 process_env_step: 19.8501 [2024-06-15 23:05:29,803][1648981] Loop Runner_EvtLoop terminating... [2024-06-15 23:05:29,803][1648981] Runner profile tree view: main_loop: 41661.1029 [2024-06-15 23:05:29,804][1648981] Collected {0: 2000125952}, FPS: 48009.4