[2023-09-24 14:34:24,946][36951] Saving configuration to ./train_atari/Battlezone/config.json... [2023-09-24 14:34:25,212][36951] Rollout worker 0 uses device cpu [2023-09-24 14:34:25,213][36951] Rollout worker 1 uses device cpu [2023-09-24 14:34:25,213][36951] Rollout worker 2 uses device cpu [2023-09-24 14:34:25,213][36951] Rollout worker 3 uses device cpu [2023-09-24 14:34:25,213][36951] Rollout worker 4 uses device cpu [2023-09-24 14:34:25,214][36951] Rollout worker 5 uses device cpu [2023-09-24 14:34:25,214][36951] Rollout worker 6 uses device cpu [2023-09-24 14:34:25,214][36951] Rollout worker 7 uses device cpu [2023-09-24 14:34:25,214][36951] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-24 14:34:25,245][36951] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:34:25,245][36951] InferenceWorker_p0-w0: min num requests: 2 [2023-09-24 14:34:25,268][36951] Starting all processes... [2023-09-24 14:34:25,269][36951] Starting process learner_proc0 [2023-09-24 14:34:26,842][36951] Starting all processes... [2023-09-24 14:34:26,845][37497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:34:26,845][37497] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-24 14:34:26,850][36951] Starting process inference_proc0-0 [2023-09-24 14:34:26,850][36951] Starting process rollout_proc0 [2023-09-24 14:34:26,850][36951] Starting process rollout_proc1 [2023-09-24 14:34:26,851][36951] Starting process rollout_proc2 [2023-09-24 14:34:26,863][37497] Num visible devices: 1 [2023-09-24 14:34:26,851][36951] Starting process rollout_proc3 [2023-09-24 14:34:26,855][36951] Starting process rollout_proc4 [2023-09-24 14:34:26,862][36951] Starting process rollout_proc5 [2023-09-24 14:34:26,883][37497] Starting seed is not provided [2023-09-24 14:34:26,883][37497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:34:26,883][37497] Initializing actor-critic model on device cuda:0 [2023-09-24 14:34:26,884][37497] RunningMeanStd input shape: (4, 84, 84) [2023-09-24 14:34:26,862][36951] Starting process rollout_proc6 [2023-09-24 14:34:26,884][37497] RunningMeanStd input shape: (1,) [2023-09-24 14:34:26,863][36951] Starting process rollout_proc7 [2023-09-24 14:34:26,896][37497] ConvEncoder: input_channels=4 [2023-09-24 14:34:27,210][37497] Conv encoder output size: 512 [2023-09-24 14:34:27,212][37497] Created Actor Critic model with architecture: [2023-09-24 14:34:27,212][37497] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-24 14:34:27,801][37497] Using optimizer [2023-09-24 14:34:27,802][37497] No checkpoints found [2023-09-24 14:34:27,802][37497] Did not load from checkpoint, starting from scratch! [2023-09-24 14:34:27,802][37497] Initialized policy 0 weights for model version 0 [2023-09-24 14:34:27,804][37497] LearnerWorker_p0 finished initialization! [2023-09-24 14:34:27,804][37497] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:34:28,728][37840] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:34:28,728][37840] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-24 14:34:28,734][37842] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-24 14:34:28,746][37840] Num visible devices: 1 [2023-09-24 14:34:28,787][37897] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-24 14:34:28,789][37895] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-24 14:34:28,796][37892] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-24 14:34:28,799][37896] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-24 14:34:28,799][37898] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-24 14:34:28,892][37841] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-24 14:34:28,905][37894] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-24 14:34:29,334][37840] RunningMeanStd input shape: (4, 84, 84) [2023-09-24 14:34:29,335][37840] RunningMeanStd input shape: (1,) [2023-09-24 14:34:29,346][37840] ConvEncoder: input_channels=4 [2023-09-24 14:34:29,441][37840] Conv encoder output size: 512 [2023-09-24 14:34:29,447][36951] Inference worker 0-0 is ready! [2023-09-24 14:34:29,447][36951] All inference workers are ready! Signal rollout workers to start! [2023-09-24 14:34:29,914][37896] Decorrelating experience for 0 frames... [2023-09-24 14:34:29,915][37841] Decorrelating experience for 0 frames... [2023-09-24 14:34:29,916][37898] Decorrelating experience for 0 frames... [2023-09-24 14:34:29,919][37842] Decorrelating experience for 0 frames... [2023-09-24 14:34:29,921][37895] Decorrelating experience for 0 frames... [2023-09-24 14:34:29,922][37894] Decorrelating experience for 0 frames... [2023-09-24 14:34:30,006][37897] Decorrelating experience for 0 frames... [2023-09-24 14:34:30,021][37892] Decorrelating experience for 0 frames... [2023-09-24 14:34:31,271][36951] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-24 14:34:36,271][36951] Fps is (10 sec: 2457.7, 60 sec: 2457.7, 300 sec: 2457.7). Total num frames: 12288. Throughput: 0: 614.4. Samples: 3072. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:34:36,271][36951] Avg episode reward: [(0, '0.438')] [2023-09-24 14:34:40,518][36951] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 36951], exiting... [2023-09-24 14:34:40,519][37897] Stopping RolloutWorker_w6... [2023-09-24 14:34:40,519][37841] Stopping RolloutWorker_w1... [2023-09-24 14:34:40,519][37896] Stopping RolloutWorker_w4... [2023-09-24 14:34:40,519][37897] Loop rollout_proc6_evt_loop terminating... [2023-09-24 14:34:40,519][37892] Stopping RolloutWorker_w2... [2023-09-24 14:34:40,519][37895] Stopping RolloutWorker_w5... [2023-09-24 14:34:40,519][37841] Loop rollout_proc1_evt_loop terminating... [2023-09-24 14:34:40,519][36951] Runner profile tree view: main_loop: 15.2510 [2023-09-24 14:34:40,520][37896] Loop rollout_proc4_evt_loop terminating... [2023-09-24 14:34:40,520][37892] Loop rollout_proc2_evt_loop terminating... [2023-09-24 14:34:40,520][37842] Stopping RolloutWorker_w0... [2023-09-24 14:34:40,520][37895] Loop rollout_proc5_evt_loop terminating... [2023-09-24 14:34:40,520][37497] Stopping Batcher_0... [2023-09-24 14:34:40,520][36951] Collected {0: 32768}, FPS: 2148.6 [2023-09-24 14:34:40,520][37842] Loop rollout_proc0_evt_loop terminating... [2023-09-24 14:34:40,520][37497] Loop batcher_evt_loop terminating... [2023-09-24 14:34:40,521][37894] Stopping RolloutWorker_w3... [2023-09-24 14:34:40,521][37894] Loop rollout_proc3_evt_loop terminating... [2023-09-24 14:34:40,521][37898] Stopping RolloutWorker_w7... [2023-09-24 14:34:40,521][37898] Loop rollout_proc7_evt_loop terminating... [2023-09-24 14:34:40,521][37497] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000000128_32768.pth... [2023-09-24 14:34:40,535][37840] Weights refcount: 2 0 [2023-09-24 14:34:40,536][37840] Stopping InferenceWorker_p0-w0... [2023-09-24 14:34:40,537][37840] Loop inference_proc0-0_evt_loop terminating... [2023-09-24 14:34:40,556][37497] Stopping LearnerWorker_p0... [2023-09-24 14:34:40,556][37497] Loop learner_proc0_evt_loop terminating... [2023-09-24 14:35:23,366][42771] Saving configuration to ./train_atari/Battlezone/config.json... [2023-09-24 14:35:23,641][42771] Rollout worker 0 uses device cpu [2023-09-24 14:35:23,642][42771] Rollout worker 1 uses device cpu [2023-09-24 14:35:23,643][42771] Rollout worker 2 uses device cpu [2023-09-24 14:35:23,643][42771] Rollout worker 3 uses device cpu [2023-09-24 14:35:23,644][42771] Rollout worker 4 uses device cpu [2023-09-24 14:35:23,644][42771] Rollout worker 5 uses device cpu [2023-09-24 14:35:23,645][42771] Rollout worker 6 uses device cpu [2023-09-24 14:35:23,646][42771] Rollout worker 7 uses device cpu [2023-09-24 14:35:23,646][42771] In synchronous mode, we only accumulate one batch. Setting num_batches_to_accumulate to 1 [2023-09-24 14:35:23,691][42771] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:35:23,691][42771] InferenceWorker_p0-w0: min num requests: 1 [2023-09-24 14:35:23,694][42771] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-24 14:35:23,695][42771] InferenceWorker_p1-w0: min num requests: 1 [2023-09-24 14:35:23,717][42771] Starting all processes... [2023-09-24 14:35:23,717][42771] Starting process learner_proc0 [2023-09-24 14:35:25,310][42771] Starting process learner_proc1 [2023-09-24 14:35:25,313][43303] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:35:25,313][43303] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2023-09-24 14:35:25,331][43303] Num visible devices: 1 [2023-09-24 14:35:25,349][43303] Starting seed is not provided [2023-09-24 14:35:25,349][43303] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:35:25,349][43303] Initializing actor-critic model on device cuda:0 [2023-09-24 14:35:25,349][43303] RunningMeanStd input shape: (4, 84, 84) [2023-09-24 14:35:25,350][43303] RunningMeanStd input shape: (1,) [2023-09-24 14:35:25,361][43303] ConvEncoder: input_channels=4 [2023-09-24 14:35:25,544][43303] Conv encoder output size: 512 [2023-09-24 14:35:25,546][43303] Created Actor Critic model with architecture: [2023-09-24 14:35:25,546][43303] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-24 14:35:26,101][43303] Using optimizer [2023-09-24 14:35:26,102][43303] Loading state from checkpoint ./train_atari/Battlezone/checkpoint_p0/checkpoint_000000128_32768.pth... [2023-09-24 14:35:26,118][43303] Loading model from checkpoint [2023-09-24 14:35:26,121][43303] Loaded experiment state at self.train_step=128, self.env_steps=32768 [2023-09-24 14:35:26,121][43303] Initialized policy 0 weights for model version 128 [2023-09-24 14:35:26,123][43303] LearnerWorker_p0 finished initialization! [2023-09-24 14:35:26,123][43303] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:35:26,916][42771] Starting all processes... [2023-09-24 14:35:26,920][43474] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-24 14:35:26,920][43474] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for learning process 1 [2023-09-24 14:35:26,923][42771] Starting process inference_proc0-0 [2023-09-24 14:35:26,923][42771] Starting process inference_proc1-0 [2023-09-24 14:35:26,938][43474] Num visible devices: 1 [2023-09-24 14:35:26,923][42771] Starting process rollout_proc0 [2023-09-24 14:35:26,923][42771] Starting process rollout_proc1 [2023-09-24 14:35:26,924][42771] Starting process rollout_proc2 [2023-09-24 14:35:26,984][43474] Starting seed is not provided [2023-09-24 14:35:26,985][43474] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-24 14:35:26,985][43474] Initializing actor-critic model on device cuda:0 [2023-09-24 14:35:26,985][43474] RunningMeanStd input shape: (4, 84, 84) [2023-09-24 14:35:26,924][42771] Starting process rollout_proc3 [2023-09-24 14:35:26,986][43474] RunningMeanStd input shape: (1,) [2023-09-24 14:35:26,924][42771] Starting process rollout_proc4 [2023-09-24 14:35:26,926][42771] Starting process rollout_proc5 [2023-09-24 14:35:26,963][42771] Starting process rollout_proc6 [2023-09-24 14:35:26,964][42771] Starting process rollout_proc7 [2023-09-24 14:35:26,998][43474] ConvEncoder: input_channels=4 [2023-09-24 14:35:27,367][43474] Conv encoder output size: 512 [2023-09-24 14:35:27,369][43474] Created Actor Critic model with architecture: [2023-09-24 14:35:27,370][43474] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): MultiInputEncoder( (encoders): ModuleDict( (obs): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ReLU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ReLU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ReLU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ReLU) ) ) ) ) ) (core): ModelCoreIdentity() (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=18, bias=True) ) ) [2023-09-24 14:35:28,050][43474] Using optimizer [2023-09-24 14:35:28,050][43474] No checkpoints found [2023-09-24 14:35:28,050][43474] Did not load from checkpoint, starting from scratch! [2023-09-24 14:35:28,051][43474] Initialized policy 1 weights for model version 0 [2023-09-24 14:35:28,052][43474] LearnerWorker_p1 finished initialization! [2023-09-24 14:35:28,052][43474] Using GPUs [0] for process 1 (actually maps to GPUs [1]) [2023-09-24 14:35:28,871][43669] Worker 3 uses CPU cores [12, 13, 14, 15] [2023-09-24 14:35:28,872][43679] Worker 5 uses CPU cores [20, 21, 22, 23] [2023-09-24 14:35:28,884][43659] Worker 1 uses CPU cores [4, 5, 6, 7] [2023-09-24 14:35:28,886][43681] Worker 7 uses CPU cores [28, 29, 30, 31] [2023-09-24 14:35:28,896][43671] Worker 4 uses CPU cores [16, 17, 18, 19] [2023-09-24 14:35:28,914][43616] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2023-09-24 14:35:28,914][43616] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2023-09-24 14:35:28,923][43680] Worker 6 uses CPU cores [24, 25, 26, 27] [2023-09-24 14:35:28,933][43616] Num visible devices: 1 [2023-09-24 14:35:28,948][43667] Worker 2 uses CPU cores [8, 9, 10, 11] [2023-09-24 14:35:29,054][43665] Worker 0 uses CPU cores [0, 1, 2, 3] [2023-09-24 14:35:29,074][43653] Using GPUs [1] for process 1 (actually maps to GPUs [1]) [2023-09-24 14:35:29,074][43653] Set environment var CUDA_VISIBLE_DEVICES to '1' (GPU indices [1]) for inference process 1 [2023-09-24 14:35:29,092][43653] Num visible devices: 1 [2023-09-24 14:35:29,543][43616] RunningMeanStd input shape: (4, 84, 84) [2023-09-24 14:35:29,544][43616] RunningMeanStd input shape: (1,) [2023-09-24 14:35:29,555][43616] ConvEncoder: input_channels=4 [2023-09-24 14:35:29,647][43653] RunningMeanStd input shape: (4, 84, 84) [2023-09-24 14:35:29,648][43653] RunningMeanStd input shape: (1,) [2023-09-24 14:35:29,651][43616] Conv encoder output size: 512 [2023-09-24 14:35:29,657][42771] Inference worker 0-0 is ready! [2023-09-24 14:35:29,658][42771] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 32768. Throughput: 0: nan, 1: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2023-09-24 14:35:29,659][43653] ConvEncoder: input_channels=4 [2023-09-24 14:35:29,757][43653] Conv encoder output size: 512 [2023-09-24 14:35:29,763][42771] Inference worker 1-0 is ready! [2023-09-24 14:35:29,763][42771] All inference workers are ready! Signal rollout workers to start! [2023-09-24 14:35:30,240][43671] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,242][43680] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,242][43667] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,242][43681] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,243][43669] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,260][43665] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,272][43659] Decorrelating experience for 0 frames... [2023-09-24 14:35:30,274][43679] Decorrelating experience for 0 frames... [2023-09-24 14:35:34,658][42771] Fps is (10 sec: 1638.5, 60 sec: 1638.5, 300 sec: 1638.5). Total num frames: 40960. Throughput: 0: 204.8, 1: 204.8. Samples: 2048. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:35:34,659][42771] Avg episode reward: [(0, '0.400'), (1, '0.429')] [2023-09-24 14:35:39,658][42771] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 65536. Throughput: 0: 394.5, 1: 403.4. Samples: 7979. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:35:39,659][42771] Avg episode reward: [(0, '0.238'), (1, '0.158')] [2023-09-24 14:35:43,678][42771] Heartbeat connected on Batcher_0 [2023-09-24 14:35:43,681][42771] Heartbeat connected on LearnerWorker_p0 [2023-09-24 14:35:43,684][42771] Heartbeat connected on Batcher_1 [2023-09-24 14:35:43,687][42771] Heartbeat connected on LearnerWorker_p1 [2023-09-24 14:35:43,693][42771] Heartbeat connected on InferenceWorker_p0-w0 [2023-09-24 14:35:43,697][42771] Heartbeat connected on InferenceWorker_p1-w0 [2023-09-24 14:35:43,698][42771] Heartbeat connected on RolloutWorker_w0 [2023-09-24 14:35:43,701][42771] Heartbeat connected on RolloutWorker_w1 [2023-09-24 14:35:43,703][42771] Heartbeat connected on RolloutWorker_w2 [2023-09-24 14:35:43,706][42771] Heartbeat connected on RolloutWorker_w3 [2023-09-24 14:35:43,708][42771] Heartbeat connected on RolloutWorker_w4 [2023-09-24 14:35:43,711][42771] Heartbeat connected on RolloutWorker_w5 [2023-09-24 14:35:43,714][42771] Heartbeat connected on RolloutWorker_w6 [2023-09-24 14:35:43,716][42771] Heartbeat connected on RolloutWorker_w7 [2023-09-24 14:35:44,658][42771] Fps is (10 sec: 5734.3, 60 sec: 4369.1, 300 sec: 4369.1). Total num frames: 98304. Throughput: 0: 409.6, 1: 409.6. Samples: 12288. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:35:44,659][42771] Avg episode reward: [(0, '0.333'), (1, '0.355')] [2023-09-24 14:35:47,212][43616] Updated weights for policy 0, policy_version 288 (0.0017) [2023-09-24 14:35:47,212][43653] Updated weights for policy 1, policy_version 160 (0.0017) [2023-09-24 14:35:49,658][42771] Fps is (10 sec: 5734.5, 60 sec: 4505.6, 300 sec: 4505.6). Total num frames: 122880. Throughput: 0: 534.7, 1: 538.8. Samples: 21469. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 14:35:49,659][42771] Avg episode reward: [(0, '0.357'), (1, '0.392')] [2023-09-24 14:35:54,659][42771] Fps is (10 sec: 5734.4, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 155648. Throughput: 0: 615.8, 1: 620.1. Samples: 30898. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:35:54,659][42771] Avg episode reward: [(0, '0.435'), (1, '0.420')] [2023-09-24 14:35:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 5188.3, 300 sec: 5188.3). Total num frames: 188416. Throughput: 0: 597.9, 1: 601.9. Samples: 35996. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:35:59,659][42771] Avg episode reward: [(0, '0.402'), (1, '0.476')] [2023-09-24 14:35:59,660][43303] Saving new best policy, reward=0.402! [2023-09-24 14:35:59,920][43616] Updated weights for policy 0, policy_version 448 (0.0019) [2023-09-24 14:35:59,920][43653] Updated weights for policy 1, policy_version 320 (0.0018) [2023-09-24 14:36:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 5383.3, 300 sec: 5383.3). Total num frames: 221184. Throughput: 0: 645.2, 1: 648.2. Samples: 45269. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:36:04,659][42771] Avg episode reward: [(0, '0.400'), (1, '0.515')] [2023-09-24 14:36:09,658][42771] Fps is (10 sec: 6553.5, 60 sec: 5529.6, 300 sec: 5529.6). Total num frames: 253952. Throughput: 0: 688.6, 1: 691.2. Samples: 55192. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:36:09,659][42771] Avg episode reward: [(0, '0.460'), (1, '0.600')] [2023-09-24 14:36:09,661][43303] Saving new best policy, reward=0.460! [2023-09-24 14:36:09,661][43474] Saving new best policy, reward=0.600! [2023-09-24 14:36:13,046][43616] Updated weights for policy 0, policy_version 608 (0.0017) [2023-09-24 14:36:13,047][43653] Updated weights for policy 1, policy_version 480 (0.0019) [2023-09-24 14:36:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 5643.4, 300 sec: 5643.4). Total num frames: 286720. Throughput: 0: 660.0, 1: 661.1. Samples: 59449. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:36:14,659][42771] Avg episode reward: [(0, '0.570'), (1, '0.670')] [2023-09-24 14:36:14,660][43474] Saving new best policy, reward=0.670! [2023-09-24 14:36:14,660][43303] Saving new best policy, reward=0.570! [2023-09-24 14:36:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 5734.4, 300 sec: 5734.4). Total num frames: 319488. Throughput: 0: 742.5, 1: 745.0. Samples: 68985. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:36:19,659][42771] Avg episode reward: [(0, '0.620'), (1, '0.820')] [2023-09-24 14:36:19,666][43474] Saving new best policy, reward=0.820! [2023-09-24 14:36:19,666][43303] Saving new best policy, reward=0.620! [2023-09-24 14:36:24,658][42771] Fps is (10 sec: 5734.3, 60 sec: 5659.9, 300 sec: 5659.9). Total num frames: 344064. Throughput: 0: 781.8, 1: 781.6. Samples: 78331. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:36:24,659][42771] Avg episode reward: [(0, '0.620'), (1, '0.810')] [2023-09-24 14:36:26,070][43616] Updated weights for policy 0, policy_version 768 (0.0017) [2023-09-24 14:36:26,071][43653] Updated weights for policy 1, policy_version 640 (0.0016) [2023-09-24 14:36:29,658][42771] Fps is (10 sec: 5734.4, 60 sec: 5734.4, 300 sec: 5734.4). Total num frames: 376832. Throughput: 0: 786.6, 1: 787.8. Samples: 83134. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:36:29,659][42771] Avg episode reward: [(0, '0.690'), (1, '0.810')] [2023-09-24 14:36:29,661][43303] Saving new best policy, reward=0.690! [2023-09-24 14:36:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6144.0, 300 sec: 5797.4). Total num frames: 409600. Throughput: 0: 786.4, 1: 785.0. Samples: 92182. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:36:34,659][42771] Avg episode reward: [(0, '0.780'), (1, '0.830')] [2023-09-24 14:36:34,667][43303] Saving new best policy, reward=0.780! [2023-09-24 14:36:34,669][43474] Saving new best policy, reward=0.830! [2023-09-24 14:36:39,222][43616] Updated weights for policy 0, policy_version 928 (0.0018) [2023-09-24 14:36:39,222][43653] Updated weights for policy 1, policy_version 800 (0.0016) [2023-09-24 14:36:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 5851.4). Total num frames: 442368. Throughput: 0: 790.3, 1: 788.4. Samples: 101942. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:36:39,659][42771] Avg episode reward: [(0, '0.830'), (1, '0.860')] [2023-09-24 14:36:39,660][43303] Saving new best policy, reward=0.830! [2023-09-24 14:36:39,661][43474] Saving new best policy, reward=0.860! [2023-09-24 14:36:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 5898.2). Total num frames: 475136. Throughput: 0: 784.7, 1: 782.0. Samples: 106496. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 14:36:44,659][42771] Avg episode reward: [(0, '0.820'), (1, '0.770')] [2023-09-24 14:36:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 5939.2). Total num frames: 507904. Throughput: 0: 787.7, 1: 787.3. Samples: 116143. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 14:36:49,659][42771] Avg episode reward: [(0, '0.870'), (1, '0.730')] [2023-09-24 14:36:49,667][43303] Saving new best policy, reward=0.870! [2023-09-24 14:36:52,022][43616] Updated weights for policy 0, policy_version 1088 (0.0018) [2023-09-24 14:36:52,023][43653] Updated weights for policy 1, policy_version 960 (0.0019) [2023-09-24 14:36:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 5975.4). Total num frames: 540672. Throughput: 0: 782.8, 1: 782.5. Samples: 125629. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:36:54,659][42771] Avg episode reward: [(0, '0.940'), (1, '0.820')] [2023-09-24 14:36:54,660][43303] Saving new best policy, reward=0.940! [2023-09-24 14:36:59,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 5916.5). Total num frames: 565248. Throughput: 0: 790.6, 1: 790.7. Samples: 130604. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:36:59,659][42771] Avg episode reward: [(0, '0.950'), (1, '0.810')] [2023-09-24 14:36:59,717][43303] Saving new best policy, reward=0.950! [2023-09-24 14:37:04,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 5950.0). Total num frames: 598016. Throughput: 0: 789.3, 1: 789.0. Samples: 140009. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:04,659][42771] Avg episode reward: [(0, '1.050'), (1, '0.880')] [2023-09-24 14:37:04,667][43474] Saving new best policy, reward=0.880! [2023-09-24 14:37:04,838][43303] Saving new best policy, reward=1.050! [2023-09-24 14:37:04,906][43653] Updated weights for policy 1, policy_version 1120 (0.0017) [2023-09-24 14:37:04,906][43616] Updated weights for policy 0, policy_version 1248 (0.0017) [2023-09-24 14:37:09,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 5980.2). Total num frames: 630784. Throughput: 0: 791.7, 1: 790.0. Samples: 149509. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:09,659][42771] Avg episode reward: [(0, '1.090'), (1, '0.940')] [2023-09-24 14:37:09,661][43474] Saving new best policy, reward=0.940! [2023-09-24 14:37:09,661][43303] Saving new best policy, reward=1.090! [2023-09-24 14:37:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6007.5). Total num frames: 663552. Throughput: 0: 791.5, 1: 792.7. Samples: 154422. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:37:14,659][42771] Avg episode reward: [(0, '1.060'), (1, '1.100')] [2023-09-24 14:37:14,661][43474] Saving new best policy, reward=1.100! [2023-09-24 14:37:17,737][43616] Updated weights for policy 0, policy_version 1408 (0.0015) [2023-09-24 14:37:17,738][43653] Updated weights for policy 1, policy_version 1280 (0.0017) [2023-09-24 14:37:19,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6032.3). Total num frames: 696320. Throughput: 0: 796.4, 1: 796.4. Samples: 163858. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:19,659][42771] Avg episode reward: [(0, '1.120'), (1, '1.180')] [2023-09-24 14:37:19,666][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000001296_331776.pth... [2023-09-24 14:37:19,666][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000001424_364544.pth... [2023-09-24 14:37:19,702][43474] Saving new best policy, reward=1.180! [2023-09-24 14:37:19,702][43303] Saving new best policy, reward=1.120! [2023-09-24 14:37:24,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6055.0). Total num frames: 729088. Throughput: 0: 798.0, 1: 800.2. Samples: 173858. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:24,660][42771] Avg episode reward: [(0, '1.060'), (1, '1.270')] [2023-09-24 14:37:24,661][43474] Saving new best policy, reward=1.270! [2023-09-24 14:37:29,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6075.7). Total num frames: 761856. Throughput: 0: 798.7, 1: 801.1. Samples: 178488. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:29,659][42771] Avg episode reward: [(0, '1.030'), (1, '1.210')] [2023-09-24 14:37:30,486][43653] Updated weights for policy 1, policy_version 1440 (0.0016) [2023-09-24 14:37:30,486][43616] Updated weights for policy 0, policy_version 1568 (0.0018) [2023-09-24 14:37:34,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6094.8). Total num frames: 794624. Throughput: 0: 801.1, 1: 800.4. Samples: 188209. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:37:34,659][42771] Avg episode reward: [(0, '0.900'), (1, '1.300')] [2023-09-24 14:37:34,668][43474] Saving new best policy, reward=1.300! [2023-09-24 14:37:39,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6112.5). Total num frames: 827392. Throughput: 0: 800.4, 1: 801.3. Samples: 197707. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:39,659][42771] Avg episode reward: [(0, '0.840'), (1, '1.230')] [2023-09-24 14:37:43,344][43616] Updated weights for policy 0, policy_version 1728 (0.0018) [2023-09-24 14:37:43,344][43653] Updated weights for policy 1, policy_version 1600 (0.0017) [2023-09-24 14:37:44,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6068.1). Total num frames: 851968. Throughput: 0: 799.6, 1: 801.0. Samples: 202633. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:37:44,659][42771] Avg episode reward: [(0, '0.770'), (1, '1.230')] [2023-09-24 14:37:49,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6085.5). Total num frames: 884736. Throughput: 0: 796.3, 1: 796.7. Samples: 211693. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:49,659][42771] Avg episode reward: [(0, '0.830'), (1, '1.180')] [2023-09-24 14:37:54,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6101.6). Total num frames: 917504. Throughput: 0: 796.5, 1: 797.1. Samples: 221220. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:37:54,659][42771] Avg episode reward: [(0, '0.900'), (1, '1.180')] [2023-09-24 14:37:56,394][43616] Updated weights for policy 0, policy_version 1888 (0.0017) [2023-09-24 14:37:56,394][43653] Updated weights for policy 1, policy_version 1760 (0.0019) [2023-09-24 14:37:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6116.7). Total num frames: 950272. Throughput: 0: 794.3, 1: 794.3. Samples: 225908. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:37:59,659][42771] Avg episode reward: [(0, '0.860'), (1, '1.130')] [2023-09-24 14:38:04,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6130.8). Total num frames: 983040. Throughput: 0: 795.1, 1: 796.0. Samples: 235460. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:04,659][42771] Avg episode reward: [(0, '0.960'), (1, '1.280')] [2023-09-24 14:38:09,285][43616] Updated weights for policy 0, policy_version 2048 (0.0019) [2023-09-24 14:38:09,286][43653] Updated weights for policy 1, policy_version 1920 (0.0019) [2023-09-24 14:38:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6144.0). Total num frames: 1015808. Throughput: 0: 791.5, 1: 791.0. Samples: 245069. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:09,659][42771] Avg episode reward: [(0, '1.080'), (1, '1.450')] [2023-09-24 14:38:09,661][43474] Saving new best policy, reward=1.450! [2023-09-24 14:38:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6156.4). Total num frames: 1048576. Throughput: 0: 794.2, 1: 791.8. Samples: 249856. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:38:14,659][42771] Avg episode reward: [(0, '1.130'), (1, '1.470')] [2023-09-24 14:38:14,659][43303] Saving new best policy, reward=1.130! [2023-09-24 14:38:14,659][43474] Saving new best policy, reward=1.470! [2023-09-24 14:38:19,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6168.1). Total num frames: 1081344. Throughput: 0: 791.1, 1: 790.9. Samples: 259399. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:19,659][42771] Avg episode reward: [(0, '1.320'), (1, '1.510')] [2023-09-24 14:38:19,671][43303] Saving new best policy, reward=1.320! [2023-09-24 14:38:19,671][43474] Saving new best policy, reward=1.510! [2023-09-24 14:38:22,215][43653] Updated weights for policy 1, policy_version 2080 (0.0017) [2023-09-24 14:38:22,217][43616] Updated weights for policy 0, policy_version 2208 (0.0019) [2023-09-24 14:38:24,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6132.3). Total num frames: 1105920. Throughput: 0: 788.7, 1: 788.2. Samples: 268668. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:38:24,659][42771] Avg episode reward: [(0, '1.250'), (1, '1.650')] [2023-09-24 14:38:24,760][43474] Saving new best policy, reward=1.650! [2023-09-24 14:38:29,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6144.0). Total num frames: 1138688. Throughput: 0: 787.9, 1: 787.7. Samples: 273535. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:29,659][42771] Avg episode reward: [(0, '1.250'), (1, '1.690')] [2023-09-24 14:38:29,660][43474] Saving new best policy, reward=1.690! [2023-09-24 14:38:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6155.1). Total num frames: 1171456. Throughput: 0: 789.4, 1: 788.1. Samples: 282680. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:34,659][42771] Avg episode reward: [(0, '1.310'), (1, '1.590')] [2023-09-24 14:38:35,416][43653] Updated weights for policy 1, policy_version 2240 (0.0017) [2023-09-24 14:38:35,416][43616] Updated weights for policy 0, policy_version 2368 (0.0018) [2023-09-24 14:38:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6165.6). Total num frames: 1204224. Throughput: 0: 788.1, 1: 790.8. Samples: 292268. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:39,659][42771] Avg episode reward: [(0, '1.280'), (1, '1.470')] [2023-09-24 14:38:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6175.5). Total num frames: 1236992. Throughput: 0: 790.1, 1: 788.3. Samples: 296934. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:38:44,659][42771] Avg episode reward: [(0, '1.310'), (1, '1.470')] [2023-09-24 14:38:48,510][43616] Updated weights for policy 0, policy_version 2528 (0.0019) [2023-09-24 14:38:48,512][43653] Updated weights for policy 1, policy_version 2400 (0.0016) [2023-09-24 14:38:49,658][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6144.0). Total num frames: 1261568. Throughput: 0: 784.0, 1: 784.9. Samples: 306061. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:49,659][42771] Avg episode reward: [(0, '1.220'), (1, '1.420')] [2023-09-24 14:38:54,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6154.0). Total num frames: 1294336. Throughput: 0: 782.4, 1: 781.1. Samples: 315430. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:38:54,659][42771] Avg episode reward: [(0, '1.230'), (1, '1.230')] [2023-09-24 14:38:59,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6163.5). Total num frames: 1327104. Throughput: 0: 780.4, 1: 783.2. Samples: 320218. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:38:59,659][42771] Avg episode reward: [(0, '1.260'), (1, '1.280')] [2023-09-24 14:39:01,656][43653] Updated weights for policy 1, policy_version 2560 (0.0018) [2023-09-24 14:39:01,656][43616] Updated weights for policy 0, policy_version 2688 (0.0018) [2023-09-24 14:39:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6172.6). Total num frames: 1359872. Throughput: 0: 781.0, 1: 780.9. Samples: 329683. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:39:04,659][42771] Avg episode reward: [(0, '1.360'), (1, '1.510')] [2023-09-24 14:39:04,671][43303] Saving new best policy, reward=1.360! [2023-09-24 14:39:09,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6181.2). Total num frames: 1392640. Throughput: 0: 782.7, 1: 782.6. Samples: 339106. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:39:09,659][42771] Avg episode reward: [(0, '1.430'), (1, '1.580')] [2023-09-24 14:39:09,661][43303] Saving new best policy, reward=1.430! [2023-09-24 14:39:14,607][43653] Updated weights for policy 1, policy_version 2720 (0.0017) [2023-09-24 14:39:14,608][43616] Updated weights for policy 0, policy_version 2848 (0.0019) [2023-09-24 14:39:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6189.5). Total num frames: 1425408. Throughput: 0: 782.2, 1: 782.4. Samples: 343942. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:39:14,659][42771] Avg episode reward: [(0, '1.300'), (1, '1.540')] [2023-09-24 14:39:19,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6144.0, 300 sec: 6161.8). Total num frames: 1449984. Throughput: 0: 784.5, 1: 785.6. Samples: 353332. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:39:19,659][42771] Avg episode reward: [(0, '1.160'), (1, '1.590')] [2023-09-24 14:39:19,671][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000002784_712704.pth... [2023-09-24 14:39:19,710][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000002912_745472.pth... [2023-09-24 14:39:19,739][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000000128_32768.pth [2023-09-24 14:39:24,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6170.1). Total num frames: 1482752. Throughput: 0: 782.0, 1: 779.3. Samples: 362529. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:39:24,659][42771] Avg episode reward: [(0, '1.120'), (1, '1.590')] [2023-09-24 14:39:27,871][43616] Updated weights for policy 0, policy_version 3008 (0.0019) [2023-09-24 14:39:27,871][43653] Updated weights for policy 1, policy_version 2880 (0.0017) [2023-09-24 14:39:29,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6178.1). Total num frames: 1515520. Throughput: 0: 775.3, 1: 777.2. Samples: 366795. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 14:39:29,659][42771] Avg episode reward: [(0, '1.030'), (1, '1.450')] [2023-09-24 14:39:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6185.8). Total num frames: 1548288. Throughput: 0: 782.9, 1: 783.0. Samples: 376523. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 14:39:34,659][42771] Avg episode reward: [(0, '1.040'), (1, '1.230')] [2023-09-24 14:39:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6193.2). Total num frames: 1581056. Throughput: 0: 780.7, 1: 782.6. Samples: 385781. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:39:39,659][42771] Avg episode reward: [(0, '0.990'), (1, '1.400')] [2023-09-24 14:39:41,020][43653] Updated weights for policy 1, policy_version 3040 (0.0019) [2023-09-24 14:39:41,020][43616] Updated weights for policy 0, policy_version 3168 (0.0018) [2023-09-24 14:39:44,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6168.1). Total num frames: 1605632. Throughput: 0: 780.9, 1: 780.0. Samples: 390460. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 14:39:44,659][42771] Avg episode reward: [(0, '0.950'), (1, '1.490')] [2023-09-24 14:39:49,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6175.5). Total num frames: 1638400. Throughput: 0: 778.7, 1: 780.1. Samples: 399828. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:39:49,659][42771] Avg episode reward: [(0, '0.940'), (1, '1.640')] [2023-09-24 14:39:54,109][43653] Updated weights for policy 1, policy_version 3200 (0.0018) [2023-09-24 14:39:54,110][43616] Updated weights for policy 0, policy_version 3328 (0.0016) [2023-09-24 14:39:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6182.6). Total num frames: 1671168. Throughput: 0: 780.0, 1: 781.0. Samples: 409352. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:39:54,659][42771] Avg episode reward: [(0, '0.930'), (1, '1.670')] [2023-09-24 14:39:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6189.5). Total num frames: 1703936. Throughput: 0: 776.5, 1: 775.8. Samples: 413794. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:39:59,659][42771] Avg episode reward: [(0, '1.030'), (1, '1.670')] [2023-09-24 14:40:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6196.1). Total num frames: 1736704. Throughput: 0: 784.3, 1: 783.2. Samples: 423873. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:40:04,659][42771] Avg episode reward: [(0, '1.050'), (1, '1.880')] [2023-09-24 14:40:04,669][43474] Saving new best policy, reward=1.880! [2023-09-24 14:40:06,906][43653] Updated weights for policy 1, policy_version 3360 (0.0016) [2023-09-24 14:40:06,906][43616] Updated weights for policy 0, policy_version 3488 (0.0018) [2023-09-24 14:40:09,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6202.5). Total num frames: 1769472. Throughput: 0: 782.8, 1: 784.3. Samples: 433049. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:40:09,659][42771] Avg episode reward: [(0, '1.010'), (1, '1.820')] [2023-09-24 14:40:14,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6179.9). Total num frames: 1794048. Throughput: 0: 791.2, 1: 790.9. Samples: 437987. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:40:14,659][42771] Avg episode reward: [(0, '0.930'), (1, '1.960')] [2023-09-24 14:40:14,683][43474] Saving new best policy, reward=1.960! [2023-09-24 14:40:19,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6186.4). Total num frames: 1826816. Throughput: 0: 784.2, 1: 784.1. Samples: 447098. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:40:19,659][42771] Avg episode reward: [(0, '0.990'), (1, '1.840')] [2023-09-24 14:40:19,901][43653] Updated weights for policy 1, policy_version 3520 (0.0016) [2023-09-24 14:40:19,901][43616] Updated weights for policy 0, policy_version 3648 (0.0017) [2023-09-24 14:40:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6192.6). Total num frames: 1859584. Throughput: 0: 789.4, 1: 787.7. Samples: 456750. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:40:24,659][42771] Avg episode reward: [(0, '1.020'), (1, '1.870')] [2023-09-24 14:40:29,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 1892352. Throughput: 0: 789.8, 1: 790.2. Samples: 461560. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:40:29,659][42771] Avg episode reward: [(0, '0.950'), (1, '1.900')] [2023-09-24 14:40:32,701][43653] Updated weights for policy 1, policy_version 3680 (0.0018) [2023-09-24 14:40:32,701][43616] Updated weights for policy 0, policy_version 3808 (0.0018) [2023-09-24 14:40:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 1925120. Throughput: 0: 792.5, 1: 791.4. Samples: 471100. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:40:34,659][42771] Avg episode reward: [(0, '1.040'), (1, '1.950')] [2023-09-24 14:40:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 1957888. Throughput: 0: 793.7, 1: 793.0. Samples: 480751. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 14:40:39,659][42771] Avg episode reward: [(0, '1.060'), (1, '2.030')] [2023-09-24 14:40:39,659][43474] Saving new best policy, reward=2.030! [2023-09-24 14:40:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 1990656. Throughput: 0: 796.3, 1: 794.4. Samples: 485376. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:40:44,659][42771] Avg episode reward: [(0, '1.160'), (1, '2.130')] [2023-09-24 14:40:44,661][43474] Saving new best policy, reward=2.130! [2023-09-24 14:40:45,736][43653] Updated weights for policy 1, policy_version 3840 (0.0019) [2023-09-24 14:40:45,736][43616] Updated weights for policy 0, policy_version 3968 (0.0018) [2023-09-24 14:40:49,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2023424. Throughput: 0: 788.3, 1: 788.8. Samples: 494846. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:40:49,659][42771] Avg episode reward: [(0, '1.130'), (1, '2.120')] [2023-09-24 14:40:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2056192. Throughput: 0: 793.4, 1: 793.7. Samples: 504471. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:40:54,659][42771] Avg episode reward: [(0, '1.100'), (1, '2.180')] [2023-09-24 14:40:54,661][43474] Saving new best policy, reward=2.180! [2023-09-24 14:40:58,447][43653] Updated weights for policy 1, policy_version 4000 (0.0015) [2023-09-24 14:40:58,447][43616] Updated weights for policy 0, policy_version 4128 (0.0018) [2023-09-24 14:40:59,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2080768. Throughput: 0: 794.6, 1: 794.2. Samples: 509483. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:40:59,659][42771] Avg episode reward: [(0, '1.200'), (1, '2.180')] [2023-09-24 14:41:04,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2113536. Throughput: 0: 799.1, 1: 798.5. Samples: 518991. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:41:04,659][42771] Avg episode reward: [(0, '1.170'), (1, '2.210')] [2023-09-24 14:41:04,800][43474] Saving new best policy, reward=2.210! [2023-09-24 14:41:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2146304. Throughput: 0: 796.4, 1: 795.9. Samples: 528406. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:41:09,659][42771] Avg episode reward: [(0, '1.310'), (1, '2.160')] [2023-09-24 14:41:11,250][43653] Updated weights for policy 1, policy_version 4160 (0.0016) [2023-09-24 14:41:11,251][43616] Updated weights for policy 0, policy_version 4288 (0.0017) [2023-09-24 14:41:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 2179072. Throughput: 0: 798.3, 1: 797.9. Samples: 533386. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:41:14,659][42771] Avg episode reward: [(0, '1.390'), (1, '2.000')] [2023-09-24 14:41:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 2211840. Throughput: 0: 796.5, 1: 797.0. Samples: 542808. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:41:19,659][42771] Avg episode reward: [(0, '1.400'), (1, '1.910')] [2023-09-24 14:41:19,670][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000004384_1122304.pth... [2023-09-24 14:41:19,670][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000004256_1089536.pth... [2023-09-24 14:41:19,703][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000001296_331776.pth [2023-09-24 14:41:19,705][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000001424_364544.pth [2023-09-24 14:41:24,028][43653] Updated weights for policy 1, policy_version 4320 (0.0015) [2023-09-24 14:41:24,029][43616] Updated weights for policy 0, policy_version 4448 (0.0017) [2023-09-24 14:41:24,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2244608. Throughput: 0: 800.8, 1: 800.4. Samples: 552807. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:41:24,659][42771] Avg episode reward: [(0, '1.460'), (1, '2.030')] [2023-09-24 14:41:24,661][43303] Saving new best policy, reward=1.460! [2023-09-24 14:41:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2277376. Throughput: 0: 796.4, 1: 796.4. Samples: 557056. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:41:29,659][42771] Avg episode reward: [(0, '1.430'), (1, '2.140')] [2023-09-24 14:41:34,659][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2301952. Throughput: 0: 792.5, 1: 794.8. Samples: 566277. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:41:34,659][42771] Avg episode reward: [(0, '1.470'), (1, '2.200')] [2023-09-24 14:41:34,833][43303] Saving new best policy, reward=1.470! [2023-09-24 14:41:37,354][43653] Updated weights for policy 1, policy_version 4480 (0.0016) [2023-09-24 14:41:37,354][43616] Updated weights for policy 0, policy_version 4608 (0.0017) [2023-09-24 14:41:39,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2334720. Throughput: 0: 791.4, 1: 791.3. Samples: 575693. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:41:39,659][42771] Avg episode reward: [(0, '1.520'), (1, '2.230')] [2023-09-24 14:41:39,660][43303] Saving new best policy, reward=1.520! [2023-09-24 14:41:39,661][43474] Saving new best policy, reward=2.230! [2023-09-24 14:41:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 2367488. Throughput: 0: 788.2, 1: 789.0. Samples: 580458. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:41:44,659][42771] Avg episode reward: [(0, '1.470'), (1, '2.230')] [2023-09-24 14:41:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2400256. Throughput: 0: 788.0, 1: 788.4. Samples: 589930. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:41:49,660][42771] Avg episode reward: [(0, '1.500'), (1, '2.270')] [2023-09-24 14:41:49,671][43474] Saving new best policy, reward=2.270! [2023-09-24 14:41:50,251][43653] Updated weights for policy 1, policy_version 4640 (0.0018) [2023-09-24 14:41:50,251][43616] Updated weights for policy 0, policy_version 4768 (0.0019) [2023-09-24 14:41:54,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 2433024. Throughput: 0: 793.2, 1: 795.0. Samples: 599873. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:41:54,659][42771] Avg episode reward: [(0, '1.590'), (1, '2.370')] [2023-09-24 14:41:54,661][43474] Saving new best policy, reward=2.370! [2023-09-24 14:41:54,661][43303] Saving new best policy, reward=1.590! [2023-09-24 14:41:59,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2465792. Throughput: 0: 789.1, 1: 789.6. Samples: 604426. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:41:59,659][42771] Avg episode reward: [(0, '1.640'), (1, '2.150')] [2023-09-24 14:41:59,660][43303] Saving new best policy, reward=1.640! [2023-09-24 14:42:03,003][43653] Updated weights for policy 1, policy_version 4800 (0.0017) [2023-09-24 14:42:03,003][43616] Updated weights for policy 0, policy_version 4928 (0.0018) [2023-09-24 14:42:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2498560. Throughput: 0: 794.1, 1: 794.5. Samples: 614294. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:42:04,659][42771] Avg episode reward: [(0, '1.570'), (1, '2.110')] [2023-09-24 14:42:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2531328. Throughput: 0: 790.6, 1: 791.2. Samples: 623992. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:42:09,659][42771] Avg episode reward: [(0, '1.570'), (1, '1.950')] [2023-09-24 14:42:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2564096. Throughput: 0: 794.5, 1: 795.1. Samples: 628588. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:14,659][42771] Avg episode reward: [(0, '1.550'), (1, '1.890')] [2023-09-24 14:42:15,932][43653] Updated weights for policy 1, policy_version 4960 (0.0016) [2023-09-24 14:42:15,932][43616] Updated weights for policy 0, policy_version 5088 (0.0017) [2023-09-24 14:42:19,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2588672. Throughput: 0: 797.5, 1: 795.0. Samples: 637940. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:42:19,659][42771] Avg episode reward: [(0, '1.740'), (1, '1.890')] [2023-09-24 14:42:19,750][43303] Saving new best policy, reward=1.740! [2023-09-24 14:42:24,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2621440. Throughput: 0: 797.0, 1: 797.0. Samples: 647426. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:24,659][42771] Avg episode reward: [(0, '1.730'), (1, '1.890')] [2023-09-24 14:42:28,928][43616] Updated weights for policy 0, policy_version 5248 (0.0018) [2023-09-24 14:42:28,928][43653] Updated weights for policy 1, policy_version 5120 (0.0017) [2023-09-24 14:42:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2654208. Throughput: 0: 797.1, 1: 797.3. Samples: 652206. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:29,659][42771] Avg episode reward: [(0, '1.830'), (1, '1.910')] [2023-09-24 14:42:29,661][43303] Saving new best policy, reward=1.830! [2023-09-24 14:42:34,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 2686976. Throughput: 0: 796.1, 1: 794.3. Samples: 661497. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:42:34,659][42771] Avg episode reward: [(0, '1.970'), (1, '1.930')] [2023-09-24 14:42:34,671][43303] Saving new best policy, reward=1.970! [2023-09-24 14:42:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2719744. Throughput: 0: 789.3, 1: 791.2. Samples: 670994. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:39,659][42771] Avg episode reward: [(0, '2.070'), (1, '2.160')] [2023-09-24 14:42:39,660][43303] Saving new best policy, reward=2.070! [2023-09-24 14:42:41,930][43653] Updated weights for policy 1, policy_version 5280 (0.0017) [2023-09-24 14:42:41,931][43616] Updated weights for policy 0, policy_version 5408 (0.0017) [2023-09-24 14:42:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2752512. Throughput: 0: 793.8, 1: 792.3. Samples: 675798. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:44,659][42771] Avg episode reward: [(0, '2.030'), (1, '2.170')] [2023-09-24 14:42:49,658][42771] Fps is (10 sec: 6143.9, 60 sec: 6348.8, 300 sec: 6317.6). Total num frames: 2781184. Throughput: 0: 790.8, 1: 788.9. Samples: 685380. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:49,659][42771] Avg episode reward: [(0, '2.080'), (1, '2.090')] [2023-09-24 14:42:49,671][43303] Saving new best policy, reward=2.080! [2023-09-24 14:42:54,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2809856. Throughput: 0: 783.1, 1: 782.5. Samples: 694443. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:42:54,659][42771] Avg episode reward: [(0, '2.010'), (1, '2.240')] [2023-09-24 14:42:54,899][43653] Updated weights for policy 1, policy_version 5440 (0.0017) [2023-09-24 14:42:54,899][43616] Updated weights for policy 0, policy_version 5568 (0.0018) [2023-09-24 14:42:59,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2842624. Throughput: 0: 784.7, 1: 787.0. Samples: 699312. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:42:59,659][42771] Avg episode reward: [(0, '1.940'), (1, '2.340')] [2023-09-24 14:43:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2875392. Throughput: 0: 786.0, 1: 785.9. Samples: 708672. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:43:04,659][42771] Avg episode reward: [(0, '1.850'), (1, '2.250')] [2023-09-24 14:43:07,958][43653] Updated weights for policy 1, policy_version 5600 (0.0018) [2023-09-24 14:43:07,958][43616] Updated weights for policy 0, policy_version 5728 (0.0018) [2023-09-24 14:43:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2908160. Throughput: 0: 787.6, 1: 787.5. Samples: 718303. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:43:09,659][42771] Avg episode reward: [(0, '1.620'), (1, '2.350')] [2023-09-24 14:43:14,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 2940928. Throughput: 0: 787.4, 1: 784.6. Samples: 722944. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:43:14,659][42771] Avg episode reward: [(0, '1.740'), (1, '2.260')] [2023-09-24 14:43:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 2973696. Throughput: 0: 790.7, 1: 793.0. Samples: 732761. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:43:19,659][42771] Avg episode reward: [(0, '1.550'), (1, '2.340')] [2023-09-24 14:43:19,669][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000005744_1470464.pth... [2023-09-24 14:43:19,669][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000005872_1503232.pth... [2023-09-24 14:43:19,704][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000002912_745472.pth [2023-09-24 14:43:19,709][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000002784_712704.pth [2023-09-24 14:43:20,784][43616] Updated weights for policy 0, policy_version 5888 (0.0018) [2023-09-24 14:43:20,784][43653] Updated weights for policy 1, policy_version 5760 (0.0019) [2023-09-24 14:43:24,659][42771] Fps is (10 sec: 6143.8, 60 sec: 6348.8, 300 sec: 6317.5). Total num frames: 3002368. Throughput: 0: 789.7, 1: 788.2. Samples: 742001. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:43:24,659][42771] Avg episode reward: [(0, '1.430'), (1, '2.290')] [2023-09-24 14:43:29,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3031040. Throughput: 0: 785.7, 1: 787.2. Samples: 746577. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:43:29,659][42771] Avg episode reward: [(0, '1.270'), (1, '2.250')] [2023-09-24 14:43:33,861][43616] Updated weights for policy 0, policy_version 6048 (0.0018) [2023-09-24 14:43:33,862][43653] Updated weights for policy 1, policy_version 5920 (0.0018) [2023-09-24 14:43:34,658][42771] Fps is (10 sec: 6144.3, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 3063808. Throughput: 0: 782.3, 1: 784.2. Samples: 755872. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:43:34,659][42771] Avg episode reward: [(0, '1.300'), (1, '2.350')] [2023-09-24 14:43:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3096576. Throughput: 0: 792.2, 1: 790.4. Samples: 765662. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:43:39,659][42771] Avg episode reward: [(0, '1.330'), (1, '2.340')] [2023-09-24 14:43:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 3129344. Throughput: 0: 787.4, 1: 784.6. Samples: 770052. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:43:44,659][42771] Avg episode reward: [(0, '1.230'), (1, '2.390')] [2023-09-24 14:43:44,660][43474] Saving new best policy, reward=2.390! [2023-09-24 14:43:46,841][43653] Updated weights for policy 1, policy_version 6080 (0.0018) [2023-09-24 14:43:46,841][43616] Updated weights for policy 0, policy_version 6208 (0.0017) [2023-09-24 14:43:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6331.5). Total num frames: 3162112. Throughput: 0: 792.1, 1: 793.1. Samples: 780006. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:43:49,659][42771] Avg episode reward: [(0, '1.400'), (1, '2.260')] [2023-09-24 14:43:54,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3186688. Throughput: 0: 787.1, 1: 787.0. Samples: 789137. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 14:43:54,659][42771] Avg episode reward: [(0, '1.400'), (1, '2.190')] [2023-09-24 14:43:59,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3219456. Throughput: 0: 788.5, 1: 789.9. Samples: 793973. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:43:59,659][42771] Avg episode reward: [(0, '1.560'), (1, '2.260')] [2023-09-24 14:43:59,866][43616] Updated weights for policy 0, policy_version 6368 (0.0017) [2023-09-24 14:43:59,866][43653] Updated weights for policy 1, policy_version 6240 (0.0019) [2023-09-24 14:44:04,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3252224. Throughput: 0: 784.8, 1: 784.4. Samples: 803377. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:04,659][42771] Avg episode reward: [(0, '1.670'), (1, '2.190')] [2023-09-24 14:44:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3284992. Throughput: 0: 790.9, 1: 789.1. Samples: 813101. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:09,659][42771] Avg episode reward: [(0, '1.710'), (1, '2.280')] [2023-09-24 14:44:12,595][43653] Updated weights for policy 1, policy_version 6400 (0.0016) [2023-09-24 14:44:12,596][43616] Updated weights for policy 0, policy_version 6528 (0.0017) [2023-09-24 14:44:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 3317760. Throughput: 0: 793.3, 1: 793.4. Samples: 817977. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:14,659][42771] Avg episode reward: [(0, '1.750'), (1, '1.930')] [2023-09-24 14:44:19,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3350528. Throughput: 0: 795.8, 1: 794.7. Samples: 827446. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:19,659][42771] Avg episode reward: [(0, '1.750'), (1, '2.080')] [2023-09-24 14:44:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.9, 300 sec: 6331.5). Total num frames: 3383296. Throughput: 0: 794.4, 1: 796.5. Samples: 837255. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:44:24,659][42771] Avg episode reward: [(0, '1.790'), (1, '2.380')] [2023-09-24 14:44:25,415][43616] Updated weights for policy 0, policy_version 6688 (0.0017) [2023-09-24 14:44:25,415][43653] Updated weights for policy 1, policy_version 6560 (0.0019) [2023-09-24 14:44:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 3416064. Throughput: 0: 796.6, 1: 798.3. Samples: 841819. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:29,659][42771] Avg episode reward: [(0, '1.670'), (1, '2.370')] [2023-09-24 14:44:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 3448832. Throughput: 0: 795.8, 1: 795.1. Samples: 851596. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:34,659][42771] Avg episode reward: [(0, '1.590'), (1, '2.450')] [2023-09-24 14:44:34,666][43474] Saving new best policy, reward=2.450! [2023-09-24 14:44:38,287][43616] Updated weights for policy 0, policy_version 6848 (0.0016) [2023-09-24 14:44:38,288][43653] Updated weights for policy 1, policy_version 6720 (0.0016) [2023-09-24 14:44:39,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 3481600. Throughput: 0: 798.5, 1: 797.1. Samples: 860937. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 14:44:39,659][42771] Avg episode reward: [(0, '1.780'), (1, '2.560')] [2023-09-24 14:44:39,661][43474] Saving new best policy, reward=2.560! [2023-09-24 14:44:44,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3506176. Throughput: 0: 796.9, 1: 796.2. Samples: 865661. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:44:44,659][42771] Avg episode reward: [(0, '1.840'), (1, '2.640')] [2023-09-24 14:44:44,753][43474] Saving new best policy, reward=2.640! [2023-09-24 14:44:49,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3538944. Throughput: 0: 796.8, 1: 796.7. Samples: 875084. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:44:49,659][42771] Avg episode reward: [(0, '1.920'), (1, '2.780')] [2023-09-24 14:44:49,666][43474] Saving new best policy, reward=2.780! [2023-09-24 14:44:51,241][43616] Updated weights for policy 0, policy_version 7008 (0.0016) [2023-09-24 14:44:51,241][43653] Updated weights for policy 1, policy_version 6880 (0.0018) [2023-09-24 14:44:54,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 3571712. Throughput: 0: 796.4, 1: 795.5. Samples: 884737. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:44:54,659][42771] Avg episode reward: [(0, '2.080'), (1, '2.690')] [2023-09-24 14:44:59,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 3604480. Throughput: 0: 792.4, 1: 791.3. Samples: 889242. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:44:59,659][42771] Avg episode reward: [(0, '2.150'), (1, '2.620')] [2023-09-24 14:44:59,661][43303] Saving new best policy, reward=2.150! [2023-09-24 14:45:04,574][43653] Updated weights for policy 1, policy_version 7040 (0.0016) [2023-09-24 14:45:04,574][43616] Updated weights for policy 0, policy_version 7168 (0.0017) [2023-09-24 14:45:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 3637248. Throughput: 0: 792.0, 1: 792.1. Samples: 898730. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:45:04,659][42771] Avg episode reward: [(0, '2.180'), (1, '2.720')] [2023-09-24 14:45:04,671][43303] Saving new best policy, reward=2.180! [2023-09-24 14:45:09,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3661824. Throughput: 0: 781.1, 1: 780.7. Samples: 907538. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:45:09,659][42771] Avg episode reward: [(0, '2.150'), (1, '2.590')] [2023-09-24 14:45:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3694592. Throughput: 0: 784.8, 1: 785.6. Samples: 912485. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:45:14,660][42771] Avg episode reward: [(0, '1.970'), (1, '2.390')] [2023-09-24 14:45:17,477][43616] Updated weights for policy 0, policy_version 7328 (0.0018) [2023-09-24 14:45:17,477][43653] Updated weights for policy 1, policy_version 7200 (0.0016) [2023-09-24 14:45:19,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 3727360. Throughput: 0: 782.4, 1: 782.6. Samples: 922024. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 14:45:19,659][42771] Avg episode reward: [(0, '1.760'), (1, '2.410')] [2023-09-24 14:45:19,665][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000007216_1847296.pth... [2023-09-24 14:45:19,665][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000007344_1880064.pth... [2023-09-24 14:45:19,702][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000004256_1089536.pth [2023-09-24 14:45:19,703][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000004384_1122304.pth [2023-09-24 14:45:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3760128. Throughput: 0: 784.5, 1: 784.7. Samples: 931554. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:45:24,659][42771] Avg episode reward: [(0, '1.810'), (1, '2.480')] [2023-09-24 14:45:29,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3792896. Throughput: 0: 781.2, 1: 782.1. Samples: 936011. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:45:29,660][42771] Avg episode reward: [(0, '1.660'), (1, '2.500')] [2023-09-24 14:45:30,560][43616] Updated weights for policy 0, policy_version 7488 (0.0018) [2023-09-24 14:45:30,560][43653] Updated weights for policy 1, policy_version 7360 (0.0017) [2023-09-24 14:45:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3825664. Throughput: 0: 785.3, 1: 786.0. Samples: 945791. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:45:34,659][42771] Avg episode reward: [(0, '1.590'), (1, '2.380')] [2023-09-24 14:45:39,658][42771] Fps is (10 sec: 6144.1, 60 sec: 6212.3, 300 sec: 6317.6). Total num frames: 3854336. Throughput: 0: 779.8, 1: 781.9. Samples: 955011. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:45:39,659][42771] Avg episode reward: [(0, '1.700'), (1, '2.270')] [2023-09-24 14:45:43,474][43653] Updated weights for policy 1, policy_version 7520 (0.0019) [2023-09-24 14:45:43,474][43616] Updated weights for policy 0, policy_version 7648 (0.0018) [2023-09-24 14:45:44,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3883008. Throughput: 0: 785.2, 1: 786.3. Samples: 959957. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:45:44,659][42771] Avg episode reward: [(0, '1.820'), (1, '2.500')] [2023-09-24 14:45:49,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 3915776. Throughput: 0: 778.8, 1: 779.8. Samples: 968864. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:45:49,659][42771] Avg episode reward: [(0, '1.860'), (1, '2.400')] [2023-09-24 14:45:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3948544. Throughput: 0: 789.8, 1: 789.5. Samples: 978608. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:45:54,659][42771] Avg episode reward: [(0, '1.880'), (1, '2.510')] [2023-09-24 14:45:56,633][43653] Updated weights for policy 1, policy_version 7680 (0.0017) [2023-09-24 14:45:56,634][43616] Updated weights for policy 0, policy_version 7808 (0.0016) [2023-09-24 14:45:59,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 3981312. Throughput: 0: 785.6, 1: 785.5. Samples: 983186. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:45:59,659][42771] Avg episode reward: [(0, '1.710'), (1, '2.590')] [2023-09-24 14:46:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 4014080. Throughput: 0: 788.4, 1: 787.6. Samples: 992947. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:46:04,659][42771] Avg episode reward: [(0, '1.710'), (1, '2.590')] [2023-09-24 14:46:09,616][43616] Updated weights for policy 0, policy_version 7968 (0.0020) [2023-09-24 14:46:09,616][43653] Updated weights for policy 1, policy_version 7840 (0.0020) [2023-09-24 14:46:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 4046848. Throughput: 0: 785.6, 1: 786.2. Samples: 1002289. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:46:09,660][42771] Avg episode reward: [(0, '1.690'), (1, '2.520')] [2023-09-24 14:46:14,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4071424. Throughput: 0: 787.7, 1: 788.3. Samples: 1006929. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 14:46:14,659][42771] Avg episode reward: [(0, '1.500'), (1, '2.480')] [2023-09-24 14:46:19,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4104192. Throughput: 0: 779.2, 1: 776.7. Samples: 1015808. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:46:19,659][42771] Avg episode reward: [(0, '1.520'), (1, '2.530')] [2023-09-24 14:46:23,000][43653] Updated weights for policy 1, policy_version 8000 (0.0018) [2023-09-24 14:46:23,000][43616] Updated weights for policy 0, policy_version 8128 (0.0019) [2023-09-24 14:46:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4136960. Throughput: 0: 781.6, 1: 781.2. Samples: 1025337. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:46:24,659][42771] Avg episode reward: [(0, '1.590'), (1, '2.530')] [2023-09-24 14:46:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 4169728. Throughput: 0: 781.1, 1: 778.6. Samples: 1030144. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:46:29,659][42771] Avg episode reward: [(0, '1.660'), (1, '2.590')] [2023-09-24 14:46:34,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6303.7). Total num frames: 4194304. Throughput: 0: 781.8, 1: 781.1. Samples: 1039196. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:46:34,659][42771] Avg episode reward: [(0, '1.690'), (1, '2.640')] [2023-09-24 14:46:36,107][43616] Updated weights for policy 0, policy_version 8288 (0.0017) [2023-09-24 14:46:36,108][43653] Updated weights for policy 1, policy_version 8160 (0.0017) [2023-09-24 14:46:39,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6212.3, 300 sec: 6303.7). Total num frames: 4227072. Throughput: 0: 778.3, 1: 776.7. Samples: 1048581. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:46:39,659][42771] Avg episode reward: [(0, '1.650'), (1, '2.610')] [2023-09-24 14:46:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4259840. Throughput: 0: 781.5, 1: 781.7. Samples: 1053527. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:46:44,659][42771] Avg episode reward: [(0, '1.540'), (1, '2.790')] [2023-09-24 14:46:44,661][43474] Saving new best policy, reward=2.790! [2023-09-24 14:46:49,069][43616] Updated weights for policy 0, policy_version 8448 (0.0016) [2023-09-24 14:46:49,069][43653] Updated weights for policy 1, policy_version 8320 (0.0016) [2023-09-24 14:46:49,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4292608. Throughput: 0: 778.0, 1: 776.9. Samples: 1062917. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 14:46:49,659][42771] Avg episode reward: [(0, '1.500'), (1, '2.860')] [2023-09-24 14:46:49,666][43474] Saving new best policy, reward=2.860! [2023-09-24 14:46:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4325376. Throughput: 0: 781.2, 1: 782.9. Samples: 1072674. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:46:54,659][42771] Avg episode reward: [(0, '1.530'), (1, '2.810')] [2023-09-24 14:46:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 4358144. Throughput: 0: 782.4, 1: 780.4. Samples: 1077252. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:46:59,659][42771] Avg episode reward: [(0, '1.450'), (1, '2.740')] [2023-09-24 14:47:02,053][43616] Updated weights for policy 0, policy_version 8608 (0.0017) [2023-09-24 14:47:02,054][43653] Updated weights for policy 1, policy_version 8480 (0.0018) [2023-09-24 14:47:04,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4390912. Throughput: 0: 786.0, 1: 788.5. Samples: 1086657. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:04,659][42771] Avg episode reward: [(0, '1.420'), (1, '2.680')] [2023-09-24 14:47:09,658][42771] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 4415488. Throughput: 0: 785.4, 1: 786.4. Samples: 1096065. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:09,660][42771] Avg episode reward: [(0, '1.430'), (1, '2.650')] [2023-09-24 14:47:14,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4448256. Throughput: 0: 784.7, 1: 786.9. Samples: 1100870. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:14,659][42771] Avg episode reward: [(0, '1.410'), (1, '2.650')] [2023-09-24 14:47:15,094][43653] Updated weights for policy 1, policy_version 8640 (0.0017) [2023-09-24 14:47:15,095][43616] Updated weights for policy 0, policy_version 8768 (0.0016) [2023-09-24 14:47:19,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 4481024. Throughput: 0: 788.9, 1: 789.8. Samples: 1110238. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:19,659][42771] Avg episode reward: [(0, '1.470'), (1, '2.410')] [2023-09-24 14:47:19,668][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000008816_2256896.pth... [2023-09-24 14:47:19,668][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000008688_2224128.pth... [2023-09-24 14:47:19,703][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000005744_1470464.pth [2023-09-24 14:47:19,704][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000005872_1503232.pth [2023-09-24 14:47:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4513792. Throughput: 0: 793.3, 1: 796.2. Samples: 1120109. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:24,659][42771] Avg episode reward: [(0, '1.460'), (1, '2.450')] [2023-09-24 14:47:27,946][43616] Updated weights for policy 0, policy_version 8928 (0.0018) [2023-09-24 14:47:27,947][43653] Updated weights for policy 1, policy_version 8800 (0.0019) [2023-09-24 14:47:29,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4546560. Throughput: 0: 788.4, 1: 787.8. Samples: 1124455. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 14:47:29,659][42771] Avg episode reward: [(0, '1.430'), (1, '2.390')] [2023-09-24 14:47:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 4579328. Throughput: 0: 791.9, 1: 793.9. Samples: 1134280. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:47:34,659][42771] Avg episode reward: [(0, '1.460'), (1, '2.480')] [2023-09-24 14:47:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 4612096. Throughput: 0: 788.1, 1: 786.8. Samples: 1143545. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:47:39,659][42771] Avg episode reward: [(0, '1.450'), (1, '2.520')] [2023-09-24 14:47:40,945][43616] Updated weights for policy 0, policy_version 9088 (0.0016) [2023-09-24 14:47:40,946][43653] Updated weights for policy 1, policy_version 8960 (0.0016) [2023-09-24 14:47:44,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6289.8). Total num frames: 4636672. Throughput: 0: 789.7, 1: 792.1. Samples: 1148436. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:47:44,659][42771] Avg episode reward: [(0, '1.410'), (1, '2.460')] [2023-09-24 14:47:49,659][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4669440. Throughput: 0: 791.0, 1: 790.4. Samples: 1157819. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:49,660][42771] Avg episode reward: [(0, '1.380'), (1, '2.510')] [2023-09-24 14:47:53,842][43653] Updated weights for policy 1, policy_version 9120 (0.0018) [2023-09-24 14:47:53,849][43616] Updated weights for policy 0, policy_version 9248 (0.0018) [2023-09-24 14:47:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4702208. Throughput: 0: 793.6, 1: 790.8. Samples: 1167360. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:54,659][42771] Avg episode reward: [(0, '1.340'), (1, '2.720')] [2023-09-24 14:47:59,658][42771] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4734976. Throughput: 0: 787.9, 1: 788.0. Samples: 1171786. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:47:59,659][42771] Avg episode reward: [(0, '1.350'), (1, '2.650')] [2023-09-24 14:48:04,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4767744. Throughput: 0: 794.4, 1: 792.7. Samples: 1181659. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:48:04,659][42771] Avg episode reward: [(0, '1.370'), (1, '2.690')] [2023-09-24 14:48:07,203][43616] Updated weights for policy 0, policy_version 9408 (0.0018) [2023-09-24 14:48:07,203][43653] Updated weights for policy 1, policy_version 9280 (0.0018) [2023-09-24 14:48:09,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 4792320. Throughput: 0: 781.6, 1: 780.8. Samples: 1190417. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:48:09,659][42771] Avg episode reward: [(0, '1.390'), (1, '2.630')] [2023-09-24 14:48:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 4825088. Throughput: 0: 786.5, 1: 787.0. Samples: 1195263. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:48:14,659][42771] Avg episode reward: [(0, '1.390'), (1, '2.550')] [2023-09-24 14:48:19,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6289.8). Total num frames: 4857856. Throughput: 0: 783.0, 1: 783.3. Samples: 1204765. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:48:19,659][42771] Avg episode reward: [(0, '1.470'), (1, '2.540')] [2023-09-24 14:48:20,151][43616] Updated weights for policy 0, policy_version 9568 (0.0013) [2023-09-24 14:48:20,152][43653] Updated weights for policy 1, policy_version 9440 (0.0017) [2023-09-24 14:48:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 4890624. Throughput: 0: 786.4, 1: 784.6. Samples: 1214243. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:48:24,659][42771] Avg episode reward: [(0, '1.550'), (1, '2.470')] [2023-09-24 14:48:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 4923392. Throughput: 0: 780.5, 1: 778.9. Samples: 1218610. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:48:29,659][42771] Avg episode reward: [(0, '1.580'), (1, '2.540')] [2023-09-24 14:48:33,381][43653] Updated weights for policy 1, policy_version 9600 (0.0017) [2023-09-24 14:48:33,381][43616] Updated weights for policy 0, policy_version 9728 (0.0017) [2023-09-24 14:48:34,658][42771] Fps is (10 sec: 6143.9, 60 sec: 6212.2, 300 sec: 6289.8). Total num frames: 4952064. Throughput: 0: 778.7, 1: 779.5. Samples: 1227940. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:48:34,659][42771] Avg episode reward: [(0, '1.590'), (1, '2.610')] [2023-09-24 14:48:39,658][42771] Fps is (10 sec: 5734.2, 60 sec: 6144.0, 300 sec: 6275.9). Total num frames: 4980736. Throughput: 0: 777.2, 1: 778.9. Samples: 1237385. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:48:39,659][42771] Avg episode reward: [(0, '1.600'), (1, '2.700')] [2023-09-24 14:48:44,659][42771] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5013504. Throughput: 0: 783.7, 1: 784.6. Samples: 1242361. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:48:44,659][42771] Avg episode reward: [(0, '1.550'), (1, '2.570')] [2023-09-24 14:48:46,114][43653] Updated weights for policy 1, policy_version 9760 (0.0017) [2023-09-24 14:48:46,115][43616] Updated weights for policy 0, policy_version 9888 (0.0018) [2023-09-24 14:48:49,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 5046272. Throughput: 0: 779.0, 1: 780.9. Samples: 1251853. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:48:49,659][42771] Avg episode reward: [(0, '1.590'), (1, '2.710')] [2023-09-24 14:48:54,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5079040. Throughput: 0: 791.7, 1: 789.4. Samples: 1261568. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:48:54,659][42771] Avg episode reward: [(0, '1.600'), (1, '2.610')] [2023-09-24 14:48:58,935][43653] Updated weights for policy 1, policy_version 9920 (0.0015) [2023-09-24 14:48:58,936][43616] Updated weights for policy 0, policy_version 10048 (0.0019) [2023-09-24 14:48:59,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5111808. Throughput: 0: 789.1, 1: 789.2. Samples: 1266285. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:48:59,659][42771] Avg episode reward: [(0, '1.600'), (1, '2.330')] [2023-09-24 14:49:04,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5144576. Throughput: 0: 791.7, 1: 789.2. Samples: 1275904. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:49:04,659][42771] Avg episode reward: [(0, '1.550'), (1, '2.230')] [2023-09-24 14:49:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5177344. Throughput: 0: 791.5, 1: 793.6. Samples: 1285571. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:49:09,659][42771] Avg episode reward: [(0, '1.600'), (1, '2.300')] [2023-09-24 14:49:11,819][43653] Updated weights for policy 1, policy_version 10080 (0.0019) [2023-09-24 14:49:11,819][43616] Updated weights for policy 0, policy_version 10208 (0.0018) [2023-09-24 14:49:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5210112. Throughput: 0: 796.4, 1: 795.4. Samples: 1290241. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:49:14,659][42771] Avg episode reward: [(0, '1.660'), (1, '2.410')] [2023-09-24 14:49:19,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5242880. Throughput: 0: 802.0, 1: 801.8. Samples: 1300110. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:49:19,659][42771] Avg episode reward: [(0, '1.770'), (1, '2.200')] [2023-09-24 14:49:19,670][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000010304_2637824.pth... [2023-09-24 14:49:19,670][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000010176_2605056.pth... [2023-09-24 14:49:19,698][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000007344_1880064.pth [2023-09-24 14:49:19,705][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000007216_1847296.pth [2023-09-24 14:49:24,514][43653] Updated weights for policy 1, policy_version 10240 (0.0016) [2023-09-24 14:49:24,516][43616] Updated weights for policy 0, policy_version 10368 (0.0018) [2023-09-24 14:49:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5275648. Throughput: 0: 801.5, 1: 802.1. Samples: 1309549. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:49:24,659][42771] Avg episode reward: [(0, '1.840'), (1, '2.270')] [2023-09-24 14:49:29,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6275.9). Total num frames: 5300224. Throughput: 0: 802.0, 1: 802.4. Samples: 1314558. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:49:29,659][42771] Avg episode reward: [(0, '1.800'), (1, '2.350')] [2023-09-24 14:49:34,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6348.8, 300 sec: 6275.9). Total num frames: 5332992. Throughput: 0: 798.2, 1: 798.1. Samples: 1323687. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 14:49:34,659][42771] Avg episode reward: [(0, '1.890'), (1, '2.460')] [2023-09-24 14:49:37,459][43653] Updated weights for policy 1, policy_version 10400 (0.0017) [2023-09-24 14:49:37,460][43616] Updated weights for policy 0, policy_version 10528 (0.0016) [2023-09-24 14:49:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5365760. Throughput: 0: 796.5, 1: 797.0. Samples: 1333276. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 14:49:39,659][42771] Avg episode reward: [(0, '1.940'), (1, '2.310')] [2023-09-24 14:49:44,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5398528. Throughput: 0: 798.0, 1: 797.0. Samples: 1338062. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:49:44,659][42771] Avg episode reward: [(0, '1.930'), (1, '2.270')] [2023-09-24 14:49:49,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5431296. Throughput: 0: 796.4, 1: 796.5. Samples: 1347588. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:49:49,659][42771] Avg episode reward: [(0, '1.920'), (1, '2.330')] [2023-09-24 14:49:50,365][43653] Updated weights for policy 1, policy_version 10560 (0.0015) [2023-09-24 14:49:50,366][43616] Updated weights for policy 0, policy_version 10688 (0.0018) [2023-09-24 14:49:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5464064. Throughput: 0: 798.1, 1: 798.2. Samples: 1357404. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:49:54,659][42771] Avg episode reward: [(0, '1.940'), (1, '2.360')] [2023-09-24 14:49:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5496832. Throughput: 0: 796.5, 1: 798.5. Samples: 1362016. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:49:59,659][42771] Avg episode reward: [(0, '1.930'), (1, '2.370')] [2023-09-24 14:50:03,258][43653] Updated weights for policy 1, policy_version 10720 (0.0016) [2023-09-24 14:50:03,259][43616] Updated weights for policy 0, policy_version 10848 (0.0016) [2023-09-24 14:50:04,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6331.5). Total num frames: 5529600. Throughput: 0: 795.2, 1: 793.0. Samples: 1371576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:04,659][42771] Avg episode reward: [(0, '2.080'), (1, '2.120')] [2023-09-24 14:50:09,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5554176. Throughput: 0: 792.2, 1: 792.4. Samples: 1380854. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:50:09,659][42771] Avg episode reward: [(0, '2.020'), (1, '2.200')] [2023-09-24 14:50:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5586944. Throughput: 0: 790.3, 1: 788.9. Samples: 1385622. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:50:14,659][42771] Avg episode reward: [(0, '1.960'), (1, '2.310')] [2023-09-24 14:50:16,251][43616] Updated weights for policy 0, policy_version 11008 (0.0019) [2023-09-24 14:50:16,251][43653] Updated weights for policy 1, policy_version 10880 (0.0019) [2023-09-24 14:50:19,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5619712. Throughput: 0: 793.4, 1: 792.6. Samples: 1395053. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:19,659][42771] Avg episode reward: [(0, '2.040'), (1, '2.270')] [2023-09-24 14:50:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5652480. Throughput: 0: 791.6, 1: 794.0. Samples: 1404628. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:24,659][42771] Avg episode reward: [(0, '2.070'), (1, '2.380')] [2023-09-24 14:50:29,241][43653] Updated weights for policy 1, policy_version 11040 (0.0017) [2023-09-24 14:50:29,242][43616] Updated weights for policy 0, policy_version 11168 (0.0017) [2023-09-24 14:50:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 5685248. Throughput: 0: 789.6, 1: 790.3. Samples: 1409159. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:29,659][42771] Avg episode reward: [(0, '2.300'), (1, '2.340')] [2023-09-24 14:50:29,659][43303] Saving new best policy, reward=2.300! [2023-09-24 14:50:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6317.6). Total num frames: 5718016. Throughput: 0: 787.7, 1: 789.0. Samples: 1418537. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:34,659][42771] Avg episode reward: [(0, '2.220'), (1, '2.490')] [2023-09-24 14:50:39,658][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5742592. Throughput: 0: 783.0, 1: 783.3. Samples: 1427886. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:39,659][42771] Avg episode reward: [(0, '2.180'), (1, '2.500')] [2023-09-24 14:50:42,500][43653] Updated weights for policy 1, policy_version 11200 (0.0019) [2023-09-24 14:50:42,500][43616] Updated weights for policy 0, policy_version 11328 (0.0019) [2023-09-24 14:50:44,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5775360. Throughput: 0: 782.8, 1: 783.4. Samples: 1432494. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:44,659][42771] Avg episode reward: [(0, '2.140'), (1, '2.290')] [2023-09-24 14:50:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5808128. Throughput: 0: 780.4, 1: 780.0. Samples: 1441795. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:50:49,659][42771] Avg episode reward: [(0, '2.070'), (1, '2.320')] [2023-09-24 14:50:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 5840896. Throughput: 0: 787.2, 1: 787.7. Samples: 1451724. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:50:54,659][42771] Avg episode reward: [(0, '2.180'), (1, '2.240')] [2023-09-24 14:50:55,408][43653] Updated weights for policy 1, policy_version 11360 (0.0018) [2023-09-24 14:50:55,408][43616] Updated weights for policy 0, policy_version 11488 (0.0016) [2023-09-24 14:50:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5873664. Throughput: 0: 784.6, 1: 784.2. Samples: 1456214. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:50:59,659][42771] Avg episode reward: [(0, '1.970'), (1, '2.170')] [2023-09-24 14:51:04,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5906432. Throughput: 0: 788.2, 1: 789.5. Samples: 1466050. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:51:04,660][42771] Avg episode reward: [(0, '2.100'), (1, '2.090')] [2023-09-24 14:51:08,263][43616] Updated weights for policy 0, policy_version 11648 (0.0018) [2023-09-24 14:51:08,264][43653] Updated weights for policy 1, policy_version 11520 (0.0015) [2023-09-24 14:51:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 5939200. Throughput: 0: 787.2, 1: 786.0. Samples: 1475425. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:51:09,659][42771] Avg episode reward: [(0, '2.250'), (1, '2.080')] [2023-09-24 14:51:14,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5963776. Throughput: 0: 787.7, 1: 785.1. Samples: 1479936. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:51:14,659][42771] Avg episode reward: [(0, '2.390'), (1, '2.290')] [2023-09-24 14:51:14,660][43303] Saving new best policy, reward=2.390! [2023-09-24 14:51:19,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 5996544. Throughput: 0: 784.6, 1: 785.8. Samples: 1489202. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:51:19,659][42771] Avg episode reward: [(0, '2.530'), (1, '2.250')] [2023-09-24 14:51:19,672][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000011648_2981888.pth... [2023-09-24 14:51:19,672][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000011776_3014656.pth... [2023-09-24 14:51:19,710][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000008688_2224128.pth [2023-09-24 14:51:19,713][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000008816_2256896.pth [2023-09-24 14:51:19,718][43303] Saving new best policy, reward=2.530! [2023-09-24 14:51:21,503][43616] Updated weights for policy 0, policy_version 11808 (0.0017) [2023-09-24 14:51:21,503][43653] Updated weights for policy 1, policy_version 11680 (0.0017) [2023-09-24 14:51:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6029312. Throughput: 0: 788.1, 1: 788.1. Samples: 1498816. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 14:51:24,659][42771] Avg episode reward: [(0, '2.500'), (1, '2.360')] [2023-09-24 14:51:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6062080. Throughput: 0: 787.8, 1: 787.5. Samples: 1503381. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 14:51:29,659][42771] Avg episode reward: [(0, '2.610'), (1, '2.430')] [2023-09-24 14:51:29,661][43303] Saving new best policy, reward=2.610! [2023-09-24 14:51:34,452][43653] Updated weights for policy 1, policy_version 11840 (0.0020) [2023-09-24 14:51:34,452][43616] Updated weights for policy 0, policy_version 11968 (0.0018) [2023-09-24 14:51:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6094848. Throughput: 0: 789.2, 1: 793.8. Samples: 1513029. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:51:34,659][42771] Avg episode reward: [(0, '2.570'), (1, '2.580')] [2023-09-24 14:51:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 6127616. Throughput: 0: 787.6, 1: 787.1. Samples: 1522582. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 14:51:39,659][42771] Avg episode reward: [(0, '2.520'), (1, '2.590')] [2023-09-24 14:51:44,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6152192. Throughput: 0: 790.3, 1: 791.4. Samples: 1527387. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:51:44,659][42771] Avg episode reward: [(0, '2.500'), (1, '2.580')] [2023-09-24 14:51:47,340][43653] Updated weights for policy 1, policy_version 12000 (0.0015) [2023-09-24 14:51:47,341][43616] Updated weights for policy 0, policy_version 12128 (0.0018) [2023-09-24 14:51:49,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 6184960. Throughput: 0: 784.9, 1: 785.0. Samples: 1536695. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:51:49,659][42771] Avg episode reward: [(0, '2.290'), (1, '2.590')] [2023-09-24 14:51:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6217728. Throughput: 0: 787.7, 1: 786.0. Samples: 1546240. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:51:54,659][42771] Avg episode reward: [(0, '2.330'), (1, '2.480')] [2023-09-24 14:51:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 6250496. Throughput: 0: 784.8, 1: 787.0. Samples: 1550668. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:51:59,659][42771] Avg episode reward: [(0, '2.310'), (1, '2.550')] [2023-09-24 14:52:00,299][43653] Updated weights for policy 1, policy_version 12160 (0.0017) [2023-09-24 14:52:00,299][43616] Updated weights for policy 0, policy_version 12288 (0.0017) [2023-09-24 14:52:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6283264. Throughput: 0: 794.3, 1: 791.8. Samples: 1560576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:52:04,659][42771] Avg episode reward: [(0, '2.210'), (1, '2.680')] [2023-09-24 14:52:09,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6316032. Throughput: 0: 789.7, 1: 790.0. Samples: 1569902. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:52:09,659][42771] Avg episode reward: [(0, '2.190'), (1, '2.800')] [2023-09-24 14:52:13,455][43653] Updated weights for policy 1, policy_version 12320 (0.0019) [2023-09-24 14:52:13,455][43616] Updated weights for policy 0, policy_version 12448 (0.0019) [2023-09-24 14:52:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6340608. Throughput: 0: 789.5, 1: 789.8. Samples: 1574450. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:52:14,659][42771] Avg episode reward: [(0, '2.210'), (1, '2.770')] [2023-09-24 14:52:19,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 6373376. Throughput: 0: 785.7, 1: 782.9. Samples: 1583616. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:52:19,659][42771] Avg episode reward: [(0, '2.260'), (1, '2.750')] [2023-09-24 14:52:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6406144. Throughput: 0: 785.2, 1: 784.9. Samples: 1593238. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:52:24,659][42771] Avg episode reward: [(0, '2.250'), (1, '2.690')] [2023-09-24 14:52:26,514][43616] Updated weights for policy 0, policy_version 12608 (0.0016) [2023-09-24 14:52:26,514][43653] Updated weights for policy 1, policy_version 12480 (0.0018) [2023-09-24 14:52:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 6438912. Throughput: 0: 782.9, 1: 782.4. Samples: 1597825. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:52:29,659][42771] Avg episode reward: [(0, '2.220'), (1, '2.690')] [2023-09-24 14:52:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6471680. Throughput: 0: 788.6, 1: 786.7. Samples: 1607582. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:52:34,659][42771] Avg episode reward: [(0, '2.150'), (1, '2.500')] [2023-09-24 14:52:39,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6144.0, 300 sec: 6303.7). Total num frames: 6496256. Throughput: 0: 777.0, 1: 779.5. Samples: 1616286. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:52:39,659][42771] Avg episode reward: [(0, '2.110'), (1, '2.630')] [2023-09-24 14:52:39,774][43653] Updated weights for policy 1, policy_version 12640 (0.0016) [2023-09-24 14:52:39,774][43616] Updated weights for policy 0, policy_version 12768 (0.0016) [2023-09-24 14:52:44,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6529024. Throughput: 0: 783.0, 1: 782.8. Samples: 1621128. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:52:44,659][42771] Avg episode reward: [(0, '2.250'), (1, '2.640')] [2023-09-24 14:52:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6561792. Throughput: 0: 776.8, 1: 778.6. Samples: 1630567. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:52:49,659][42771] Avg episode reward: [(0, '2.180'), (1, '2.790')] [2023-09-24 14:52:52,671][43616] Updated weights for policy 0, policy_version 12928 (0.0017) [2023-09-24 14:52:52,671][43653] Updated weights for policy 1, policy_version 12800 (0.0016) [2023-09-24 14:52:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 6594560. Throughput: 0: 784.3, 1: 782.4. Samples: 1640402. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:52:54,659][42771] Avg episode reward: [(0, '2.240'), (1, '2.760')] [2023-09-24 14:52:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6627328. Throughput: 0: 781.7, 1: 781.6. Samples: 1644798. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:52:59,659][42771] Avg episode reward: [(0, '2.250'), (1, '2.780')] [2023-09-24 14:53:04,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6660096. Throughput: 0: 786.6, 1: 789.8. Samples: 1654553. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 14:53:04,659][42771] Avg episode reward: [(0, '2.180'), (1, '2.960')] [2023-09-24 14:53:04,671][43474] Saving new best policy, reward=2.960! [2023-09-24 14:53:05,658][43616] Updated weights for policy 0, policy_version 13088 (0.0015) [2023-09-24 14:53:05,658][43653] Updated weights for policy 1, policy_version 12960 (0.0018) [2023-09-24 14:53:09,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6692864. Throughput: 0: 784.0, 1: 784.0. Samples: 1663799. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:53:09,660][42771] Avg episode reward: [(0, '2.210'), (1, '2.990')] [2023-09-24 14:53:09,661][43474] Saving new best policy, reward=2.990! [2023-09-24 14:53:14,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 6717440. Throughput: 0: 788.5, 1: 788.4. Samples: 1668785. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:53:14,659][42771] Avg episode reward: [(0, '2.310'), (1, '2.840')] [2023-09-24 14:53:18,478][43653] Updated weights for policy 1, policy_version 13120 (0.0017) [2023-09-24 14:53:18,479][43616] Updated weights for policy 0, policy_version 13248 (0.0018) [2023-09-24 14:53:19,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6750208. Throughput: 0: 785.6, 1: 786.8. Samples: 1678341. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:53:19,660][42771] Avg episode reward: [(0, '2.150'), (1, '2.900')] [2023-09-24 14:53:19,711][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000013136_3362816.pth... [2023-09-24 14:53:19,732][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000013264_3395584.pth... [2023-09-24 14:53:19,744][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000010176_2605056.pth [2023-09-24 14:53:19,759][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000010304_2637824.pth [2023-09-24 14:53:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6782976. Throughput: 0: 794.1, 1: 793.9. Samples: 1687746. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:53:24,659][42771] Avg episode reward: [(0, '2.080'), (1, '2.750')] [2023-09-24 14:53:29,658][42771] Fps is (10 sec: 6553.9, 60 sec: 6280.5, 300 sec: 6317.6). Total num frames: 6815744. Throughput: 0: 796.6, 1: 795.9. Samples: 1692792. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:53:29,659][42771] Avg episode reward: [(0, '2.080'), (1, '2.680')] [2023-09-24 14:53:31,459][43653] Updated weights for policy 1, policy_version 13280 (0.0018) [2023-09-24 14:53:31,459][43616] Updated weights for policy 0, policy_version 13408 (0.0018) [2023-09-24 14:53:34,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 6848512. Throughput: 0: 793.4, 1: 791.5. Samples: 1701888. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:53:34,660][42771] Avg episode reward: [(0, '2.240'), (1, '2.500')] [2023-09-24 14:53:39,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 6881280. Throughput: 0: 789.4, 1: 790.2. Samples: 1711481. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:53:39,659][42771] Avg episode reward: [(0, '2.370'), (1, '2.480')] [2023-09-24 14:53:44,638][43653] Updated weights for policy 1, policy_version 13440 (0.0017) [2023-09-24 14:53:44,639][43616] Updated weights for policy 0, policy_version 13568 (0.0016) [2023-09-24 14:53:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 6914048. Throughput: 0: 792.4, 1: 792.4. Samples: 1716117. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:53:44,659][42771] Avg episode reward: [(0, '2.380'), (1, '2.530')] [2023-09-24 14:53:49,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6938624. Throughput: 0: 788.5, 1: 785.8. Samples: 1725397. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:53:49,660][42771] Avg episode reward: [(0, '2.500'), (1, '2.450')] [2023-09-24 14:53:54,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 6971392. Throughput: 0: 789.2, 1: 789.2. Samples: 1734824. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 14:53:54,659][42771] Avg episode reward: [(0, '2.560'), (1, '2.650')] [2023-09-24 14:53:57,393][43653] Updated weights for policy 1, policy_version 13600 (0.0017) [2023-09-24 14:53:57,393][43616] Updated weights for policy 0, policy_version 13728 (0.0017) [2023-09-24 14:53:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7004160. Throughput: 0: 789.8, 1: 790.1. Samples: 1739880. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:53:59,659][42771] Avg episode reward: [(0, '2.470'), (1, '2.560')] [2023-09-24 14:54:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7036928. Throughput: 0: 786.5, 1: 786.3. Samples: 1749116. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:54:04,659][42771] Avg episode reward: [(0, '2.700'), (1, '2.620')] [2023-09-24 14:54:04,671][43303] Saving new best policy, reward=2.700! [2023-09-24 14:54:09,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 7069696. Throughput: 0: 787.1, 1: 787.0. Samples: 1758580. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 14:54:09,659][42771] Avg episode reward: [(0, '2.530'), (1, '2.700')] [2023-09-24 14:54:10,579][43616] Updated weights for policy 0, policy_version 13888 (0.0018) [2023-09-24 14:54:10,579][43653] Updated weights for policy 1, policy_version 13760 (0.0016) [2023-09-24 14:54:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7102464. Throughput: 0: 784.3, 1: 783.2. Samples: 1763328. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 14:54:14,659][42771] Avg episode reward: [(0, '2.460'), (1, '2.710')] [2023-09-24 14:54:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7135232. Throughput: 0: 787.9, 1: 790.8. Samples: 1772930. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:54:19,659][42771] Avg episode reward: [(0, '2.360'), (1, '2.720')] [2023-09-24 14:54:23,676][43616] Updated weights for policy 0, policy_version 14048 (0.0018) [2023-09-24 14:54:23,676][43653] Updated weights for policy 1, policy_version 13920 (0.0019) [2023-09-24 14:54:24,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7159808. Throughput: 0: 781.9, 1: 781.7. Samples: 1781842. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:54:24,659][42771] Avg episode reward: [(0, '2.380'), (1, '2.740')] [2023-09-24 14:54:29,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7192576. Throughput: 0: 780.8, 1: 781.7. Samples: 1786432. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:54:29,659][42771] Avg episode reward: [(0, '2.220'), (1, '2.610')] [2023-09-24 14:54:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7225344. Throughput: 0: 786.7, 1: 784.5. Samples: 1796101. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:54:34,660][42771] Avg episode reward: [(0, '2.080'), (1, '2.800')] [2023-09-24 14:54:36,617][43616] Updated weights for policy 0, policy_version 14208 (0.0018) [2023-09-24 14:54:36,617][43653] Updated weights for policy 1, policy_version 14080 (0.0017) [2023-09-24 14:54:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7258112. Throughput: 0: 790.2, 1: 790.7. Samples: 1805966. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:54:39,659][42771] Avg episode reward: [(0, '2.140'), (1, '2.910')] [2023-09-24 14:54:44,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7290880. Throughput: 0: 785.1, 1: 783.7. Samples: 1810479. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:54:44,659][42771] Avg episode reward: [(0, '2.100'), (1, '2.790')] [2023-09-24 14:54:49,222][43653] Updated weights for policy 1, policy_version 14240 (0.0019) [2023-09-24 14:54:49,222][43616] Updated weights for policy 0, policy_version 14368 (0.0019) [2023-09-24 14:54:49,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7323648. Throughput: 0: 794.5, 1: 793.9. Samples: 1820593. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:54:49,659][42771] Avg episode reward: [(0, '2.140'), (1, '2.780')] [2023-09-24 14:54:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7356416. Throughput: 0: 792.6, 1: 792.3. Samples: 1829898. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:54:54,659][42771] Avg episode reward: [(0, '2.050'), (1, '2.830')] [2023-09-24 14:54:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7389184. Throughput: 0: 794.2, 1: 796.4. Samples: 1834909. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:54:59,659][42771] Avg episode reward: [(0, '1.990'), (1, '2.650')] [2023-09-24 14:55:02,084][43616] Updated weights for policy 0, policy_version 14528 (0.0017) [2023-09-24 14:55:02,085][43653] Updated weights for policy 1, policy_version 14400 (0.0018) [2023-09-24 14:55:04,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7413760. Throughput: 0: 794.5, 1: 792.6. Samples: 1844348. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:55:04,659][42771] Avg episode reward: [(0, '2.220'), (1, '2.670')] [2023-09-24 14:55:09,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7446528. Throughput: 0: 797.0, 1: 797.8. Samples: 1853608. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:55:09,659][42771] Avg episode reward: [(0, '2.300'), (1, '2.430')] [2023-09-24 14:55:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7479296. Throughput: 0: 803.2, 1: 802.2. Samples: 1858678. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:55:14,659][42771] Avg episode reward: [(0, '2.390'), (1, '2.420')] [2023-09-24 14:55:14,899][43653] Updated weights for policy 1, policy_version 14560 (0.0016) [2023-09-24 14:55:14,899][43616] Updated weights for policy 0, policy_version 14688 (0.0016) [2023-09-24 14:55:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7512064. Throughput: 0: 802.4, 1: 804.8. Samples: 1868424. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:55:19,659][42771] Avg episode reward: [(0, '2.480'), (1, '2.530')] [2023-09-24 14:55:19,671][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000014736_3772416.pth... [2023-09-24 14:55:19,704][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000011776_3014656.pth [2023-09-24 14:55:19,859][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000014624_3743744.pth... [2023-09-24 14:55:19,886][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000011648_2981888.pth [2023-09-24 14:55:24,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7544832. Throughput: 0: 802.1, 1: 801.0. Samples: 1878106. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:55:24,659][42771] Avg episode reward: [(0, '2.500'), (1, '2.570')] [2023-09-24 14:55:27,751][43616] Updated weights for policy 0, policy_version 14848 (0.0017) [2023-09-24 14:55:27,752][43653] Updated weights for policy 1, policy_version 14720 (0.0018) [2023-09-24 14:55:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7577600. Throughput: 0: 802.4, 1: 802.5. Samples: 1882700. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:55:29,659][42771] Avg episode reward: [(0, '2.540'), (1, '2.620')] [2023-09-24 14:55:34,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 7610368. Throughput: 0: 798.2, 1: 796.4. Samples: 1892351. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 14:55:34,659][42771] Avg episode reward: [(0, '2.740'), (1, '2.670')] [2023-09-24 14:55:34,669][43303] Saving new best policy, reward=2.740! [2023-09-24 14:55:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 7643136. Throughput: 0: 793.6, 1: 794.5. Samples: 1901361. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:55:39,659][42771] Avg episode reward: [(0, '2.650'), (1, '2.600')] [2023-09-24 14:55:40,813][43653] Updated weights for policy 1, policy_version 14880 (0.0018) [2023-09-24 14:55:40,814][43616] Updated weights for policy 0, policy_version 15008 (0.0018) [2023-09-24 14:55:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 7675904. Throughput: 0: 794.4, 1: 795.1. Samples: 1906435. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:55:44,659][42771] Avg episode reward: [(0, '2.610'), (1, '2.860')] [2023-09-24 14:55:49,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7700480. Throughput: 0: 792.0, 1: 794.0. Samples: 1915718. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:55:49,660][42771] Avg episode reward: [(0, '2.510'), (1, '2.760')] [2023-09-24 14:55:53,659][43653] Updated weights for policy 1, policy_version 15040 (0.0018) [2023-09-24 14:55:53,659][43616] Updated weights for policy 0, policy_version 15168 (0.0018) [2023-09-24 14:55:54,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7733248. Throughput: 0: 795.9, 1: 795.5. Samples: 1925222. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:55:54,659][42771] Avg episode reward: [(0, '2.540'), (1, '2.770')] [2023-09-24 14:55:59,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7766016. Throughput: 0: 795.0, 1: 795.2. Samples: 1930238. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:55:59,659][42771] Avg episode reward: [(0, '2.450'), (1, '2.810')] [2023-09-24 14:56:04,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 7798784. Throughput: 0: 790.6, 1: 790.0. Samples: 1939547. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:04,659][42771] Avg episode reward: [(0, '2.550'), (1, '2.650')] [2023-09-24 14:56:06,506][43616] Updated weights for policy 0, policy_version 15328 (0.0018) [2023-09-24 14:56:06,507][43653] Updated weights for policy 1, policy_version 15200 (0.0014) [2023-09-24 14:56:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 7831552. Throughput: 0: 790.6, 1: 792.3. Samples: 1949335. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:09,659][42771] Avg episode reward: [(0, '2.560'), (1, '2.750')] [2023-09-24 14:56:14,658][42771] Fps is (10 sec: 6143.8, 60 sec: 6348.8, 300 sec: 6317.6). Total num frames: 7860224. Throughput: 0: 790.1, 1: 789.3. Samples: 1953776. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:14,659][42771] Avg episode reward: [(0, '2.690'), (1, '2.660')] [2023-09-24 14:56:19,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7888896. Throughput: 0: 781.0, 1: 781.8. Samples: 1962681. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:19,659][42771] Avg episode reward: [(0, '2.830'), (1, '2.590')] [2023-09-24 14:56:19,668][43303] Saving new best policy, reward=2.830! [2023-09-24 14:56:20,064][43653] Updated weights for policy 1, policy_version 15360 (0.0017) [2023-09-24 14:56:20,064][43616] Updated weights for policy 0, policy_version 15488 (0.0020) [2023-09-24 14:56:24,658][42771] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7921664. Throughput: 0: 786.0, 1: 785.9. Samples: 1972099. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:24,659][42771] Avg episode reward: [(0, '2.810'), (1, '2.530')] [2023-09-24 14:56:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7954432. Throughput: 0: 780.2, 1: 779.7. Samples: 1976632. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 14:56:29,659][42771] Avg episode reward: [(0, '2.880'), (1, '2.360')] [2023-09-24 14:56:29,661][43303] Saving new best policy, reward=2.880! [2023-09-24 14:56:33,203][43616] Updated weights for policy 0, policy_version 15648 (0.0019) [2023-09-24 14:56:33,204][43653] Updated weights for policy 1, policy_version 15520 (0.0020) [2023-09-24 14:56:34,659][42771] Fps is (10 sec: 6553.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 7987200. Throughput: 0: 781.4, 1: 780.3. Samples: 1985993. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 14:56:34,660][42771] Avg episode reward: [(0, '2.960'), (1, '2.450')] [2023-09-24 14:56:34,669][43303] Saving new best policy, reward=2.960! [2023-09-24 14:56:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8019968. Throughput: 0: 783.1, 1: 783.5. Samples: 1995721. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:39,659][42771] Avg episode reward: [(0, '2.820'), (1, '2.550')] [2023-09-24 14:56:44,659][42771] Fps is (10 sec: 5734.5, 60 sec: 6144.0, 300 sec: 6303.7). Total num frames: 8044544. Throughput: 0: 781.2, 1: 781.9. Samples: 2000576. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:56:44,659][42771] Avg episode reward: [(0, '2.870'), (1, '2.530')] [2023-09-24 14:56:46,199][43653] Updated weights for policy 1, policy_version 15680 (0.0018) [2023-09-24 14:56:46,199][43616] Updated weights for policy 0, policy_version 15808 (0.0018) [2023-09-24 14:56:49,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 8077312. Throughput: 0: 777.6, 1: 777.6. Samples: 2009530. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:56:49,659][42771] Avg episode reward: [(0, '2.740'), (1, '2.460')] [2023-09-24 14:56:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 8110080. Throughput: 0: 778.3, 1: 776.0. Samples: 2019276. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:56:54,659][42771] Avg episode reward: [(0, '2.640'), (1, '2.580')] [2023-09-24 14:56:59,150][43616] Updated weights for policy 0, policy_version 15968 (0.0018) [2023-09-24 14:56:59,150][43653] Updated weights for policy 1, policy_version 15840 (0.0018) [2023-09-24 14:56:59,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 8142848. Throughput: 0: 776.0, 1: 778.0. Samples: 2023705. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:56:59,659][42771] Avg episode reward: [(0, '2.660'), (1, '2.540')] [2023-09-24 14:57:04,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 8175616. Throughput: 0: 788.9, 1: 788.3. Samples: 2033654. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:04,659][42771] Avg episode reward: [(0, '2.690'), (1, '2.600')] [2023-09-24 14:57:09,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8208384. Throughput: 0: 791.0, 1: 790.4. Samples: 2043259. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:09,659][42771] Avg episode reward: [(0, '2.600'), (1, '2.830')] [2023-09-24 14:57:11,820][43616] Updated weights for policy 0, policy_version 16128 (0.0016) [2023-09-24 14:57:11,820][43653] Updated weights for policy 1, policy_version 16000 (0.0017) [2023-09-24 14:57:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6331.4). Total num frames: 8241152. Throughput: 0: 794.1, 1: 791.9. Samples: 2048002. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:14,659][42771] Avg episode reward: [(0, '2.720'), (1, '2.810')] [2023-09-24 14:57:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8273920. Throughput: 0: 798.9, 1: 798.3. Samples: 2057868. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:19,659][42771] Avg episode reward: [(0, '3.000'), (1, '2.980')] [2023-09-24 14:57:19,667][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000016224_4153344.pth... [2023-09-24 14:57:19,668][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000016096_4120576.pth... [2023-09-24 14:57:19,699][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000013264_3395584.pth [2023-09-24 14:57:19,702][43303] Saving new best policy, reward=3.000! [2023-09-24 14:57:19,703][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000013136_3362816.pth [2023-09-24 14:57:24,569][43653] Updated weights for policy 1, policy_version 16160 (0.0017) [2023-09-24 14:57:24,569][43616] Updated weights for policy 0, policy_version 16288 (0.0016) [2023-09-24 14:57:24,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 8306688. Throughput: 0: 796.0, 1: 795.1. Samples: 2067321. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:24,659][42771] Avg episode reward: [(0, '2.930'), (1, '2.900')] [2023-09-24 14:57:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8339456. Throughput: 0: 797.8, 1: 796.8. Samples: 2072335. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:29,659][42771] Avg episode reward: [(0, '2.890'), (1, '2.980')] [2023-09-24 14:57:34,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 8364032. Throughput: 0: 802.2, 1: 802.5. Samples: 2081740. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:34,659][42771] Avg episode reward: [(0, '3.030'), (1, '3.130')] [2023-09-24 14:57:34,687][43303] Saving new best policy, reward=3.030! [2023-09-24 14:57:34,769][43474] Saving new best policy, reward=3.130! [2023-09-24 14:57:37,328][43653] Updated weights for policy 1, policy_version 16320 (0.0016) [2023-09-24 14:57:37,328][43616] Updated weights for policy 0, policy_version 16448 (0.0016) [2023-09-24 14:57:39,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8396800. Throughput: 0: 799.0, 1: 800.0. Samples: 2091228. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:39,659][42771] Avg episode reward: [(0, '2.750'), (1, '3.150')] [2023-09-24 14:57:39,660][43474] Saving new best policy, reward=3.150! [2023-09-24 14:57:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8429568. Throughput: 0: 804.3, 1: 803.4. Samples: 2096051. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:44,659][42771] Avg episode reward: [(0, '2.700'), (1, '2.980')] [2023-09-24 14:57:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8462336. Throughput: 0: 798.8, 1: 801.0. Samples: 2105644. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:49,659][42771] Avg episode reward: [(0, '2.610'), (1, '2.750')] [2023-09-24 14:57:50,174][43653] Updated weights for policy 1, policy_version 16480 (0.0017) [2023-09-24 14:57:50,175][43616] Updated weights for policy 0, policy_version 16608 (0.0017) [2023-09-24 14:57:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8495104. Throughput: 0: 802.2, 1: 802.5. Samples: 2115472. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:54,659][42771] Avg episode reward: [(0, '2.530'), (1, '2.740')] [2023-09-24 14:57:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 8527872. Throughput: 0: 799.2, 1: 801.8. Samples: 2120047. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:57:59,659][42771] Avg episode reward: [(0, '2.440'), (1, '2.890')] [2023-09-24 14:58:02,865][43653] Updated weights for policy 1, policy_version 16640 (0.0015) [2023-09-24 14:58:02,865][43616] Updated weights for policy 0, policy_version 16768 (0.0017) [2023-09-24 14:58:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8560640. Throughput: 0: 801.2, 1: 799.9. Samples: 2129920. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:58:04,659][42771] Avg episode reward: [(0, '2.440'), (1, '2.840')] [2023-09-24 14:58:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 8593408. Throughput: 0: 798.3, 1: 799.9. Samples: 2139238. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:58:09,659][42771] Avg episode reward: [(0, '2.330'), (1, '2.920')] [2023-09-24 14:58:14,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 8622080. Throughput: 0: 797.9, 1: 797.2. Samples: 2144115. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 14:58:14,659][42771] Avg episode reward: [(0, '2.320'), (1, '2.810')] [2023-09-24 14:58:15,972][43616] Updated weights for policy 0, policy_version 16928 (0.0018) [2023-09-24 14:58:15,972][43653] Updated weights for policy 1, policy_version 16800 (0.0018) [2023-09-24 14:58:19,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8650752. Throughput: 0: 795.5, 1: 795.6. Samples: 2153339. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:58:19,659][42771] Avg episode reward: [(0, '2.450'), (1, '2.960')] [2023-09-24 14:58:24,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 8683520. Throughput: 0: 795.1, 1: 793.3. Samples: 2162707. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:58:24,659][42771] Avg episode reward: [(0, '2.420'), (1, '3.070')] [2023-09-24 14:58:28,859][43653] Updated weights for policy 1, policy_version 16960 (0.0017) [2023-09-24 14:58:28,859][43616] Updated weights for policy 0, policy_version 17088 (0.0017) [2023-09-24 14:58:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8716288. Throughput: 0: 794.5, 1: 795.0. Samples: 2167577. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 14:58:29,659][42771] Avg episode reward: [(0, '2.510'), (1, '3.160')] [2023-09-24 14:58:29,661][43474] Saving new best policy, reward=3.160! [2023-09-24 14:58:34,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 8749056. Throughput: 0: 794.3, 1: 791.9. Samples: 2177024. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:58:34,659][42771] Avg episode reward: [(0, '2.570'), (1, '3.020')] [2023-09-24 14:58:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6331.5). Total num frames: 8781824. Throughput: 0: 788.8, 1: 791.0. Samples: 2186562. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:58:39,659][42771] Avg episode reward: [(0, '2.660'), (1, '3.130')] [2023-09-24 14:58:41,970][43616] Updated weights for policy 0, policy_version 17248 (0.0017) [2023-09-24 14:58:41,971][43653] Updated weights for policy 1, policy_version 17120 (0.0017) [2023-09-24 14:58:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 8814592. Throughput: 0: 792.0, 1: 791.1. Samples: 2191286. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:58:44,659][42771] Avg episode reward: [(0, '3.030'), (1, '3.060')] [2023-09-24 14:58:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 8847360. Throughput: 0: 787.3, 1: 788.9. Samples: 2200849. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:58:49,659][42771] Avg episode reward: [(0, '2.850'), (1, '2.890')] [2023-09-24 14:58:54,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8871936. Throughput: 0: 788.6, 1: 788.1. Samples: 2210186. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:58:54,659][42771] Avg episode reward: [(0, '2.980'), (1, '3.030')] [2023-09-24 14:58:54,810][43653] Updated weights for policy 1, policy_version 17280 (0.0015) [2023-09-24 14:58:54,811][43616] Updated weights for policy 0, policy_version 17408 (0.0017) [2023-09-24 14:58:59,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8904704. Throughput: 0: 788.2, 1: 790.0. Samples: 2215132. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:58:59,659][42771] Avg episode reward: [(0, '2.900'), (1, '3.010')] [2023-09-24 14:59:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 8937472. Throughput: 0: 791.4, 1: 791.5. Samples: 2224569. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 14:59:04,660][42771] Avg episode reward: [(0, '2.940'), (1, '2.850')] [2023-09-24 14:59:07,520][43616] Updated weights for policy 0, policy_version 17568 (0.0017) [2023-09-24 14:59:07,520][43653] Updated weights for policy 1, policy_version 17440 (0.0016) [2023-09-24 14:59:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 8970240. Throughput: 0: 796.4, 1: 796.0. Samples: 2234368. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:59:09,659][42771] Avg episode reward: [(0, '2.920'), (1, '2.740')] [2023-09-24 14:59:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6331.4). Total num frames: 9003008. Throughput: 0: 794.6, 1: 793.9. Samples: 2239062. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:59:14,659][42771] Avg episode reward: [(0, '2.740'), (1, '2.780')] [2023-09-24 14:59:19,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9035776. Throughput: 0: 796.4, 1: 796.5. Samples: 2248708. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 14:59:19,659][42771] Avg episode reward: [(0, '2.810'), (1, '2.850')] [2023-09-24 14:59:19,670][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000017584_4501504.pth... [2023-09-24 14:59:19,671][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000017712_4534272.pth... [2023-09-24 14:59:19,707][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000014736_3772416.pth [2023-09-24 14:59:19,711][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000014624_3743744.pth [2023-09-24 14:59:20,350][43653] Updated weights for policy 1, policy_version 17600 (0.0018) [2023-09-24 14:59:20,350][43616] Updated weights for policy 0, policy_version 17728 (0.0018) [2023-09-24 14:59:24,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9068544. Throughput: 0: 798.3, 1: 796.6. Samples: 2258333. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:59:24,659][42771] Avg episode reward: [(0, '2.660'), (1, '2.790')] [2023-09-24 14:59:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9101312. Throughput: 0: 797.8, 1: 796.4. Samples: 2263025. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 14:59:29,659][42771] Avg episode reward: [(0, '2.700'), (1, '2.740')] [2023-09-24 14:59:33,417][43653] Updated weights for policy 1, policy_version 17760 (0.0013) [2023-09-24 14:59:33,418][43616] Updated weights for policy 0, policy_version 17888 (0.0018) [2023-09-24 14:59:34,665][42771] Fps is (10 sec: 6549.4, 60 sec: 6416.4, 300 sec: 6359.1). Total num frames: 9134080. Throughput: 0: 795.0, 1: 794.9. Samples: 2272408. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 14:59:34,666][42771] Avg episode reward: [(0, '2.680'), (1, '2.860')] [2023-09-24 14:59:39,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9158656. Throughput: 0: 795.7, 1: 795.4. Samples: 2281784. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 14:59:39,659][42771] Avg episode reward: [(0, '2.700'), (1, '2.890')] [2023-09-24 14:59:44,658][42771] Fps is (10 sec: 5738.1, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9191424. Throughput: 0: 796.7, 1: 795.6. Samples: 2286787. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 14:59:44,659][42771] Avg episode reward: [(0, '2.840'), (1, '2.930')] [2023-09-24 14:59:46,353][43616] Updated weights for policy 0, policy_version 18048 (0.0016) [2023-09-24 14:59:46,354][43653] Updated weights for policy 1, policy_version 17920 (0.0016) [2023-09-24 14:59:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9224192. Throughput: 0: 794.5, 1: 794.4. Samples: 2296068. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:59:49,660][42771] Avg episode reward: [(0, '2.790'), (1, '2.950')] [2023-09-24 14:59:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 9256960. Throughput: 0: 796.4, 1: 796.4. Samples: 2306048. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:59:54,659][42771] Avg episode reward: [(0, '2.890'), (1, '2.920')] [2023-09-24 14:59:59,138][43616] Updated weights for policy 0, policy_version 18208 (0.0016) [2023-09-24 14:59:59,138][43653] Updated weights for policy 1, policy_version 18080 (0.0016) [2023-09-24 14:59:59,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 9289728. Throughput: 0: 793.7, 1: 794.5. Samples: 2310532. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 14:59:59,660][42771] Avg episode reward: [(0, '2.970'), (1, '2.940')] [2023-09-24 15:00:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9322496. Throughput: 0: 794.6, 1: 796.4. Samples: 2320299. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:00:04,659][42771] Avg episode reward: [(0, '2.990'), (1, '3.100')] [2023-09-24 15:00:09,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9355264. Throughput: 0: 797.3, 1: 795.1. Samples: 2329992. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:00:09,659][42771] Avg episode reward: [(0, '2.820'), (1, '2.980')] [2023-09-24 15:00:11,972][43653] Updated weights for policy 1, policy_version 18240 (0.0016) [2023-09-24 15:00:11,972][43616] Updated weights for policy 0, policy_version 18368 (0.0017) [2023-09-24 15:00:14,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9388032. Throughput: 0: 795.0, 1: 796.4. Samples: 2334640. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 15:00:14,660][42771] Avg episode reward: [(0, '2.930'), (1, '3.020')] [2023-09-24 15:00:19,659][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9412608. Throughput: 0: 793.4, 1: 794.2. Samples: 2343843. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 15:00:19,659][42771] Avg episode reward: [(0, '2.760'), (1, '3.050')] [2023-09-24 15:00:24,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9445376. Throughput: 0: 795.1, 1: 795.2. Samples: 2353349. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 15:00:24,659][42771] Avg episode reward: [(0, '2.780'), (1, '3.200')] [2023-09-24 15:00:24,842][43474] Saving new best policy, reward=3.200! [2023-09-24 15:00:24,848][43616] Updated weights for policy 0, policy_version 18528 (0.0015) [2023-09-24 15:00:24,848][43653] Updated weights for policy 1, policy_version 18400 (0.0017) [2023-09-24 15:00:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9478144. Throughput: 0: 795.5, 1: 795.9. Samples: 2358398. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:00:29,659][42771] Avg episode reward: [(0, '2.780'), (1, '3.320')] [2023-09-24 15:00:29,661][43474] Saving new best policy, reward=3.320! [2023-09-24 15:00:34,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6281.2, 300 sec: 6331.4). Total num frames: 9510912. Throughput: 0: 794.8, 1: 794.1. Samples: 2367570. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:00:34,659][42771] Avg episode reward: [(0, '2.770'), (1, '3.190')] [2023-09-24 15:00:37,824][43616] Updated weights for policy 0, policy_version 18688 (0.0018) [2023-09-24 15:00:37,825][43653] Updated weights for policy 1, policy_version 18560 (0.0018) [2023-09-24 15:00:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 9543680. Throughput: 0: 792.4, 1: 794.8. Samples: 2377471. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:00:39,659][42771] Avg episode reward: [(0, '2.690'), (1, '3.110')] [2023-09-24 15:00:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9576448. Throughput: 0: 793.3, 1: 792.6. Samples: 2381897. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 15:00:44,659][42771] Avg episode reward: [(0, '2.840'), (1, '3.150')] [2023-09-24 15:00:49,659][42771] Fps is (10 sec: 6553.3, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9609216. Throughput: 0: 791.8, 1: 792.1. Samples: 2391574. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 15:00:49,660][42771] Avg episode reward: [(0, '2.750'), (1, '3.210')] [2023-09-24 15:00:50,753][43616] Updated weights for policy 0, policy_version 18848 (0.0016) [2023-09-24 15:00:50,753][43653] Updated weights for policy 1, policy_version 18720 (0.0018) [2023-09-24 15:00:54,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 9641984. Throughput: 0: 789.2, 1: 790.4. Samples: 2401077. Policy #0 lag: (min: 10.0, avg: 10.0, max: 10.0) [2023-09-24 15:00:54,659][42771] Avg episode reward: [(0, '2.930'), (1, '3.070')] [2023-09-24 15:00:59,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 9666560. Throughput: 0: 793.4, 1: 792.8. Samples: 2406019. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:00:59,659][42771] Avg episode reward: [(0, '3.120'), (1, '2.980')] [2023-09-24 15:00:59,700][43303] Saving new best policy, reward=3.120! [2023-09-24 15:01:03,596][43616] Updated weights for policy 0, policy_version 19008 (0.0018) [2023-09-24 15:01:03,596][43653] Updated weights for policy 1, policy_version 18880 (0.0015) [2023-09-24 15:01:04,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 9699328. Throughput: 0: 795.8, 1: 794.6. Samples: 2415412. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:01:04,660][42771] Avg episode reward: [(0, '2.880'), (1, '2.970')] [2023-09-24 15:01:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6345.3). Total num frames: 9732096. Throughput: 0: 795.4, 1: 793.4. Samples: 2424847. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:01:09,659][42771] Avg episode reward: [(0, '2.780'), (1, '3.060')] [2023-09-24 15:01:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 9764864. Throughput: 0: 792.9, 1: 793.0. Samples: 2429761. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:14,659][42771] Avg episode reward: [(0, '2.890'), (1, '3.160')] [2023-09-24 15:01:16,435][43653] Updated weights for policy 1, policy_version 19040 (0.0018) [2023-09-24 15:01:16,435][43616] Updated weights for policy 0, policy_version 19168 (0.0018) [2023-09-24 15:01:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9797632. Throughput: 0: 796.5, 1: 796.9. Samples: 2439271. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:19,659][42771] Avg episode reward: [(0, '2.790'), (1, '3.120')] [2023-09-24 15:01:19,671][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000019200_4915200.pth... [2023-09-24 15:01:19,672][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000019072_4882432.pth... [2023-09-24 15:01:19,708][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000016096_4120576.pth [2023-09-24 15:01:19,709][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000016224_4153344.pth [2023-09-24 15:01:24,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9830400. Throughput: 0: 793.9, 1: 793.8. Samples: 2448919. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:24,659][42771] Avg episode reward: [(0, '2.860'), (1, '3.160')] [2023-09-24 15:01:29,278][43616] Updated weights for policy 0, policy_version 19328 (0.0018) [2023-09-24 15:01:29,279][43653] Updated weights for policy 1, policy_version 19200 (0.0015) [2023-09-24 15:01:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 9863168. Throughput: 0: 796.4, 1: 796.1. Samples: 2453559. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:29,659][42771] Avg episode reward: [(0, '2.800'), (1, '3.150')] [2023-09-24 15:01:34,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 9895936. Throughput: 0: 797.6, 1: 797.8. Samples: 2463366. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:34,660][42771] Avg episode reward: [(0, '2.760'), (1, '3.160')] [2023-09-24 15:01:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 9928704. Throughput: 0: 799.1, 1: 799.4. Samples: 2473009. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:39,659][42771] Avg episode reward: [(0, '2.680'), (1, '3.100')] [2023-09-24 15:01:41,981][43653] Updated weights for policy 1, policy_version 19360 (0.0016) [2023-09-24 15:01:41,981][43616] Updated weights for policy 0, policy_version 19488 (0.0016) [2023-09-24 15:01:44,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 9961472. Throughput: 0: 799.4, 1: 800.0. Samples: 2477992. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:44,659][42771] Avg episode reward: [(0, '2.930'), (1, '2.970')] [2023-09-24 15:01:49,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 9994240. Throughput: 0: 801.5, 1: 802.2. Samples: 2487577. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:49,659][42771] Avg episode reward: [(0, '2.900'), (1, '3.030')] [2023-09-24 15:01:54,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10018816. Throughput: 0: 796.6, 1: 798.3. Samples: 2496616. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:54,659][42771] Avg episode reward: [(0, '3.070'), (1, '3.130')] [2023-09-24 15:01:54,916][43616] Updated weights for policy 0, policy_version 19648 (0.0017) [2023-09-24 15:01:54,916][43653] Updated weights for policy 1, policy_version 19520 (0.0016) [2023-09-24 15:01:59,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10051584. Throughput: 0: 799.9, 1: 800.0. Samples: 2501755. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:01:59,659][42771] Avg episode reward: [(0, '3.040'), (1, '3.320')] [2023-09-24 15:02:04,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10084352. Throughput: 0: 799.5, 1: 800.3. Samples: 2511260. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:02:04,660][42771] Avg episode reward: [(0, '2.900'), (1, '3.240')] [2023-09-24 15:02:07,477][43653] Updated weights for policy 1, policy_version 19680 (0.0018) [2023-09-24 15:02:07,477][43616] Updated weights for policy 0, policy_version 19808 (0.0018) [2023-09-24 15:02:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10117120. Throughput: 0: 803.0, 1: 801.2. Samples: 2521111. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:02:09,659][42771] Avg episode reward: [(0, '2.860'), (1, '3.410')] [2023-09-24 15:02:09,661][43474] Saving new best policy, reward=3.410! [2023-09-24 15:02:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10149888. Throughput: 0: 802.1, 1: 803.3. Samples: 2525800. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:02:14,659][42771] Avg episode reward: [(0, '2.940'), (1, '3.310')] [2023-09-24 15:02:19,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10182656. Throughput: 0: 800.4, 1: 799.1. Samples: 2535344. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:02:19,660][42771] Avg episode reward: [(0, '2.790'), (1, '3.240')] [2023-09-24 15:02:20,685][43616] Updated weights for policy 0, policy_version 19968 (0.0018) [2023-09-24 15:02:20,685][43653] Updated weights for policy 1, policy_version 19840 (0.0018) [2023-09-24 15:02:24,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 10211328. Throughput: 0: 791.2, 1: 791.1. Samples: 2544211. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:02:24,659][42771] Avg episode reward: [(0, '2.850'), (1, '3.380')] [2023-09-24 15:02:29,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10240000. Throughput: 0: 791.4, 1: 792.4. Samples: 2549263. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:02:29,659][42771] Avg episode reward: [(0, '2.740'), (1, '3.090')] [2023-09-24 15:02:33,889][43653] Updated weights for policy 1, policy_version 20000 (0.0018) [2023-09-24 15:02:33,890][43616] Updated weights for policy 0, policy_version 20128 (0.0017) [2023-09-24 15:02:34,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 10272768. Throughput: 0: 784.4, 1: 784.9. Samples: 2558197. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:02:34,659][42771] Avg episode reward: [(0, '2.650'), (1, '3.040')] [2023-09-24 15:02:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10305536. Throughput: 0: 795.7, 1: 794.2. Samples: 2568164. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:02:39,659][42771] Avg episode reward: [(0, '2.690'), (1, '3.180')] [2023-09-24 15:02:44,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10338304. Throughput: 0: 789.3, 1: 788.6. Samples: 2572762. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:02:44,659][42771] Avg episode reward: [(0, '2.640'), (1, '3.130')] [2023-09-24 15:02:46,629][43653] Updated weights for policy 1, policy_version 20160 (0.0016) [2023-09-24 15:02:46,629][43616] Updated weights for policy 0, policy_version 20288 (0.0016) [2023-09-24 15:02:49,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10371072. Throughput: 0: 791.4, 1: 790.4. Samples: 2582441. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 15:02:49,659][42771] Avg episode reward: [(0, '2.720'), (1, '3.110')] [2023-09-24 15:02:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10403840. Throughput: 0: 787.1, 1: 790.1. Samples: 2592084. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 15:02:54,659][42771] Avg episode reward: [(0, '2.770'), (1, '3.110')] [2023-09-24 15:02:59,546][43616] Updated weights for policy 0, policy_version 20448 (0.0018) [2023-09-24 15:02:59,547][43653] Updated weights for policy 1, policy_version 20320 (0.0017) [2023-09-24 15:02:59,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10436608. Throughput: 0: 790.5, 1: 788.4. Samples: 2596851. Policy #0 lag: (min: 13.0, avg: 13.0, max: 13.0) [2023-09-24 15:02:59,659][42771] Avg episode reward: [(0, '2.730'), (1, '3.010')] [2023-09-24 15:03:04,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 10461184. Throughput: 0: 783.6, 1: 784.5. Samples: 2605908. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:04,660][42771] Avg episode reward: [(0, '2.790'), (1, '3.000')] [2023-09-24 15:03:09,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6345.3). Total num frames: 10493952. Throughput: 0: 791.0, 1: 790.2. Samples: 2615367. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:09,659][42771] Avg episode reward: [(0, '2.950'), (1, '3.190')] [2023-09-24 15:03:12,631][43616] Updated weights for policy 0, policy_version 20608 (0.0017) [2023-09-24 15:03:12,632][43653] Updated weights for policy 1, policy_version 20480 (0.0016) [2023-09-24 15:03:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10526720. Throughput: 0: 787.3, 1: 786.8. Samples: 2620096. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:14,659][42771] Avg episode reward: [(0, '2.900'), (1, '3.050')] [2023-09-24 15:03:19,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10559488. Throughput: 0: 794.9, 1: 793.8. Samples: 2629690. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:19,659][42771] Avg episode reward: [(0, '2.890'), (1, '3.100')] [2023-09-24 15:03:19,671][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000020560_5263360.pth... [2023-09-24 15:03:19,671][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000020688_5296128.pth... [2023-09-24 15:03:19,706][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000017712_4534272.pth [2023-09-24 15:03:19,706][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000017584_4501504.pth [2023-09-24 15:03:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6359.2). Total num frames: 10592256. Throughput: 0: 791.6, 1: 793.5. Samples: 2639493. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:24,659][42771] Avg episode reward: [(0, '2.920'), (1, '2.990')] [2023-09-24 15:03:25,475][43653] Updated weights for policy 1, policy_version 20640 (0.0017) [2023-09-24 15:03:25,475][43616] Updated weights for policy 0, policy_version 20768 (0.0016) [2023-09-24 15:03:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 10625024. Throughput: 0: 792.3, 1: 790.1. Samples: 2643968. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:29,659][42771] Avg episode reward: [(0, '2.730'), (1, '3.150')] [2023-09-24 15:03:34,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 10657792. Throughput: 0: 787.8, 1: 787.9. Samples: 2653348. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:03:34,659][42771] Avg episode reward: [(0, '2.750'), (1, '3.150')] [2023-09-24 15:03:38,463][43616] Updated weights for policy 0, policy_version 20928 (0.0018) [2023-09-24 15:03:38,463][43653] Updated weights for policy 1, policy_version 20800 (0.0019) [2023-09-24 15:03:39,658][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 10682368. Throughput: 0: 787.2, 1: 786.3. Samples: 2662889. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:03:39,659][42771] Avg episode reward: [(0, '2.600'), (1, '3.050')] [2023-09-24 15:03:44,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 10715136. Throughput: 0: 785.0, 1: 788.0. Samples: 2667635. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:03:44,659][42771] Avg episode reward: [(0, '2.630'), (1, '2.990')] [2023-09-24 15:03:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10747904. Throughput: 0: 787.3, 1: 786.0. Samples: 2676704. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:49,659][42771] Avg episode reward: [(0, '2.600'), (1, '2.950')] [2023-09-24 15:03:51,822][43653] Updated weights for policy 1, policy_version 20960 (0.0018) [2023-09-24 15:03:51,822][43616] Updated weights for policy 0, policy_version 21088 (0.0019) [2023-09-24 15:03:54,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 10780672. Throughput: 0: 784.2, 1: 785.9. Samples: 2686022. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:54,659][42771] Avg episode reward: [(0, '2.570'), (1, '2.870')] [2023-09-24 15:03:59,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6212.3, 300 sec: 6345.3). Total num frames: 10809344. Throughput: 0: 786.1, 1: 787.3. Samples: 2690899. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:03:59,659][42771] Avg episode reward: [(0, '2.600'), (1, '2.970')] [2023-09-24 15:04:04,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 10838016. Throughput: 0: 781.5, 1: 784.3. Samples: 2700148. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:04:04,660][42771] Avg episode reward: [(0, '2.670'), (1, '2.900')] [2023-09-24 15:04:04,878][43616] Updated weights for policy 0, policy_version 21248 (0.0018) [2023-09-24 15:04:04,878][43653] Updated weights for policy 1, policy_version 21120 (0.0017) [2023-09-24 15:04:09,658][42771] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 10870784. Throughput: 0: 779.2, 1: 776.6. Samples: 2709504. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:04:09,659][42771] Avg episode reward: [(0, '2.610'), (1, '2.900')] [2023-09-24 15:04:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 10903552. Throughput: 0: 778.0, 1: 780.6. Samples: 2714107. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:04:14,659][42771] Avg episode reward: [(0, '2.610'), (1, '3.100')] [2023-09-24 15:04:17,790][43653] Updated weights for policy 1, policy_version 21280 (0.0016) [2023-09-24 15:04:17,791][43616] Updated weights for policy 0, policy_version 21408 (0.0018) [2023-09-24 15:04:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 10936320. Throughput: 0: 784.3, 1: 782.3. Samples: 2723844. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:04:19,659][42771] Avg episode reward: [(0, '2.670'), (1, '3.070')] [2023-09-24 15:04:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 10969088. Throughput: 0: 786.5, 1: 786.1. Samples: 2733656. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:04:24,659][42771] Avg episode reward: [(0, '2.800'), (1, '3.060')] [2023-09-24 15:04:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.6). Total num frames: 11001856. Throughput: 0: 785.5, 1: 782.2. Samples: 2738181. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:04:29,659][42771] Avg episode reward: [(0, '2.860'), (1, '2.920')] [2023-09-24 15:04:30,617][43653] Updated weights for policy 1, policy_version 21440 (0.0017) [2023-09-24 15:04:30,617][43616] Updated weights for policy 0, policy_version 21568 (0.0017) [2023-09-24 15:04:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 11034624. Throughput: 0: 789.6, 1: 790.1. Samples: 2747789. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:04:34,659][42771] Avg episode reward: [(0, '2.850'), (1, '2.990')] [2023-09-24 15:04:39,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11059200. Throughput: 0: 789.4, 1: 788.6. Samples: 2757029. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:04:39,659][42771] Avg episode reward: [(0, '2.890'), (1, '3.060')] [2023-09-24 15:04:43,904][43616] Updated weights for policy 0, policy_version 21728 (0.0015) [2023-09-24 15:04:43,905][43653] Updated weights for policy 1, policy_version 21600 (0.0017) [2023-09-24 15:04:44,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11091968. Throughput: 0: 789.0, 1: 786.7. Samples: 2761802. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:04:44,659][42771] Avg episode reward: [(0, '2.750'), (1, '3.180')] [2023-09-24 15:04:49,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11124736. Throughput: 0: 788.6, 1: 784.6. Samples: 2770944. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:04:49,659][42771] Avg episode reward: [(0, '2.810'), (1, '3.230')] [2023-09-24 15:04:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 11157504. Throughput: 0: 787.6, 1: 790.7. Samples: 2780526. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:04:54,659][42771] Avg episode reward: [(0, '2.740'), (1, '3.250')] [2023-09-24 15:04:56,773][43653] Updated weights for policy 1, policy_version 21760 (0.0015) [2023-09-24 15:04:56,774][43616] Updated weights for policy 0, policy_version 21888 (0.0018) [2023-09-24 15:04:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6331.4). Total num frames: 11190272. Throughput: 0: 792.1, 1: 789.6. Samples: 2785284. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:04:59,659][42771] Avg episode reward: [(0, '2.660'), (1, '3.180')] [2023-09-24 15:05:04,658][42771] Fps is (10 sec: 6143.8, 60 sec: 6348.8, 300 sec: 6317.6). Total num frames: 11218944. Throughput: 0: 786.8, 1: 788.4. Samples: 2794729. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:04,659][42771] Avg episode reward: [(0, '2.720'), (1, '3.200')] [2023-09-24 15:05:09,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11247616. Throughput: 0: 780.8, 1: 780.8. Samples: 2803931. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:09,659][42771] Avg episode reward: [(0, '2.630'), (1, '3.210')] [2023-09-24 15:05:09,935][43653] Updated weights for policy 1, policy_version 21920 (0.0018) [2023-09-24 15:05:09,935][43616] Updated weights for policy 0, policy_version 22048 (0.0018) [2023-09-24 15:05:14,658][42771] Fps is (10 sec: 6144.1, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 11280384. Throughput: 0: 782.5, 1: 784.4. Samples: 2808690. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:14,659][42771] Avg episode reward: [(0, '2.720'), (1, '3.240')] [2023-09-24 15:05:19,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11313152. Throughput: 0: 781.9, 1: 783.1. Samples: 2818214. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:19,660][42771] Avg episode reward: [(0, '2.650'), (1, '3.080')] [2023-09-24 15:05:19,670][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000022032_5640192.pth... [2023-09-24 15:05:19,671][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000022160_5672960.pth... [2023-09-24 15:05:19,708][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000019072_4882432.pth [2023-09-24 15:05:19,711][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000019200_4915200.pth [2023-09-24 15:05:22,901][43616] Updated weights for policy 0, policy_version 22208 (0.0018) [2023-09-24 15:05:22,901][43653] Updated weights for policy 1, policy_version 22080 (0.0015) [2023-09-24 15:05:24,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11345920. Throughput: 0: 787.6, 1: 786.3. Samples: 2827853. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:24,659][42771] Avg episode reward: [(0, '2.670'), (1, '3.250')] [2023-09-24 15:05:29,659][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11378688. Throughput: 0: 785.0, 1: 784.2. Samples: 2832415. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:29,660][42771] Avg episode reward: [(0, '2.810'), (1, '3.150')] [2023-09-24 15:05:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11411456. Throughput: 0: 792.2, 1: 794.6. Samples: 2842351. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:34,659][42771] Avg episode reward: [(0, '2.880'), (1, '3.030')] [2023-09-24 15:05:35,602][43653] Updated weights for policy 1, policy_version 22240 (0.0017) [2023-09-24 15:05:35,602][43616] Updated weights for policy 0, policy_version 22368 (0.0016) [2023-09-24 15:05:39,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 11444224. Throughput: 0: 791.3, 1: 790.8. Samples: 2851722. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:39,659][42771] Avg episode reward: [(0, '2.840'), (1, '3.180')] [2023-09-24 15:05:44,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6317.6). Total num frames: 11472896. Throughput: 0: 793.5, 1: 795.9. Samples: 2856804. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:44,659][42771] Avg episode reward: [(0, '2.840'), (1, '3.110')] [2023-09-24 15:05:48,736][43616] Updated weights for policy 0, policy_version 22528 (0.0017) [2023-09-24 15:05:48,736][43653] Updated weights for policy 1, policy_version 22400 (0.0017) [2023-09-24 15:05:49,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 11501568. Throughput: 0: 787.4, 1: 788.0. Samples: 2865620. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:05:49,659][42771] Avg episode reward: [(0, '2.880'), (1, '2.850')] [2023-09-24 15:05:54,658][42771] Fps is (10 sec: 6144.1, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 11534336. Throughput: 0: 792.3, 1: 791.4. Samples: 2875197. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:05:54,659][42771] Avg episode reward: [(0, '2.990'), (1, '2.730')] [2023-09-24 15:05:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 11567104. Throughput: 0: 787.7, 1: 786.6. Samples: 2879532. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:05:59,659][42771] Avg episode reward: [(0, '2.980'), (1, '2.800')] [2023-09-24 15:06:01,971][43653] Updated weights for policy 1, policy_version 22560 (0.0017) [2023-09-24 15:06:01,971][43616] Updated weights for policy 0, policy_version 22688 (0.0018) [2023-09-24 15:06:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6348.8, 300 sec: 6331.4). Total num frames: 11599872. Throughput: 0: 788.2, 1: 787.9. Samples: 2889139. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:06:04,659][42771] Avg episode reward: [(0, '2.970'), (1, '2.660')] [2023-09-24 15:06:09,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 11632640. Throughput: 0: 787.1, 1: 788.4. Samples: 2898752. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:06:09,659][42771] Avg episode reward: [(0, '2.890'), (1, '2.750')] [2023-09-24 15:06:14,659][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11657216. Throughput: 0: 790.2, 1: 792.3. Samples: 2903629. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:06:14,660][42771] Avg episode reward: [(0, '2.800'), (1, '2.690')] [2023-09-24 15:06:14,887][43616] Updated weights for policy 0, policy_version 22848 (0.0017) [2023-09-24 15:06:14,888][43653] Updated weights for policy 1, policy_version 22720 (0.0017) [2023-09-24 15:06:19,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 11689984. Throughput: 0: 782.1, 1: 782.9. Samples: 2912773. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:06:19,659][42771] Avg episode reward: [(0, '3.010'), (1, '2.740')] [2023-09-24 15:06:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11722752. Throughput: 0: 787.5, 1: 785.0. Samples: 2922487. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:06:24,659][42771] Avg episode reward: [(0, '2.830'), (1, '2.860')] [2023-09-24 15:06:27,706][43653] Updated weights for policy 1, policy_version 22880 (0.0015) [2023-09-24 15:06:27,706][43616] Updated weights for policy 0, policy_version 23008 (0.0017) [2023-09-24 15:06:29,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11755520. Throughput: 0: 781.6, 1: 781.7. Samples: 2927149. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:06:29,659][42771] Avg episode reward: [(0, '2.790'), (1, '3.020')] [2023-09-24 15:06:34,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11788288. Throughput: 0: 792.3, 1: 790.1. Samples: 2936829. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:06:34,660][42771] Avg episode reward: [(0, '3.040'), (1, '3.090')] [2023-09-24 15:06:39,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11821056. Throughput: 0: 786.0, 1: 786.6. Samples: 2945965. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:06:39,660][42771] Avg episode reward: [(0, '2.950'), (1, '3.090')] [2023-09-24 15:06:40,882][43653] Updated weights for policy 1, policy_version 23040 (0.0018) [2023-09-24 15:06:40,882][43616] Updated weights for policy 0, policy_version 23168 (0.0019) [2023-09-24 15:06:44,659][42771] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6275.9). Total num frames: 11845632. Throughput: 0: 790.5, 1: 790.8. Samples: 2950691. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:06:44,659][42771] Avg episode reward: [(0, '3.070'), (1, '3.310')] [2023-09-24 15:06:49,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11878400. Throughput: 0: 789.8, 1: 790.2. Samples: 2960238. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:06:49,659][42771] Avg episode reward: [(0, '2.990'), (1, '3.400')] [2023-09-24 15:06:53,906][43653] Updated weights for policy 1, policy_version 23200 (0.0016) [2023-09-24 15:06:53,906][43616] Updated weights for policy 0, policy_version 23328 (0.0017) [2023-09-24 15:06:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11911168. Throughput: 0: 788.3, 1: 786.0. Samples: 2969596. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:06:54,659][42771] Avg episode reward: [(0, '2.830'), (1, '3.400')] [2023-09-24 15:06:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11943936. Throughput: 0: 784.1, 1: 783.4. Samples: 2974165. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:06:59,659][42771] Avg episode reward: [(0, '3.000'), (1, '3.370')] [2023-09-24 15:07:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 11976704. Throughput: 0: 792.3, 1: 789.1. Samples: 2983936. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:07:04,659][42771] Avg episode reward: [(0, '2.980'), (1, '3.120')] [2023-09-24 15:07:06,692][43653] Updated weights for policy 1, policy_version 23360 (0.0017) [2023-09-24 15:07:06,692][43616] Updated weights for policy 0, policy_version 23488 (0.0018) [2023-09-24 15:07:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 12009472. Throughput: 0: 788.2, 1: 790.7. Samples: 2993539. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:07:09,659][42771] Avg episode reward: [(0, '3.070'), (1, '3.300')] [2023-09-24 15:07:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12042240. Throughput: 0: 791.6, 1: 789.0. Samples: 2998276. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:07:14,659][42771] Avg episode reward: [(0, '2.960'), (1, '3.030')] [2023-09-24 15:07:19,529][43653] Updated weights for policy 1, policy_version 23520 (0.0017) [2023-09-24 15:07:19,529][43616] Updated weights for policy 0, policy_version 23648 (0.0016) [2023-09-24 15:07:19,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6317.6). Total num frames: 12075008. Throughput: 0: 790.3, 1: 793.0. Samples: 3008074. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:07:19,659][42771] Avg episode reward: [(0, '3.070'), (1, '2.900')] [2023-09-24 15:07:19,672][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000023648_6053888.pth... [2023-09-24 15:07:19,672][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000023520_6021120.pth... [2023-09-24 15:07:19,702][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000020688_5296128.pth [2023-09-24 15:07:19,708][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000020560_5263360.pth [2023-09-24 15:07:24,659][42771] Fps is (10 sec: 5734.2, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12099584. Throughput: 0: 789.7, 1: 790.1. Samples: 3017059. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:07:24,660][42771] Avg episode reward: [(0, '3.110'), (1, '2.880')] [2023-09-24 15:07:29,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 12132352. Throughput: 0: 792.6, 1: 794.0. Samples: 3022091. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:07:29,659][42771] Avg episode reward: [(0, '3.140'), (1, '2.890')] [2023-09-24 15:07:29,660][43303] Saving new best policy, reward=3.140! [2023-09-24 15:07:32,545][43616] Updated weights for policy 0, policy_version 23808 (0.0017) [2023-09-24 15:07:32,545][43653] Updated weights for policy 1, policy_version 23680 (0.0017) [2023-09-24 15:07:34,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12165120. Throughput: 0: 790.4, 1: 789.5. Samples: 3031335. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:07:34,659][42771] Avg episode reward: [(0, '3.240'), (1, '2.990')] [2023-09-24 15:07:34,669][43303] Saving new best policy, reward=3.240! [2023-09-24 15:07:39,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12197888. Throughput: 0: 793.3, 1: 795.4. Samples: 3041091. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:07:39,659][42771] Avg episode reward: [(0, '3.300'), (1, '3.120')] [2023-09-24 15:07:39,660][43303] Saving new best policy, reward=3.300! [2023-09-24 15:07:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12230656. Throughput: 0: 793.3, 1: 793.5. Samples: 3045573. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:07:44,659][42771] Avg episode reward: [(0, '3.210'), (1, '2.910')] [2023-09-24 15:07:45,447][43616] Updated weights for policy 0, policy_version 23968 (0.0015) [2023-09-24 15:07:45,448][43653] Updated weights for policy 1, policy_version 23840 (0.0018) [2023-09-24 15:07:49,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12263424. Throughput: 0: 793.2, 1: 796.1. Samples: 3055456. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:07:49,659][42771] Avg episode reward: [(0, '3.200'), (1, '3.210')] [2023-09-24 15:07:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12296192. Throughput: 0: 793.9, 1: 793.8. Samples: 3064986. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:07:54,659][42771] Avg episode reward: [(0, '3.190'), (1, '3.210')] [2023-09-24 15:07:58,183][43653] Updated weights for policy 1, policy_version 24000 (0.0017) [2023-09-24 15:07:58,184][43616] Updated weights for policy 0, policy_version 24128 (0.0016) [2023-09-24 15:07:59,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 12328960. Throughput: 0: 795.7, 1: 796.4. Samples: 3069917. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:07:59,659][42771] Avg episode reward: [(0, '2.970'), (1, '3.120')] [2023-09-24 15:08:04,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 12361728. Throughput: 0: 793.3, 1: 792.4. Samples: 3079434. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:08:04,660][42771] Avg episode reward: [(0, '2.990'), (1, '3.200')] [2023-09-24 15:08:09,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6317.6). Total num frames: 12390400. Throughput: 0: 798.4, 1: 799.4. Samples: 3088961. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:08:09,659][42771] Avg episode reward: [(0, '2.820'), (1, '3.300')] [2023-09-24 15:08:10,911][43616] Updated weights for policy 0, policy_version 24288 (0.0017) [2023-09-24 15:08:10,912][43653] Updated weights for policy 1, policy_version 24160 (0.0018) [2023-09-24 15:08:14,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12419072. Throughput: 0: 800.5, 1: 799.3. Samples: 3094082. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:08:14,659][42771] Avg episode reward: [(0, '3.060'), (1, '3.370')] [2023-09-24 15:08:19,659][42771] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 12451840. Throughput: 0: 801.0, 1: 801.8. Samples: 3103460. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:08:19,659][42771] Avg episode reward: [(0, '3.040'), (1, '3.350')] [2023-09-24 15:08:23,773][43653] Updated weights for policy 1, policy_version 24320 (0.0014) [2023-09-24 15:08:23,774][43616] Updated weights for policy 0, policy_version 24448 (0.0018) [2023-09-24 15:08:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12484608. Throughput: 0: 799.6, 1: 797.6. Samples: 3112965. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:08:24,659][42771] Avg episode reward: [(0, '2.980'), (1, '3.250')] [2023-09-24 15:08:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6303.7). Total num frames: 12517376. Throughput: 0: 801.9, 1: 802.0. Samples: 3117749. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:08:29,659][42771] Avg episode reward: [(0, '3.020'), (1, '3.270')] [2023-09-24 15:08:34,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 12550144. Throughput: 0: 799.6, 1: 796.8. Samples: 3127296. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:08:34,659][42771] Avg episode reward: [(0, '3.010'), (1, '3.270')] [2023-09-24 15:08:36,741][43616] Updated weights for policy 0, policy_version 24608 (0.0018) [2023-09-24 15:08:36,741][43653] Updated weights for policy 1, policy_version 24480 (0.0017) [2023-09-24 15:08:39,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 12582912. Throughput: 0: 798.2, 1: 798.4. Samples: 3136830. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:08:39,659][42771] Avg episode reward: [(0, '3.270'), (1, '3.360')] [2023-09-24 15:08:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 12615680. Throughput: 0: 796.8, 1: 796.4. Samples: 3141615. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:08:44,659][42771] Avg episode reward: [(0, '3.170'), (1, '3.020')] [2023-09-24 15:08:49,528][43616] Updated weights for policy 0, policy_version 24768 (0.0015) [2023-09-24 15:08:49,529][43653] Updated weights for policy 1, policy_version 24640 (0.0018) [2023-09-24 15:08:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 12648448. Throughput: 0: 797.3, 1: 797.6. Samples: 3151203. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:08:49,659][42771] Avg episode reward: [(0, '2.980'), (1, '3.100')] [2023-09-24 15:08:54,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6317.6). Total num frames: 12673024. Throughput: 0: 795.1, 1: 793.6. Samples: 3160454. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:08:54,660][42771] Avg episode reward: [(0, '3.170'), (1, '3.130')] [2023-09-24 15:08:59,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 12705792. Throughput: 0: 787.2, 1: 788.5. Samples: 3164991. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:08:59,659][42771] Avg episode reward: [(0, '2.970'), (1, '3.110')] [2023-09-24 15:09:02,707][43653] Updated weights for policy 1, policy_version 24800 (0.0017) [2023-09-24 15:09:02,707][43616] Updated weights for policy 0, policy_version 24928 (0.0016) [2023-09-24 15:09:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 12738560. Throughput: 0: 789.6, 1: 788.1. Samples: 3174453. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:09:04,659][42771] Avg episode reward: [(0, '2.900'), (1, '2.950')] [2023-09-24 15:09:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6331.4). Total num frames: 12771328. Throughput: 0: 793.9, 1: 796.0. Samples: 3184514. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:09:09,659][42771] Avg episode reward: [(0, '2.860'), (1, '3.100')] [2023-09-24 15:09:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 12804096. Throughput: 0: 790.8, 1: 790.2. Samples: 3188895. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:09:14,659][42771] Avg episode reward: [(0, '2.770'), (1, '3.000')] [2023-09-24 15:09:15,639][43653] Updated weights for policy 1, policy_version 24960 (0.0018) [2023-09-24 15:09:15,640][43616] Updated weights for policy 0, policy_version 25088 (0.0018) [2023-09-24 15:09:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.5). Total num frames: 12836864. Throughput: 0: 790.9, 1: 793.2. Samples: 3198581. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:09:19,659][42771] Avg episode reward: [(0, '2.630'), (1, '2.950')] [2023-09-24 15:09:19,666][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000025008_6402048.pth... [2023-09-24 15:09:19,667][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000025136_6434816.pth... [2023-09-24 15:09:19,701][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000022160_5672960.pth [2023-09-24 15:09:19,707][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000022032_5640192.pth [2023-09-24 15:09:24,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 12869632. Throughput: 0: 792.2, 1: 792.0. Samples: 3208123. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:09:24,659][42771] Avg episode reward: [(0, '2.530'), (1, '2.950')] [2023-09-24 15:09:28,352][43616] Updated weights for policy 0, policy_version 25248 (0.0018) [2023-09-24 15:09:28,352][43653] Updated weights for policy 1, policy_version 25120 (0.0015) [2023-09-24 15:09:29,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 12902400. Throughput: 0: 792.4, 1: 795.2. Samples: 3213057. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:09:29,659][42771] Avg episode reward: [(0, '2.700'), (1, '3.030')] [2023-09-24 15:09:34,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 12926976. Throughput: 0: 788.7, 1: 789.5. Samples: 3222220. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:09:34,659][42771] Avg episode reward: [(0, '2.700'), (1, '2.950')] [2023-09-24 15:09:39,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 12959744. Throughput: 0: 793.1, 1: 791.2. Samples: 3231745. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:09:39,659][42771] Avg episode reward: [(0, '2.740'), (1, '3.080')] [2023-09-24 15:09:41,466][43616] Updated weights for policy 0, policy_version 25408 (0.0017) [2023-09-24 15:09:41,466][43653] Updated weights for policy 1, policy_version 25280 (0.0017) [2023-09-24 15:09:44,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 12992512. Throughput: 0: 792.2, 1: 792.7. Samples: 3236312. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:09:44,659][42771] Avg episode reward: [(0, '2.720'), (1, '3.200')] [2023-09-24 15:09:49,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13025280. Throughput: 0: 795.7, 1: 795.4. Samples: 3246051. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:09:49,659][42771] Avg episode reward: [(0, '2.840'), (1, '3.120')] [2023-09-24 15:09:54,466][43616] Updated weights for policy 0, policy_version 25568 (0.0018) [2023-09-24 15:09:54,466][43653] Updated weights for policy 1, policy_version 25440 (0.0017) [2023-09-24 15:09:54,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 13058048. Throughput: 0: 787.2, 1: 785.6. Samples: 3255288. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:09:54,659][42771] Avg episode reward: [(0, '2.850'), (1, '3.050')] [2023-09-24 15:09:59,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6317.6). Total num frames: 13082624. Throughput: 0: 790.0, 1: 790.5. Samples: 3260016. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:09:59,659][42771] Avg episode reward: [(0, '2.890'), (1, '3.130')] [2023-09-24 15:10:04,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13115392. Throughput: 0: 786.8, 1: 786.8. Samples: 3269391. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:04,659][42771] Avg episode reward: [(0, '2.740'), (1, '2.870')] [2023-09-24 15:10:07,373][43616] Updated weights for policy 0, policy_version 25728 (0.0018) [2023-09-24 15:10:07,373][43653] Updated weights for policy 1, policy_version 25600 (0.0018) [2023-09-24 15:10:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13148160. Throughput: 0: 787.2, 1: 787.0. Samples: 3278962. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:09,659][42771] Avg episode reward: [(0, '2.560'), (1, '2.970')] [2023-09-24 15:10:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 13180928. Throughput: 0: 788.4, 1: 787.9. Samples: 3283989. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:14,659][42771] Avg episode reward: [(0, '2.600'), (1, '2.880')] [2023-09-24 15:10:19,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13213696. Throughput: 0: 794.3, 1: 793.6. Samples: 3293676. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:10:19,660][42771] Avg episode reward: [(0, '2.620'), (1, '2.800')] [2023-09-24 15:10:20,020][43616] Updated weights for policy 0, policy_version 25888 (0.0015) [2023-09-24 15:10:20,021][43653] Updated weights for policy 1, policy_version 25760 (0.0017) [2023-09-24 15:10:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13246464. Throughput: 0: 796.4, 1: 796.4. Samples: 3303424. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:10:24,659][42771] Avg episode reward: [(0, '2.660'), (1, '2.850')] [2023-09-24 15:10:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13279232. Throughput: 0: 800.2, 1: 799.8. Samples: 3308314. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:10:29,659][42771] Avg episode reward: [(0, '2.780'), (1, '2.930')] [2023-09-24 15:10:32,937][43616] Updated weights for policy 0, policy_version 26048 (0.0011) [2023-09-24 15:10:32,937][43653] Updated weights for policy 1, policy_version 25920 (0.0017) [2023-09-24 15:10:34,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 13312000. Throughput: 0: 797.1, 1: 796.4. Samples: 3317760. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:10:34,659][42771] Avg episode reward: [(0, '2.720'), (1, '3.010')] [2023-09-24 15:10:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6345.3). Total num frames: 13344768. Throughput: 0: 800.4, 1: 801.6. Samples: 3327380. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:39,659][42771] Avg episode reward: [(0, '2.760'), (1, '2.950')] [2023-09-24 15:10:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 13377536. Throughput: 0: 802.1, 1: 799.6. Samples: 3332096. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:44,659][42771] Avg episode reward: [(0, '2.810'), (1, '2.990')] [2023-09-24 15:10:45,683][43653] Updated weights for policy 1, policy_version 26080 (0.0018) [2023-09-24 15:10:45,683][43616] Updated weights for policy 0, policy_version 26208 (0.0018) [2023-09-24 15:10:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 13410304. Throughput: 0: 803.9, 1: 803.9. Samples: 3341740. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:49,660][42771] Avg episode reward: [(0, '2.840'), (1, '2.960')] [2023-09-24 15:10:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 13443072. Throughput: 0: 802.0, 1: 802.1. Samples: 3351146. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:54,659][42771] Avg episode reward: [(0, '2.910'), (1, '2.870')] [2023-09-24 15:10:58,677][43616] Updated weights for policy 0, policy_version 26368 (0.0019) [2023-09-24 15:10:58,677][43653] Updated weights for policy 1, policy_version 26240 (0.0020) [2023-09-24 15:10:59,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 13467648. Throughput: 0: 803.0, 1: 800.6. Samples: 3356149. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:10:59,659][42771] Avg episode reward: [(0, '2.810'), (1, '3.180')] [2023-09-24 15:11:04,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 13500416. Throughput: 0: 793.6, 1: 793.8. Samples: 3365110. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:11:04,659][42771] Avg episode reward: [(0, '2.830'), (1, '3.210')] [2023-09-24 15:11:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 13533184. Throughput: 0: 796.4, 1: 796.4. Samples: 3375101. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:11:09,659][42771] Avg episode reward: [(0, '2.860'), (1, '3.290')] [2023-09-24 15:11:11,543][43653] Updated weights for policy 1, policy_version 26400 (0.0018) [2023-09-24 15:11:11,543][43616] Updated weights for policy 0, policy_version 26528 (0.0017) [2023-09-24 15:11:14,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 13565952. Throughput: 0: 791.3, 1: 791.0. Samples: 3379518. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:11:14,659][42771] Avg episode reward: [(0, '2.870'), (1, '3.060')] [2023-09-24 15:11:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 13598720. Throughput: 0: 793.4, 1: 794.9. Samples: 3389230. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:11:19,659][42771] Avg episode reward: [(0, '2.780'), (1, '3.260')] [2023-09-24 15:11:19,669][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000026496_6782976.pth... [2023-09-24 15:11:19,669][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000026624_6815744.pth... [2023-09-24 15:11:19,704][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000023648_6053888.pth [2023-09-24 15:11:19,709][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000023520_6021120.pth [2023-09-24 15:11:24,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 13627392. Throughput: 0: 788.7, 1: 788.6. Samples: 3398358. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:11:24,659][42771] Avg episode reward: [(0, '2.820'), (1, '3.390')] [2023-09-24 15:11:24,671][43653] Updated weights for policy 1, policy_version 26560 (0.0016) [2023-09-24 15:11:24,672][43616] Updated weights for policy 0, policy_version 26688 (0.0017) [2023-09-24 15:11:29,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13656064. Throughput: 0: 789.2, 1: 791.6. Samples: 3403228. Policy #0 lag: (min: 1.0, avg: 1.0, max: 1.0) [2023-09-24 15:11:29,659][42771] Avg episode reward: [(0, '2.780'), (1, '3.450')] [2023-09-24 15:11:29,790][43474] Saving new best policy, reward=3.450! [2023-09-24 15:11:34,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13688832. Throughput: 0: 787.7, 1: 787.9. Samples: 3412643. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:11:34,659][42771] Avg episode reward: [(0, '2.690'), (1, '3.280')] [2023-09-24 15:11:37,495][43616] Updated weights for policy 0, policy_version 26848 (0.0015) [2023-09-24 15:11:37,496][43653] Updated weights for policy 1, policy_version 26720 (0.0018) [2023-09-24 15:11:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 13721600. Throughput: 0: 790.7, 1: 788.5. Samples: 3422213. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:11:39,659][42771] Avg episode reward: [(0, '2.740'), (1, '3.260')] [2023-09-24 15:11:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 13754368. Throughput: 0: 786.8, 1: 787.7. Samples: 3426998. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:11:44,659][42771] Avg episode reward: [(0, '2.730'), (1, '3.320')] [2023-09-24 15:11:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 13787136. Throughput: 0: 795.0, 1: 792.4. Samples: 3436544. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:11:49,659][42771] Avg episode reward: [(0, '2.910'), (1, '3.380')] [2023-09-24 15:11:50,459][43616] Updated weights for policy 0, policy_version 27008 (0.0017) [2023-09-24 15:11:50,459][43653] Updated weights for policy 1, policy_version 26880 (0.0017) [2023-09-24 15:11:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 13819904. Throughput: 0: 788.2, 1: 791.0. Samples: 3446162. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:11:54,659][42771] Avg episode reward: [(0, '2.870'), (1, '3.350')] [2023-09-24 15:11:59,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 13852672. Throughput: 0: 794.2, 1: 791.8. Samples: 3450886. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:11:59,659][42771] Avg episode reward: [(0, '2.880'), (1, '3.220')] [2023-09-24 15:12:03,240][43616] Updated weights for policy 0, policy_version 27168 (0.0018) [2023-09-24 15:12:03,240][43653] Updated weights for policy 1, policy_version 27040 (0.0013) [2023-09-24 15:12:04,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 13885440. Throughput: 0: 791.8, 1: 792.7. Samples: 3460530. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:04,659][42771] Avg episode reward: [(0, '2.930'), (1, '3.280')] [2023-09-24 15:12:09,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 13910016. Throughput: 0: 793.0, 1: 793.0. Samples: 3469730. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:12:09,659][42771] Avg episode reward: [(0, '2.870'), (1, '3.090')] [2023-09-24 15:12:14,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.6, 300 sec: 6331.5). Total num frames: 13942784. Throughput: 0: 796.7, 1: 797.1. Samples: 3474950. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:12:14,659][42771] Avg episode reward: [(0, '2.900'), (1, '3.060')] [2023-09-24 15:12:15,962][43616] Updated weights for policy 0, policy_version 27328 (0.0018) [2023-09-24 15:12:15,962][43653] Updated weights for policy 1, policy_version 27200 (0.0018) [2023-09-24 15:12:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 13975552. Throughput: 0: 799.6, 1: 799.5. Samples: 3484601. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:12:19,659][42771] Avg episode reward: [(0, '2.860'), (1, '3.080')] [2023-09-24 15:12:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6348.8, 300 sec: 6359.2). Total num frames: 14008320. Throughput: 0: 797.8, 1: 800.0. Samples: 3494114. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:12:24,659][42771] Avg episode reward: [(0, '2.650'), (1, '3.110')] [2023-09-24 15:12:28,672][43616] Updated weights for policy 0, policy_version 27488 (0.0018) [2023-09-24 15:12:28,672][43653] Updated weights for policy 1, policy_version 27360 (0.0017) [2023-09-24 15:12:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14041088. Throughput: 0: 800.4, 1: 801.8. Samples: 3499098. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:29,659][42771] Avg episode reward: [(0, '2.720'), (1, '3.120')] [2023-09-24 15:12:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14073856. Throughput: 0: 798.5, 1: 800.5. Samples: 3508499. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:34,659][42771] Avg episode reward: [(0, '2.640'), (1, '3.030')] [2023-09-24 15:12:39,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14106624. Throughput: 0: 804.8, 1: 801.9. Samples: 3518464. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:39,659][42771] Avg episode reward: [(0, '2.600'), (1, '3.070')] [2023-09-24 15:12:41,392][43616] Updated weights for policy 0, policy_version 27648 (0.0017) [2023-09-24 15:12:41,392][43653] Updated weights for policy 1, policy_version 27520 (0.0018) [2023-09-24 15:12:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14139392. Throughput: 0: 801.5, 1: 804.5. Samples: 3523153. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:44,659][42771] Avg episode reward: [(0, '2.690'), (1, '3.070')] [2023-09-24 15:12:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 14172160. Throughput: 0: 803.5, 1: 801.8. Samples: 3532767. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:49,659][42771] Avg episode reward: [(0, '2.710'), (1, '3.160')] [2023-09-24 15:12:54,262][43653] Updated weights for policy 1, policy_version 27680 (0.0017) [2023-09-24 15:12:54,263][43616] Updated weights for policy 0, policy_version 27808 (0.0018) [2023-09-24 15:12:54,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 14204928. Throughput: 0: 807.2, 1: 807.3. Samples: 3542382. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:54,659][42771] Avg episode reward: [(0, '2.630'), (1, '3.110')] [2023-09-24 15:12:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14237696. Throughput: 0: 803.5, 1: 800.7. Samples: 3547140. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:12:59,659][42771] Avg episode reward: [(0, '2.810'), (1, '3.020')] [2023-09-24 15:13:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6373.1). Total num frames: 14270464. Throughput: 0: 800.6, 1: 799.9. Samples: 3556624. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:13:04,659][42771] Avg episode reward: [(0, '2.850'), (1, '2.930')] [2023-09-24 15:13:07,341][43616] Updated weights for policy 0, policy_version 27968 (0.0014) [2023-09-24 15:13:07,342][43653] Updated weights for policy 1, policy_version 27840 (0.0016) [2023-09-24 15:13:09,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14295040. Throughput: 0: 796.3, 1: 796.6. Samples: 3565794. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:13:09,659][42771] Avg episode reward: [(0, '2.800'), (1, '2.910')] [2023-09-24 15:13:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 14327808. Throughput: 0: 797.8, 1: 797.6. Samples: 3570887. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:13:14,659][42771] Avg episode reward: [(0, '2.950'), (1, '3.110')] [2023-09-24 15:13:19,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14360576. Throughput: 0: 798.8, 1: 798.9. Samples: 3580395. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:13:19,659][42771] Avg episode reward: [(0, '2.870'), (1, '3.070')] [2023-09-24 15:13:19,671][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000028112_7196672.pth... [2023-09-24 15:13:19,671][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000027984_7163904.pth... [2023-09-24 15:13:19,702][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000025008_6402048.pth [2023-09-24 15:13:19,709][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000025136_6434816.pth [2023-09-24 15:13:20,075][43616] Updated weights for policy 0, policy_version 28128 (0.0012) [2023-09-24 15:13:20,076][43653] Updated weights for policy 1, policy_version 28000 (0.0017) [2023-09-24 15:13:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14393344. Throughput: 0: 792.7, 1: 795.2. Samples: 3589918. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:13:24,659][42771] Avg episode reward: [(0, '2.810'), (1, '3.010')] [2023-09-24 15:13:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14426112. Throughput: 0: 793.8, 1: 793.0. Samples: 3594562. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:13:29,659][42771] Avg episode reward: [(0, '2.910'), (1, '3.130')] [2023-09-24 15:13:33,002][43616] Updated weights for policy 0, policy_version 28288 (0.0017) [2023-09-24 15:13:33,002][43653] Updated weights for policy 1, policy_version 28160 (0.0016) [2023-09-24 15:13:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 14458880. Throughput: 0: 795.3, 1: 796.4. Samples: 3604394. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:13:34,659][42771] Avg episode reward: [(0, '2.920'), (1, '3.260')] [2023-09-24 15:13:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14491648. Throughput: 0: 794.7, 1: 794.8. Samples: 3613911. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:13:39,659][42771] Avg episode reward: [(0, '2.870'), (1, '3.400')] [2023-09-24 15:13:44,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 14520320. Throughput: 0: 795.7, 1: 795.2. Samples: 3618729. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:13:44,659][42771] Avg episode reward: [(0, '3.020'), (1, '3.470')] [2023-09-24 15:13:44,661][43474] Saving new best policy, reward=3.470! [2023-09-24 15:13:46,058][43616] Updated weights for policy 0, policy_version 28448 (0.0017) [2023-09-24 15:13:46,058][43653] Updated weights for policy 1, policy_version 28320 (0.0017) [2023-09-24 15:13:49,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 14548992. Throughput: 0: 790.1, 1: 790.6. Samples: 3627754. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:13:49,659][42771] Avg episode reward: [(0, '3.100'), (1, '3.380')] [2023-09-24 15:13:54,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 14581760. Throughput: 0: 796.4, 1: 795.6. Samples: 3637432. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:13:54,659][42771] Avg episode reward: [(0, '3.030'), (1, '3.420')] [2023-09-24 15:13:58,911][43616] Updated weights for policy 0, policy_version 28608 (0.0018) [2023-09-24 15:13:58,912][43653] Updated weights for policy 1, policy_version 28480 (0.0017) [2023-09-24 15:13:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 14614528. Throughput: 0: 792.7, 1: 794.1. Samples: 3642292. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:13:59,659][42771] Avg episode reward: [(0, '3.160'), (1, '3.350')] [2023-09-24 15:14:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 14647296. Throughput: 0: 792.0, 1: 789.9. Samples: 3651584. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:14:04,659][42771] Avg episode reward: [(0, '3.120'), (1, '3.390')] [2023-09-24 15:14:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14680064. Throughput: 0: 794.5, 1: 793.2. Samples: 3661365. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:14:09,659][42771] Avg episode reward: [(0, '3.020'), (1, '3.360')] [2023-09-24 15:14:11,818][43653] Updated weights for policy 1, policy_version 28640 (0.0016) [2023-09-24 15:14:11,818][43616] Updated weights for policy 0, policy_version 28768 (0.0018) [2023-09-24 15:14:14,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14712832. Throughput: 0: 794.0, 1: 791.7. Samples: 3665921. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:14:14,659][42771] Avg episode reward: [(0, '3.250'), (1, '3.290')] [2023-09-24 15:14:19,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14745600. Throughput: 0: 791.3, 1: 790.6. Samples: 3675579. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:14:19,660][42771] Avg episode reward: [(0, '3.360'), (1, '3.160')] [2023-09-24 15:14:19,669][43303] Saving new best policy, reward=3.360! [2023-09-24 15:14:24,579][43653] Updated weights for policy 1, policy_version 28800 (0.0017) [2023-09-24 15:14:24,579][43616] Updated weights for policy 0, policy_version 28928 (0.0017) [2023-09-24 15:14:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14778368. Throughput: 0: 791.2, 1: 790.9. Samples: 3685108. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:14:24,659][42771] Avg episode reward: [(0, '3.210'), (1, '3.130')] [2023-09-24 15:14:29,659][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 14802944. Throughput: 0: 792.7, 1: 796.6. Samples: 3690245. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:14:29,660][42771] Avg episode reward: [(0, '3.260'), (1, '3.150')] [2023-09-24 15:14:34,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 14835712. Throughput: 0: 790.6, 1: 790.5. Samples: 3698901. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:14:34,662][42771] Avg episode reward: [(0, '3.280'), (1, '3.180')] [2023-09-24 15:14:37,905][43653] Updated weights for policy 1, policy_version 28960 (0.0017) [2023-09-24 15:14:37,905][43616] Updated weights for policy 0, policy_version 29088 (0.0017) [2023-09-24 15:14:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 14868480. Throughput: 0: 789.4, 1: 790.2. Samples: 3708513. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:14:39,659][42771] Avg episode reward: [(0, '3.410'), (1, '3.180')] [2023-09-24 15:14:39,660][43303] Saving new best policy, reward=3.410! [2023-09-24 15:14:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6348.8, 300 sec: 6359.2). Total num frames: 14901248. Throughput: 0: 787.9, 1: 786.5. Samples: 3713140. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:14:44,659][42771] Avg episode reward: [(0, '3.540'), (1, '3.440')] [2023-09-24 15:14:44,659][43303] Saving new best policy, reward=3.540! [2023-09-24 15:14:49,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 14934016. Throughput: 0: 795.2, 1: 796.4. Samples: 3723208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:14:49,659][42771] Avg episode reward: [(0, '3.250'), (1, '3.270')] [2023-09-24 15:14:50,654][43616] Updated weights for policy 0, policy_version 29248 (0.0017) [2023-09-24 15:14:50,654][43653] Updated weights for policy 1, policy_version 29120 (0.0015) [2023-09-24 15:14:54,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 14966784. Throughput: 0: 788.1, 1: 790.2. Samples: 3732389. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:14:54,659][42771] Avg episode reward: [(0, '3.460'), (1, '3.120')] [2023-09-24 15:14:59,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.0, 300 sec: 6387.0). Total num frames: 14999552. Throughput: 0: 792.9, 1: 795.0. Samples: 3737378. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) [2023-09-24 15:14:59,659][42771] Avg episode reward: [(0, '3.430'), (1, '3.130')] [2023-09-24 15:15:03,404][43653] Updated weights for policy 1, policy_version 29280 (0.0016) [2023-09-24 15:15:03,404][43616] Updated weights for policy 0, policy_version 29408 (0.0017) [2023-09-24 15:15:04,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6373.1). Total num frames: 15028224. Throughput: 0: 792.0, 1: 793.2. Samples: 3746913. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:15:04,659][42771] Avg episode reward: [(0, '3.460'), (1, '2.970')] [2023-09-24 15:15:09,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15056896. Throughput: 0: 792.9, 1: 793.4. Samples: 3756492. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:15:09,659][42771] Avg episode reward: [(0, '3.430'), (1, '3.160')] [2023-09-24 15:15:14,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15089664. Throughput: 0: 787.2, 1: 787.7. Samples: 3761118. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:15:14,659][42771] Avg episode reward: [(0, '3.380'), (1, '2.950')] [2023-09-24 15:15:16,468][43616] Updated weights for policy 0, policy_version 29568 (0.0017) [2023-09-24 15:15:16,468][43653] Updated weights for policy 1, policy_version 29440 (0.0018) [2023-09-24 15:15:19,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15122432. Throughput: 0: 795.2, 1: 793.0. Samples: 3770369. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:15:19,659][42771] Avg episode reward: [(0, '3.390'), (1, '2.980')] [2023-09-24 15:15:19,668][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000029600_7577600.pth... [2023-09-24 15:15:19,669][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000029472_7544832.pth... [2023-09-24 15:15:19,698][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000026624_6815744.pth [2023-09-24 15:15:19,704][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000026496_6782976.pth [2023-09-24 15:15:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 15155200. Throughput: 0: 792.1, 1: 791.9. Samples: 3779790. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:15:24,659][42771] Avg episode reward: [(0, '3.200'), (1, '2.980')] [2023-09-24 15:15:29,489][43653] Updated weights for policy 1, policy_version 29600 (0.0017) [2023-09-24 15:15:29,489][43616] Updated weights for policy 0, policy_version 29728 (0.0018) [2023-09-24 15:15:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 15187968. Throughput: 0: 795.9, 1: 794.0. Samples: 3784685. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:15:29,659][42771] Avg episode reward: [(0, '3.020'), (1, '2.990')] [2023-09-24 15:15:34,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15212544. Throughput: 0: 785.5, 1: 786.6. Samples: 3793956. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:15:34,659][42771] Avg episode reward: [(0, '2.830'), (1, '2.940')] [2023-09-24 15:15:39,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15245312. Throughput: 0: 789.2, 1: 788.3. Samples: 3803376. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:15:39,659][42771] Avg episode reward: [(0, '2.970'), (1, '3.040')] [2023-09-24 15:15:42,417][43616] Updated weights for policy 0, policy_version 29888 (0.0017) [2023-09-24 15:15:42,417][43653] Updated weights for policy 1, policy_version 29760 (0.0017) [2023-09-24 15:15:44,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15278080. Throughput: 0: 788.7, 1: 789.8. Samples: 3808407. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:15:44,659][42771] Avg episode reward: [(0, '2.920'), (1, '3.080')] [2023-09-24 15:15:49,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15310848. Throughput: 0: 788.0, 1: 787.7. Samples: 3817818. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:15:49,659][42771] Avg episode reward: [(0, '2.810'), (1, '3.140')] [2023-09-24 15:15:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15343616. Throughput: 0: 789.0, 1: 790.2. Samples: 3827557. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:15:54,659][42771] Avg episode reward: [(0, '2.930'), (1, '3.020')] [2023-09-24 15:15:55,321][43653] Updated weights for policy 1, policy_version 29920 (0.0017) [2023-09-24 15:15:55,321][43616] Updated weights for policy 0, policy_version 30048 (0.0016) [2023-09-24 15:15:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15376384. Throughput: 0: 787.6, 1: 785.6. Samples: 3831914. Policy #0 lag: (min: 7.0, avg: 7.0, max: 7.0) [2023-09-24 15:15:59,659][42771] Avg episode reward: [(0, '2.900'), (1, '2.960')] [2023-09-24 15:16:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6348.8, 300 sec: 6359.2). Total num frames: 15409152. Throughput: 0: 793.9, 1: 793.9. Samples: 3841819. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:16:04,659][42771] Avg episode reward: [(0, '2.940'), (1, '2.920')] [2023-09-24 15:16:08,412][43653] Updated weights for policy 1, policy_version 30080 (0.0019) [2023-09-24 15:16:08,412][43616] Updated weights for policy 0, policy_version 30208 (0.0018) [2023-09-24 15:16:09,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 15437824. Throughput: 0: 788.7, 1: 788.7. Samples: 3850775. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:16:09,659][42771] Avg episode reward: [(0, '2.990'), (1, '3.000')] [2023-09-24 15:16:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15466496. Throughput: 0: 788.6, 1: 792.1. Samples: 3855816. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:16:14,660][42771] Avg episode reward: [(0, '3.020'), (1, '2.860')] [2023-09-24 15:16:19,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6345.3). Total num frames: 15499264. Throughput: 0: 791.5, 1: 791.7. Samples: 3865201. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:16:19,659][42771] Avg episode reward: [(0, '3.070'), (1, '2.730')] [2023-09-24 15:16:21,419][43616] Updated weights for policy 0, policy_version 30368 (0.0015) [2023-09-24 15:16:21,419][43653] Updated weights for policy 1, policy_version 30240 (0.0018) [2023-09-24 15:16:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15532032. Throughput: 0: 793.4, 1: 792.6. Samples: 3874746. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:24,659][42771] Avg episode reward: [(0, '3.160'), (1, '2.910')] [2023-09-24 15:16:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15564800. Throughput: 0: 786.3, 1: 785.6. Samples: 3879144. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:29,659][42771] Avg episode reward: [(0, '3.020'), (1, '2.990')] [2023-09-24 15:16:34,500][43653] Updated weights for policy 1, policy_version 30400 (0.0018) [2023-09-24 15:16:34,500][43616] Updated weights for policy 0, policy_version 30528 (0.0017) [2023-09-24 15:16:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 15597568. Throughput: 0: 789.1, 1: 789.2. Samples: 3888842. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:34,659][42771] Avg episode reward: [(0, '3.180'), (1, '3.200')] [2023-09-24 15:16:39,658][42771] Fps is (10 sec: 6144.1, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 15626240. Throughput: 0: 782.9, 1: 782.5. Samples: 3898002. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:39,659][42771] Avg episode reward: [(0, '3.190'), (1, '3.180')] [2023-09-24 15:16:44,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15654912. Throughput: 0: 786.4, 1: 787.0. Samples: 3902717. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:44,659][42771] Avg episode reward: [(0, '3.230'), (1, '3.160')] [2023-09-24 15:16:47,744][43653] Updated weights for policy 1, policy_version 30560 (0.0016) [2023-09-24 15:16:47,745][43616] Updated weights for policy 0, policy_version 30688 (0.0017) [2023-09-24 15:16:49,659][42771] Fps is (10 sec: 6143.9, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15687680. Throughput: 0: 776.2, 1: 776.6. Samples: 3911695. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:49,659][42771] Avg episode reward: [(0, '3.230'), (1, '3.130')] [2023-09-24 15:16:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15720448. Throughput: 0: 785.3, 1: 785.1. Samples: 3921444. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:54,659][42771] Avg episode reward: [(0, '3.230'), (1, '3.080')] [2023-09-24 15:16:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15753216. Throughput: 0: 782.0, 1: 778.1. Samples: 3926020. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:16:59,659][42771] Avg episode reward: [(0, '3.330'), (1, '3.120')] [2023-09-24 15:17:00,569][43653] Updated weights for policy 1, policy_version 30720 (0.0016) [2023-09-24 15:17:00,570][43616] Updated weights for policy 0, policy_version 30848 (0.0018) [2023-09-24 15:17:04,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 15785984. Throughput: 0: 785.0, 1: 785.3. Samples: 3935864. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:04,659][42771] Avg episode reward: [(0, '3.250'), (1, '3.100')] [2023-09-24 15:17:09,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6212.3, 300 sec: 6331.4). Total num frames: 15810560. Throughput: 0: 778.8, 1: 779.6. Samples: 3944874. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:17:09,659][42771] Avg episode reward: [(0, '3.150'), (1, '2.920')] [2023-09-24 15:17:13,591][43653] Updated weights for policy 1, policy_version 30880 (0.0018) [2023-09-24 15:17:13,591][43616] Updated weights for policy 0, policy_version 31008 (0.0018) [2023-09-24 15:17:14,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15843328. Throughput: 0: 785.1, 1: 785.0. Samples: 3949799. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:17:14,659][42771] Avg episode reward: [(0, '3.170'), (1, '2.990')] [2023-09-24 15:17:19,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15876096. Throughput: 0: 783.1, 1: 782.8. Samples: 3959309. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:17:19,660][42771] Avg episode reward: [(0, '3.190'), (1, '2.950')] [2023-09-24 15:17:19,672][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000031072_7954432.pth... [2023-09-24 15:17:19,673][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000030944_7921664.pth... [2023-09-24 15:17:19,702][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000028112_7196672.pth [2023-09-24 15:17:19,709][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000027984_7163904.pth [2023-09-24 15:17:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15908864. Throughput: 0: 790.6, 1: 787.6. Samples: 3969025. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:17:24,659][42771] Avg episode reward: [(0, '3.100'), (1, '3.030')] [2023-09-24 15:17:26,353][43653] Updated weights for policy 1, policy_version 31040 (0.0016) [2023-09-24 15:17:26,353][43616] Updated weights for policy 0, policy_version 31168 (0.0018) [2023-09-24 15:17:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15941632. Throughput: 0: 790.4, 1: 789.9. Samples: 3973830. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:29,659][42771] Avg episode reward: [(0, '3.070'), (1, '2.960')] [2023-09-24 15:17:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 15974400. Throughput: 0: 795.0, 1: 796.1. Samples: 3983294. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:34,659][42771] Avg episode reward: [(0, '2.910'), (1, '2.840')] [2023-09-24 15:17:39,558][43653] Updated weights for policy 1, policy_version 31200 (0.0017) [2023-09-24 15:17:39,558][43616] Updated weights for policy 0, policy_version 31328 (0.0016) [2023-09-24 15:17:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6348.8, 300 sec: 6331.4). Total num frames: 16007168. Throughput: 0: 788.1, 1: 788.1. Samples: 3992374. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:39,659][42771] Avg episode reward: [(0, '3.050'), (1, '3.090')] [2023-09-24 15:17:44,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16039936. Throughput: 0: 792.4, 1: 794.5. Samples: 3997433. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:44,659][42771] Avg episode reward: [(0, '3.100'), (1, '3.050')] [2023-09-24 15:17:49,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 16064512. Throughput: 0: 788.4, 1: 787.7. Samples: 4006788. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:49,659][42771] Avg episode reward: [(0, '3.060'), (1, '3.190')] [2023-09-24 15:17:52,454][43616] Updated weights for policy 0, policy_version 31488 (0.0018) [2023-09-24 15:17:52,455][43653] Updated weights for policy 1, policy_version 31360 (0.0018) [2023-09-24 15:17:54,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16097280. Throughput: 0: 793.0, 1: 792.1. Samples: 4016203. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:54,660][42771] Avg episode reward: [(0, '3.020'), (1, '3.270')] [2023-09-24 15:17:59,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16130048. Throughput: 0: 794.0, 1: 793.5. Samples: 4021235. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:17:59,659][42771] Avg episode reward: [(0, '2.940'), (1, '3.430')] [2023-09-24 15:18:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 16162816. Throughput: 0: 792.7, 1: 793.5. Samples: 4030687. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:18:04,659][42771] Avg episode reward: [(0, '3.140'), (1, '3.580')] [2023-09-24 15:18:04,668][43474] Saving new best policy, reward=3.580! [2023-09-24 15:18:05,114][43653] Updated weights for policy 1, policy_version 31520 (0.0017) [2023-09-24 15:18:05,114][43616] Updated weights for policy 0, policy_version 31648 (0.0018) [2023-09-24 15:18:09,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6331.5). Total num frames: 16195584. Throughput: 0: 796.4, 1: 796.4. Samples: 4040701. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:18:09,659][42771] Avg episode reward: [(0, '2.930'), (1, '3.690')] [2023-09-24 15:18:09,659][43474] Saving new best policy, reward=3.690! [2023-09-24 15:18:14,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16228352. Throughput: 0: 795.0, 1: 795.2. Samples: 4045387. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:18:14,659][42771] Avg episode reward: [(0, '2.980'), (1, '3.390')] [2023-09-24 15:18:17,755][43653] Updated weights for policy 1, policy_version 31680 (0.0018) [2023-09-24 15:18:17,755][43616] Updated weights for policy 0, policy_version 31808 (0.0017) [2023-09-24 15:18:19,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16261120. Throughput: 0: 797.9, 1: 796.6. Samples: 4055046. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:18:19,659][42771] Avg episode reward: [(0, '2.940'), (1, '3.580')] [2023-09-24 15:18:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16293888. Throughput: 0: 804.0, 1: 807.2. Samples: 4064878. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:18:24,659][42771] Avg episode reward: [(0, '3.000'), (1, '3.510')] [2023-09-24 15:18:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16326656. Throughput: 0: 800.5, 1: 799.5. Samples: 4069431. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:18:29,660][42771] Avg episode reward: [(0, '3.020'), (1, '3.500')] [2023-09-24 15:18:30,480][43653] Updated weights for policy 1, policy_version 31840 (0.0018) [2023-09-24 15:18:30,480][43616] Updated weights for policy 0, policy_version 31968 (0.0018) [2023-09-24 15:18:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16359424. Throughput: 0: 805.4, 1: 804.9. Samples: 4079252. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:18:34,659][42771] Avg episode reward: [(0, '3.110'), (1, '3.380')] [2023-09-24 15:18:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6345.3). Total num frames: 16392192. Throughput: 0: 803.4, 1: 804.0. Samples: 4088534. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:18:39,659][42771] Avg episode reward: [(0, '3.130'), (1, '3.210')] [2023-09-24 15:18:43,540][43616] Updated weights for policy 0, policy_version 32128 (0.0018) [2023-09-24 15:18:43,540][43653] Updated weights for policy 1, policy_version 32000 (0.0017) [2023-09-24 15:18:44,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 16416768. Throughput: 0: 801.2, 1: 800.6. Samples: 4093318. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:18:44,659][42771] Avg episode reward: [(0, '3.170'), (1, '3.380')] [2023-09-24 15:18:49,658][42771] Fps is (10 sec: 5734.2, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 16449536. Throughput: 0: 802.1, 1: 802.2. Samples: 4102881. Policy #0 lag: (min: 12.0, avg: 12.0, max: 12.0) [2023-09-24 15:18:49,659][42771] Avg episode reward: [(0, '3.270'), (1, '3.340')] [2023-09-24 15:18:54,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16482304. Throughput: 0: 795.4, 1: 795.7. Samples: 4112297. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:18:54,659][42771] Avg episode reward: [(0, '3.350'), (1, '3.490')] [2023-09-24 15:18:56,601][43653] Updated weights for policy 1, policy_version 32160 (0.0018) [2023-09-24 15:18:56,601][43616] Updated weights for policy 0, policy_version 32288 (0.0016) [2023-09-24 15:18:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16515072. Throughput: 0: 792.2, 1: 792.3. Samples: 4116690. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:18:59,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.570')] [2023-09-24 15:19:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16547840. Throughput: 0: 796.4, 1: 796.3. Samples: 4126720. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:19:04,659][42771] Avg episode reward: [(0, '3.360'), (1, '3.500')] [2023-09-24 15:19:09,375][43653] Updated weights for policy 1, policy_version 32320 (0.0016) [2023-09-24 15:19:09,375][43616] Updated weights for policy 0, policy_version 32448 (0.0018) [2023-09-24 15:19:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 16580608. Throughput: 0: 793.4, 1: 790.4. Samples: 4136146. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:19:09,659][42771] Avg episode reward: [(0, '3.330'), (1, '3.460')] [2023-09-24 15:19:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.5). Total num frames: 16613376. Throughput: 0: 794.0, 1: 795.2. Samples: 4140949. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:19:14,659][42771] Avg episode reward: [(0, '3.440'), (1, '3.390')] [2023-09-24 15:19:19,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.6, 300 sec: 6303.7). Total num frames: 16637952. Throughput: 0: 788.4, 1: 788.9. Samples: 4150231. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:19:19,659][42771] Avg episode reward: [(0, '3.360'), (1, '3.610')] [2023-09-24 15:19:19,668][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000032432_8302592.pth... [2023-09-24 15:19:19,668][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000032560_8335360.pth... [2023-09-24 15:19:19,704][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000029600_7577600.pth [2023-09-24 15:19:19,705][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000029472_7544832.pth [2023-09-24 15:19:22,555][43653] Updated weights for policy 1, policy_version 32480 (0.0019) [2023-09-24 15:19:22,555][43616] Updated weights for policy 0, policy_version 32608 (0.0019) [2023-09-24 15:19:24,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.5). Total num frames: 16670720. Throughput: 0: 789.4, 1: 787.4. Samples: 4159489. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:19:24,659][42771] Avg episode reward: [(0, '3.500'), (1, '3.440')] [2023-09-24 15:19:29,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 16703488. Throughput: 0: 788.1, 1: 789.1. Samples: 4164289. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:19:29,660][42771] Avg episode reward: [(0, '3.440'), (1, '3.320')] [2023-09-24 15:19:34,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 16736256. Throughput: 0: 789.6, 1: 786.9. Samples: 4173824. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:19:34,659][42771] Avg episode reward: [(0, '3.450'), (1, '3.450')] [2023-09-24 15:19:35,360][43653] Updated weights for policy 1, policy_version 32640 (0.0015) [2023-09-24 15:19:35,360][43616] Updated weights for policy 0, policy_version 32768 (0.0017) [2023-09-24 15:19:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 16769024. Throughput: 0: 791.4, 1: 793.2. Samples: 4183606. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:19:39,659][42771] Avg episode reward: [(0, '3.460'), (1, '3.370')] [2023-09-24 15:19:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 16801792. Throughput: 0: 795.4, 1: 793.1. Samples: 4188172. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:19:44,659][42771] Avg episode reward: [(0, '3.350'), (1, '3.530')] [2023-09-24 15:19:48,340][43616] Updated weights for policy 0, policy_version 32928 (0.0017) [2023-09-24 15:19:48,340][43653] Updated weights for policy 1, policy_version 32800 (0.0014) [2023-09-24 15:19:49,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 16834560. Throughput: 0: 790.3, 1: 789.6. Samples: 4197818. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:19:49,660][42771] Avg episode reward: [(0, '3.390'), (1, '3.480')] [2023-09-24 15:19:54,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 16859136. Throughput: 0: 788.1, 1: 788.1. Samples: 4207076. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:19:54,659][42771] Avg episode reward: [(0, '3.300'), (1, '3.390')] [2023-09-24 15:19:59,658][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6317.6). Total num frames: 16891904. Throughput: 0: 790.6, 1: 790.6. Samples: 4212104. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:19:59,659][42771] Avg episode reward: [(0, '3.260'), (1, '3.380')] [2023-09-24 15:20:01,142][43616] Updated weights for policy 0, policy_version 33088 (0.0015) [2023-09-24 15:20:01,142][43653] Updated weights for policy 1, policy_version 32960 (0.0018) [2023-09-24 15:20:04,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 16924672. Throughput: 0: 790.6, 1: 791.4. Samples: 4221424. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:04,659][42771] Avg episode reward: [(0, '3.280'), (1, '3.680')] [2023-09-24 15:20:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 16957440. Throughput: 0: 796.4, 1: 796.4. Samples: 4231168. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:09,659][42771] Avg episode reward: [(0, '3.270'), (1, '3.480')] [2023-09-24 15:20:14,116][43653] Updated weights for policy 1, policy_version 33120 (0.0014) [2023-09-24 15:20:14,118][43616] Updated weights for policy 0, policy_version 33248 (0.0018) [2023-09-24 15:20:14,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 16990208. Throughput: 0: 793.9, 1: 793.8. Samples: 4235737. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:14,659][42771] Avg episode reward: [(0, '3.030'), (1, '3.490')] [2023-09-24 15:20:19,659][42771] Fps is (10 sec: 6553.3, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 17022976. Throughput: 0: 794.3, 1: 796.4. Samples: 4245408. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:19,660][42771] Avg episode reward: [(0, '3.170'), (1, '3.380')] [2023-09-24 15:20:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6331.4). Total num frames: 17055744. Throughput: 0: 790.7, 1: 790.3. Samples: 4254751. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:24,659][42771] Avg episode reward: [(0, '3.270'), (1, '3.510')] [2023-09-24 15:20:27,110][43616] Updated weights for policy 0, policy_version 33408 (0.0016) [2023-09-24 15:20:27,110][43653] Updated weights for policy 1, policy_version 33280 (0.0016) [2023-09-24 15:20:29,659][42771] Fps is (10 sec: 5734.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17080320. Throughput: 0: 793.6, 1: 794.6. Samples: 4259643. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:29,660][42771] Avg episode reward: [(0, '3.450'), (1, '3.400')] [2023-09-24 15:20:34,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17113088. Throughput: 0: 785.8, 1: 788.4. Samples: 4268659. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:34,659][42771] Avg episode reward: [(0, '3.180'), (1, '3.530')] [2023-09-24 15:20:39,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.5, 300 sec: 6331.5). Total num frames: 17145856. Throughput: 0: 792.2, 1: 789.9. Samples: 4278272. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:20:39,659][42771] Avg episode reward: [(0, '3.210'), (1, '3.450')] [2023-09-24 15:20:40,140][43653] Updated weights for policy 1, policy_version 33440 (0.0016) [2023-09-24 15:20:40,141][43616] Updated weights for policy 0, policy_version 33568 (0.0018) [2023-09-24 15:20:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17178624. Throughput: 0: 786.0, 1: 786.6. Samples: 4282870. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:20:44,659][42771] Avg episode reward: [(0, '3.250'), (1, '3.320')] [2023-09-24 15:20:49,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 17211392. Throughput: 0: 791.6, 1: 789.5. Samples: 4292576. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:20:49,659][42771] Avg episode reward: [(0, '3.280'), (1, '3.300')] [2023-09-24 15:20:53,319][43653] Updated weights for policy 1, policy_version 33600 (0.0017) [2023-09-24 15:20:53,320][43616] Updated weights for policy 0, policy_version 33728 (0.0014) [2023-09-24 15:20:54,659][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 17244160. Throughput: 0: 780.7, 1: 783.5. Samples: 4301559. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:20:54,659][42771] Avg episode reward: [(0, '3.310'), (1, '3.460')] [2023-09-24 15:20:59,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6303.7). Total num frames: 17268736. Throughput: 0: 786.0, 1: 784.2. Samples: 4306393. Policy #0 lag: (min: 3.0, avg: 3.0, max: 3.0) [2023-09-24 15:20:59,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.400')] [2023-09-24 15:21:04,658][42771] Fps is (10 sec: 5734.6, 60 sec: 6280.5, 300 sec: 6317.6). Total num frames: 17301504. Throughput: 0: 779.6, 1: 780.3. Samples: 4315605. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:04,659][42771] Avg episode reward: [(0, '3.120'), (1, '3.420')] [2023-09-24 15:21:06,274][43616] Updated weights for policy 0, policy_version 33888 (0.0018) [2023-09-24 15:21:06,274][43653] Updated weights for policy 1, policy_version 33760 (0.0017) [2023-09-24 15:21:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17334272. Throughput: 0: 785.7, 1: 783.9. Samples: 4325380. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:09,659][42771] Avg episode reward: [(0, '3.300'), (1, '3.290')] [2023-09-24 15:21:14,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17367040. Throughput: 0: 780.6, 1: 782.7. Samples: 4329992. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:14,659][42771] Avg episode reward: [(0, '3.350'), (1, '3.450')] [2023-09-24 15:21:19,128][43653] Updated weights for policy 1, policy_version 33920 (0.0015) [2023-09-24 15:21:19,129][43616] Updated weights for policy 0, policy_version 34048 (0.0016) [2023-09-24 15:21:19,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 17399808. Throughput: 0: 790.4, 1: 788.5. Samples: 4339712. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:19,659][42771] Avg episode reward: [(0, '3.210'), (1, '3.580')] [2023-09-24 15:21:19,667][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000034048_8716288.pth... [2023-09-24 15:21:19,667][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000033920_8683520.pth... [2023-09-24 15:21:19,695][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000031072_7954432.pth [2023-09-24 15:21:19,702][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000030944_7921664.pth [2023-09-24 15:21:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17432576. Throughput: 0: 786.7, 1: 787.6. Samples: 4349114. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:24,659][42771] Avg episode reward: [(0, '3.370'), (1, '3.320')] [2023-09-24 15:21:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 17465344. Throughput: 0: 789.9, 1: 789.2. Samples: 4353930. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:29,659][42771] Avg episode reward: [(0, '3.440'), (1, '3.360')] [2023-09-24 15:21:32,104][43653] Updated weights for policy 1, policy_version 34080 (0.0016) [2023-09-24 15:21:32,104][43616] Updated weights for policy 0, policy_version 34208 (0.0018) [2023-09-24 15:21:34,659][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6317.6). Total num frames: 17489920. Throughput: 0: 786.5, 1: 787.7. Samples: 4363414. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:34,659][42771] Avg episode reward: [(0, '3.310'), (1, '3.320')] [2023-09-24 15:21:39,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17522688. Throughput: 0: 792.3, 1: 792.0. Samples: 4372854. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:39,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.410')] [2023-09-24 15:21:44,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6331.4). Total num frames: 17555456. Throughput: 0: 793.1, 1: 794.8. Samples: 4377847. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:44,659][42771] Avg episode reward: [(0, '3.280'), (1, '3.290')] [2023-09-24 15:21:44,845][43616] Updated weights for policy 0, policy_version 34368 (0.0020) [2023-09-24 15:21:44,845][43653] Updated weights for policy 1, policy_version 34240 (0.0018) [2023-09-24 15:21:49,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17588224. Throughput: 0: 795.4, 1: 794.8. Samples: 4387164. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:49,659][42771] Avg episode reward: [(0, '3.420'), (1, '3.420')] [2023-09-24 15:21:54,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 17620992. Throughput: 0: 796.3, 1: 796.4. Samples: 4397049. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:54,659][42771] Avg episode reward: [(0, '3.460'), (1, '3.300')] [2023-09-24 15:21:57,768][43616] Updated weights for policy 0, policy_version 34528 (0.0017) [2023-09-24 15:21:57,768][43653] Updated weights for policy 1, policy_version 34400 (0.0016) [2023-09-24 15:21:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 17653760. Throughput: 0: 795.3, 1: 794.2. Samples: 4401520. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:21:59,659][42771] Avg episode reward: [(0, '3.430'), (1, '3.420')] [2023-09-24 15:22:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 17686528. Throughput: 0: 796.4, 1: 796.4. Samples: 4411392. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:04,659][42771] Avg episode reward: [(0, '3.350'), (1, '3.370')] [2023-09-24 15:22:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 17719296. Throughput: 0: 798.5, 1: 800.4. Samples: 4421062. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:09,659][42771] Avg episode reward: [(0, '3.290'), (1, '3.430')] [2023-09-24 15:22:10,558][43616] Updated weights for policy 0, policy_version 34688 (0.0018) [2023-09-24 15:22:10,559][43653] Updated weights for policy 1, policy_version 34560 (0.0017) [2023-09-24 15:22:14,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 17752064. Throughput: 0: 799.0, 1: 796.6. Samples: 4425730. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:14,659][42771] Avg episode reward: [(0, '3.420'), (1, '3.390')] [2023-09-24 15:22:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 17784832. Throughput: 0: 800.9, 1: 801.0. Samples: 4435500. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:19,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.360')] [2023-09-24 15:22:23,314][43616] Updated weights for policy 0, policy_version 34848 (0.0018) [2023-09-24 15:22:23,314][43653] Updated weights for policy 1, policy_version 34720 (0.0017) [2023-09-24 15:22:24,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 17817600. Throughput: 0: 800.8, 1: 800.5. Samples: 4444912. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:24,659][42771] Avg episode reward: [(0, '3.090'), (1, '3.310')] [2023-09-24 15:22:29,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 17850368. Throughput: 0: 801.0, 1: 801.1. Samples: 4449939. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:29,659][42771] Avg episode reward: [(0, '3.150'), (1, '3.370')] [2023-09-24 15:22:34,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 17874944. Throughput: 0: 802.1, 1: 802.1. Samples: 4459355. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:34,659][42771] Avg episode reward: [(0, '3.060'), (1, '3.220')] [2023-09-24 15:22:36,123][43653] Updated weights for policy 1, policy_version 34880 (0.0017) [2023-09-24 15:22:36,123][43616] Updated weights for policy 0, policy_version 35008 (0.0013) [2023-09-24 15:22:39,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 17907712. Throughput: 0: 796.6, 1: 797.4. Samples: 4468782. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:22:39,660][42771] Avg episode reward: [(0, '3.160'), (1, '3.350')] [2023-09-24 15:22:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 17940480. Throughput: 0: 801.5, 1: 801.8. Samples: 4473671. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:22:44,659][42771] Avg episode reward: [(0, '3.190'), (1, '3.320')] [2023-09-24 15:22:48,867][43616] Updated weights for policy 0, policy_version 35168 (0.0019) [2023-09-24 15:22:48,867][43653] Updated weights for policy 1, policy_version 35040 (0.0019) [2023-09-24 15:22:49,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 17973248. Throughput: 0: 797.4, 1: 799.6. Samples: 4483257. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:22:49,659][42771] Avg episode reward: [(0, '3.130'), (1, '3.350')] [2023-09-24 15:22:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18006016. Throughput: 0: 801.2, 1: 800.6. Samples: 4493144. Policy #0 lag: (min: 6.0, avg: 6.0, max: 6.0) [2023-09-24 15:22:54,659][42771] Avg episode reward: [(0, '3.190'), (1, '3.380')] [2023-09-24 15:22:59,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18038784. Throughput: 0: 798.3, 1: 800.9. Samples: 4497693. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:22:59,659][42771] Avg episode reward: [(0, '3.400'), (1, '3.420')] [2023-09-24 15:23:01,696][43653] Updated weights for policy 1, policy_version 35200 (0.0019) [2023-09-24 15:23:01,696][43616] Updated weights for policy 0, policy_version 35328 (0.0018) [2023-09-24 15:23:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18071552. Throughput: 0: 801.6, 1: 800.6. Samples: 4507599. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:04,659][42771] Avg episode reward: [(0, '3.240'), (1, '3.370')] [2023-09-24 15:23:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18104320. Throughput: 0: 800.1, 1: 800.0. Samples: 4516917. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:09,659][42771] Avg episode reward: [(0, '3.390'), (1, '3.320')] [2023-09-24 15:23:14,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6348.8, 300 sec: 6345.3). Total num frames: 18132992. Throughput: 0: 798.2, 1: 798.6. Samples: 4521795. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:14,659][42771] Avg episode reward: [(0, '3.380'), (1, '3.280')] [2023-09-24 15:23:14,668][43653] Updated weights for policy 1, policy_version 35360 (0.0017) [2023-09-24 15:23:14,668][43616] Updated weights for policy 0, policy_version 35488 (0.0018) [2023-09-24 15:23:19,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 18161664. Throughput: 0: 798.3, 1: 798.4. Samples: 4531206. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:19,660][42771] Avg episode reward: [(0, '3.240'), (1, '3.330')] [2023-09-24 15:23:19,738][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000035552_9101312.pth... [2023-09-24 15:23:19,768][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000032560_8335360.pth [2023-09-24 15:23:19,812][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000035424_9068544.pth... [2023-09-24 15:23:19,850][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000032432_8302592.pth [2023-09-24 15:23:24,658][42771] Fps is (10 sec: 6144.0, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 18194432. Throughput: 0: 797.1, 1: 798.6. Samples: 4540587. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:23:24,659][42771] Avg episode reward: [(0, '3.310'), (1, '3.390')] [2023-09-24 15:23:27,464][43616] Updated weights for policy 0, policy_version 35648 (0.0017) [2023-09-24 15:23:27,465][43653] Updated weights for policy 1, policy_version 35520 (0.0015) [2023-09-24 15:23:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.5, 300 sec: 6331.4). Total num frames: 18227200. Throughput: 0: 797.9, 1: 798.4. Samples: 4545504. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:23:29,659][42771] Avg episode reward: [(0, '3.380'), (1, '3.270')] [2023-09-24 15:23:34,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6331.4). Total num frames: 18259968. Throughput: 0: 797.4, 1: 797.1. Samples: 4555007. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:23:34,659][42771] Avg episode reward: [(0, '3.260'), (1, '3.440')] [2023-09-24 15:23:39,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18292736. Throughput: 0: 796.9, 1: 797.2. Samples: 4564876. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:23:39,659][42771] Avg episode reward: [(0, '3.350'), (1, '3.370')] [2023-09-24 15:23:40,302][43616] Updated weights for policy 0, policy_version 35808 (0.0012) [2023-09-24 15:23:40,302][43653] Updated weights for policy 1, policy_version 35680 (0.0017) [2023-09-24 15:23:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18325504. Throughput: 0: 794.7, 1: 794.0. Samples: 4569184. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:44,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.380')] [2023-09-24 15:23:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 18358272. Throughput: 0: 793.7, 1: 794.5. Samples: 4579067. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:49,659][42771] Avg episode reward: [(0, '3.120'), (1, '3.490')] [2023-09-24 15:23:53,139][43616] Updated weights for policy 0, policy_version 35968 (0.0016) [2023-09-24 15:23:53,140][43653] Updated weights for policy 1, policy_version 35840 (0.0017) [2023-09-24 15:23:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18391040. Throughput: 0: 797.2, 1: 797.6. Samples: 4588686. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:54,659][42771] Avg episode reward: [(0, '3.200'), (1, '3.350')] [2023-09-24 15:23:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18423808. Throughput: 0: 799.6, 1: 797.2. Samples: 4593649. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:23:59,659][42771] Avg episode reward: [(0, '3.350'), (1, '3.330')] [2023-09-24 15:24:04,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18456576. Throughput: 0: 799.2, 1: 798.9. Samples: 4603119. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:24:04,659][42771] Avg episode reward: [(0, '3.200'), (1, '3.320')] [2023-09-24 15:24:05,910][43616] Updated weights for policy 0, policy_version 36128 (0.0017) [2023-09-24 15:24:05,910][43653] Updated weights for policy 1, policy_version 36000 (0.0017) [2023-09-24 15:24:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18489344. Throughput: 0: 800.8, 1: 800.8. Samples: 4612661. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:24:09,659][42771] Avg episode reward: [(0, '3.240'), (1, '3.270')] [2023-09-24 15:24:14,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6348.8, 300 sec: 6359.2). Total num frames: 18513920. Throughput: 0: 802.5, 1: 801.6. Samples: 4617690. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:24:14,660][42771] Avg episode reward: [(0, '3.340'), (1, '3.120')] [2023-09-24 15:24:18,580][43653] Updated weights for policy 1, policy_version 36160 (0.0017) [2023-09-24 15:24:18,580][43616] Updated weights for policy 0, policy_version 36288 (0.0016) [2023-09-24 15:24:19,659][42771] Fps is (10 sec: 5734.3, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18546688. Throughput: 0: 802.0, 1: 802.8. Samples: 4627226. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:24:19,659][42771] Avg episode reward: [(0, '3.290'), (1, '3.110')] [2023-09-24 15:24:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18579456. Throughput: 0: 799.1, 1: 798.3. Samples: 4636757. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:24:24,659][42771] Avg episode reward: [(0, '3.490'), (1, '2.960')] [2023-09-24 15:24:29,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18612224. Throughput: 0: 799.9, 1: 800.8. Samples: 4641218. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2023-09-24 15:24:29,659][42771] Avg episode reward: [(0, '3.630'), (1, '2.950')] [2023-09-24 15:24:29,659][43303] Saving new best policy, reward=3.630! [2023-09-24 15:24:31,511][43616] Updated weights for policy 0, policy_version 36448 (0.0015) [2023-09-24 15:24:31,512][43653] Updated weights for policy 1, policy_version 36320 (0.0016) [2023-09-24 15:24:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6359.2). Total num frames: 18644992. Throughput: 0: 800.3, 1: 798.4. Samples: 4651008. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:24:34,659][42771] Avg episode reward: [(0, '3.520'), (1, '2.950')] [2023-09-24 15:24:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18677760. Throughput: 0: 800.7, 1: 799.9. Samples: 4660714. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:24:39,659][42771] Avg episode reward: [(0, '3.660'), (1, '3.020')] [2023-09-24 15:24:39,659][43303] Saving new best policy, reward=3.660! [2023-09-24 15:24:44,373][43616] Updated weights for policy 0, policy_version 36608 (0.0016) [2023-09-24 15:24:44,374][43653] Updated weights for policy 1, policy_version 36480 (0.0017) [2023-09-24 15:24:44,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18710528. Throughput: 0: 796.8, 1: 796.5. Samples: 4665346. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:24:44,659][42771] Avg episode reward: [(0, '3.510'), (1, '3.080')] [2023-09-24 15:24:49,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 18743296. Throughput: 0: 799.0, 1: 799.1. Samples: 4675034. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:24:49,659][42771] Avg episode reward: [(0, '3.660'), (1, '3.150')] [2023-09-24 15:24:54,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 18767872. Throughput: 0: 796.3, 1: 796.2. Samples: 4684323. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:24:54,659][42771] Avg episode reward: [(0, '3.720'), (1, '3.360')] [2023-09-24 15:24:54,668][43303] Saving new best policy, reward=3.720! [2023-09-24 15:24:57,211][43616] Updated weights for policy 0, policy_version 36768 (0.0017) [2023-09-24 15:24:57,211][43653] Updated weights for policy 1, policy_version 36640 (0.0017) [2023-09-24 15:24:59,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 18800640. Throughput: 0: 796.3, 1: 796.8. Samples: 4689382. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:24:59,659][42771] Avg episode reward: [(0, '3.640'), (1, '3.440')] [2023-09-24 15:25:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 18833408. Throughput: 0: 790.3, 1: 789.8. Samples: 4698332. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:25:04,659][42771] Avg episode reward: [(0, '3.530'), (1, '3.360')] [2023-09-24 15:25:09,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 18866176. Throughput: 0: 792.7, 1: 793.6. Samples: 4708140. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:25:09,659][42771] Avg episode reward: [(0, '3.460'), (1, '3.350')] [2023-09-24 15:25:10,424][43616] Updated weights for policy 0, policy_version 36928 (0.0014) [2023-09-24 15:25:10,424][43653] Updated weights for policy 1, policy_version 36800 (0.0016) [2023-09-24 15:25:14,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18898944. Throughput: 0: 792.9, 1: 790.5. Samples: 4712471. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:25:14,659][42771] Avg episode reward: [(0, '3.490'), (1, '3.290')] [2023-09-24 15:25:19,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 18931712. Throughput: 0: 788.4, 1: 791.3. Samples: 4722093. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:25:19,659][42771] Avg episode reward: [(0, '3.500'), (1, '3.250')] [2023-09-24 15:25:19,669][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000036912_9449472.pth... [2023-09-24 15:25:19,669][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000037040_9482240.pth... [2023-09-24 15:25:19,703][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000034048_8716288.pth [2023-09-24 15:25:19,704][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000033920_8683520.pth [2023-09-24 15:25:23,556][43653] Updated weights for policy 1, policy_version 36960 (0.0017) [2023-09-24 15:25:23,557][43616] Updated weights for policy 0, policy_version 37088 (0.0018) [2023-09-24 15:25:24,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 18956288. Throughput: 0: 782.7, 1: 782.9. Samples: 4731168. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:25:24,659][42771] Avg episode reward: [(0, '3.430'), (1, '3.170')] [2023-09-24 15:25:29,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 18989056. Throughput: 0: 784.6, 1: 787.2. Samples: 4736075. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:25:29,659][42771] Avg episode reward: [(0, '3.420'), (1, '3.160')] [2023-09-24 15:25:34,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19021824. Throughput: 0: 781.0, 1: 781.0. Samples: 4745320. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:25:34,659][42771] Avg episode reward: [(0, '3.330'), (1, '3.110')] [2023-09-24 15:25:36,462][43653] Updated weights for policy 1, policy_version 37120 (0.0019) [2023-09-24 15:25:36,462][43616] Updated weights for policy 0, policy_version 37248 (0.0018) [2023-09-24 15:25:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19054592. Throughput: 0: 789.5, 1: 788.9. Samples: 4755352. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:25:39,659][42771] Avg episode reward: [(0, '3.360'), (1, '3.290')] [2023-09-24 15:25:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19087360. Throughput: 0: 781.7, 1: 781.2. Samples: 4759712. Policy #0 lag: (min: 14.0, avg: 14.0, max: 14.0) [2023-09-24 15:25:44,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.360')] [2023-09-24 15:25:49,204][43616] Updated weights for policy 0, policy_version 37408 (0.0019) [2023-09-24 15:25:49,205][43653] Updated weights for policy 1, policy_version 37280 (0.0018) [2023-09-24 15:25:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19120128. Throughput: 0: 793.9, 1: 792.9. Samples: 4769739. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 15:25:49,659][42771] Avg episode reward: [(0, '3.380'), (1, '3.470')] [2023-09-24 15:25:54,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 19152896. Throughput: 0: 791.3, 1: 790.7. Samples: 4779327. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 15:25:54,659][42771] Avg episode reward: [(0, '3.360'), (1, '3.380')] [2023-09-24 15:25:59,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6387.0). Total num frames: 19185664. Throughput: 0: 796.4, 1: 796.0. Samples: 4784132. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 15:25:59,659][42771] Avg episode reward: [(0, '3.270'), (1, '3.330')] [2023-09-24 15:26:01,917][43653] Updated weights for policy 1, policy_version 37440 (0.0019) [2023-09-24 15:26:01,917][43616] Updated weights for policy 0, policy_version 37568 (0.0018) [2023-09-24 15:26:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 19218432. Throughput: 0: 796.3, 1: 797.7. Samples: 4793822. Policy #0 lag: (min: 5.0, avg: 5.0, max: 5.0) [2023-09-24 15:26:04,659][42771] Avg episode reward: [(0, '3.310'), (1, '3.320')] [2023-09-24 15:26:09,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.0, 300 sec: 6387.0). Total num frames: 19251200. Throughput: 0: 802.2, 1: 802.2. Samples: 4803366. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:26:09,659][42771] Avg episode reward: [(0, '3.410'), (1, '3.290')] [2023-09-24 15:26:14,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19275776. Throughput: 0: 802.4, 1: 802.2. Samples: 4808281. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:26:14,659][42771] Avg episode reward: [(0, '3.320'), (1, '3.190')] [2023-09-24 15:26:14,712][43616] Updated weights for policy 0, policy_version 37728 (0.0018) [2023-09-24 15:26:14,712][43653] Updated weights for policy 1, policy_version 37600 (0.0018) [2023-09-24 15:26:19,659][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19308544. Throughput: 0: 805.4, 1: 805.6. Samples: 4817815. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:26:19,659][42771] Avg episode reward: [(0, '3.440'), (1, '3.210')] [2023-09-24 15:26:24,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19341312. Throughput: 0: 798.8, 1: 799.0. Samples: 4827253. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:26:24,659][42771] Avg episode reward: [(0, '3.470'), (1, '3.290')] [2023-09-24 15:26:27,557][43653] Updated weights for policy 1, policy_version 37760 (0.0018) [2023-09-24 15:26:27,557][43616] Updated weights for policy 0, policy_version 37888 (0.0016) [2023-09-24 15:26:29,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 19374080. Throughput: 0: 804.9, 1: 805.3. Samples: 4832169. Policy #0 lag: (min: 8.0, avg: 8.0, max: 8.0) [2023-09-24 15:26:29,659][42771] Avg episode reward: [(0, '3.650'), (1, '3.260')] [2023-09-24 15:26:34,658][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 19406848. Throughput: 0: 797.6, 1: 796.5. Samples: 4841475. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:26:34,659][42771] Avg episode reward: [(0, '3.570'), (1, '3.330')] [2023-09-24 15:26:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6387.0). Total num frames: 19439616. Throughput: 0: 794.9, 1: 795.8. Samples: 4850909. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:26:39,659][42771] Avg episode reward: [(0, '3.580'), (1, '3.370')] [2023-09-24 15:26:40,789][43653] Updated weights for policy 1, policy_version 37920 (0.0018) [2023-09-24 15:26:40,789][43616] Updated weights for policy 0, policy_version 38048 (0.0017) [2023-09-24 15:26:44,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19464192. Throughput: 0: 791.8, 1: 794.3. Samples: 4855507. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:26:44,659][42771] Avg episode reward: [(0, '3.560'), (1, '3.400')] [2023-09-24 15:26:49,659][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19496960. Throughput: 0: 790.6, 1: 788.9. Samples: 4864900. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:26:49,660][42771] Avg episode reward: [(0, '3.680'), (1, '3.300')] [2023-09-24 15:26:53,865][43653] Updated weights for policy 1, policy_version 38080 (0.0014) [2023-09-24 15:26:53,865][43616] Updated weights for policy 0, policy_version 38208 (0.0015) [2023-09-24 15:26:54,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19529728. Throughput: 0: 788.5, 1: 786.5. Samples: 4874240. Policy #0 lag: (min: 9.0, avg: 9.0, max: 9.0) [2023-09-24 15:26:54,659][42771] Avg episode reward: [(0, '3.700'), (1, '3.090')] [2023-09-24 15:26:59,659][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19562496. Throughput: 0: 783.3, 1: 783.3. Samples: 4878779. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:26:59,659][42771] Avg episode reward: [(0, '3.770'), (1, '3.350')] [2023-09-24 15:26:59,661][43303] Saving new best policy, reward=3.770! [2023-09-24 15:27:04,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19595264. Throughput: 0: 787.2, 1: 785.1. Samples: 4888571. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:27:04,659][42771] Avg episode reward: [(0, '3.570'), (1, '3.170')] [2023-09-24 15:27:06,677][43616] Updated weights for policy 0, policy_version 38368 (0.0015) [2023-09-24 15:27:06,678][43653] Updated weights for policy 1, policy_version 38240 (0.0017) [2023-09-24 15:27:09,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6280.6, 300 sec: 6359.2). Total num frames: 19628032. Throughput: 0: 789.0, 1: 789.3. Samples: 4898278. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:27:09,659][42771] Avg episode reward: [(0, '3.780'), (1, '3.300')] [2023-09-24 15:27:09,660][43303] Saving new best policy, reward=3.780! [2023-09-24 15:27:14,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19660800. Throughput: 0: 787.3, 1: 784.8. Samples: 4902913. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:27:14,659][42771] Avg episode reward: [(0, '3.660'), (1, '3.150')] [2023-09-24 15:27:19,377][43653] Updated weights for policy 1, policy_version 38400 (0.0018) [2023-09-24 15:27:19,377][43616] Updated weights for policy 0, policy_version 38528 (0.0016) [2023-09-24 15:27:19,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19693568. Throughput: 0: 792.2, 1: 793.2. Samples: 4912821. Policy #0 lag: (min: 11.0, avg: 11.0, max: 11.0) [2023-09-24 15:27:19,659][42771] Avg episode reward: [(0, '3.710'), (1, '3.140')] [2023-09-24 15:27:19,668][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000038400_9830400.pth... [2023-09-24 15:27:19,669][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000038528_9863168.pth... [2023-09-24 15:27:19,702][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000035424_9068544.pth [2023-09-24 15:27:19,712][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000035552_9101312.pth [2023-09-24 15:27:24,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19726336. Throughput: 0: 793.2, 1: 793.8. Samples: 4922322. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:27:24,659][42771] Avg episode reward: [(0, '3.530'), (1, '3.280')] [2023-09-24 15:27:29,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19750912. Throughput: 0: 797.9, 1: 797.6. Samples: 4927305. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:27:29,659][42771] Avg episode reward: [(0, '3.540'), (1, '3.440')] [2023-09-24 15:27:32,321][43653] Updated weights for policy 1, policy_version 38560 (0.0015) [2023-09-24 15:27:32,323][43616] Updated weights for policy 0, policy_version 38688 (0.0018) [2023-09-24 15:27:34,658][42771] Fps is (10 sec: 5734.3, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19783680. Throughput: 0: 794.8, 1: 794.9. Samples: 4936437. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:27:34,659][42771] Avg episode reward: [(0, '3.450'), (1, '3.320')] [2023-09-24 15:27:39,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 19816448. Throughput: 0: 796.6, 1: 798.2. Samples: 4946002. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:27:39,659][42771] Avg episode reward: [(0, '3.510'), (1, '3.370')] [2023-09-24 15:27:44,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19849216. Throughput: 0: 801.1, 1: 801.0. Samples: 4950875. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:27:44,659][42771] Avg episode reward: [(0, '3.510'), (1, '3.370')] [2023-09-24 15:27:45,167][43653] Updated weights for policy 1, policy_version 38720 (0.0017) [2023-09-24 15:27:45,167][43616] Updated weights for policy 0, policy_version 38848 (0.0016) [2023-09-24 15:27:49,658][42771] Fps is (10 sec: 6553.6, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19881984. Throughput: 0: 796.6, 1: 798.2. Samples: 4960339. Policy #0 lag: (min: 4.0, avg: 4.0, max: 4.0) [2023-09-24 15:27:49,659][42771] Avg episode reward: [(0, '3.460'), (1, '3.180')] [2023-09-24 15:27:54,658][42771] Fps is (10 sec: 6553.8, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19914752. Throughput: 0: 799.6, 1: 800.9. Samples: 4970302. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:27:54,659][42771] Avg episode reward: [(0, '3.620'), (1, '3.180')] [2023-09-24 15:27:57,827][43653] Updated weights for policy 1, policy_version 38880 (0.0017) [2023-09-24 15:27:57,827][43616] Updated weights for policy 0, policy_version 39008 (0.0018) [2023-09-24 15:27:59,658][42771] Fps is (10 sec: 6553.7, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19947520. Throughput: 0: 798.1, 1: 800.4. Samples: 4974843. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:27:59,659][42771] Avg episode reward: [(0, '3.620'), (1, '3.160')] [2023-09-24 15:28:04,659][42771] Fps is (10 sec: 6553.4, 60 sec: 6417.1, 300 sec: 6359.2). Total num frames: 19980288. Throughput: 0: 799.6, 1: 799.6. Samples: 4984785. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:28:04,659][42771] Avg episode reward: [(0, '3.500'), (1, '3.160')] [2023-09-24 15:28:09,658][42771] Fps is (10 sec: 6553.5, 60 sec: 6417.0, 300 sec: 6373.1). Total num frames: 20013056. Throughput: 0: 796.7, 1: 795.8. Samples: 4993985. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:28:09,659][42771] Avg episode reward: [(0, '3.300'), (1, '3.220')] [2023-09-24 15:28:10,825][43616] Updated weights for policy 0, policy_version 39168 (0.0016) [2023-09-24 15:28:10,825][43653] Updated weights for policy 1, policy_version 39040 (0.0015) [2023-09-24 15:28:14,658][42771] Fps is (10 sec: 5734.4, 60 sec: 6280.5, 300 sec: 6359.2). Total num frames: 20037632. Throughput: 0: 793.7, 1: 795.4. Samples: 4998812. Policy #0 lag: (min: 15.0, avg: 15.0, max: 15.0) [2023-09-24 15:28:14,659][42771] Avg episode reward: [(0, '3.230'), (1, '3.240')] [2023-09-24 15:28:14,691][43667] Stopping RolloutWorker_w2... [2023-09-24 15:28:14,691][43659] Stopping RolloutWorker_w1... [2023-09-24 15:28:14,691][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000039216_10039296.pth... [2023-09-24 15:28:14,691][43680] Stopping RolloutWorker_w6... [2023-09-24 15:28:14,691][43679] Stopping RolloutWorker_w5... [2023-09-24 15:28:14,691][43667] Loop rollout_proc2_evt_loop terminating... [2023-09-24 15:28:14,691][43669] Stopping RolloutWorker_w3... [2023-09-24 15:28:14,691][43681] Stopping RolloutWorker_w7... [2023-09-24 15:28:14,691][43671] Stopping RolloutWorker_w4... [2023-09-24 15:28:14,691][43665] Stopping RolloutWorker_w0... [2023-09-24 15:28:14,691][42771] Component RolloutWorker_w1 stopped! [2023-09-24 15:28:14,691][43474] Stopping Batcher_1... [2023-09-24 15:28:14,692][43680] Loop rollout_proc6_evt_loop terminating... [2023-09-24 15:28:14,692][43679] Loop rollout_proc5_evt_loop terminating... [2023-09-24 15:28:14,692][43659] Loop rollout_proc1_evt_loop terminating... [2023-09-24 15:28:14,692][43669] Loop rollout_proc3_evt_loop terminating... [2023-09-24 15:28:14,692][43681] Loop rollout_proc7_evt_loop terminating... [2023-09-24 15:28:14,692][43671] Loop rollout_proc4_evt_loop terminating... [2023-09-24 15:28:14,692][43665] Loop rollout_proc0_evt_loop terminating... [2023-09-24 15:28:14,692][42771] Component RolloutWorker_w6 stopped! [2023-09-24 15:28:14,692][43474] Loop batcher_evt_loop terminating... [2023-09-24 15:28:14,693][42771] Component RolloutWorker_w5 stopped! [2023-09-24 15:28:14,694][42771] Component RolloutWorker_w2 stopped! [2023-09-24 15:28:14,695][42771] Component Batcher_0 stopped! [2023-09-24 15:28:14,695][42771] Component RolloutWorker_w3 stopped! [2023-09-24 15:28:14,696][42771] Component RolloutWorker_w4 stopped! [2023-09-24 15:28:14,696][42771] Component RolloutWorker_w7 stopped! [2023-09-24 15:28:14,691][43303] Stopping Batcher_0... [2023-09-24 15:28:14,697][42771] Component Batcher_1 stopped! [2023-09-24 15:28:14,697][42771] Component RolloutWorker_w0 stopped! [2023-09-24 15:28:14,710][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-09-24 15:28:14,709][43303] Loop batcher_evt_loop terminating... [2023-09-24 15:28:14,735][43303] Removing ./train_atari/Battlezone/checkpoint_p0/checkpoint_000037040_9482240.pth [2023-09-24 15:28:14,740][43474] Removing ./train_atari/Battlezone/checkpoint_p1/checkpoint_000036912_9449472.pth [2023-09-24 15:28:14,740][43303] Saving ./train_atari/Battlezone/checkpoint_p0/checkpoint_000039216_10039296.pth... [2023-09-24 15:28:14,744][43474] Saving ./train_atari/Battlezone/checkpoint_p1/checkpoint_000039088_10006528.pth... [2023-09-24 15:28:14,749][43653] Weights refcount: 2 0 [2023-09-24 15:28:14,750][43653] Stopping InferenceWorker_p1-w0... [2023-09-24 15:28:14,750][43653] Loop inference_proc1-0_evt_loop terminating... [2023-09-24 15:28:14,750][42771] Component InferenceWorker_p1-w0 stopped! [2023-09-24 15:28:14,756][43616] Weights refcount: 2 0 [2023-09-24 15:28:14,758][43616] Stopping InferenceWorker_p0-w0... [2023-09-24 15:28:14,758][43616] Loop inference_proc0-0_evt_loop terminating... [2023-09-24 15:28:14,758][42771] Component InferenceWorker_p0-w0 stopped! [2023-09-24 15:28:14,780][43474] Stopping LearnerWorker_p1... [2023-09-24 15:28:14,780][43474] Loop learner_proc1_evt_loop terminating... [2023-09-24 15:28:14,782][42771] Component LearnerWorker_p1 stopped! [2023-09-24 15:28:14,798][43303] Stopping LearnerWorker_p0... [2023-09-24 15:28:14,798][43303] Loop learner_proc0_evt_loop terminating... [2023-09-24 15:28:14,798][42771] Component LearnerWorker_p0 stopped! [2023-09-24 15:28:14,799][42771] Waiting for process learner_proc0 to stop... [2023-09-24 15:28:15,470][42771] Waiting for process learner_proc1 to stop... [2023-09-24 15:28:15,471][42771] Waiting for process inference_proc0-0 to join... [2023-09-24 15:28:15,471][42771] Waiting for process inference_proc1-0 to join... [2023-09-24 15:28:15,472][42771] Waiting for process rollout_proc0 to join... [2023-09-24 15:28:15,472][42771] Waiting for process rollout_proc1 to join... [2023-09-24 15:28:15,472][42771] Waiting for process rollout_proc2 to join... [2023-09-24 15:28:15,473][42771] Waiting for process rollout_proc3 to join... [2023-09-24 15:28:15,473][42771] Waiting for process rollout_proc4 to join... [2023-09-24 15:28:15,474][42771] Waiting for process rollout_proc5 to join... [2023-09-24 15:28:15,474][42771] Waiting for process rollout_proc6 to join... [2023-09-24 15:28:15,475][42771] Waiting for process rollout_proc7 to join... [2023-09-24 15:28:15,475][42771] Batcher 0 profile tree view: batching: 21.1969, releasing_batches: 1.7765 [2023-09-24 15:28:15,476][42771] Batcher 1 profile tree view: batching: 21.2830, releasing_batches: 1.7946 [2023-09-24 15:28:15,476][42771] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 658.5055 update_model: 37.7973 weight_update: 0.0014 one_step: 0.0012 handle_policy_step: 2265.4355 deserialize: 67.3988, stack: 16.1799, obs_to_device_normalize: 552.1443, forward: 1090.8714, send_messages: 93.2265 prepare_outputs: 301.5997 to_cpu: 151.2445 [2023-09-24 15:28:15,476][42771] InferenceWorker_p1-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 659.4886 update_model: 37.0557 weight_update: 0.0015 one_step: 0.0011 handle_policy_step: 2264.4987 deserialize: 67.5281, stack: 16.1612, obs_to_device_normalize: 550.6961, forward: 1085.8969, send_messages: 93.2708 prepare_outputs: 304.7813 to_cpu: 153.6502 [2023-09-24 15:28:15,477][42771] Learner 0 profile tree view: misc: 0.0157, prepare_batch: 32.0258 train: 467.4151 epoch_init: 0.1015, minibatch_init: 3.0469, losses_postprocess: 62.8542, kl_divergence: 5.3780, after_optimizer: 11.5488 calculate_losses: 44.3722 losses_init: 0.0998, forward_head: 14.0681, bptt_initial: 0.4353, bptt: 0.4863, tail: 10.2143, advantages_returns: 3.0315, losses: 12.5156 update: 336.1154 clip: 164.6418 [2023-09-24 15:28:15,477][42771] Learner 1 profile tree view: misc: 0.0169, prepare_batch: 32.0249 train: 456.5028 epoch_init: 0.1015, minibatch_init: 3.1270, losses_postprocess: 62.2741, kl_divergence: 5.4612, after_optimizer: 21.8503 calculate_losses: 44.1883 losses_init: 0.0985, forward_head: 13.4195, bptt_initial: 0.4592, bptt: 0.4673, tail: 10.3796, advantages_returns: 3.0713, losses: 12.7842 update: 315.5016 clip: 162.6884 [2023-09-24 15:28:15,477][42771] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.4137, enqueue_policy_requests: 41.5665, env_step: 1158.2399, overhead: 28.9266, complete_rollouts: 1.0414 save_policy_outputs: 52.5386 split_output_tensors: 18.2546 [2023-09-24 15:28:15,478][42771] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.4000, enqueue_policy_requests: 41.2557, env_step: 1124.3240, overhead: 28.5050, complete_rollouts: 1.0393 save_policy_outputs: 52.7890 split_output_tensors: 18.3417 [2023-09-24 15:28:15,478][42771] Loop Runner_EvtLoop terminating... [2023-09-24 15:28:15,478][42771] Runner profile tree view: main_loop: 3171.7615 [2023-09-24 15:28:15,479][42771] Collected {0: 10039296, 1: 10006528}, FPS: 6309.8